YouTube Transcript:
Azure End-To-End Data Engineering Project (From Scratch!)
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
this project helped me to crack multiple
offers and I got placed as an a your
data engineer with the help of this
project in this Inn data engineering
project we will learn all the in demand
Azure tools and Technologies such as
Azure data Factory Azure data breaks
Azure snaps analex along with manage
identities API connections and many more
but what's so special in this particular
video which is not available in any
other video on YouTube well there are
three reasons let's uncover them reason
number one all the in demand Azure tools
and Technologies such as Azure data
Factory Azure data bricks aour snaps
analytics and much more are covered in
one single video and do you know the
best thing all are covered from Scrat
reason number two instead of showing
just simple approach we will be covering
some real-time scenarios which are asked
by the interviewers reason number three
we will be covering some interview
questions as well because the best way
to prepare for the interview is covering
those questions in the projects so
without delaying let's get started with
this amazing project so welcome welcome
welcome so this is your onstop solution
for a your data engineering project in
which we will be covering end to end
solution yes this project helped me a
lot to crack my first data engineer role
because in this particular project I
covered realtime scenarios which caught
the eyes of the interviewers and
definitely it highlighted my skill set I
know you are really excited to cover
this project so I am double excited to
share this knowledge with you all
because this channel is dedicated to
contribute to my data community so
without wasting any time let's get
started do you know what is the most
important step what is the most
important stage in any data project in
any project yes you guessed it right
it's the architecture so in our scenario
we going to look at our data
architecture for this project if you
have a good blueprint for your project
you can succeed easily without any
hustles because you will have the right
path right so let's discuss the data
architecture of our project in
detail as you can see we going to
use HT connection as our source we could
have easily used some manual uploading
of CSU files into Data leag but I wanted
to show you some real world scenarios
where we will be pulling data directly
from the apis so in this scenario it
will be a GitHub account and we will be
pulling data directly from that API so
that you can learn how to fetch data
directly from the apis this going to be
fun because we will be using this
powerful orchestration tool and it's
called a data Factory a data Factory is
one of the most in demand orchestration
tool right now in the world of aure data
engineering because of its massive
Powers it's low to no code tool and yet
it is really really powerful I
personally love working with your data
Factory and yes if you are familiar with
synapse analytics we have a Y data
Factory there as well with name as
synapse data pipelines or like similar
to that but code is same a your data
Factory is there a your data Factory is
everywhere like in any a data Engine
Solution a your data Factory is there so
you should be very much familiar with
this tool and don't worry because I have
added realtime
scenarios while working with this tool
because when I Was preparing for my data
engineering interviews I was facing lots
of problems because because there was
not much realtime scenarios available
which were being asked by the
interviewers so this time we going to
cover some realtime scenarios you want
some hint okay so instead of building
static pipelines we will be building
Dynamic pipelines we will be using
parameters we will be using loops and
much more so you will be learning a lot
about this powerful tool okay what we
will be doing like with you with using
this Y data Factory tool so we will be
pulling this data that is our source and
we will land our data to our bronze
layer what is this bronze middle and
what is it we don't know anything okay
so basically it is a kind of
architecture called Medallion
architecture so this is the kind of
approach that we follow in data
engineering Solutions so what we do in
this solution we make our data travel
through three different zones raw silver
gold okay you can also call it as like
bronze silver gold you can also call it
as like raw transformed and serving
layer there are so many names but the
fundamental is same three layers first
layer is the raw layer or bronze layer
in which we keep our data as it is that
is available in Source what does it mean
so let's say we have one file in Source
we will create exact replica of that
particular file in our row Zone we do
not want to apply any transformation
that's simple right okay once our data
lands in bronze layer what's next bro
the next is this Hulk this is called as
Hulk off your data engineering or any
data engineering because of its powerful
spark clusters data braks is one of the
most important tool right now
because their power is really really
insane while working with big data data
Brak is like dominating data engineering
World trust me I love working with data
brakes plus the demand for data breaks
is rising
exponentially companies are going crazy
after data breaks developers or five
spark developers so do not worry at all
because we will be learning everything
about data Pricks in this particular
project and I have added some real time
scenarios as well so you will be
familiar with this tool and from scratch
that's the best thing right so what we
will be doing with this tool we will
pick the data from the bronze layer and
we will push the data
to the silver layer with some
transformation because when we push our
data to Silver layer we apply some
Transformations so we will look at some
crazy Transformations crazy functions
available in the world of data
engineering using Spar don't worry at
all we will be covering everything from
scratch okay what's next the next is
serving layer okay what we going to do
in this layer so after applying
Transformations after applying so many
cleaning it's time to serve our data to
our stakeholders and in our scenario it
can be data analyst it can be data
scientists maybe some analytics managers
as well so we will be building data
warehouse and the most popular data
warehousing solution right now is azure
synapse analytics so we will be learning
a lot about this technology as well and
this will be our goal layer where the
data is ready to be served to the
stakeholders or any other
developer okay that sounds amazing
what's next so we will be covering a
little bit about powerbi as well so that
you will learn how to establish
connections now you will say is this
really required bro the answer is yes
because when you want to become an
efficient data engineer you should know
end to end Solutions let's suppose you
are a data engineer and you have
prepared a data warehouse a data analyst
came to you hey I'm facing a trouble I'm
facing ing a problem while establish a
connection between these snaps and
powerbi you should be there to help the
person so it's really important to know
how to build connections how to build
maybe you can say Link services and how
to actually Pull tables and data
warehouse facts and dimensions into the
powerbi so it's really important to
learn that as well so that's why I have
covered a little bit about powerbi as
well not much in detail but yes a small
part so you will love learning that part
as well okay so it was all about our
data architecture and now I know you are
double excited to actually implement the
solution but before that I would like to
just discuss some of the fundamental
terms that will be used throughout the
project and if you are already familiar
with those terms it will be a good
revision for you and if you do not know
then it is for you so just quickly cover
some of the prequest some of the
fundamental terms that we should know as
an your data engineer and trust me you
can expect some interview questions from
these fundamental areas as well because
we sometimes like Overlook these areas
but you need to master these as well so
without wasting any time let's get
started so here are some of the prequest
that we need to complete this project
don't worry these are really really easy
to cover so first of all obviously you
need a laptop or PC or maybe MacBook I
don't have a Macbook so I don't know if
we can actually feel that WIP so it's up
to you laptop or PC we are simple people
bro laptop or PC with stable internet
connection Plus aure account don't worry
don't worry wait if you do not have an
aure account I will tell you how you can
create an Azure account for free yes for
free it's not a promotion or it's not a
kind of you can say commission thing
Azure actually provides a free Azure
account so that you can use their
services and you can actually learn
Azure so don't worry I will tell you how
you can create your Azure account for
free don't worry at all and then
obviously you are learning one of the
most important one of the most in demand
technology you should have some
excitement to learn a your data solution
and that is in solution not just one or
maybe two Services we'll be learning a
lot more let's start the project because
enough information is provided I accept
that but it was required it was
important so finally it's time to create
this project project from scratch and as
I promised I will tell you how you can
create your free Azure account so first
let's create that and if you already
have that account it's good so without
wasting any time let's create Azure free
account to create your Azure free
account the steps are really really easy
let me show you you first need to go on
Google and just type as your free
account and you will have your first
link on your screen just click on it
and then you will land on the Microsoft
website where you can clearly see try
aure for free tab just click on it don't
click on this one pay as you go because
this is a paid account and this will
charge you as much as you use so just
free so it will just ask you to put your
email ID if you do not have Microsoft
email account do not worry you do not
need to have microsoft.com at the end in
your email account you can use a Gmail
account but you need to create you need
to register that account under Microsoft
as well so if you do not have one just
click on create one if you already have
your Outlook account just use that one
it will work so I already have one so
you can just click on create one steps
are very easy just create your first
account if you do not have one and just
click on next let's say I just mention
my account like demo
account at the
gmail.com and just click on next
so after putting your credentials you
will land on this page so this is the
kind of form that you need to fill and
just put your all the details that you
have like name address everything and
just verify your phone number and then
just hit on this sign up button and let
me just show you what you can expect
with this account so this account is
free for like one year you can use these
Services plus you will get
$200 credit that you need to use in
first 30 days so that you can just
create all the resources that require us
credits to actually use those services
and do not worry this project will be
sufficient to like like these us 200
credit credits are like sufficient to
complete this project so do not worry at
all just go with it one thing that I
like to mention once you hit on this
sign up
page this one it will ask you to put
your details like banking details and
all do not not worry at all because this
is Microsoft as your they're not going
to charge you at all because this is
your trial account they will ask you to
convert it into a pay as you go account
but if you do not want to do that you
simply decline it and if you do not take
any actions still they cannot charge you
so just trust aure and just go with it
and now the question is what will happen
after 30 days or maybe after an year so
all your resources will be fed uh or you
can say all will be removed for your
account from your account but you will
not be charged so do not worry at all
just put all your uh information that is
required and you are good to use your
aure free account for 30 days I have
personally created that so do not worry
at all so just trust Azure and here is
your Azure free account so once you
create your Azure account we simply need
to go on the portal that is aure Portal
and how we can do it let me show you so
it's time to go on the Azure portal and
how you can go because now you have your
Azure account with you so simply go on
Google and type portal. azure.com so
once you write it just click enter and
you will land on your Microsoft Azure
portal obviously you're going to enter
your credentials again I have already
entered it so it just took me to my aure
portal account so this is our your home
screen don't worry at all because your
screen can look different because I
already have some resources built in my
account so do not worry at all if your
screen looks different so this is a kind
of UI you can expect in your Azure
portal now hold on now you have your
Azure account now it's time to follow
the right path according to our data
architecture the first step is data
source so just keep your a account aside
and now let's explore our data source so
that we will have a good understanding
like which data we are working with and
why we pick that data we will cover some
crazy questions there let's see our data
source so this is our data source okay
Adventure works this is a very popular
data set available in the world of data
engineering data analytics data
science why I picked this data set let
me tell
you this data set was full of tables and
this gives you an advantage with when
you showcase your project that involved
so many tables because they know that
this person has applied so many joints
so many lookups while completing this
project so this gives you an advantage
and that's why I also took this data set
and this gave me an advantage too
because in the real world scenarios we
work with so many tables while building
one solution so that's why it is really
important to work with different
different tables and let's have a look
let's have a look let's have a look so
first of all we have calendar. CFC
obviously in the calendar table what you
can expect just a DAT column right so
okay then second we have customers table
in which we have all the information
related to customers like their name
address maybe like so many things birth
date and all so we will be just
aggregating so much of data don't worry
at all so I'm just giving you an
overview of the data that we have so
then we have product categories
obviously we are building a data set or
building a solution in which we have
sales data product categories product
information territories customers
information are required to apply the
necessary lookups so as you can see we
have products table product
subcategories product categories and
then yes we have passed three years of
data 2015 16 and
17 then we have an interesting table as
well we have return data as well so this
is an ideal data set if you want to work
with end to end solution because we can
perform so many joints we can perform so many
many
Transformations and we can do a lot more
with these tables so that's why I picked
this data set and you want to learn so
much with these tables so as you can
imagine like what we're going to do
obviously we have sales data right as
you can see we have past three years of
sales data we have returns data as well
okay we have returns data and these are
basically our fact tables don't worry
we'll discuss it in detail so these two
are are our fact tables and we can build
dimensions on these tables like like why
we need Dimensions because if we want to
perform aggregations if we want to
provide contextual information to our
fact table
we need Dimensions so we will be just
building a model by keeping sales data
in the center and you can say returns
data in the center and lookups as our
Dimensions so this going to be fun while
working with data so are you excited to
just load all this data to our aure and
yes we will be using aure data Factory
for that so just to make things easy I
have already loaded all this data to my
GitHub account and now we will be just
pulling the data from GitHub account
using API and we will push this data to
Ros on or Bron on right so this is our
phase one of the project where we will
be pulling data from data source and
we'll be pushing this data to our data L
Bron Zone without wasting any time let
me take you to aure and let's create the
necessary resources Resource Group and
much more
let's start working with Azure so we are
here in our Azure portal so as you can
see create a resource plus button do not
click on it because I know you really
excited to create a resource but before
that we first need to create a resource
Group because it is really important to
create a resource Group because that is
the place where you will keep all your
resources right so how you can create a
resource Group you simply need to click
on this button if if it is not visible
to you because it just displays the
recent Services resources that we have
used so simply go on this search bar
this search bar will help you navigate
all your existing resources plus if you
want to create any new resource you can
create one from here do not worry so for
you all I will just click on this
resource resource search bar so I will
just type resource oops resource groups
and you should just see the popup here
and just click on that
and it will just show you all the
existing resource groups do not mind my
resource groups because I already have
some so so sorry for that so we will
just first create the new Resource Group
by simply clicking on this create button
simple just click on
it okay so this is the window a kind of
configuration that we need to do before
creating our Resource Group it is very
simple you simply need to provide the
resource Group name because obviously
you should have your resource Group name
right then this is our subscription so
don't worry about that as well because
this is your subscription and in your
case it will be just one subscription
because you are just one single
organization having just one department
just kidding so this Resource Group I
can just give let's say aw
aw
project and it will check is it
available or not because our name should
be unique so we cannot re reuse our name
so I can just put aw project and now
here comes the region part so this is
really important because it is
recommended to just pick the nearest
region because it will just give you low
latency and a is not much big and it's
not much critical so you can pick any
region so let's pick East
us because I have already worked with
these regions and it works fine so don't
worry about that just click on next that
is tags now what are these tags so we
can totally ignore it but you should
have some knowledge why we have tags
there if we want to categorize our
resources or any resource groups maybe
because of billing structure billing
categories and all for any purpose if
you want to categorize it we just add
tags so this is for that purpose and for
now we do not need to worry about that
just skip it and click on review plus create
create
so it will just validate it and finally
it will ask you hey bro please click on
create button so you all need to click on
on
create so it will just create your
resource Group I think it has already
created it's your is really
fast okay now we can see our Resource
Group is ready just click on
it so you will see this window right so
here you can see it is empty because we
do not have any kind of resources now
there are two ways to create resources
one is just click on this create button
so what it will do it will automatically
pick this Resource Group and it will
just choose this as our default option
and it will just put all the resources
inside this Resource Group but I also
want to show you the other method so
let's cross this and let's click on this
Home tab this
one click on Home tab and similarly we
will just search it here
what should be our first resource just
tell me just tell me the First Resource
the very first
resource yes you guessed it right it is
data link okay so now here comes the
interview part as well why and how let
me show you the possible questions that
you can expect one question we have
already covered that is data redundancy
but I will show you how you can navigate
it second question I will tell you once
like I create this data l so I will simp
simply write storage account because
that is a storage account
right here comes a question for you if
you are already familiar with a so that
you should answer this question and if
you are new don't worry at all bro
because I will tell you each and
everything so obviously it will show us
all the existing storage accounts so
these are my existing storage accounts
don't worry at all just ignore so simply
click on Create and once you create your
storage account it will ask ask you for
some of the
configurations so see now you can see
the resource Group is already selected
as our aw project so if you do not have
your resource Group ready while creating
resource you can click on create new and
you can create it from here as well but
it is a
recommended advice anything you can say
you should have your resource Group
ready before creating resources okay now
we want to give storage account name
what name we should pick it totally
depends upon you so I would like to pick
name like aw
aw
storage data Lake because our name
should be unique we cannot have same
names so you cannot pick this name if
you want to pick you can say like aw
storage data Lake and you can just put
your name at like as suffix that's the
only option if you want to keep your
name as it is otherwise you can pick any
name like any any
name okay now primary service obviously
we want to use this use it as like your
blob storage or data Lake if you do not
pick it don't need to worry it will just
create as your storage
account so let me show you let me deselect
deselect
it okay it is not giving us the option
to deselect it okay just keep it like
this now just keep this performance
issue as standard
then we have redundancy that we have
just discussed in our
prequests so this is the data redundancy
Now by default it will just put your
data under GRS Geo redundant storage or
you can say Geo replication storage
because it replicates your data within
different regions within different
geography so I call it as Geo
application storage you can say it as
Geo redundant both are same things but
we do not want to do that because we
will just keep it as lrs that is locally
redundant storage which will just store
your replica of the data in the same
data center simple then here is the
question okay the question is this is my
storage account right so by default it
creates a blob storage account what it
creates blob storage account it does not
create data link so if if we want to
create a data L out of this storage
account what do we need to do you have 3
seconds you can just comment and let me
see if you know this okay so now it's
time to disclose it so if you want to
create data L you just need to take a
small box and it will do a magic it's
not a magic but it's a like small option
that you need to pick and then only you
will have your data link otherwise it
will just create a blob storage
I know lots of data Enthusiast have the
question like what is the difference
between both let me show you bro let me
show you I'm here to tell you each and
everything don't worry okay so this is
the configuration that is hierarchial
npace hierarchial nam space is the
option that we need to pick this is the
one what it does what it does okay let
me let me first click it when I click on
it it will create dat data L yes then it
will create data L otherwise it will
create blob storage so what is the
difference between two let me tell you
so let's suppose this is our blob
storage account right and this is our
data link okay in the both accounts we
have containers right let's say we have
container number one container number
two container number three and within
these containers we save our files let's
say customers. CSV file right let's say
customer. CSV file maybe any files but
we cannot create hierarchy of the
folders what does it
mean let's say I am in my data link I
can create this container that that is
the like first level of hierarchy but
within that container I can create
folders as well see I can create one
folder within that folder I can create
second folder within that folder I can
create third
folder then I will save my CSV file this
feature is not available in Blob storage
so if you are working with data where
you need to analyze your data where you
need to build tables on it you should
pick data Lake you should pick data Lake
yes and this was a difference and if you
already knew that it's good because it
it was a quick revision for you and if
you don't this is a game changer because
this is the like most important question
that you can expect and it's not just
about the questions it's the fundamental
thing that you should be aware of while
working with the so now you have enabled
hierarchal name space so now you have a
data link don't worry about
that okay now we just need to look at
the access tiers so these are the access
tiers we have hot cool cold hot means if
we want to frequently deal with data we
keep it as hot if we want
to if we do not want to frequently use
data and we want Low Cost Storage low
storage cost for our data then we keep
it as cool if we want to just have the
minimum cost for our data to store it we
keep it cold but if we want to use it we
need to pay more so it is for archiving
of data so it totally depends upon the
requirements we will be frequently use
our data so we will keep it as
hot okay
let's check the configuration for one
more time and yeah all said just click
on next this is networking
so our networking has enabled Public
Access for for all from all the networks
okay we have option like for vets and
like disable public ex and private
access this is this will be very much
related once you enter the organization
as data engineer because some data will
be restricted which cannot be accessed
from private or oh sorry from public
endpoints so there you will be creating
private endpoints so for now it's everything
everything
sorted just click on review plus create
and just wait for the validation to be
completed once it is done you are ready
to click on create button just click on
Create and then it will create your
usource okay it will take some time
because it will just create a data l so
you can expect few seconds let's wait
shake
H yes it's done so now we can see go to
Resource tab that means our resource is
ready and we can go and check that but
we don't want to do that now
because first we will create our aure
data Factory and then we will just
establish the containers within our
storage account so here comes our second
resource how we can do that okay let me
show you simply go let's just type Azure
data Factory in the search
Factory okay just try typing data
Factory yes
simple So currently I do not have any a
data Factory created in my portal
account so I will simply click on create
simple and again we need to pick
Resource Group as I told you it is the
like necessary thing and we will be
keeping the same Resource Group because
we need to just store all our resources
within one Resource Group simply select
aw project then we need to obviously
name our ADF how we can just just name
it simple it's same as we just named our
storage account simple so what we will
do I will just say ADF
ADF
aw project let's see if it is available
or not yes it is available oh it's
already taken see see this is the error
that you will see if you will just try
to enter the same name that I'm doing so
just try to pick unique you can say ADF aw
aw
project unch I think it should be available
available
yes it's available
an like people with name on are limited
yeah because they are really really
talented just kidding so just click on
review plus create and it will just
create your aure data Factory and that's
it it was really simple right yeah it
was quick and after this we will be
creating containers what are containers
basically we will just dedicate one
container for for our one zone let's say
one container for bronze second
container for silver third container for
gold so I'm really excited and let's
click on create button because our
complete H so it will just validate it
and it will deploy it as soon as
possible so we are good with our all the
sources that are required to complete
phase one of our project so are you excited
excited
are you excited because I was really
excited when I was doing these projects
you should have excitement do not take
it as a burden do not take it as a part
of study it's like fun it's like the
real skill so let's go to our Resource
Group and as you can see wow A Y data
Factory I love working with the Y data
Factory because this is my first
resource that I learned in the aure
environment obviously data Lake was the
first one because we just picked the
data and sto there but I'm talking about
like real resource that we use like for
etling eling so this was my first one
and I love the UI this is my first love
yeah after college so just click on it
and let me show you what you can
expect after clicking on this a your
data Factory studio and then we will
create our zones bronze silver gold so
let's click on this tab launch Studio it
will just open our aure data Factory Studio
Studio
and it will you going to love the UI if
you are not familiar with uh ADF you're
going to love the UI okay so this is the
homepage this is the homepage that you should
should
see okay do not worry about the GitHub
repository right now so
this these are the tabs that are
available do not worry about that we'll
cover everything so this is a kind of
Home tab on which we are right now then
we have this tab this is the most
important one because this is the one
this is this this is the one this is the
this is so author tab because we will be
creating all our pipelines within this
tab so this is one of the most important
ones we just have four so all are
important so this is Monitor tab where
we can monitor our pipelines we can see
the status of our pipelines if any
pipeline is failed what is the reason
everything everything under monitor tab
it is like the monitor of our class so
it just monitors everything then we have
manage tab where we can just manage
things such as repositories GitHub
connections devops connections and Link
services wait wait wait wait what is link
link
service you will learn everything don't
worry about that I'm just giving you an
overview so this is manage Tab and this
is our Learning Center so this is the
kind of Center where you can just read
the resources documentation
pick some data sets
available simple and
sorted let me just show you the atho tab
and okay so this is the area where we
picked the
ADF building blocks that that are like
pipelines activities functions notebooks
everything everything H have been like
summarized everything has been
summarized into this area
okay now
this is our ADF enough information has
been given so now it's time to prepare
our zones and then we will just start
pulling the data using ADF yes so let's
jump on to the storage account so I have
just clicked on my storage account like
the Azure account if you would surprise
hey what is this window I just clicked
on my portal account now I will just
click on the Home
tab okay
now let's go to our Resource Group and
click on our data link and and and and
just focus on this area because this is
our area of concern right now just click
on containers so this is the kind of
data L that we got so we get four
storage Services one is like containers
that is also known as data Lake
popularly then file shares in which we
can just provide one stop solution to
store all the files throughout the
organization or the project then cues
obviously if you are familiar with Json
data that is no SQL data we use qes here
and like qes is like basically can be
Json data and can be anything because
this is a kind of service which helps us
to work with streaming data so let's say
we have messages and we want to store it
so we just store it in cues and those
messages can be in the form of Json or
can be in the form of anything so it
depends and tables are like no SQL
databases that means not only SQL so
that is very good if you want to perform
some analysis on semi-structure data
that means like if you have keys and
value pairs so you can just use the
service okay sorted just click on
containers oh
that's what I want so now we want to
create three containers as I just
mentioned one container for bronze
second container for silver third
container is for gold that's how we will
create our three zones simple
sorted bro do not feel it like it is
very difficult just try to grasp the
information and if you find any
difficulty just rewatch that part
because it's really really easy just
click on container and first of all
we'll create the bronze
container okay just type the
name and just click on the create
button perfect
similarly you can
create silver container bronze silver
gold I'm just keeping the naming
Convention as bronze silver gold if
you're familiar with row transformed
solving layer you can keep it it's all
depend upon the requirements and it's
all about the naming convention you are
familiar with so it's all up to you Bros
bros Bros bros is not just about boys
when I'm saying bro that includes girls
as well so Bros means like boys and girls
girls
both now the third is gold so let's
create this as well perfect now our all
the three zones are ready and now we can
just perform data loading
into this particular layer and as we
just mentioned that we will be using API
so now it's time to create a link
service let's go to our ADF that is this
tab click on
it are you excited to create your first
pipeline let me show you how you can do
that so to create a pipeline we need
source and
destination because we want to load our
data from source to a destination simple
and what are the prequest for that so
it's very simple let me show you the
activity first so that you can relate so
first of all just click on Pipelines and
obviously you won't see anything because
we do not have any pipeline ready but it
will be ready soon just click on it just
click on these three dots simple and you
will see the option called as new
pipeline click on
it okay so this is our canvas this is
our canvas okay so first of all we will
name our pipeline so as you can see you
can just name this pipeline from here so
I will say Pro or let's say m get to Raw
this is the kind of pipeline I want this
is the pipeline name that I want to pick
and just click on this small button this
will just hide this naming Tab and we
have already named it so why we need to
keep it here
just aside it okay so now we have like
much bigger canvas area so how we can
create that activity first of all that
activity is known as copy activity and
how we can find it just click on this
move and and transform button it will
just show you all the possible
activities that we have under this
category Okay click on it we have copy
data drag it here simple simple don't
worry don't worry bro we'll cover all
these parts don't
worry okay so now first of all first
name it because it's really important so
obviously we have just name our pipeline
here as you can see get to row but we
need to name this activity as well how
we can do that just click on this and let's
let's say
say
copy raw data this is my activity name
that I want to pick right okay so as I
just mentioned that we want a source and
a destination to perform copy activity
because just imagine this is the
activity that will just load the data
from the source and it will just push
the data to the destination simple okay
so this is our source
and this is our sync sync is nothing but
just a destination it is a fancy name
for Destination so when I was learning
it I was also like hey what is sync okay
sync is a kind of destination simple so
in our scenario what is sync in our
scenario it is GitHub account and within
that GitHub account we have a folder
called Data let me show you so this is
my GitHub account where we have the data
repository click on it and this is the
folder oops this is the folder
and within that folder we have all the
CSV files and we want to fetch files
directly from this GitHub account let me
show you first like how it looks like
let's say I just pick this products. CSV
file and if I want to look at the data I
just need to click on Raw just to have
the URL because we need to put the
URL just click on raw and here is the
data and we will use this URL to pull
this data this one
simple yes it is simple but we need to
create some connections okay let me tell
you so this is the copy activity simple
and we want to create a linked service
what is a linked service bro linked
service is basically a
connection why it is necessary see this
is azure this is azure it needs to read
the data from GitHub it needs to push
this data to data Lake simple so it
needs to build
connection from this source as well then
it needs to build connection with
destination as well so these connections
are known as link service this is just a
fancy name for a connection simple now
after link service we need to create
data set now what is a data set okay
okay let's say you have created a
connection with this link service
right let's clean like all the Clutter
first let's say you have created your
link service with the GitHub account
okay fine within this GitHub account if
I go here I can see so many files within
this data folder now how I would know
which file to pick which data to pick
here comes the role of data set it gives
us the detailed location for the data
which data set we need to pick from this
whole connection so this is the
architecture behind link service and a
data set see it was so easy let's do it
then you will learn it and for link
service obviously we need a link we need
a URL so that it can locate this GitHub
account how we can do that simply go to
product store CSV file this one any file
we just need the uh URL because first we
will be loading data that is Adventure
work products. CSV because first we will
create a static pipeline then we will
create a dynamic pipeline but for that
you should have good hands on while
working with static pipeline we will
grow step by step bro don't rush just
click on this products. CSV file click
on Raw then you just need this URL this
URL yes now this URL has actually two
parts one is base URL second is relative
URL let me repeat first is base URL
which is till here
till. this one I will simply copy it and
rest of the URL is relative URL simple
okay let me go here and let me create
two link service one is with my GitHub
account second is with my storage
account there are in two ways I can
simply click on it and I can just click
on source and then I can just pick link
service but let me show you the
recommended way we should always have
Link services ready before performing
anything because it go it it gives us a
real you can say strategy your right
road map before performing any kind of
activity because connections are must if
we want to move our data from here and
there so how we can create in a
recommended way so let me show you just
click on this manage tab because it will
manage our all the resources right okay
click on this and here you can see the
very first connection is Link services
simply click on it and you will see this
plus button
simple click on plus new and then here
we have all the connections list ADF
provides us vast list of all the
connections like we can build
connections with ad AWS Apache Pala red
shift like there are so many options let
me just show you that's why I love it
here like you just name it you will have
all the connection data was Google ads
HTTP connection everything ready so you
have so many connectors and we will use
these connectors to actually build a
connection right so in our scenario we
want to build a connection with GitHub
account which is a HTTP connection as
you just saw that we have a HTTP link
there are two ways we can either use
rest API and HTTP but it is recommended
by Microsoft aure if you have a data in
your connection then you then you should
just go with the HTTP connector because
it is it is dedicated created for this
purpose only so let me just pick HTTP
how we can do it just simply search HTTP
and pick this one simple see we do not
need to write any code this is all low
code or you can say no code click on
continue oops I just clicked on Escape
sorry so click on HTTP and click on
continue and then we need to name our
link service how we can name our link
service so as we all know that this is
our HTTP connection so I will say
HTTP link service you can just name
anything I just named it like this okay
now we need to give base URL as I just
mentioned our base URL is this one till
Doom just remove the Slash and then the
authentic ation type we need to select
Anonymous and then it's time to test
this connection how we can do that so
here we have a tab here test connection
just click on it it will just test our
connection for us let's wait and let's
see if it is successful yes it is
successful so now our connection is
established now we can use this link
service to pick our data set but as I
just mentioned that we will first create
the second link Service as well for our
data lab because whenever you want to
build any efficient data solution you
should have all the Link services ready
yeah so let's click on create see we can
just see our link service here click on
plus new so that you can create your
second link service which is your data
link so what should I pick yes data link
just click on data link and just search
ADLs gen
two okay this is the one click on it and
click on continue then we will
say uh storage maybe
storage data
link simple link service name yeah then
you just need to pick storage account
name with which storage account you want
to make connection because in my
scenario I have so many storage accounts
in my portal account so how is your data
Factory will get to know which account
it needs to pick so here I will just
give hey just pick this storage account
see this is the value this is the kind
of importance of you can say having Link
services in place so again test
connection which should be there yes it
is successful click on create our both
the Link services are created now it's
time to create the data set out of it
how we can do that simply go on your
author tab here and click on this copy
activity now we are ready to create data
set okay I will simply go on source so
first we will create the data set for
our source okay for our source we need
to create data set we do not have any
data set ready right so we will simply
click on plus new click on it and then
again we need to pick from where we
should pick our data dat set we should
pick our data set from HTTP
HTTP
okay and what is the file format for
that data it is CSV simply click on CSV
and click continue and now it is asking
us to provide a data set name so I will
use DS that is an abbreviation for data
set and I will just give name such as DS
HTTP so that it will give me some
insights that this data set is for this
connection then then as you can see we
have to pick the link service that's why
I told you always create your link Service
Service
First okay just click on this drop down
and just pick this HTTP link service it
will just show you the available linked
Services which are based on HTTP only
that's why you are not seeing the second
link service that we created click on it
and obviously first row is header now it
is asking us asking us for relative URL
so we will just go to our source and we
will give this relative
URL just copy this one and just paste it
here simple now we will just click on
okay simple Yes simple just click on
okay so now our source data set is ready
if you want to preview this data we have
a tab here just click on this preview
data I will just click on it and I
should see the data perfect I can see my
products table it looks good that means
we are good to go with this now we need
to do the same thing with sync as well
so it is a kind of homework that you
need to do just do it on your own
without coping me right just do it on
your own on your laptop and if you find
any mistake or if you find any error
just watch me so again we need to create
a data set because we do not have any
data set right do not pick this one do
not pick this one because this this data
set is our source data set we want to
pick the same data set we will click on
plus new and where from where we need to
pick our data set obviously it is data
link Gen 2 then click on continue and in
which format we need to put our data we
just need to pick data like in CSC
format because we do not need to apply
any Transformations and it is the
recommended way that we should just
create the exact replica of data okay
just we will pick delimed text that is
CSV then we need to just give the name
so I will say DS raw so this is my data
set which is pointing raw container or
bronze container hey let's say bronze
same thing don't worry raw bronze both
are same thing then we need to pick the
link Service as you all know that we
have already created one link service so
we will pick this link service okay
let's click on this drop down click on
the storage account and then we need to
give the file path okay how we can give
the file path just click on this browse
button the small browse button and then
we will pick our bronze container within
that container we will click okay
because we do not have any folder but we
will create it how just click on
directory D directory is a kind of
folder within container so I will say
products because that is the power of
hierarchial name space because we can
create containers so now I will create a
dedicated container for every file
that's my choice so I will say products
then I I can just name it as products.
CSV products. CSV simple and obviously
we need to keep first row as header
because this is a CSV file simple let me
just click on
okay Kea import failed it shows this
thing just click on none don't worry
about that just click on okay now why it
showed this error because there is no
file there right now because we are
copying this file don't worry about that
now here comes the mapping part if you
want to import as you can let's say if
you have like if you already have one
file there and if you want to put
another file and if you want to match
the schema then you can just click on
import schemas and you can just manually
import it if you have like different
different names from source and in the
end so that totally depends on the
requirements in our scenario we do not
have any file there so everything is
fine so now it's time to actually run
this are you excited to run your
pipeline okay let's run it just click on
this debug button it will just load your
data from GitHub to your your bronze
go H it will take some time like few
seconds and it will just load this data
and once we have the data then it's time
to celebrate because we have built our
first pipeline which pulled the data
from API to our bronze
see uhuh I saw green flag I love it I
love it so now we can say that we have
successfully loaded the data from GitHub
to our aure using a pipeline do you want
to see that data yes I I want to see I
want to see so I'll just just look at
the data how we can look at it just go
on your Azure portal click on your Bron
container hey we have a folder with
products then we have a products. CSV
file okay and this is our data lay we
pull the data from this GitHub API to
our bronze layer just click on it and
you can see some configuration just
click on edit so that you can you do not
need to edit your data but you can but
you should not it is just a kind of
preview that I use to view the data so
this is data yes data looks so so so
good whenever you see green flags in
your pipelines it's
like heartwarming thing so we are good
with our pipeline but this is the
beginning of your Azure Journey why
because this is a kind of static
pipeline we have built okay let's
celebrate let's first publish this as
you can see the publish all button it
will just save your all the progress
because this is also important
right okay so this will just save our
work to our a data Factory and whenever
you will come again like tomorrow maybe
a day after tomorrow so you will see all
the progress don't worry if you do not
click on this then you need to just
rebuild everything okay our publishing
is completed now here comes the serious
part so okay we have buil static
Pipeline and as you know that we have
successfully one file and we total have
like I think 8 to 10 files one way of
doing is that is explained by everyone
else on YouTube but we want to cover
something extra okay so instead of
relying on static pipeline we will
create Dynamic pipeline okay let's first
discuss the static pipeline one way of
doing is just create the same copy
activity again and again and again that
is not the recommended way at all no you
cannot build this kind of solution in
the real world scenario you cannot build
this kind of solution in your interviews
I'm serious you cannot perform this
activity again and again and again
because why we have something called
iterations and conditionals why because
we need to use it and that's why we have
to use Dynamic pipelines and this this
part is kind of tricky not tricky you
just need to be more focused to watch this part and you may require to watch
this part and you may require to watch it like two to three times to not
it like two to three times to not hesitate to watch it again and again
hesitate to watch it again and again because when I was learning these things
because when I was learning these things I also watched it I think at least eight
I also watched it I think at least eight to 10 times and I'm not kidding so if
to 10 times and I'm not kidding so if you are watching it like for two to
you are watching it like for two to three times the upcoming part not this
three times the upcoming part not this one this was easy so do not hesitate and
one this was easy so do not hesitate and do not feel demotivated because it is a
do not feel demotivated because it is a part of learning and in the next few few
part of learning and in the next few few minutes we'll be learning how we can
minutes we'll be learning how we can build Dynamic pipelines and just be a
build Dynamic pipelines and just be a little serious because we are having fun
little serious because we are having fun throughout the videos so you need to be
throughout the videos so you need to be just more focused in few few you can say
just more focused in few few you can say phases of the videos so now it's time to
phases of the videos so now it's time to just build Dynamic pipeline so let's
just build Dynamic pipeline so let's first discuss the architecture like how
first discuss the architecture like how we need to build a dynamic Pipeline and
we need to build a dynamic Pipeline and trust me bro once you master this
trust me bro once you master this technique once you master this skill you
technique once you master this skill you going to feel much more confident in
going to feel much more confident in your ADF skills trust me because your
your ADF skills trust me because your 90% of the solutions will require you to
90% of the solutions will require you to work with these scenarios where you need
work with these scenarios where you need to iterate your pipelines where you need
to iterate your pipelines where you need to use parameters and dynamically move
to use parameters and dynamically move your pipeline so it's time to really
your pipeline so it's time to really really really focus towards this video
really really focus towards this video and let's learn that now it's time to
and let's learn that now it's time to cover Dynamic pipelines so it's time to
cover Dynamic pipelines so it's time to just be more focused and try to learn
just be more focused and try to learn the scenario where we'll be learning
the scenario where we'll be learning dynamically passing parameters into our
dynamically passing parameters into our pipelines that makes Dynamic pipelines
pipelines that makes Dynamic pipelines so what exactly we are trying to do let
so what exactly we are trying to do let me explain you so as you are aware that
me explain you so as you are aware that we have these files within our GitHub
we have these files within our GitHub repository or you can say in our source
repository or you can say in our source okay that is
okay that is sorted
sorted now
now we want to pull this data one way is
we want to pull this data one way is doing like repeating the activity that
doing like repeating the activity that is copy activity again and again
is copy activity again and again and we do not want to do that because
and we do not want to do that because that is a static approach and that is
that is a static approach and that is not a real time scenario that you should
not a real time scenario that you should do okay so what is the alternative of
do okay so what is the alternative of that step so we will be creating a loop
that step so we will be creating a loop so if you are coming from a programming
so if you are coming from a programming background so you will be aware of for
background so you will be aware of for Loop and if you are not from a
Loop and if you are not from a programming background so basically for
programming background so basically for Loop or any kind of loop is performed to
Loop or any kind of loop is performed to perform any kind of iteration so in this
perform any kind of iteration so in this scenario as you can see we have
scenario as you can see we have iterations to perform like one 1 2 3 4 5
iterations to perform like one 1 2 3 4 5 6 and so many iterations so what we will
6 and so many iterations so what we will be doing we will create a copy activity
be doing we will create a copy activity okay so we just created copy activity
okay so we just created copy activity what's different in this copy activity
what's different in this copy activity let me tell you let's say we have this
let me tell you let's say we have this copy
copy activity and instead of passing
activity and instead of passing information statically what information
information statically what information okay information number
okay information number one that is
relative URL because you will be having a different relative URL for a different
a different relative URL for a different file so one thing that will be changing
file so one thing that will be changing for our copy activity is relative URL
for our copy activity is relative URL number one right what is number two in
number one right what is number two in our scenario number two is folder in
our scenario number two is folder in which we will store our data let's say I
which we will store our data let's say I pick this file products. CSV right and I
pick this file products. CSV right and I push this data to a folder called
push this data to a folder called products so that is a dynamic parameter
products so that is a dynamic parameter that will be changing with every file so
that will be changing with every file so that is our second thing that will be
that is our second thing that will be changing right what is the third thing
changing right what is the third thing okay it is very simple the file itself
okay it is very simple the file itself so let's say we have a different folder
so let's say we have a different folder for products perfect so we need to say
for products perfect so we need to say products. CSV file within this folder so
products. CSV file within this folder so that means just these three values will
that means just these three values will be changing for this copy activity
be changing for this copy activity sorted so what we will be doing instead
sorted so what we will be doing instead of hardcoding these three values we will
of hardcoding these three values we will create three parameters we will create
create three parameters we will create three parameters and what we will do we
three parameters and what we will do we will keep on changing the values of
will keep on changing the values of these three parameters so every time
these three parameters so every time every iteration will have a different
every iteration will have a different set of parameters right so now you will
set of parameters right so now you will be wondering how we can do it okay let
be wondering how we can do it okay let me tell you so in aure data Factory we
me tell you so in aure data Factory we have an activity called for Loop like
have an activity called for Loop like it's not like named as for Loop but it's
it's not like named as for Loop but it's like for each activity which is
like for each activity which is equivalent to for Loop so that's why we
equivalent to for Loop so that's why we call it as like iteration activity or
call it as like iteration activity or for each activity whenever we want to
for each activity whenever we want to dynamically pass any kind of
dynamically pass any kind of activity then we make use of that
activity then we make use of that particular one activity so that is for
particular one activity so that is for for each activity right so what we will
for each activity right so what we will be doing we will just create a for each
be doing we will just create a for each activity like this and we will put this
activity like this and we will put this copy
copy activity inside this
activity inside this one here so it will keep on moving till
one here so it will keep on moving till our iterations are completed so this is
our iterations are completed so this is the scenario this is the scenario so
the scenario this is the scenario so this is the architecture that we will be
this is the architecture that we will be performing the it will be like building
performing the it will be like building an AJ data Factory so you do not need to
an AJ data Factory so you do not need to worry if you didn't understand 100% of
worry if you didn't understand 100% of it because you will get it once we will
it because you will get it once we will actually building it in your data
actually building it in your data Factory so this was a kind of overview
Factory so this was a kind of overview that I wanted to give you before
that I wanted to give you before actually performing it because now when
actually performing it because now when I'll be performing those steps you will
I'll be performing those steps you will feel that thing that what exactly we are
feel that thing that what exactly we are doing so this is all about
doing so this is all about parameterizing and let's get started and
parameterizing and let's get started and let's learn 100% of the so now we are
let's learn 100% of the so now we are into our Y data Factory portal so this
into our Y data Factory portal so this time we want to build a dynamic pipeline
time we want to build a dynamic pipeline simply go on these three dots click on
simply go on these three dots click on new pipeline simple so first of all
new pipeline simple so first of all let's rename it and let's say
let's rename it and let's say Dynamic get to Raw
Dynamic get to Raw simple okay so now we want to create a
simple okay so now we want to create a copy activity now you'll be saying we
copy activity now you'll be saying we just created one but this time we do not
just created one but this time we do not want to hard Cod it we want to pass
want to hard Cod it we want to pass parameters inside that inside that
parameters inside that inside that activity so let's create a parameterized
activity so let's create a parameterized copy activity how we can do that just
copy activity how we can do that just simply search copy graag the activity
simply search copy graag the activity here rename it let's
here rename it let's say
say Dynamic
Dynamic copy oop spelling mistake perfect so now
copy oop spelling mistake perfect so now we want to pick the data set so as we
we want to pick the data set so as we just mentioned that we will be creating
just mentioned that we will be creating parameterized copy activity that means
parameterized copy activity that means we need to pass parameters into the data
we need to pass parameters into the data set why because we need to have one data
set why because we need to have one data set for all the CSP files like one data
set for all the CSP files like one data set to pick all the files let me show
set to pick all the files let me show you
you uh oh yeah so one data set to pick all
uh oh yeah so one data set to pick all the files instead of creating multiple
the files instead of creating multiple data sets that's called the
data sets that's called the parameterization so simply create one
parameterization so simply create one data set for all the files and now we
data set for all the files and now we will be using parameters let's see how
will be using parameters let's see how we can do that simply click on plus new
we can do that simply click on plus new and in our case the data is in HTTP
and in our case the data is in HTTP Source okay perfect and format is CSV
Source okay perfect and format is CSV simple now we need to name it so we can
simple now we need to name it so we can say
say DS get
DS get Dynamic simple Now link service will be
Dynamic simple Now link service will be same why let me tell you the reason if
same why let me tell you the reason if you click on any file let's say let's
you click on any file let's say let's open this returns. CSV and click on Raw
open this returns. CSV and click on Raw so in every file our base URL will
so in every file our base URL will remain the same till do this will remain
remain the same till do this will remain the same because this is the connection
the same because this is the connection to my GitHub account but rest of the
to my GitHub account but rest of the things will be changing like returns
things will be changing like returns this thing Adventure Works return. CSV
this thing Adventure Works return. CSV products. CSV so relative URL will be
products. CSV so relative URL will be changing so that's why we will not be
changing so that's why we will not be creating a linked service in a
creating a linked service in a parameterized manner we will just create
parameterized manner we will just create be creating data set in a parameterized
be creating data set in a parameterized way so let's create let's pick the
way so let's create let's pick the existing link service now it is asking
existing link service now it is asking us to provide relative URL we know we
us to provide relative URL we know we know this URL see I can just simply go
know this URL see I can just simply go and paste this URL but I do not want to
and paste this URL but I do not want to do that why
do that why now you get it now you get it we want to
now you get it now you get it we want to pass the parameter so that we can just
pass the parameter so that we can just pass different different parameter
pass different different parameter different different URL for that
different different URL for that parameter so simply click on Advance tab
parameter so simply click on Advance tab to create the parameter inside our data
to create the parameter inside our data set okay simply click on Advanced click
set okay simply click on Advanced click on open this data set so this will show
on open this data set so this will show us a window where we have a detailed
us a window where we have a detailed configuration for data set so here
configuration for data set so here instead of hardcoding the value I will
instead of hardcoding the value I will be
be using parameter how just click on this
using parameter how just click on this box you will see this tab an option add
box you will see this tab an option add Dynamic content so when we click on this
Dynamic content so when we click on this thing it opens up a different window
thing it opens up a different window where we can use system variables
where we can use system variables functions parameters so many other
functions parameters so many other things so simply click on this add
things so simply click on this add Dynamic content and now as you can see
Dynamic content and now as you can see we have the option to create
we have the option to create parameters now you're getting my thing
parameters now you're getting my thing Okay click on it and click on this plus
Okay click on it and click on this plus sign and now we will create a new
sign and now we will create a new parameter and let's say I want to name
parameter and let's say I want to name it as
it as p and relative
p and relative URL simple and type a string and we do
URL simple and type a string and we do not need to pass in a default value so
not need to pass in a default value so let's keep it simple and just save
let's keep it simple and just save it and click okay and in before clicking
it and click okay and in before clicking okay now we need to obviously give
okay now we need to obviously give something to to our box so we have just
something to to our box so we have just created this parameter we will pick this
created this parameter we will pick this as you can see it has automatically
as you can see it has automatically populated this variable so this is a
populated this variable so this is a kind of syntax that we use if we want to
kind of syntax that we use if we want to pass the parameter add theate and then
pass the parameter add theate and then the data set do parameter name click
the data set do parameter name click okay now now you can
okay now now you can see this box has parameter in it instead
see this box has parameter in it instead of hardcoded value so this is a
of hardcoded value so this is a parameterized data set okay okay so now
parameterized data set okay okay so now this data set is completed now let's go
this data set is completed now let's go back to our data pipeline in which we
back to our data pipeline in which we have copy activity and our source is
have copy activity and our source is done is it done no see it is asking us
done is it done no see it is asking us hey you have provided me a parameter I
hey you have provided me a parameter I accepted it but what is the value of
accepted it but what is the value of that parameter we will say bro wait
that parameter we will say bro wait because we will be using a loop to pass
because we will be using a loop to pass the value to this parameter simple now
the value to this parameter simple now you are getting the things clear in your
you are getting the things clear in your head so let's click on sing and do the
head so let's click on sing and do the same thing try to complete this on your
same thing try to complete this on your own let's do it if you feel stuck then
own let's do it if you feel stuck then only watch this thing or you can say
only watch this thing or you can say this part of the video because in next
this part of the video because in next few minutes we will be just doing the
few minutes we will be just doing the same thing so just try to build on your
same thing so just try to build on your own and then just refer this part uh and
own and then just refer this part uh and simply select data L Gen 2 why because
simply select data L Gen 2 why because we will be using data L to push the data
we will be using data L to push the data click on continue okay and data is
click on continue okay and data is stored in the CSV format yes click on
stored in the CSV format yes click on it then we need to just name it so I
it then we need to just name it so I will say
will say DS and let's say sync simple and let me
DS and let's say sync simple and let me add Dynamic to make
add Dynamic to make it unique and Link service will be same
it unique and Link service will be same because our connection data L Remains
because our connection data L Remains the Same we want to push the data to the
the Same we want to push the data to the same data l so just pick the same data L
same data l so just pick the same data L now it is asking us to provide the path
now it is asking us to provide the path so as we just discussed that we will be
so as we just discussed that we will be using two parameters one for folder
using two parameters one for folder second for file so instead of choosing
second for file so instead of choosing the files from this browse button we
the files from this browse button we will be using parameters so rules are
will be using parameters so rules are same just click on this Advanced button
same just click on this Advanced button click on this open this data set and now
click on this open this data set and now first of all just write here bronze why
first of all just write here bronze why because our container is same then
because our container is same then directory we need to create a parameter
directory we need to create a parameter for it let's create one click on ADD
for it let's create one click on ADD Dynamic content and click on plus button
Dynamic content and click on plus button and let's say
and let's say p uh sync
p uh sync folder perfect default value nothing
folder perfect default value nothing save and just pick this parameter and
save and just pick this parameter and fill it here as you can see now we have
fill it here as you can see now we have a parameter there same thing with file
a parameter there same thing with file name as well so so create a parameter
name as well so so create a parameter click on ADD Dynamic content click on
click on ADD Dynamic content click on plus sign as you can see we already have
plus sign as you can see we already have one parameter because we just created it
one parameter because we just created it so let's create another one we will say
so let's create another one we will say p and file
p and file name p file
name p file name oops and default value nothing
name oops and default value nothing click on Save and this time pick P file
click on Save and this time pick P file name simple so now our sync data set has
name simple so now our sync data set has two parameters one is this one one is
two parameters one is this one one is this one
this one sorted yes sorted sir sorted now I want
sorted yes sorted sir sorted now I want to just go back to my copy activity and
to just go back to my copy activity and I want to see like how it
I want to see like how it looks okay so now you can see our source
looks okay so now you can see our source is done our sync is done but it is
is done our sync is done but it is parameterized this time it is
parameterized this time it is parameterized in our source it is asking
parameterized in our source it is asking us to provide value for this parameter
us to provide value for this parameter in the SN it is asking us to provide
in the SN it is asking us to provide value for these two par so in total we
value for these two par so in total we need to provide value for values for
need to provide value for values for three parameters sorted now you will be
three parameters sorted now you will be saying like hey where is our for each
saying like hey where is our for each activity for Loop activity let me show
activity for Loop activity let me show you and that is the game changer because
you and that is the game changer because we will be feeding these values using
we will be feeding these values using for each
for each activity simple simple so let me drag
activity simple simple so let me drag that for each activity how you can do it
that for each activity how you can do it so simply come on this area like
so simply come on this area like activities first of all remove this cop
activities first of all remove this cop so this is the area that you should see
so this is the area that you should see in your canvas and click on this
in your canvas and click on this iterations and conditionals so this is
iterations and conditionals so this is basically area where you can just see
basically area where you can just see all the activities related to conditions
all the activities related to conditions and iterations and we want to use for
and iterations and we want to use for each simple so click on it drag it here
each simple so click on it drag it here let's name let's name it as like for
let's name let's name it as like for each get simple now click on the
each get simple now click on the settings button okay now here you can
settings button okay now here you can see a small checkbox you can check it I
see a small checkbox you can check it I always do because I like to keep my
always do because I like to keep my things in sequence uh it looks sorted
things in sequence uh it looks sorted then here comes the items part now
then here comes the items part now you'll be seeing hey what we need to
you'll be seeing hey what we need to pass in items so
pass in items so bro you need to give something to run a
bro you need to give something to run a loop right you need to pass a list maybe
loop right you need to pass a list maybe you need to pass an array anything but
you need to pass an array anything but you need to do something so that this
you need to do something so that this activity can use entities to just
activity can use entities to just iterate through each each entity or
iterate through each each entity or element so for that we need to create an
element so for that we need to create an array in which we will be passing these
array in which we will be passing these parameters in the form of maybe you can
parameters in the form of maybe you can say dictionaries so one element for like
say dictionaries so one element for like one dictionary and then second element
one dictionary and then second element for second dictionary and how we can do
for second dictionary and how we can do it we can simply create a Json file for
it we can simply create a Json file for that yes you heard it right do not worry
that yes you heard it right do not worry at all because I will provide that Json
at all because I will provide that Json file in the description and you can just
file in the description and you can just download it from there and and you can
download it from there and and you can use it but still I will show you how you
use it but still I will show you how you can create one because it is really
can create one because it is really important in the real world scenarios
important in the real world scenarios you will be creating your own Json file
you will be creating your own Json file so let's see so I will simply navigate
so let's see so I will simply navigate to vs
code and okay I already have one dumy Jason okay let's create
Jason okay let's create one and let me say get
one and let me say get dojason so now I need to pass an array
dojason so now I need to pass an array so how we can do that first of all
so how we can do that first of all create a list empty list hit enter and
create a list empty list hit enter and Within These list within this list we
Within These list within this list we need to pass our parameters we need to
need to pass our parameters we need to pass the values for our parameters what
pass the values for our parameters what do I mean so let's say our first set of
do I mean so let's say our first set of iteration should have three values right
iteration should have three values right for one file it's relative URL folder
for one file it's relative URL folder file done one file is done then second
file done one file is done then second iteration second iteration relative URL
iteration second iteration relative URL folder file done so let me show you how
folder file done so let me show you how you can create simply create a
you can create simply create a dictionary and just enter key value
dictionary and just enter key value pairs so first of all I will say p
pairs so first of all I will say p relative
relative URL okay this is my first element of
URL okay this is my first element of first entity P relative URL and I just
first entity P relative URL and I just need to copy the relative URL from here
need to copy the relative URL from here and let's go back
and this is the relative URL simple I will just copy
URL simple I will just copy it and I will paste it
it and I will paste it here
here oops oops I think I didn't copy it
oops oops I think I didn't copy it correctly okay let me do that
correctly okay let me do that again y it's done so this is my P
again y it's done so this is my P relative URL for my first entry then I
relative URL for my first entry then I will create second one that is p what
will create second one that is p what was
was that syn folder
that syn folder yeah and this time we need to create
yeah and this time we need to create folder and I can say if we are saving
folder and I can say if we are saving returns data then I can just say
returns simple and then I need to create my third parameter which is p sync file
my third parameter which is p sync file and this time I can say returns. CSV so
and this time I can say returns. CSV so this is my one set of value this is my
this is my one set of value this is my one iteration second time I will repeat
one iteration second time I will repeat these steps I will simply copy it I will
these steps I will simply copy it I will just repeat these
just repeat these steps now I know you are getting the
steps now I know you are getting the things clear so this is done for one
things clear so this is done for one thing so let's say this is our one
thing so let's say this is our one entity having all the three parameters
entity having all the three parameters one data is migrated second data this is
one data is migrated second data this is this is our second data I will just
this is our second data I will just change
change the
the uh relative URL just now I'm just
uh relative URL just now I'm just explaining you the concept second data
explaining you the concept second data then
then third then fourth then fifth so it will
third then fourth then fifth so it will be feeding these three parameters to
be feeding these three parameters to that activity and it will be dynamically
that activity and it will be dynamically pushing that data to our aure this is
pushing that data to our aure this is the architecture I know now it seems
the architecture I know now it seems easy right so but we need to complete it
easy right so but we need to complete it let's complete it first
let's complete it first and run our pipeline without any error
and run our pipeline without any error then we will celebrate it together right
then we will celebrate it together right so let me just quickly complete this
so let me just quickly complete this Json file and then I will just show you
Json file and then I will just show you how you can just import this file into
how you can just import this file into Azure and then we will just use this to
Azure and then we will just use this to feed our for each activity let's see so
feed our for each activity let's see so I have completed all the entities as you
I have completed all the entities as you can see and now it's time to pass this
can see and now it's time to pass this jsn file to a y so let me first save it
jsn file to a y so let me first save it and upload it to a y
and upload it to a y because we need to pull this data
because we need to pull this data directly from the data lake so I will
directly from the data lake so I will simply go to
simply go to Google and obviously to my Azure account
Google and obviously to my Azure account so this is my Azure portal and here is
so this is my Azure portal and here is my storage account so simply go to
my storage account so simply go to Containers Tab and we will be creating
Containers Tab and we will be creating uh one new container for parameters file
uh one new container for parameters file so let's create one and let's say
parameters simple and create one so so in this particular parameters container
in this particular parameters container we will be uploading our file simple so
we will be uploading our file simple so this is my g. Json file so I have
this is my g. Json file so I have uploaded it you can just download it and
uploaded it you can just download it and you can also upload it and if you want
you can also upload it and if you want to create your own you can try so click
to create your own you can try so click on upload and now it will be uploaded so
on upload and now it will be uploaded so now let's jump to the aure data Factory
now let's jump to the aure data Factory so how we can pull this data in aor so
so how we can pull this data in aor so now you will be learning a new activity
now you will be learning a new activity called lookup activ
called lookup activ what is this and why we use it if we
what is this and why we use it if we want to use the output of a file if we
want to use the output of a file if we want to just have a look we should use
want to just have a look we should use lookup activity it will just give us the
lookup activity it will just give us the output of the data and let me drag the
output of the data and let me drag the lookup activity just search lookup drag
lookup activity just search lookup drag it here and I will say
it here and I will say lookup get simple and in settings
lookup get simple and in settings obviously we should have a data set if
obviously we should have a data set if we want to pull any data we should have
we want to pull any data we should have a data set for it let's create one and
a data set for it let's create one and click on plus new our data is in a Y
click on plus new our data is in a Y data
data link continue and this time our data is
link continue and this time our data is Json not CSV it's Json click on
Json not CSV it's Json click on continue uh name will be you can say
continue uh name will be you can say DSG
DSG parameters simple link service will be
parameters simple link service will be same because our storage account is same
same because our storage account is same so file path now we can just drag it
so file path now we can just drag it using this browse button and we know our
using this browse button and we know our file is in the parameters container and
file is in the parameters container and this is the file simple click okay and
this is the file simple click okay and this will just pull the
this will just pull the data simple now one thing one thing one
data simple now one thing one thing one thing is very important for this
thing is very important for this particular activity that can be asked in
particular activity that can be asked in your interview questions as well so as
your interview questions as well so as we know that we have at least I think 8
we know that we have at least I think 8 to 10 entities right but by default this
to 10 entities right but by default this activity keeps the in as first row only
activity keeps the in as first row only so it will just return one entity just
so it will just return one entity just one entity by default so if you will
one entity by default so if you will pass this in your Loop it will just run
pass this in your Loop it will just run for one time only and we do not want
for one time only and we do not want that we want to run through all the
that we want to run through all the values so we need to uncheck this box
values so we need to uncheck this box just keep this thing in your mind this
just keep this thing in your mind this is really important so just uncheck
is really important so just uncheck it now you are good so if you want to
it now you are good so if you want to see the output like how it looks I want
see the output like how it looks I want to show you so before running anything
to show you so before running anything just run this particular activity and
just run this particular activity and how to ignore other ones click on the
how to ignore other ones click on the other one other activities whatever you
other one other activities whatever you have click on General and click on
have click on General and click on activity State as deactivated so if you
activity State as deactivated so if you have activity in your canvas and you do
have activity in your canvas and you do not want to run it simply click on
not want to run it simply click on deactivate so it will just deactivate it
deactivate so it will just deactivate it see the gray sign so now it will not be
see the gray sign so now it will not be running simple do the the same thing
running simple do the the same thing with this as
with this as well deactivate so now it will just run
well deactivate so now it will just run this particular activity and I want to
this particular activity and I want to show you the output of this because that
show you the output of this because that is required to be passed in this
is required to be passed in this particular activity for each activity
particular activity for each activity because we are referring this activity
because we are referring this activity so let me debug it and let me show
so let me debug it and let me show you I know things are getting clear more
you I know things are getting clear more and more with the time so just be with
and more with the time so just be with this part for few more minutes and
this part for few more minutes and everything will be clear crystal clear
everything will be clear crystal clear simple because this is the major part
simple because this is the major part and I also faced much difficulties like
and I also faced much difficulties like when I was learning so do not worry at
when I was learning so do not worry at all you can just rewatch this part again
all you can just rewatch this part again and again but once you master this thing
and again but once you master this thing trust me your ADF skills G A Skyrocket
trust me your ADF skills G A Skyrocket so it has succeeded it has succeeded let
so it has succeeded it has succeeded let me show you the output of it how you can
me show you the output of it how you can uh just have a look on the output just
uh just have a look on the output just click on this button the exit button the
click on this button the exit button the little one click on
little one click on it now now you can see this is the
it now now you can see this is the output see now under the value tab the
output see now under the value tab the value key we have all the required
value key we have all the required information we want this is the thing
information we want this is the thing that we want to use because what we'll
that we want to use because what we'll be doing as you can clearly see see let
be doing as you can clearly see see let me just show you this is a list and we
me just show you this is a list and we want to pass a list in our for each
want to pass a list in our for each activity so what we will be doing just
activity so what we will be doing just comment down if you
comment down if you know yes you guys did right so we will
know yes you guys did right so we will be using this value
be using this value button this value key this
button this value key this value master key and this will be used
value master key and this will be used as an array that will'll be passing in
as an array that will'll be passing in this for each activity so it will just
this for each activity so it will just iterate through these entities that it
iterate through these entities that it has like P relative URL then you can say
has like P relative URL then you can say sync folder sync file that is one entity
sync folder sync file that is one entity then other one then next one then next
then other one then next one then next one so it will be used as an array this
one so it will be used as an array this output so how we can do it it is very
output so how we can do it it is very simple let's close this and just
simple let's close this and just activate this okay so if we want to
activate this okay so if we want to refer any activity like this one with
refer any activity like this one with any other activity so we need to select
any other activity so we need to select the nodes these are the nodes like skip
the nodes these are the nodes like skip succeed failed and just s like completed
succeed failed and just s like completed so we will be using succeeded if this
so we will be using succeeded if this activity succeeded connect to this and
activity succeeded connect to this and now it's time to put the items items
now it's time to put the items items items so under the items we will write
items so under the items we will write what we will write we will write Dynamic
what we will write we will write Dynamic content why because this is an this is a
content why because this is an this is a kind of system variable that we are
kind of system variable that we are using a system function we are not
using a system function we are not hardcoding anything so just click on ADD
hardcoding anything so just click on ADD Dynamic content and just pick activity
Dynamic content and just pick activity outputs because we want to use the
outputs because we want to use the output of a of an activity as our items
output of a of an activity as our items so in our scenario we have three
so in our scenario we have three different activities but what we want
different activities but what we want yes you guessed it right lookup Activity
yes you guessed it right lookup Activity Click on this oops Dynamic content click
Click on this oops Dynamic content click on this now you just need to add one
on this now you just need to add one small thing and that was the reason I
small thing and that was the reason I showed you the output if I will rout if
showed you the output if I will rout if if I will write here output it will not
if I will write here output it will not do the work why because outsell uh
do the work why because outsell uh output didn't had didn't have all the
output didn't had didn't have all the information that we needed because we
information that we needed because we want to just pass the array and array
want to just pass the array and array was stored inside the value key just
was stored inside the value key just value key it had so many keys but we
value key it had so many keys but we just want value key so we will write
just want value key so we will write output.
output. value this was the catch and this was
value this was the catch and this was another interview question
another interview question because these are small small things and
because these are small small things and you cannot like find these things like
you cannot like find these things like in the you can say PDFs that are
in the you can say PDFs that are available for interview questions and
available for interview questions and all because these are like scenario
all because these are like scenario based questions you learn when you
based questions you learn when you actually do it so we will be using dot
actually do it so we will be using dot value click on okay now it is getting
value click on okay now it is getting the array from this output it is done it
the array from this output it is done it is done now rest of the steps are really
is done now rest of the steps are really really easy because now we just need to
really easy because now we just need to pick this copy activity and we just need
pick this copy activity and we just need to embed this copy activity within for
to embed this copy activity within for each and then we will be passing this
each and then we will be passing this Loop information within those parameters
Loop information within those parameters simple so let me show you how you can do
simple so let me show you how you can do it first of all click on control X so it
it first of all click on control X so it will just cut the activity it is simple
will just cut the activity it is simple contrl x contrl c contrl v don't make it
contrl x contrl c contrl v don't make it complicated click on for each activity
complicated click on for each activity and as you can see here activities tab
and as you can see here activities tab so it is ask us to provide some
so it is ask us to provide some activities that we need to perform under
activities that we need to perform under each iteration click on
each iteration click on it click on activities and click on this
it click on activities and click on this pencil sign this
pencil sign this one and
one and just it will just open a new canvas and
just it will just open a new canvas and now we
now we are inside the for each
are inside the for each activity yes so now we will simply paste
activity yes so now we will simply paste our copy activity and simple now we have
our copy activity and simple now we have source
source yes we have Source we need to give the
yes we have Source we need to give the value of relative URL yes now we have
value of relative URL yes now we have the value because now values are coming
the value because now values are coming from this for Loop this for Loop so how
from this for Loop this for Loop so how we can do it simply click on it again
we can do it simply click on it again Dynamic content because we are using
Dynamic content because we are using parameters click on it and then you just
parameters click on it and then you just need to use the item so as we just
need to use the item so as we just discussed let me again show you this is
discussed let me again show you this is one item this complete dictionary is one
one item this complete dictionary is one item okay and every time it will be be
item okay and every time it will be be just having different different items 1
just having different different items 1 2 3 4 5 so it was just a quick
2 3 4 5 so it was just a quick recap so we'll be using
recap so we'll be using item then item has three parameters but
item then item has three parameters but for Source what is the name of our key
for Source what is the name of our key it is dot P
it is dot P relative URL simple simple simple simple
relative URL simple simple simple simple now you got it click on okay this is
now you got it click on okay this is done click on sync same thing just just
done click on sync same thing just just do it on your own just do it on your own
do it on your own just do it on your own on click on sync and for sync folder
on click on sync and for sync folder same thing activity item item has three
same thing activity item item has three parameters which parameter we need to
parameters which parameter we need to pick which key we need to pick yes it's
pick which key we need to pick yes it's p sync folder now you getting things
p sync folder now you getting things right now the third one file name same
right now the third one file name same thing we want to pick the whole item
thing we want to pick the whole item within those item within that item we
within those item within that item we have three parameters which parameter we
have three parameters which parameter we need to pick it is dot
need to pick it is dot B file
B file name
name simple simple simple simple this was
simple simple simple simple this was the end of the solution that we were
the end of the solution that we were expected to
expected to build just rewatch this part at least
build just rewatch this part at least three times even if you have grasped all
three times even if you have grasped all the knowledge just watch it to have a
the knowledge just watch it to have a quick revision so that you can absorb
quick revision so that you can absorb all the knowledge now we are all set to
all the knowledge now we are all set to run our Pipeline and we want to get it
run our Pipeline and we want to get it succeeded and without any errors and
succeeded and without any errors and once it is done we will celebrate
once it is done we will celebrate together don't worry just click on debug
together don't worry just click on debug button
button who click on it and just
who click on it and just fingers
fingers crossed crossed crossed
crossed crossed crossed crossed and now let's see what it
crossed and now let's see what it gives and TR trust me once you are
gives and TR trust me once you are familiar with this concept then you can
familiar with this concept then you can handle any you can say real time
handle any you can say real time scenario any kind of question because
scenario any kind of question because this involves logic building this
this involves logic building this involves deep understanding
involves deep understanding Oo we see so many crosses okay this was
Oo we see so many crosses okay this was expected no I'm not
expected no I'm not kidding okay what was the reason behind
kidding okay what was the reason behind the
the failure so yes yes yes because we do not
failure so yes yes yes because we do not have the P file name in the uh Json file
have the P file name in the uh Json file so why did it intentionally so so sorry
so why did it intentionally so so sorry for that but I wanted to explain it so
for that but I wanted to explain it so the thing is as you can see in the Json
the thing is as you can see in the Json file this one we
file this one we have name pync file but our parameter
have name pync file but our parameter has a different name so we can keep
has a different name so we can keep names different it doesn't matter if
names different it doesn't matter if we have a parameter with a different
we have a parameter with a different name and we have a key with different
name and we have a key with different name we can use it we can give any key
name we can use it we can give any key to it so let let's correct it and then
to it so let let's correct it and then just see all the greens instead of red
just see all the greens instead of red so I hope you understood the scenario so
so I hope you understood the scenario so just open your copy activity this is
just open your copy activity this is your copy activity and instead of P
your copy activity and instead of P filim you need to give e sync file
filim you need to give e sync file something like that because that is our
something like that because that is our that is the thing that we are receiving
that is the thing that we are receiving from the for each items so this was the
from the for each items so this was the confusion that many candidates face so I
confusion that many candidates face so I just did it intentionally so so sorry
just did it intentionally so so sorry for that but it is for your
for that but it is for your betterment now you know these kinds of
betterment now you know these kinds of Errors can occur and it is not
Errors can occur and it is not compulsory that we need to keep the
compulsory that we need to keep the parameter name and key name exact same
parameter name and key name exact same no not at all so just write
no not at all so just write pync and then file Sim
okay let's go here and now let's run this now it should be running fine and
this now it should be running fine and if now error Comes This is not
if now error Comes This is not intentional then we need to just de
intentional then we need to just de it let's see let's see let's see errors
it let's see let's see let's see errors are the part of learning if you are not
are the part of learning if you are not seeing any errors while you are learning
seeing any errors while you are learning that means you are not learning you are
that means you are not learning you are just copy basting it when you try to
just copy basting it when you try to build your own own Logics when you just
build your own own Logics when you just try to use your own brain you face
try to use your own brain you face errors and once you know how to just
errors and once you know how to just overcome the errors you become better
overcome the errors you become better and better I love seeing errors trust me
and better I love seeing errors trust me so don't worry at all if we see errors
so don't worry at all if we see errors right now as well we will debug it we
right now as well we will debug it we will debug it bro don't worry at
will debug it bro don't worry at all let's see let's see let's
all let's see let's see let's see
see H okay I can see two greens let me see
H okay I can see two greens let me see more greens more and more
greens I was really really scared because we obviously need to
scared because we obviously need to debug it if we see and hear it and
debug it if we see and hear it and because last time was like a national
because last time was like a national but this is not
intentional let's see but trust me once you are familiar
see but trust me once you are familiar with this why I'm repeating this again
with this why I'm repeating this again and again because this is really crucial
and again because this is really crucial when I was learning it so I thought to
when I was learning it so I thought to skip it let's let's do it later but I
skip it let's let's do it later but I realize it later because these things
realize it later because these things are really important and these are the
are really important and these are the things that make you a pro and that is
things that make you a pro and that is the reason that you are doing realtime
the reason that you are doing realtime scenarios that you are doing realtime
scenarios that you are doing realtime projects end to end projects you did
projects end to end projects you did your tutorials you completed your
your tutorials you completed your tutorials now it's time to just gain the
tutorials now it's time to just gain the pro level skills so in my channel you
pro level skills so in my channel you can expect these kind of scenarios these
can expect these kind of scenarios these kind of projects tutorials
kind of projects tutorials because I love messing with my projects
because I love messing with my projects so you will learn a lot so I think that
so you will learn a lot so I think that is a signal to just hit that subscribe
is a signal to just hit that subscribe button it is for your betterment because
button it is for your betterment because this channel is full of Knowledge full
this channel is full of Knowledge full of support to as your data engineering
of support to as your data engineering data breaks end to end projects
data breaks end to end projects tutorials and much more and we'll be
tutorials and much more and we'll be having fun like this because I do not
having fun like this because I do not like boring classes because I was a kind
like boring classes because I was a kind of student who was studious but a back
of student who was studious but a back bencher at the same time so you can just
bencher at the same time so you can just imagine and I like having that
imagine and I like having that environment when I I study so don't mind
environment when I I study so don't mind me and I can see many greens just one
me and I can see many greens just one more activity is pending let's see the
more activity is pending let's see the status and just tell me that tell me in
status and just tell me that tell me in the comment section what color are you
the comment section what color are you seeing just be honest is it red or green
seeing just be honest is it red or green just be
just be honest just to be honest because I saw
honest just to be honest because I saw many red ER oh one more succeeded
okay and after this our first phase of the project is completed then we will be
the project is completed then we will be jumping to the next phase that is data
jumping to the next phase that is data breaks and I hope you learned a lot
breaks and I hope you learned a lot bro everything is succeeded see all the
bro everything is succeeded see all the greens all the greens I love
greens all the greens I love it I love it so now it's time to just
it I love it so now it's time to just validate it how we can validate it we
validate it how we can validate it we will just go to bronze layer and we
will just go to bronze layer and we should see all the folders let me go to
should see all the folders let me go to my bronze layer this is my home tab this
my bronze layer this is my home tab this is my storage account let me go to
is my storage account let me go to containers and then within that
containers and then within that container we have a bronze
container we have a bronze container I have all the folders bro I I
container I have all the folders bro I I I have all the folders as you can see
I have all the folders as you can see admin in those folders We should have
admin in those folders We should have file like CSV file as you can say
file like CSV file as you can say calendar. CS V customer. CSV like all
calendar. CS V customer. CSV like all the CSV files so now it's time to
the CSV files so now it's time to celebrate not that kind of Celebration
celebrate not that kind of Celebration just a healthy congratulation to
just a healthy congratulation to everyone who have succeeded the stage
everyone who have succeeded the stage and you have learned just trust me you
and you have learned just trust me you have learned a lot in ADF so far if you
have learned a lot in ADF so far if you have built this Dynamic pipeline so just
have built this Dynamic pipeline so just tap your back good job good job so now
tap your back good job good job so now our first phase of project is completed
our first phase of project is completed is done now we are entering into into
is done now we are entering into into the phase two of the project why I'm
the phase two of the project why I'm doing this so now we are entering into
doing this so now we are entering into the phase two of the projects so are you
the phase two of the projects so are you excited to use data brakes so let me
excited to use data brakes so let me show you how you can use data Brakes in
show you how you can use data Brakes in your project let's see so now it's time
your project let's see so now it's time to create another resource that is aure
to create another resource that is aure data breaks show some excitement for
data breaks show some excitement for aure data
aure data breaks your AIC is like really in demand
breaks your AIC is like really in demand so that's why I'm saying bro so without
so that's why I'm saying bro so without wasting time let's get started with the
wasting time let's get started with the ja data braks so click on home button
ja data braks so click on home button and obviously now you know the step
and obviously now you know the step click on plus resource and now this time
click on plus resource and now this time just type data
just type data bricks okay search it
bricks okay search it and you should pick this data brakes
and you should pick this data brakes that is from Microsoft oops let me just
that is from Microsoft oops let me just retype it this skape button is like hell
retype it this skape button is like hell Okay click on create this
Okay click on create this one and again same steps pick your
one and again same steps pick your resource Group Resource Group is like aw
resource Group Resource Group is like aw project workspace name I would
project workspace name I would say
say ADB aw
ADB aw project H nice
project H nice name then you need to keep pricing tier
name then you need to keep pricing tier as maybe standard or premium or trial
as maybe standard or premium or trial you can choose anyone if you are using a
you can choose anyone if you are using a free account you can just keep it trial
free account you can just keep it trial because the thing is you do not need to
because the thing is you do not need to keep your data after 14 days right it is
keep your data after 14 days right it is just for practice you can just keep
just for practice you can just keep trial so don't need to worry just keep
trial so don't need to worry just keep anything so I'm keeping like
anything so I'm keeping like premium and manage Resource Group what
premium and manage Resource Group what is this manage Resource Group so the
is this manage Resource Group so the thing is in data breaks we run our all
thing is in data breaks we run our all the Transformations on the Clusters but
the Transformations on the Clusters but we are not the ones who look after the
we are not the ones who look after the Clusters but datab bricks is like
Clusters but datab bricks is like looking after the Clusters so it will
looking after the Clusters so it will create a manage Resource Group where it
create a manage Resource Group where it will just keep all the VMS what are VMS
will just keep all the VMS what are VMS virtual machines vnet what is vnet
virtual machines vnet what is vnet virtual Network so all the cluster
virtual Network so all the cluster configuration will be stored in manage
configuration will be stored in manage Resource Group so just keep just manage
Resource Group so just keep just manage Resource Group is like just a resource
Resource Group is like just a resource Group so I recommend keeping it as like
Group so I recommend keeping it as like the same name as we have for the
the same name as we have for the workspace it is like easily like you can
workspace it is like easily like you can easily find it so do not take much
easily find it so do not take much stress so just simply write
managed and just write ADB and aw project simple then click on next
project simple then click on next networking and just click on next
networking and just click on next encryption click on next security
encryption click on next security applies next next next and review click
applies next next next and review click on review Plus create so it will just
on review Plus create so it will just validate it and then we will create
validate it and then we will create it so click on
it so click on create I was laughing because I was
create I was laughing because I was remembering my childhood days where when
remembering my childhood days where when when I just bought the computer and it
when I just bought the computer and it was the very first computer in my family
was the very first computer in my family so my friends told me if you want to
so my friends told me if you want to install any game just click on next next
install any game just click on next next next and at the end click on install
next and at the end click on install so it was a kind of nostal nostalgia
so it was a kind of nostal nostalgia that hit
that hit because
because we do not like we haven't like read
we do not like we haven't like read anything which was written on those
anything which was written on those games instructions and all we just used
games instructions and all we just used to hit next next next next next and at
to hit next next next next next and at the end
the end install then we used to play computer
install then we used to play computer games without
internet okay so now it is being deployed and once it is deployed then
deployed and once it is deployed then it's time to actually
it's time to actually go deeper inside data
go deeper inside data braks let's wait it will take just few
braks let's wait it will take just few more seconds and then it is done and
more seconds and then it is done and then we will be just creating our first
then we will be just creating our first cluster notebook workspace everything
cluster notebook workspace everything everything from
everything from scratch everything from
scratch uh uh uh and after covering this part you can
uh and after covering this part you can easily mention you can confidently
easily mention you can confidently mention data breaks in your resume
mention data breaks in your resume because after that you will be much
because after that you will be much familiar with data breaks and I will
familiar with data breaks and I will just show you some of the
just show you some of the Transformations as well so that you can
Transformations as well so that you can confidently sit in the interviews and
confidently sit in the interviews and you can confidently answer hey I I know
you can confidently answer hey I I know like how to perform these
like how to perform these Transformations it's just the game of
Transformations it's just the game of confidence that's it no one knows
confidence that's it no one knows everything no one knows everything even
everything no one knows everything even the interviewer and you are also a human
the interviewer and you are also a human just be confident in the interviews
just be confident in the interviews that's it if you do not know anything
that's it if you do not know anything just say I don't know this thing but I
just say I don't know this thing but I can still solve this thing because I
can still solve this thing because I have like that skill to debug anything
have like that skill to debug anything because in the real world scenario you
because in the real world scenario you will not be just remembering everything
will not be just remembering everything and you will be using your brain like in
and you will be using your brain like in most of the scenarios no you will be
most of the scenarios no you will be using online resources you will be using
using online resources you will be using documentation so you can just argue with
documentation so you can just argue with the interviewer hey I don't know but
the interviewer hey I don't know but still if you just allow me I can just go
still if you just allow me I can just go on Google and I can just give you the
on Google and I can just give you the solution like
solution like this it's all about presenting yourself
this it's all about presenting yourself it's all about
confidence okay stop your philosophy and just start
okay stop your philosophy and just start working on data PRS yep I know you are
working on data PRS yep I know you are saying this click on go to Resource and
saying this click on go to Resource and then let's see what is this thing then
then let's see what is this thing then again same thing we need to just launch
again same thing we need to just launch this workspace click on launch
this workspace click on launch workspace H
okay okay so now it will just take us to the datab bricks workspace wo so this is
the datab bricks workspace wo so this is the UI by the way they have just updated
the UI by the way they have just updated it so it used to be black
it so it used to be black here like darkish black or darkish gray
here like darkish black or darkish gray but now it's white so they have just
but now it's white so they have just updated it I used to like the previous
updated it I used to like the previous one okay so this is the UI this is the
one okay so this is the UI this is the data braks this is the most in demand
data braks this is the most in demand technology right now so don't worry we
technology right now so don't worry we will be just we will be just having all
will be just we will be just having all the information that is required don't
the information that is required don't worry at all so first of all
worry at all so first of all the very basic thing that is
the very basic thing that is compute because we will be applying
compute because we will be applying Transformations on the data we will be
Transformations on the data we will be processing our data using compute
processing our data using compute compute is nothing just a cluster so
compute is nothing just a cluster so without wasting any time let's create
without wasting any time let's create the cluster first and obviously other
the cluster first and obviously other things are like very common like
things are like very common like workspace workspaces like other like we
workspace workspaces like other like we have just created a folder in your data
have just created a folder in your data Factory so it is the same thing we have
Factory so it is the same thing we have workspaces so it's just naming
workspaces so it's just naming convention meaning is same then we have
convention meaning is same then we have catalog where we just Define the data
catalog where we just Define the data cataloges like external location and all
cataloges like external location and all don't worry about that for now just
don't worry about that for now just ignore then we have workflows if you
ignore then we have workflows if you want to just apply you can say
want to just apply you can say orchestration so we have workflows and
orchestration so we have workflows and then SQL editor queries dashboard alerts
then SQL editor queries dashboard alerts job run Delta life tables forget about
job run Delta life tables forget about that for now just create compute and we
that for now just create compute and we will be just start working with it don't
will be just start working with it don't worry click on compute and then you will
worry click on compute and then you will see this interface obviously you won't
see this interface obviously you won't be having any compute ready so just
be having any compute ready so just click on plus new so click on create
click on plus new so click on create compute now this is the configuration
compute now this is the configuration that we need to complete to create a new
that we need to complete to create a new cluster for our workloads so first of
cluster for our workloads so first of all this is policy and by the way you
all this is policy and by the way you can just change the cluster name as well
can just change the cluster name as well so maybe I can just keep it as aw
so maybe I can just keep it as aw project this is my cluster aw project
project this is my cluster aw project cluster and keep it as
cluster and keep it as unrestricted yep and obviously single
unrestricted yep and obviously single node because we do not want to spend
node because we do not want to spend much and we can easily process this much
much and we can easily process this much of data using single node so just keep
of data using single node so just keep single node okay and xess mode is like
single node okay and xess mode is like single user you can keep it as no
single user you can keep it as no isolation
isolation shared if you
shared if you want and then you need to just pick the
want and then you need to just pick the run time this is the run time okay
run time this is the run time okay so we'll be Pi uh sorry so we will be
so we'll be Pi uh sorry so we will be picking any you can pick any run time
picking any you can pick any run time with long-term support LTS means
with long-term support LTS means long-term support So currently we have
long-term support So currently we have 15.4 LTS maybe free account user will
15.4 LTS maybe free account user will not be having this one so for that
not be having this one so for that reason I'm picking
reason I'm picking 14.3 this one if you have 15.4 you can
14.3 this one if you have 15.4 you can just try with it don't worry it's same
just try with it don't worry it's same and Noe type again we need to pick the
and Noe type again we need to pick the node
node type for the cheapest one I'll be using
type for the cheapest one I'll be using general purpose DS3 V2 in which we get
general purpose DS3 V2 in which we get four cores and it's enough then we have
four cores and it's enough then we have terminate after so it is really
terminate after so it is really important if you want to save cost
important if you want to save cost obviously if you are using free account
obviously if you are using free account you do not need to worry just use it but
you do not need to worry just use it but if you have a paid account like me so it
if you have a paid account like me so it will just automatically terminate the
will just automatically terminate the cluster if if it is not in use like for
cluster if if it is not in use like for like 120 minutes but 120 Minutes is is
like 120 minutes but 120 Minutes is is like Big Time range I will just keep it
like Big Time range I will just keep it as like 20
as like 20 minutes okay then tags I don't want to
minutes okay then tags I don't want to add any tags then I also do not want to
add any tags then I also do not want to use phone acceleration so I will just
use phone acceleration so I will just use basic cluster just click on create
use basic cluster just click on create compute
compute here and that's it click on create
here and that's it click on create compute so now as you can see it is
compute so now as you can see it is creating our compute and you can also
creating our compute and you can also check it using spark UI
check it using spark UI okay it's creating now just
wait and now once our cluster is ready we can actually start working with data
we can actually start working with data PR because cluster is the most important
PR because cluster is the most important thing that we should have already in our
thing that we should have already in our your datab BR account so the thing is
your datab BR account so the thing is the datab BR Community Edition provides
the datab BR Community Edition provides cluster and obviously you can use that
cluster and obviously you can use that as well but obviously not with aor that
as well but obviously not with aor that is just for datab don't worry we will
is just for datab don't worry we will just creating videos for data brakes
just creating videos for data brakes only in future don't worry about that
only in future don't worry about that but this is just for the project and it
but this is just for the project and it is totally a different experience when
is totally a different experience when you actually work with aure storage or
you actually work with aure storage or any cloud storage with data breaks
any cloud storage with data breaks because in uh data breaks you will not
because in uh data breaks you will not be getting that Vibe of connecting your
be getting that Vibe of connecting your data with Cloud pulling that data
data with Cloud pulling that data pushing that data so that's why it is
pushing that data so that's why it is really important to have this exposure
really important to have this exposure as well and when you are just getting
as well and when you are just getting the opportunity to work with the these
the opportunity to work with the these areas for free you should take the
areas for free you should take the advantage
advantage right and here I am your friend your bro
right and here I am your friend your bro to just help you to guide you how you
to just help you to guide you how you can use it because obviously when I was
can use it because obviously when I was learning I was ring so many resources so
learning I was ring so many resources so now it's time to just provide
now it's time to just provide everything in one place yeah simple
everything in one place yeah simple that's the whole intent so first of all
that's the whole intent so first of all congratulations on completing the phase
congratulations on completing the phase one of our project which was
one of our project which was ingesting the data dynamically so you
ingesting the data dynamically so you should be proud of yourself because you
should be proud of yourself because you learned how to build Dynamic data
learned how to build Dynamic data pipelines in a Y data Factory so our
pipelines in a Y data Factory so our phase one is
phase one is completed good job now we need to cover
completed good job now we need to cover the next phase and next phase is all
the next phase and next phase is all about aure data bricks this
about aure data bricks this masterpiece a your data brakes as we all
masterpiece a your data brakes as we all know that a your data Brak is one of the
know that a your data Brak is one of the most in demand technology right now if
most in demand technology right now if we want to deal with big data so we will
we want to deal with big data so we will be learning everything don't worry about
be learning everything don't worry about that and this particular phase of
that and this particular phase of project will be full of data breaks data
project will be full of data breaks data brakes and data brakes because this is
brakes and data brakes because this is the sole tool responsible for data
the sole tool responsible for data Transformations for our project plus it
Transformations for our project plus it is the so tool responsible for data
is the so tool responsible for data transformations in the real world as as
transformations in the real world as as well so I decided to pick this
well so I decided to pick this technology because this really helped me
technology because this really helped me a lot to crack data engineering
a lot to crack data engineering interviews yes so what you can expect in
interviews yes so what you can expect in this particular phase of the project so
this particular phase of the project so what we will be doing we will pick this
what we will be doing we will pick this data which is stored in the row layer
data which is stored in the row layer then we will just ingest this
then we will just ingest this data to Silver layer with some
data to Silver layer with some Transformations this is all about
Transformations this is all about our phase two of the project and this
our phase two of the project and this phase two will be full of fun so don't
phase two will be full of fun so don't worry at all so without wasting any time
worry at all so without wasting any time let's get started because we have
let's get started because we have already created a ja databas workspace
already created a ja databas workspace so I'm really really excited to tell you
so I'm really really excited to tell you ins and outs of all the things that you
ins and outs of all the things that you should know before working with data
should know before working with data breaks yes because there are some things
breaks yes because there are some things which are really important that you
which are really important that you should know before actually working with
should know before actually working with data braks and yes it's very much
data braks and yes it's very much relevant because you cannot start
relevant because you cannot start working with data breakes if you do not
working with data breakes if you do not know this what's that let's cover this
know this what's that let's cover this so I was talking
so I was talking about storage access what's that what's
about storage access what's that what's that what's that don't worry let me
that what's that don't worry let me explain you it's very simple but it's
explain you it's very simple but it's really important because you as a data
really important because you as a data engineer will be
engineer will be working on this part and you should know
working on this part and you should know you should take the ownership of this
you should take the ownership of this part so let's start so as we all know
part so let's start so as we all know that our data is residing in the data l
that our data is residing in the data l sort it yes now we want to work with
sort it yes now we want to work with your data brakes okay sorted but just
your data brakes okay sorted but just tell me one thing this is a resource
tell me one thing this is a resource this thing this is your data lake is a
this thing this is your data lake is a resource which can be used as an
resource which can be used as an independent resource right which is not
independent resource right which is not linked to data
linked to data breaks so how we will be using data
breaks so how we will be using data breaks
breaks to access the data stored in data Lake
to access the data stored in data Lake obviously we need some permissions we
obviously we need some permissions we need some access level
need some access level permissions we need some Authority we
permissions we need some Authority we need some credentials to access the data
need some credentials to access the data stored in the data Lake and how we can
stored in the data Lake and how we can do that here comes the role
do that here comes the role of data access
of data access that is stored in the data L what is it
that is stored in the data L what is it it is a kind of key so what exactly we
it is a kind of key so what exactly we will be doing we will create an
will be doing we will create an application which is known as service
application which is known as service level application so this
level application so this application will have the access to this
application will have the access to this asure data Lake okay and this
asure data Lake okay and this application will be used by data bricks
application will be used by data bricks so let me give you an example let's
so let me give you an example let's suppose you want to enter into a museum
suppose you want to enter into a museum and you should have a ticket to visit
and you should have a ticket to visit that museum right so this application
that museum right so this application will act as a ticket will act as a
will act as a ticket will act as a credential that you should have u means
credential that you should have u means data breaks datab bricks should have
data breaks datab bricks should have this credential and then when this data
this credential and then when this data bricks will be
bricks will be going to a your data Lake and will say
going to a your data Lake and will say hey data Lake I want to access the data
hey data Lake I want to access the data that you have stored so it will say just
that you have stored so it will say just show me the ID card because I cannot
show me the ID card because I cannot allow everyone because I am the safety
allow everyone because I am the safety guard here then it will show this
guard here then it will show this credential you will say see this is my
credential you will say see this is my credential this is my ID card then data
credential this is my ID card then data L will say okay now you can enter into
L will say okay now you can enter into my zone and you can pick the data sorted
my zone and you can pick the data sorted this was the easiest example I can just
this was the easiest example I can just mention it here so this was all about
mention it here so this was all about data access and it's really easy let's
data access and it's really easy let's jump onto the asure portal and let's
jump onto the asure portal and let's actually execute this so this was all
actually execute this so this was all about the architecture or you can say
about the architecture or you can say behind the scenes and you are the owner
behind the scenes and you are the owner of this step so you need to do this so
of this step so you need to do this so let's get started to implement this and
let's get started to implement this and it will be official step towards our
it will be official step towards our phase two of projects so let's get
phase two of projects so let's get started bro so I am in my as your portal
started bro so I am in my as your portal so first thing that you need to do
so first thing that you need to do simply go to your search t Tab and just
simply go to your search t Tab and just search
search Microsoft entra ID see just click on
Microsoft entra ID see just click on this
this one and then you will see an area where
one and then you will see an area where you have different different things such
you have different different things such as users groups external identities
as users groups external identities roles at administrations to not worry
roles at administrations to not worry about that just focus on this part app
about that just focus on this part app registrations because we will be
registrations because we will be registering an application so simp click
registering an application so simp click on it and here you just need to click on
on it and here you just need to click on plus new
plus new registration click on it okay so here we
registration click on it okay so here we just need to name it let's say I want to
just need to name it let's say I want to say aw
say aw projector application or just app simple
projector application or just app simple then you don't need to change anything
then you don't need to change anything just click on register that's it click
just click on register that's it click on register it will create an
on register it will create an application and what we will be doing
application and what we will be doing with an application
with an application let me tell you bro just just hold your
let me tell you bro just just hold your breath just take a deep breath so now
breath just take a deep breath so now our application is
our application is created very good so as we know our
created very good so as we know our second step that we need to assign this
second step that we need to assign this application a role which can contribute
application a role which can contribute to our data Lake right before doing that
to our data Lake right before doing that I want to save this information which
I want to save this information which information this
information this information why because this will be
information why because this will be needed yes because while we will be
needed yes because while we will be pulling this application in datab brex
pulling this application in datab brex we need to just pass these URLs so for
we need to just pass these URLs so for that purpose I need to just save this
that purpose I need to just save this information okay so let me just store
information okay so let me just store this information let me just copy this
this information let me just copy this and let's say let me just open a
and let's say let me just open a notepad and there I can just save it
notepad and there I can just save it let's create a
so I copied this and this is an application ID so I will simply say app
application ID so I will simply say app ID app
ID app ID and equals to this okay then I need
ID and equals to this okay then I need to copy object ID as you can see object
to copy object ID as you can see object ID okay I will simply click on on copy
ID okay I will simply click on on copy to clipboard and then I will say object
to clipboard and then I will say object ID don't worry about this don't worry
ID don't worry about this don't worry about this just copy it just save this
about this just copy it just save this information okay and the third third
information okay and the third third thing is we need to create a secret why
thing is we need to create a secret why because it will be used in datab brakes
because it will be used in datab brakes to pull this information so how we can
to pull this information so how we can create a secret that is the third
create a secret that is the third information that we need simply click on
information that we need simply click on certificates and secrets click on
certificates and secrets click on it uh and obviously new client
it uh and obviously new client secret click on it and give a
secret click on it and give a description like let's say aw
description like let's say aw project simple and it is recommended
project simple and it is recommended that it will be expired in 6 months but
that it will be expired in 6 months but I I believe that my viewers my data
I I believe that my viewers my data lovers my data Community will complete
lovers my data Community will complete this project before six months so do not
this project before six months so do not worry at all just click on
worry at all just click on ADD and here we have the secret
ADD and here we have the secret and we just need value of this secret so
and we just need value of this secret so simply copy the value don't copy the
simply copy the value don't copy the secret ID just copy the value just click
secret ID just copy the value just click on value and just save it and I will
on value and just save it and I will name it as
secret I'm also not sure whether it was value or it was secret ID we will test
value or it was secret ID we will test it don't worry about that because we
it don't worry about that because we just need to store the information let's
just need to store the information let's store this value as well because
store this value as well because precautions are better than cure
precautions are better than cure wow so let's say
secret secret two simple because if that will not work
two simple because if that will not work we will use the second one don't worry
we will use the second one don't worry don't worry don't
don't worry don't worry simple okay so we have done our
worry simple okay so we have done our first step so step number one was
first step so step number one was creating an application let's go to the
creating an application let's go to the Second Step assigning a role to this
Second Step assigning a role to this application so that it can access the
application so that it can access the data later how we can do that simply go
data later how we can do that simply go to your home tab select your storage
to your home tab select your storage account because we need to assign a role
account because we need to assign a role in this storage account so how we can do
in this storage account so how we can do that simply click on access control I am
that simply click on access control I am am so I will click on it then I can see
am so I will click on it then I can see add because I want to add a role so I
add because I want to add a role so I will simply click on ADD add Ro
will simply click on ADD add Ro assignment
assignment simple now there are many roles there
simple now there are many roles there are so many roles so the role that we
are so many roles so the role that we want to apply is storage blob
want to apply is storage blob contributor it gives read and write both
contributor it gives read and write both like you can either write or you can
like you can either write or you can read so it is the best ex that you can
read so it is the best ex that you can give so just search storage blob
give so just search storage blob contributor uh here it is storage blob
contributor uh here it is storage blob data contributor same thing click on
data contributor same thing click on next now it is
next now it is asking select members sock click on
asking select members sock click on select members because now we need to
select members because now we need to select our
select our application okay and now we need to
application okay and now we need to search our application so what we will
search our application so what we will do we will simply search aw project yes
do we will simply search aw project yes here you can see your
here you can see your application simple just pick this
application simple just pick this oops just pick this and
oops just pick this and select and then click on review plus
select and then click on review plus assign click review plus sign again now
assign click review plus sign again now it will be adding a role you can see the
it will be adding a role you can see the notification adding role assignment so
notification adding role assignment so it will take some some time and it takes
it will take some some time and it takes I think 10 to 15 minutes to be applied
I think 10 to 15 minutes to be applied as well I know you can see like it is
as well I know you can see like it is applied but it takes some time don't
applied but it takes some time don't worry we have some work before that so
worry we have some work before that so our two steps are done now comes the
our two steps are done now comes the third step okay and for that we need to
third step okay and for that we need to go to our data brakes that's the
go to our data brakes that's the loveliest part so simply open your your
loveliest part so simply open your your data breakes workspace launch your
data breakes workspace launch your workspace
okay and you already know that we have already created the cluster so that is
already created the cluster so that is the best thing our cluster is ready so
the best thing our cluster is ready so it automat automatically turns off after
it automat automatically turns off after some time so if your cluster is turned
some time so if your cluster is turned off you can turn it on if it is already
off you can turn it on if it is already turned on and it is good no worries at
turned on and it is good no worries at all and now it's time to actually start
all and now it's time to actually start working with your data breaks so are you
working with your data breaks so are you excited to create your first notebook
excited to create your first notebook because I was really excited when I
because I was really excited when I first when I created my first notebook
first when I created my first notebook and I hope that my data lovers will be
and I hope that my data lovers will be happy as well
happy as well so simply click on workspace because
so simply click on workspace because first of all we will create a folder we
first of all we will create a folder we will create a workspace for us simply
will create a workspace for us simply click on workspace and then obviously
click on workspace and then obviously this folder is empty because we do not
this folder is empty because we do not have any workspace click on create in
have any workspace click on create in this corner and we need to create a
this corner and we need to create a folder okay so what name I want to give
folder okay so what name I want to give I will give a w and project simple
I will give a w and project simple create now within this project within
create now within this project within this project folder or you can say aw
this project folder or you can say aw project folder I want to create a
project folder I want to create a notebook then again click on Create and
notebook then again click on Create and this time I will create a
notebook woohooo this is our notebook okay and obviously if you are
notebook okay and obviously if you are familiar with pandas if you're familiar
familiar with pandas if you're familiar with Jupiter you must be familiar with
with Jupiter you must be familiar with this user interface if you are not
this user interface if you are not familiar don't worry this is a kind of
familiar don't worry this is a kind of you can
you can say uh compiler that you would have seen
say uh compiler that you would have seen in vs code so instead of running the
in vs code so instead of running the whole code we run the code in chunks
whole code we run the code in chunks like in the cells so you will get to
like in the cells so you will get to know everything don't worry about that
know everything don't worry about that so first of all I do like renaming my
so first of all I do like renaming my notebook first because that's the best
notebook first because that's the best thing that you can do to navigate your
thing that you can do to navigate your notebooks in the future if you forget to
notebooks in the future if you forget to name it to name it so first of all I
name it to name it so first of all I will say Transformations or let's
will say Transformations or let's say silver
say silver layer silver layer simple
okay silver layer simple so I have renamed my notebook so this is file this
renamed my notebook so this is file this is edit View Run help this is the cell
is edit View Run help this is the cell in which we will be writing our code and
in which we will be writing our code and I know you are familiar with
I know you are familiar with this now first of all I will turn on my
this now first of all I will turn on my cluster because it was turned off so how
cluster because it was turned off so how I can do this you can see a button
I can do this you can see a button called connect I will click on it it
called connect I will click on it it will show all the Clusters that are
will show all the Clusters that are available and and in my case it is just
available and and in my case it is just one so I will just click on it and I
one so I will just click on it and I will say just turn it on so now you can
will say just turn it on so now you can see
see starting so now it will turn on my
starting so now it will turn on my cluster and till the time it is turned
cluster and till the time it is turned on I will just do some headings I will
on I will just do some headings I will just add some headings and I will just
just add some headings and I will just clean up my notebook because I am
clean up my notebook because I am planning to upload this notebook as well
planning to upload this notebook as well in my GitHub GitHub repository so that
in my GitHub GitHub repository so that you guys can refer and you guys can just
you guys can refer and you guys can just keep my notebook in parallel to your
keep my notebook in parallel to your notebook and you can just refer your
notebook and you can just refer your code so it will be good for your
code so it will be good for your learning see uh I do care about you all
learning see uh I do care about you all like so much so first of all I will just
like so much so first of all I will just add a heading how we can add headings
add a heading how we can add headings because we are expected to write the
because we are expected to write the codes so this is the best thing that I
codes so this is the best thing that I do like about notebooks we can just add
do like about notebooks we can just add headings titles as well so how we can do
headings titles as well so how we can do that there are two ways one way is just
that there are two ways one way is just over over your cod cod cell and you will
over over your cod cod cell and you will see an option called python just click
see an option called python just click on it you will see mult multiple things
on it you will see mult multiple things markdown sqls Scala r that means you can
markdown sqls Scala r that means you can use all these languages but wait
use all these languages but wait markdown is not a language when you
markdown is not a language when you click on markdown you will see that now
click on markdown you will see that now you can just put headings so let's say I
you can just put headings so let's say I want to write silver
want to write silver layer
layer script simple and if I want to make it
script simple and if I want to make it bold I can just add hashtags so when I
bold I can just add hashtags so when I put three hashtags that mean it is H3
put three hashtags that mean it is H3 heading in HTML if you are familiar with
heading in HTML if you are familiar with HTML so just if you want to run it
HTML so just if you want to run it simply click shift plus enter plus
simply click shift plus enter plus simple simple see as you can now now as
simple simple see as you can now now as you can see it gave us the heading
you can see it gave us the heading simple so this is our heading and now I
simple so this is our heading and now I will put another heading because I'll be
will put another heading because I'll be putting let's
putting let's say data
loading data loading Okay click shift plus
Okay click shift plus enter another heading so let's make this
enter another heading so let's make this heading little bigger so just reduce two
heading little bigger so just reduce two hashtags see so now it is a H1 heading
hashtags see so now it is a H1 heading so everything is done and now I think my
so everything is done and now I think my starter is about to start sorry my
starter is about to start sorry my cluster is about to start so once it
cluster is about to start so once it starts then we can just start loading
starts then we can just start loading our data and one important thing yes our
our data and one important thing yes our step three is still
step three is still pending now before data loading we need
pending now before data loading we need to create our application in data bricks
to create our application in data bricks so basically we do not create the
so basically we do not create the application we just write some codes in
application we just write some codes in which we pull the credentials of the
which we pull the credentials of the application and we employ those
application and we employ those applications within our code so don't
applications within our code so don't worry you do not need to learn any code
worry you do not need to learn any code so all that code is available in the
so all that code is available in the documentation so you can just need to
documentation so you can just need to copy and paste it let me show you how
copy and paste it let me show you how you can do it first of all I will just
you can do it first of all I will just add one cell here in between so you can
add one cell here in between so you can click on Plus
click on Plus Code and then you can click on markdown
Code and then you can click on markdown again and I will say this
again and I will say this time data
time data Access Data access using application so
Access Data access using application so this time I will hit alt plus enter so
this time I will hit alt plus enter so what it does it will run and it will add
what it does it will run and it will add a cell below that code cell so it's up
a cell below that code cell so it's up to you if you want to hit shift plus
to you if you want to hit shift plus enter if you want to hit all plus enter
enter if you want to hit all plus enter my duty is to tell you each and
my duty is to tell you each and everything so now it's time to just look
everything so now it's time to just look at the documentation let's look at it so
at the documentation let's look at it so in order to get the code simply write on
in order to get the code simply write on Google that access data L using data
Google that access data L using data brakes and just then just search for the
brakes and just then just search for the documentation like this connect to aure
documentation like this connect to aure data Lake gen to blob storage so simply
data Lake gen to blob storage so simply click on it so basically whenever you
click on it so basically whenever you want to access the data we just need to
want to access the data we just need to get the code and we just need to copy it
get the code and we just need to copy it from a documentation because you do not
from a documentation because you do not need to learn all that coding thing
need to learn all that coding thing because it is not required that is just
because it is not required that is just for the purpose of data access that's it
for the purpose of data access that's it so when you just scroll it down you will
so when you just scroll it down you will see your service
see your service principle so this is the thing that we
principle so this is the thing that we were looking for so you simply need to
were looking for so you simply need to use this one like you use the following
use this one like you use the following format to set the cluster configuration
format to set the cluster configuration we already have set the cluster
we already have set the cluster configuration because data does that for
configuration because data does that for us okay so now we just need to copy this
us okay so now we just need to copy this code this code simple so I will click
code this code simple so I will click click on copy and I will go to my data
click on copy and I will go to my data breaks and then I just need to paste it
breaks and then I just need to paste it here like this do not run it do not run
here like this do not run it do not run it wait wait do not run it just paste it
it wait wait do not run it just paste it here because we need to give some values
here because we need to give some values because we just need to replace the
because we just need to replace the values why bro this is a generic code
values why bro this is a generic code and let's say it is asking for C create
and let's say it is asking for C create it is asking for a storage account it is
it is asking for a storage account it is asking for some other things application
asking for some other things application ID so we need to pass our values that we
ID so we need to pass our values that we have so do not worry I will tell you how
have so do not worry I will tell you how you can do that first of all remove this
you can do that first of all remove this one service
one service credential just remove it because we
credential just remove it because we have already saved the credential in our
have already saved the credential in our uh notepad don't worry about don't worry
uh notepad don't worry about don't worry about that first thing storage account
about that first thing storage account we first need to remove this let me just
we first need to remove this let me just highlight it do not make any
highlight it do not make any mistake just remove this storage account
mistake just remove this storage account to your storage account name so our
to your storage account name so our storage account name is you can simply
storage account name is you can simply go to your Azure go to your home Tab and
go to your Azure go to your home Tab and just get the storage account from
just get the storage account from here do not copy and paste my storage
here do not copy and paste my storage account you need to put your storage
account you need to put your storage account
account name so I will simply copy it from here
name so I will simply copy it from here and I will paste it
and I will paste it here control V then again I need to
here control V then again I need to paste it here okay then again I need to
paste it here okay then again I need to paste it here no
paste it here no worries again I need to paste it
worries again I need to paste it [Music]
[Music] here again I need to paste it here okay
here again I need to paste it here okay now along with this we need to pass two
now along with this we need to pass two more things application ID as you
more things application ID as you already know that we saved our data so
already know that we saved our data so application ID is this one you can just
application ID is this one you can just copy your again do not copy my
copy your again do not copy my application ID just use your your
application ID just use your your application ID so we will just replace
application ID so we will just replace it do not worry when we will be just
it do not worry when we will be just filling all this information you can
filling all this information you can pause the video and you can just tally
pause the video and you can just tally everything do not worry do not do not be
everything do not worry do not do not be in a hurry so then we need to pass
in a hurry so then we need to pass directory ID directory ID I think was
directory ID directory ID I think was this one object
ID yes yes yes yes okay if anything fails don't worry
yes okay if anything fails don't worry we will we can just go back and we can
we will we can just go back and we can just check the information again it's
just check the information again it's not like oh we just uh clicked back in
not like oh we just uh clicked back in the application and now we cannot go
the application and now we cannot go back now no it's not like that you can
back now no it's not like that you can just go back just paste it here now the
just go back just paste it here now the service credential so service credential
service credential so service credential is this one let's try this first if it
is this one let's try this first if it fails we can use the second
fails we can use the second one okay okay okay okay okay
one okay okay okay okay okay uh sh so we just need to store it in
uh sh so we just need to store it in double quotes
double quotes simple so now I'm going to run this so
simple so now I'm going to run this so now you can just take the screenshot or
now you can just take the screenshot or you can just pause the video so now I'm
you can just pause the video so now I'm going to run this and let's see if it
going to run this and let's see if it any if it it throws any error we will
any if it it throws any error we will just tackle it together don't worry
just tackle it together don't worry because this is something that we are
because this is something that we are just putting the code that is already
just putting the code that is already there so oh it ran
there so oh it ran successfully wow wow wow wow so what we
successfully wow wow wow wow so what we actually have done it
actually have done it here so we have just employed or you can
here so we have just employed or you can say we have called our application here
say we have called our application here to allow us all allow us means allow
to allow us all allow us means allow data breaks to access the data because
data breaks to access the data because data braks cannot go and access the data
data braks cannot go and access the data directly no we are going indirectly we
directly no we are going indirectly we are going indirectly with a credential
are going indirectly with a credential in our hand and we are saying hey we
in our hand and we are saying hey we have the required identity card please
have the required identity card please allow us us means data breaks I am the
allow us us means data breaks I am the data breaks so please allow me to access
data breaks so please allow me to access the data stored in data Lake simple
the data stored in data Lake simple sorted so now it's time to test this
sorted so now it's time to test this connection obviously it will be tested
connection obviously it will be tested once we can just load the data so let's
once we can just load the data so let's quickly read the data set number one
quickly read the data set number one what is our data set number one let me
what is our data set number one let me just see let me go to my storage account
just see let me go to my storage account and then I can just click on containers
and then I can just click on containers then I can just click on bronze and
then I can just click on bronze and first of all I will just read this data
first of all I will just read this data Adventure Works uncore calendar okay
Adventure Works uncore calendar okay Adventure Works calendar okay so how we
Adventure Works calendar okay so how we can do that first of all let me just
can do that first of all let me just write a heading that I am going to
write a heading that I am going to create okay here's the second way of
create okay here's the second way of putting the heading just put percentage
putting the heading just put percentage MD MD is the short perform markdown and
MD MD is the short perform markdown and percentage is a kind of syntax that we
percentage is a kind of syntax that we use it's called Magic command so I will
say read calendar data because I want to create a notebook
data because I want to create a notebook need and clean for you guys so that you
need and clean for you guys so that you can refer and learn better so first of
can refer and learn better so first of all let's read calendar data so how we
all let's read calendar data so how we can do that let me show you the code it
can do that let me show you the code it is very easy simply say DF or any kind
is very easy simply say DF or any kind of variable for your data frame DF
of variable for your data frame DF equals to
spark. read. format first of all understand
read. format first of all understand this we are saying spark. read. format
this we are saying spark. read. format then we just need to put the format the
then we just need to put the format the data it is in CSV format just put CSV
data it is in CSV format just put CSV then we need to say dot
then we need to say dot option and in the option we want to put
option and in the option we want to put header equals to true so we will say
header equals to true so we will say header
header comma true simple then another
comma true simple then another option
option option INF first
schema comma true don't worry I will explain exp you what are all these terms
explain exp you what are all these terms don't worry but let me just test this
don't worry but let me just test this connection let me just load the data for
connection let me just load the data for you guys then I will explain you what
you guys then I will explain you what these terms are for then I will say dot
these terms are for then I will say dot load and now we need to just give the
load and now we need to just give the location and there is a format to pass
location and there is a format to pass the URL of the data stored in the data
the URL of the data stored in the data link right and let me just quickly first
link right and let me just quickly first put that here you can just copy it from
put that here you can just copy it from here so I will explain you how you can
here so I will explain you how you can remember this format because sometime it
remember this format because sometime it becomes really confusing and obviously
becomes really confusing and obviously interview can ask you this question like
interview can ask you this question like what is the format of putting URL
what is the format of putting URL because this URL is not readily
because this URL is not readily available in data L you need to either
available in data L you need to either refer data documentation and I have like
refer data documentation and I have like enough practice of using data and data l
enough practice of using data and data l so I remember the format so let me just
so I remember the format so let me just quickly write it here
quickly write it here so at theate storage account
so at theate storage account name it is aw storage data Lake don't
name it is aw storage data Lake don't worry we EXP play it in just few seconds
worry we EXP play it in just few seconds just for just wait for few seconds okay
just for just wait for few seconds okay dot DFS
dot DFS dot Windows do DFS
dot Windows do DFS doc. windows.net
doc. windows.net simple let me just run
it h okay I can see some errors what is that error bro
oh I see let me just mention the folder name
see let me just mention the folder name as well and what was the folder name
as well and what was the folder name Adventure Works Calendar let me just
Adventure Works Calendar let me just copy it Adventure Works calendar
copy it Adventure Works calendar Adventure Works calendar
Adventure Works calendar okay let's read
okay let's read this again error httv connection failed
this again error httv connection failed to it oh I got it it was not about that
to it oh I got it it was not about that particular folder so the connection that
particular folder so the connection that we have made with this application
we have made with this application failed now I want to dig deeper why what
failed now I want to dig deeper why what information we have put wrong so what I
information we have put wrong so what I think is we have mentioned one of these
think is we have mentioned one of these values wrong my 99% of the intuition and
values wrong my 99% of the intuition and let me still see it because it oh I got
let me still see it because it oh I got it so the thing is we have copied object
it so the thing is we have copied object ID instead of tenant ID just a s mistake
ID instead of tenant ID just a s mistake and you know how to correct it let me
and you know how to correct it let me show you so now it's time to go to your
show you so now it's time to go to your aure and let me show you from the
aure and let me show you from the beginning so just go back to your
beginning so just go back to your Microsoft enter ID and just select the
Microsoft enter ID and just select the app registration in which we have
app registration in which we have registered our applications just click
registered our applications just click on all
on all applications so now here you see the
applications so now here you see the list so just click on AW project
list so just click on AW project application so this is the one that I
application so this is the one that I recently created so these two like other
recently created so these two like other two are like for my different projects
two are like for my different projects don't don't mind don't mind just click
don't don't mind don't mind just click on it and it will again show you the
on it and it will again show you the same page that we need so instead of
same page that we need so instead of object ID we need to copy the tenant ID
object ID we need to copy the tenant ID just just a silly mistake so we just
just just a silly mistake so we just need to replace this object ID e70
need to replace this object ID e70 something with our tenant ID let's do
something with our tenant ID let's do that let's do
that let's do that okay so we need to replace this
that okay so we need to replace this value with this one so let's rerun it
value with this one so let's rerun it okay and let's rerun this as
well now I should see some data if not we will tackle the errors because that's
we will tackle the errors because that's how you become a pro feel happy when you
how you become a pro feel happy when you see the
see the errors really you would have seen like
errors really you would have seen like first developer who is saying this but
first developer who is saying this but trust me it will help you a lot in your
trust me it will help you a lot in your Learning Journey it's better to see
Learning Journey it's better to see errors in your
errors in your room rather than seeing errors in front
room rather than seeing errors in front of interviewers right so as you can see
of interviewers right so as you can see we have
we have successfully loaded the data and in
successfully loaded the data and in order to display the data we have a
order to display the data we have a command called dot
command called dot display okay just write DF dot
display and it will display the data as you can see we have just one column in
you can see we have just one column in our data source so obviously it gave us
our data source so obviously it gave us only one column congratulations first of
only one column congratulations first of all Cong congratulations and now let me
all Cong congratulations and now let me explain you what we have done in our
explain you what we have done in our reading code so the code is really
reading code so the code is really simple so let me just break it down for
simple so let me just break it down for you all guys first of all we have
you all guys first of all we have already covered that this is the format
already covered that this is the format right okay this is the header obviously
right okay this is the header obviously you are aware of header so we want to
you are aware of header so we want to make our header header true because we
make our header header true because we want to see our header then what is this
want to see our header then what is this infer schema bro what is this infer
infer schema bro what is this infer schema
schema so by default when we save data in CSV
so by default when we save data in CSV it will read all the columns are text
it will read all the columns are text columns so we want spark to just infer
columns so we want spark to just infer the schema what does it mean just to
the schema what does it mean just to decide the schema on your own based on
decide the schema on your own based on your own intelligence just look at the
your own intelligence just look at the data and just decide the schema on your
data and just decide the schema on your own we do not want to extensively like
own we do not want to extensively like Define the schema again and again we do
Define the schema again and again we do not want to pass any argument for schema
not want to pass any argument for schema just decide it on your own and based on
just decide it on your own and based on its uh you can say intelligence it can
its uh you can say intelligence it can just predict the schema that's it it's
just predict the schema that's it it's very handy when you do not need to give
very handy when you do not need to give schema for your data frames again and
schema for your data frames again and again so
again so I use this command a lot like in for
I use this command a lot like in for schema then this is a load command as
schema then this is a load command as the name suggest we need to load the
the name suggest we need to load the data now don't worry I'm coming to your
data now don't worry I'm coming to your favorite part how we can remember the
favorite part how we can remember the format for the URL it's very simple so
format for the URL it's very simple so the format is let me show you if you can
the format is let me show you if you can closely see first of all I will
closely see first of all I will just take it to the next line because I
just take it to the next line because I don't like like keeping all the code in
don't like like keeping all the code in one line yeah I know I was the one who
one line yeah I know I was the one who wrote it but still
wrote it but still sorry so yes you can just shift your
sorry so yes you can just shift your codes in the next Lines by just adding
codes in the next Lines by just adding one small thing which is this backs
one small thing which is this backs slash do not put uh front slash front
slash do not put uh front slash front slash is the regular slash we use back
slash is the regular slash we use back slash is the key which is available just
slash is the key which is available just above the enter key so use that
above the enter key so use that slash simple simple simple now you can
slash simple simple simple now you can just move your code to the next line it
just move your code to the next line it will allow you so what is the format bro
will allow you so what is the format bro format is very simple first of all you
format is very simple first of all you need to write AB FSS that is a kind of
need to write AB FSS that is a kind of your blob storage something then put
your blob storage something then put your container name our container name
your container name our container name is bronze then put at theate storage
is bronze then put at theate storage account name and our storage account
account name and our storage account name is my my not our my aw storage data
name is my my not our my aw storage data Lake simple then you need to put this
Lake simple then you need to put this thing as it is dfsco windows.net simple
thing as it is dfsco windows.net simple and then
and then your folder name simple this is the
your folder name simple this is the format I know do not feel overwhelmed
format I know do not feel overwhelmed like because obviously when you are just
like because obviously when you are just starting working with data brakes it is
starting working with data brakes it is very difficult to remember the format
very difficult to remember the format but you can just copy paste from here
but you can just copy paste from here and with the course of time you will be
and with the course of time you will be familiar with it and and you you would
familiar with it and and you you would not need to see and you would not need
not need to see and you would not need to just copy and paste it you will just
to just copy and paste it you will just look at the storage account name and
look at the storage account name and container name and you can just form
container name and you can just form your uh URL without just copying it from
your uh URL without just copying it from anywhere else but for now just copy it
anywhere else but for now just copy it from here I have just paused my video so
from here I have just paused my video so you can just take it okay sorted now
you can just take it okay sorted now let's load the other data frames as well
let's load the other data frames as well because we have successfully loaded
because we have successfully loaded calendar uh data data frame or data now
calendar uh data data frame or data now it's time to load other data
it's time to load other data let's do it so as we have just loaded
let's do it so as we have just loaded calendar data so now it's time to load
calendar data so now it's time to load other data sites as well so what we can
other data sites as well so what we can do we can simply copy paste the code
do we can simply copy paste the code that we have used for data reading it
that we have used for data reading it will save us a lot of time but keep one
will save us a lot of time but keep one thing in mind just add a suffix let's
thing in mind just add a suffix let's say DF Cal because it will be
say DF Cal because it will be identifiable in future that we are
identifiable in future that we are reading calendar data right just do this
reading calendar data right just do this and just delete this
and just delete this because we will be displaying the data
because we will be displaying the data together so let's say read calendar data
together so let's say read calendar data let me
let me just copy this or let's let me call it
just copy this or let's let me call it as reading data so it will be like
as reading data so it will be like generic for all the data reading right
generic for all the data reading right then let
then let [Music]
[Music] me copy this code and let me now read
me copy this code and let me now read the next data set which
the next data set which is which is I think customer let me
is which is I think customer let me check click on containers and then
check click on containers and then bronze yeah it's customers so I just
bronze yeah it's customers so I just need to change the folder name because
need to change the folder name because rest of the things will be same exact
rest of the things will be same exact same and as this is customers so I want
same and as this is customers so I want to make my data frame to be known as
to make my data frame to be known as dfdore cus so I will just read
it yes successful then I will just read another data
another data set which is
set which is d uh product categories okay so I just
d uh product categories okay so I just need to say product
categories product categories okay simply run it
product categories okay simply run it oops I forgot to just change it so I
oops I forgot to just change it so I will say DF product categories
will say DF product categories proat okay let me rerun it and we just
proat okay let me rerun it and we just need to rerun this as well that's it
need to rerun this as well that's it then the next one
then the next one so this is just a kind of data reading
so this is just a kind of data reading step so we are just reading all the data
step so we are just reading all the data one by one in this we just need to say
one by one in this we just need to say product so I will say
product so I will say Pro uh we can remove this product or
Pro uh we can remove this product or products products okay we just need to
products products okay we just need to add
add s
s simple then another
[Music] returns simply run this as
returns simply run this as well then we have sales 15 16 17 okay
well then we have sales 15 16 17 okay we'll do
simple let me show you one thing so instead of writing 15 16 17 again and
instead of writing 15 16 17 again and again what I can say just put asteris
again what I can say just put asteris after this let's try
after this let's try this it ran it so what we have done we
this it ran it so what we have done we have just set data breakes just pick all
have just set data breakes just pick all the folders which have this naming
the folders which have this naming convention Adventure Works unor sales
convention Adventure Works unor sales whether it is 15 16 or 177 it will pick
whether it is 15 16 or 177 it will pick all the files just a pro tip just a
all the files just a pro tip just a proad tip and a potential intervie
proad tip and a potential intervie equation as well how you can recursively
equation as well how you can recursively read
read files
files see so now we can just say territories
see so now we can just say territories as
as well okay
territories okay and I will say here d r and I need to change this name as well
and I need to change this name as well because this for sales
because this for sales so let me rerun it and I need to rerun
so let me rerun it and I need to rerun DF Pro as well because I just use that
DF Pro as well because I just use that data frame name so do not do this
data frame name so do not do this mistake if you do it accept it and
mistake if you do it accept it and correct
correct it DF and then we have
subcategories okay okay okay okay let's write it just one more just one
write it just one more just one more subcat
subcategories and then we have oh I saw one error don't worry we'll correct it
one error don't worry we'll correct it then we have products
then we have products products oh don't mind this one because
products oh don't mind this one because this was the one that we just pulled it
this was the one that we just pulled it when we were testing our data I was
when we were testing our data I was thinking like why this naming convention
thinking like why this naming convention is so different from the other ones so
is so different from the other ones so do not mind this one because we already
do not mind this one because we already have product file so let me just take
have product file so let me just take take you to the your data Factory phase
take you to the your data Factory phase so when we were just testing our static
so when we were just testing our static pipeline this is that data so ignore it
pipeline this is that data so ignore it so I know we have a
so I know we have a error ah what is the
error ah what is the error I think I have messed up the
error I think I have messed up the spelling I don't know Adventure work sub
spelling I don't know Adventure work sub categories what is the spelling
here sub categories let me just copy it from here I I think I have just missed
from here I I think I have just missed up with
up with spelling oh yeah I didn't add product
spelling oh yeah I didn't add product okay okay okay
okay okay okay okay now let me run
okay now let me run it let me run
it let me run this
this what oh it's just product subcategories
what oh it's just product subcategories it's not like Adventure works
it's not like Adventure works subcategories okay makes sense makes
subcategories okay makes sense makes sense Mak sense we are humans we can
sense Mak sense we are humans we can make mistakes the difference is we can
make mistakes the difference is we can correct it if you want if you don't
correct it if you want if you don't want I should I should not speak about
want I should I should not speak about it okay so now our all the data frames
it okay so now our all the data frames are loaded congratulations you have
are loaded congratulations you have successfully loaded all the data frames
successfully loaded all the data frames and now we will be performing some kind
and now we will be performing some kind of Transformations and simultaneously we
of Transformations and simultaneously we will be pushing this data to to to to to
will be pushing this data to to to to to to let me show you this to this
to let me show you this to this container which is silver right so are
container which is silver right so are you excited to learn some of the crazy
you excited to learn some of the crazy functions available in data braks so
functions available in data braks so let's transform our data to some extent
let's transform our data to some extent obviously we are not just doing hardcore
obviously we are not just doing hardcore transformation because these open data
transformation because these open data sets are already clean but we will try
sets are already clean but we will try our best to cover as many functions as
our best to cover as many functions as we can so that we can be familiar with
we can so that we can be familiar with Transformations or functions available
Transformations or functions available in spark and vice spark so let's start
in spark and vice spark so let's start transform Transforming Our data and
transform Transforming Our data and pushing to the silver layer and complete
pushing to the silver layer and complete our phase two as well
our phase two as well successfully love you my data Community
successfully love you my data Community let's do that let me just put a heading
let's do that let me just put a heading here and then we'll start doing that as
here and then we'll start doing that as well so this is
Transformations okay perfect so now first of all let's
first of all let's transform calendar data now you'll be
transform calendar data now you'll be asking hey what do we need to transform
asking hey what do we need to transform it it just has one column
it it just has one column so we will transform it first of all I
so we will transform it first of all I will display the data DF do
display LOL LOL they're saying that that data frame doesn't exist
data frame doesn't exist what is not defined real we defined it
what is not defined real we defined it let me rone
let me rone it
it yeah now I can just say
yeah now I can just say display it should work
display it should work yep so now as you can see we just have
yep so now as you can see we just have one column do you want to learn some
one column do you want to learn some date functions let me tell you so let's
date functions let me tell you so let's say we have this date and we want to
say we have this date and we want to create a column called month in which we
create a column called month in which we have month of every date right that that
have month of every date right that that can be the scenario because I have
can be the scenario because I have worked with these scenarios where we
worked with these scenarios where we just need to fetch month year so let's
just need to fetch month year so let's create two columns month and year and by
create two columns month and year and by just applying this transformation you
just applying this transformation you will get to know about about the date
will get to know about about the date functions right so this is just a
functions right so this is just a special edition for you for you for you
special edition for you for you for you okay so how we can do that it's very
okay so how we can do that it's very simple
simple so I will simply say DF do with column
so I will simply say DF do with column so if you are not aw aware of dot withth
so if you are not aw aware of dot withth column so it is a kind of function that
column so it is a kind of function that we have available in spark Library so
we have available in spark Library so with the help of this function we create
with the help of this function we create a new column or we modify the existing
a new column or we modify the existing mod how we can just make a difference
mod how we can just make a difference between modifying the existing one or
between modifying the existing one or create a new column so if I keep the
create a new column so if I keep the column name same let's say I just write
column name same let's say I just write here date and we already have date date
here date and we already have date date column so it will just modify this one
column so it will just modify this one but we do not want that we create a new
but we do not want that we create a new we want to create a new one so in this
we want to create a new one so in this scenario I will just pick a new name
scenario I will just pick a new name which is not available in my table or
which is not available in my table or data frame right so I will simply say
data frame right so I will simply say month then comma then we need to put the
month then comma then we need to put the transformation like what transformation
transformation like what transformation we want to apply on this data frame so I
we want to apply on this data frame so I want to use a function called month yes
want to use a function called month yes we do have a month function within ppar
we do have a month function within ppar Library I just saw a fishy thing because
Library I just saw a fishy thing because I didn't see a recommendation based on
I didn't see a recommendation based on this function and you know why because I
this function and you know why because I didn't pull the library so for that
didn't pull the library so for that first thing that you should do always
first thing that you should do always just click on Plus Code and pull the
just click on Plus Code and pull the modules so like
modules so like from P Spark do SQL do
from P Spark do SQL do functions import all the functions
functions import all the functions import asri and from F
import asri and from F spark do SQL do types pull all the
spark do SQL do types pull all the types import estx just run
this yeah successful yes so now when I will be writing month let me just
will be writing month let me just rewrite for you now I should see some
rewrite for you now I should see some recommendation see this is a this is a
recommendation see this is a this is a quick tip for you guys and you can just
quick tip for you guys and you can just identify the errors before compiler will
identify the errors before compiler will do it for you so you are your your
do it for you so you are your your compiler so I will simply write month
compiler so I will simply write month now I need to use a column like I want
now I need to use a column like I want to fetch a month of which colum let me
to fetch a month of which colum let me give you bro I want to use column so
give you bro I want to use column so column Co is a column object that we use
column Co is a column object that we use with column name right then we just need
with column name right then we just need to put our column name which is date
to put our column name which is date simple it is done and if you want to say
simple it is done and if you want to say it we will simply say DF
it we will simply say DF Cal equals this simple and if I want to
Cal equals this simple and if I want to display it I can
display it I can here do display so now you will see
here do display so now you will see column
created perfect as you can see it has given us the month one one one one one
given us the month one one one one one because this is just month one then I
because this is just month one then I can just scroll it down I can see 8 9 10
can just scroll it down I can see 8 9 10 as well similarly I will just create
as well similarly I will just create another column so so I will just use
another column so so I will just use backslash for the next line as you all
backslash for the next line as you all know now so click on it and then I will
know now so click on it and then I will again say dot with
again say dot with column then I will say
ear and then I will use ear function we have ear function as well very good then
have ear function as well very good then we just need to repeat these steps
we just need to repeat these steps column object and then we need to pass
column object and then we need to pass the column name which is date simple do
the column name which is date simple do not make things compliment not
not make things compliment not compliment like difficult so just
compliment like difficult so just remember one thing you need to put the
remember one thing you need to put the single CES whenever you are using column
single CES whenever you are using column object right so now let me just rerun it
object right so now let me just rerun it and now I should see two two more
and now I should see two two more columns month and year perfect as you
columns month and year perfect as you can see year is 2015 16 17 simple and
can see year is 2015 16 17 simple and now I want to just push this data let me
now I want to just push this data let me remove this so now I want to push this
remove this so now I want to push this data to Silver layer and how we can do
data to Silver layer and how we can do this let me show you
this let me show you so code is little bit similar to the
so code is little bit similar to the data reading but it is slight different
data reading but it is slight different slightly different so just write DF Cal
slightly different so just write DF Cal that is your data frame name then write
that is your data frame name then write this is a pypar writer API so just write
this is a pypar writer API so just write DF do write then you need to mention
DF do write then you need to mention format in which format you want to push
format in which format you want to push the data to Silver and I would say let's
the data to Silver and I would say let's push this data to pocket format what is
push this data to pocket format what is a pocket format bro
a pocket format bro pocket format is a columnal format and
pocket format is a columnal format and it is very much in demand and right now
it is very much in demand and right now we work with pocket files I know we also
we work with pocket files I know we also use Delta but Delta are built on top of
use Delta but Delta are built on top of pocket files so pocket files are very
pocket files so pocket files are very much in demand and it is very much
much in demand and it is very much optimized for data reads because it is a
optimized for data reads because it is a columnal file format so now you are
columnal file format so now you are familiar with pocket file formats as
familiar with pocket file formats as well and let's push this data to pocket
well and let's push this data to pocket format let's write pocket then backs
format let's write pocket then backs slash then I want to say dot option oh
slash then I want to say dot option oh before option I will say mode now what
before option I will say mode now what is this mode and writer uh API so there
is this mode and writer uh API so there are basically four modes available let
are basically four modes available let me just write it for you so that you can
me just write it for you so that you can keep this thing in your mind because
keep this thing in your mind because these are some things that interviewer
these are some things that interviewer interviewers can ask and you should be
interviewers can ask and you should be like having these T these things in your
like having these T these things in your tips on your tips like just ask me this
tips on your tips like just ask me this and I will be like hey this is the mode
and I will be like hey this is the mode and this mode is used for this purpose
and this mode is used for this purpose so basically there four modes available
so basically there four modes available so first of all we
so first of all we have upend
have upend mode don't worry I will tell you each
mode don't worry I will tell you each and every mode then we have overwrite
and every mode then we have overwrite mode then we
mode then we have error mode then we have ignore mode
have error mode then we have ignore mode yes so we have these four modes so what
yes so we have these four modes so what does this mode do so if we have data
does this mode do so if we have data already already stored in the folder and
already already stored in the folder and we want to append the data that means we
we want to append the data that means we want to merge that data with the
want to merge that data with the existing one we want to just apply a
existing one we want to just apply a union so we will use upend but if you
union so we will use upend but if you want to replace that data that is
want to replace that data that is already there and you want to store the
already there and you want to store the fresh data we use
fresh data we use overwrite what is this error so if data
overwrite what is this error so if data is already there and we are just trying
is already there and we are just trying to write the data to the same folder it
to write the data to the same folder it will say hey bro there's data there I
will say hey bro there's data there I cannot WR right so it will throw an
cannot WR right so it will throw an error okay now what this ignore will do
error okay now what this ignore will do so ignore will just ignore the things no
so ignore will just ignore the things no no no no I'm just kidding basically it
no no no I'm just kidding basically it ignores so if data is there in the
ignores so if data is there in the folder it will not throw any error but
folder it will not throw any error but it will not write the data as well so it
it will not write the data as well so it will not throw any error but it will not
will not throw any error but it will not write the data as well so it will just
write the data as well so it will just ignore right now I have have explained
ignore right now I have have explained all these four modes in the easiest
all these four modes in the easiest manner so I think now you will be
manner so I think now you will be keeping these modes in mind and just
keeping these modes in mind and just feel confident just answer like this
feel confident just answer like this whenever I ask you like what is this
whenever I ask you like what is this this mode for just say hey bro we
this mode for just say hey bro we already know so I will just use append
already know so I will just use append mode
mode okay when I use append mode I will just
okay when I use append mode I will just use back slash enter Then I need to use
use back slash enter Then I need to use option what is this option so in this
option what is this option so in this option command we will give the path
option command we will give the path where we need to store the data so I
where we need to store the data so I will say
will say path comma and then same same data
path comma and then same same data format yeah the the format that we used
format yeah the the format that we used while reading the data same same path
while reading the data same same path not same I mean to say same type same
not same I mean to say same type same format obviously the path is different
format obviously the path is different because that is a source this is a
because that is a source this is a destination so the source this time is
destination so the source this time is oh sorry destination this time is
oh sorry destination this time is abfs colon double slash and then this
abfs colon double slash and then this time container is silver not
time container is silver not bronze so silver at the rate what is the
bronze so silver at the rate what is the storage account name aw let me copy this
storage account name aw let me copy this one because rest of things are same we
one because rest of things are same we just need to save this data
just need to save this data in silver container instead of bronze
in silver container instead of bronze container and we have successfully put
container and we have successfully put the URL here simple one last thing do
the URL here simple one last thing do not forget to mention that dot save so
not forget to mention that dot save so it will just save your data let's run
this okay so it has successfully return the data so now I want to check this
the data so now I want to check this let's go to your storage account and
let's go to your storage account and check your silver folder not folder
check your silver folder not folder container yes we have returns data why
container yes we have returns data why we have returns data
we have returns data oh because we just picked the location
oh because we just picked the location of returns sorry let me just change
of returns sorry let me just change it let me just change it see this is the
it let me just change it see this is the thing this is the advantage of data
thing this is the advantage of data validation I instantly validated it and
validation I instantly validated it and I got the issue just remove this data
I got the issue just remove this data and just reload it uh let me just rerun
and just reload it uh let me just rerun it
it simple so now I should see the correct
simple so now I should see the correct data let me just refresh it yes now we
data let me just refresh it yes now we have the data so this was all about date
have the data so this was all about date functions and this was all about
functions and this was all about calendar now the next data frame that we
calendar now the next data frame that we want to transform and push to Silver
want to transform and push to Silver layer is customers let's do
layer is customers let's do that let's do that I think I should
that let's do that I think I should create a heading for you guys so that
create a heading for you guys so that you you will not feel confused like
you you will not feel confused like which transformation is for which data
which transformation is for which data frame let's do that so I will simply
frame let's do that so I will simply say
say calendar okay so now you can easily
calendar okay so now you can easily navigate through through through this
navigate through through through this notebook okay no
notebook okay no worries okay so now let me just add
worries okay so now let me just add another one which is customers right
another one which is customers right H so first of all I want to see my
H so first of all I want to see my customer
customer data by running the command. display and
data by running the command. display and then we will just create some scenarios
then we will just create some scenarios where we can just apply Transformations
where we can just apply Transformations on top of this data frame and in
on top of this data frame and in customers data frame we can easily apply
customers data frame we can easily apply lots of Transformations so this is one
lots of Transformations so this is one of the best data frames you can just
of the best data frames you can just imagine while working with spark
imagine while working with spark Transformations because we have so much
Transformations because we have so much to to use right and this is our data
to to use right and this is our data frame this looks like this so we will
frame this looks like this so we will cover three transformations in this
cover three transformations in this three Transformations and these three
three Transformations and these three Transformations will be really really
Transformations will be really really nice trust me because I'm going to cover
nice trust me because I'm going to cover some crazy transformation not crazy
some crazy transformation not crazy transformation but yeah it will be
transformation but yeah it will be really handy and it will be really uh
really handy and it will be really uh helpful for you First Transformation
helpful for you First Transformation First Transformation so the First
First Transformation so the First Transformation will be for text based
Transformation will be for text based transformation so we have already
transformation so we have already covered some date Transformations it's
covered some date Transformations it's time to cover some string transformation
time to cover some string transformation SL text transformation whatever you want
SL text transformation whatever you want to say so what we will be doing we have
to say so what we will be doing we have Mr that is prefix Mr Mrs Miss then we
Mr that is prefix Mr Mrs Miss then we have first name then we have last name
have first name then we have last name so what we will be doing we will create
so what we will be doing we will create a column called full name in which we
a column called full name in which we will be just concatenating all the three
will be just concatenating all the three columns so let's create that uh
columns so let's create that uh transformation let's transform the the
transformation let's transform the the these three columns and it will work
these three columns and it will work like this DF
like this DF C equals to DF C dot with
C equals to DF C dot with column then we need to say hey full name
column then we need to say hey full name this is our column column name full name
this is our column column name full name and the transformation that we want to
and the transformation that we want to apply so basically there are two ways so
apply so basically there are two ways so that's why I picked this transformation
that's why I picked this transformation because this is one of the favorite
because this is one of the favorite questions asked by interviewers yes so
questions asked by interviewers yes so basically there are two functions
basically there are two functions available for concat one is concat
available for concat one is concat simple concat in which we just put all
simple concat in which we just put all the columns and we put the separator let
the columns and we put the separator let me show you that first then I will just
me show you that first then I will just tell you the advanced one so first of
tell you the advanced one so first of all I will say
all I will say concat then I will say column name that
concat then I will say column name that I want to conards first of all I want
I want to conards first of all I want prefix let me just check
prefix let me just check the column name yes it is
the column name yes it is prefix then after column I want to add a
prefix then after column I want to add a space so I will just do this it is
space so I will just do this it is similar to excel don't worry so just put
similar to excel don't worry so just put another comma then I will put my second
another comma then I will put my second column name which is first
column name which is first name yeah it is first name
name yeah it is first name okay first name then again I will
okay first name then again I will put comma and then space then comma
put comma and then space then comma third column name so this is
third column name so this is the normal approach that everyone
the normal approach that everyone follows and now I'm going to tell you
follows and now I'm going to tell you the advanced approach okay so this is my
the advanced approach okay so this is my last
last name perfect so let me first display
name perfect so let me first display this to you and we will not save this
this to you and we will not save this because we will save the advanced one
because we will save the advanced one right so let me just display
right so let me just display this do
display what what is this error uh oh I got it so here is another
error uh oh I got it so here is another learning for you I forgot to mention H
learning for you I forgot to mention H nice so the thing is whenever you want
nice so the thing is whenever you want to add a constant what does it mean like
to add a constant what does it mean like any constant maybe a kind of space any
any constant maybe a kind of space any kind of number any kind of alphabet if
kind of number any kind of alphabet if you want to put any constant we want to
you want to put any constant we want to make the use of lit function lit lit I
make the use of lit function lit lit I forgot to mention that because I didn't
forgot to mention that because I didn't use that functions in so long so I just
use that functions in so long so I just remember like looking at like it is
remember like looking at like it is saying there's no column name so I said
saying there's no column name so I said like I didn't use any column name with
like I didn't use any column name with that space then I got to know hey I need
that space then I got to know hey I need to use L function so this is the
to use L function so this is the function lit and then we can just Define
function lit and then we can just Define this space within this because we cannot
this space within this because we cannot Define constant like this we have to use
Define constant like this we have to use lit
lit function simple now it's fine it should
function simple now it's fine it should be
be fine it should be fine fingers crossed
fine it should be fine fingers crossed simple so let's look at our new column
simple so let's look at our new column wow it worked yes now I will tell you
wow it worked yes now I will tell you the advanced approach not an advanced
the advanced approach not an advanced approach but yes this is a kind of
approach but yes this is a kind of function that no one not every one knows
function that no one not every one knows about so what is this function so you
about so what is this function so you have just seen that we were just again
have just seen that we were just again and again putting this space thing right
and again putting this space thing right I do not want to do that I just want to
I do not want to do that I just want to put my delimeter just one time and then
put my delimeter just one time and then I just want to mention the column names
I just want to mention the column names for that we have a function called WS
for that we have a function called WS concat
concat wsor concat yes this is the function
wsor concat yes this is the function that we have so I will just write DF C
and then DF
C dot with column and I will say full
column and I will say full name and I will use WS
name and I will use WS concat or concat
concat or concat WS yes it's concat WS so just open this
WS yes it's concat WS so just open this and now as you can see here in the
and now as you can see here in the suggestion as well it is asking us for
suggestion as well it is asking us for like SE seator that we want and then we
like SE seator that we want and then we just need to put the column names
just need to put the column names simple so here in the separator I will
simple so here in the separator I will use let's
use let's say space here I can use space without
say space here I can use space without any lit function and all so I think so
any lit function and all so I think so because I used it like few months back
because I used it like few months back let's test it but I feel like we can use
let's test it but I feel like we can use it without lit if not we can add it not
it without lit if not we can add it not a big deal then just add your all the
a big deal then just add your all the columns let's say column
prefix and column first name and then column last
name simple simple now I want to test it let's run it see we just need to put
let's run it see we just need to put space we do not need to put uh LS
space we do not need to put uh LS because this function is specifically
because this function is specifically Built For This purpose where we do not
Built For This purpose where we do not need to mention the separator again and
need to mention the separator again and again so let me just show you the data
again so let me just show you the data now so just go to the next Cel and just
now so just go to the next Cel and just write DF c.
write DF c. display see now you're learning all the
display see now you're learning all the advanced things all the advanced things
advanced things all the advanced things now I should see the column yes it is
now I should see the column yes it is perfect and now this time I have saved
perfect and now this time I have saved this column to my data frame so it will
this column to my data frame so it will be pushed to Silver layer as well it's
be pushed to Silver layer as well it's time to write this data so just copy the
time to write this data so just copy the code from above that we used in the
code from above that we used in the calendar and do not forget to change the
calendar and do not forget to change the name the mistake that we did it earlier
name the mistake that we did it earlier okay so this time I will say
customers perfect and our data frame is DF cus
DF cus simple just hit the command and we
simple just hit the command and we good yes perfect now it's time to pull
good yes perfect now it's time to pull the third one third data frame let me
the third one third data frame let me let me just refresh it first okay I can
let me just refresh it first okay I can see the
see the data okay now it's time to call the
data okay now it's time to call the third one which is product product
third one which is product product categories okay so for product
categories okay so for product categories I don't think so we have much
categories I don't think so we have much data so I don't think so we need to
data so I don't think so we need to apply any Transformations on it so we
apply any Transformations on it so we will just read it and we will just write
will just read it and we will just write it to Silver layer simple simple simp
it to Silver layer simple simple simp simple so first of all I will just write
simple so first of all I will just write percentage
subcategories okay subcategories so DF
subcategories so DF subcategories equals to let me just copy
subcategories equals to let me just copy the
code copy oh we just need to display it sorry
categories oh yeah we just have like three columns
three columns H so we can just simply write this data
H so we can just simply write this data to silverly without any transformation
to silverly without any transformation because I don't think so we should apply
because I don't think so we should apply any transformation this is really simple
any transformation this is really simple so now we can just copy the code here
so now we can just copy the code here from right and then we can just write
from right and then we can just write this
this data DF subcat
data DF subcat dot this
one simple and and do not forget to change the column name sorry folder
name sub categories simple just hit shift plus enter simple so now it's
hit shift plus enter simple so now it's time to read products data yes let's do
time to read products data yes let's do that I will simply
that I will simply write percentage MD and then products
okay products so let's see what do we have in products data
frame oh we have many more columns okay so what we going to do in this
okay so what we going to do in this particular data frame so as you can see
particular data frame so as you can see we have all the information available
we have all the information available related to products very good so I have
related to products very good so I have a very beautiful
a very beautiful for this particular table so as you can
for this particular table so as you can see that we have product sko column and
see that we have product sko column and this is the real time scenario because
this is the real time scenario because these kind of uh requirement that we get
these kind of uh requirement that we get in like normal days as well so this is
in like normal days as well so this is product SKU I just want to fetch the
product SKU I just want to fetch the first two
first two letters of product SQ maybe this can be
letters of product SQ maybe this can be my mapping two categories or anything
my mapping two categories or anything this is the requirement I just want
this is the requirement I just want first two letters that's it or I want to
first two letters that's it or I want to just level it up okay just level up I
just level it up okay just level up I want all the alphabet let's say I have
want all the alphabet let's say I have three alphabets because I can have three
three alphabets because I can have three alphabets so I want all the alphabets
alphabets so I want all the alphabets before this
before this hyphen yes let's do this so this is my
hyphen yes let's do this so this is my requirement Number One requirement
requirement Number One requirement number two I just
number two I just want
want the last color thing like product
the last color thing like product name so as you can see in this product
name so as you can see in this product name
column this product name column we have all the names obviously related to
all the names obviously related to products so I just want to fetch the
products so I just want to fetch the first word of every product name column
first word of every product name column just First Alpha not alphabet like first
just First Alpha not alphabet like first word before
word before space as you can see we have spaces like
space as you can see we have spaces like l L HL ml then we have long aw AWC
l L HL ml then we have long aw AWC Mountain the the like these kind of
Mountain the the like these kind of words I just want very first word so
words I just want very first word so here we will be
here we will be learning a very popular and very strong
learning a very popular and very strong function called split split function yes
function called split split function yes so what this function does it just
so what this function does it just splits the column into a list so then we
splits the column into a list so then we need to use indexing as well on the top
need to use indexing as well on the top of it so that we can just fetch the
of it so that we can just fetch the design ired index so in this scenario as
design ired index so in this scenario as we talked about we just need the first
we talked about we just need the first one so first we will just split the data
one so first we will just split the data based on hyphen then we will just pick
based on hyphen then we will just pick the first index the very first index
the first index the very first index which is also referred with
which is also referred with zero are you excited to to to do that
zero are you excited to to to do that let me show you how you can do that okay
let me show you how you can do that okay so we need to transform our columns this
so we need to transform our columns this time instead of creating new one so now
time instead of creating new one so now you will also learn how to transform the
you will also learn how to transform the column so it is very easy simply write
column so it is very easy simply write your
your DF
DF product right equals to
product right equals to DF Pro dot with
column okay with column and then I will just keep the column name same so that
just keep the column name same so that it will just modify the column product
it will just modify the column product SKU now here comes the function called
SKU now here comes the function called split so how we can use this function in
split so how we can use this function in Split we need to just
Split we need to just Define the column first and then we need
Define the column first and then we need to define the separator so in our
to define the separator so in our scenario the column
scenario the column is product
SKU and the separator is comma then once it creates a list
comma then once it creates a list because split function creates a list
because split function creates a list based on the separator I just want index
based on the separator I just want index number
number zero
zero simple this is the transformation
simple this is the transformation just focus on this part
just focus on this part because you need to do indexing as well
because you need to do indexing as well because otherwise it will just create a
because otherwise it will just create a list and we do not want a list we need a
list and we do not want a list we need a index zero index so let me just create
index zero index so let me just create the second column as well the same way
the second column as well the same way which is our product
which is our product name okay and I will simply write here
name okay and I will simply write here product name and here as well product
product name and here as well product name
name and this time my separator is
and this time my separator is space space yeah simple space see space
space space yeah simple space see space and then again I want zero index like
and then again I want zero index like zero zeroe index
zero zeroe index so okay now I want to just run this let
so okay now I want to just run this let me run
me run this in syntax uh uh uh I think there's
this in syntax uh uh uh I think there's one braces
one braces missing uhuh
missing uhuh uh I can't see anything let me just
uh I can't see anything let me just rerun
rerun it oh yeah it was fine I think so oh
it oh yeah it was fine I think so oh yeah it was fine just ignore these kinds
yeah it was fine just ignore these kinds of silly errors so now I want to look at
of silly errors so now I want to look at this data frame
this data frame result now I should see just first words
result now I should see just first words using split
using split function so as you can see that it has
function so as you can see that it has returned just the first word of the
returned just the first word of the product
product name very well done
name very well done yes and we have just committed a small
yes and we have just committed a small mistake instead of putting hyphen we
mistake instead of putting hyphen we have just put comma and I don't know why
have just put comma and I don't know why I just put comma let me just rerun
it okay so now it has given the desired
has given the desired result so let me just repeat what we did
result so let me just repeat what we did so we just needed to mention the
so we just needed to mention the separator and and I don't know why I
separator and and I don't know why I just put
just put comma so we need to just put hyphen
comma so we need to just put hyphen because our separator is hyphen not
because our separator is hyphen not comma because I was just talking about
comma because I was just talking about comma comma CSC file so it just came in
comma comma CSC file so it just came in my mind I just put comma so just ignore
my mind I just put comma so just ignore that so now our this customer data frame
that so now our this customer data frame is also modified now it's time to write
is also modified now it's time to write it and I know now you are feeling much
it and I know now you are feeling much more confident with split column or like
more confident with split column or like splitting the columns into a list and
splitting the columns into a list and then use indexing on top of it very well
then use indexing on top of it very well done very done because it's not a not an
done very done because it's not a not an easy function and not not a easy thing
easy function and not not a easy thing not an easy thing to grasp so if you
not an easy thing to grasp so if you have done it then good job and then we
have done it then good job and then we will just write this data frame Pro and
products uh uh uh
products okay and then we have returns file okay let's pick returns file
file okay let's pick returns file let me see what we have in
returns Rons
simple let me just see DF red. display oh I just committed
see DF red. display oh I just committed a mistake I just misspelled display I
a mistake I just misspelled display I just WR
just WR display okay just write display and then
display okay just write display and then oh this is a small data set okay I don't
oh this is a small data set okay I don't think so we just need to apply any
think so we just need to apply any transformation on this as well so just
transformation on this as well so just write it as it is because we do not have
write it as it is because we do not have much to transform it in this data frame
much to transform it in this data frame don't worry I have a very special thing
don't worry I have a very special thing to add in your transformation
to add in your transformation game do you want hint so when you'll be
game do you want hint so when you'll be just writing data for like sales sales
just writing data for like sales sales 2015 16 17 I will show you something
2015 16 17 I will show you something very special
very special so this is regarding performing a kind
so this is regarding performing a kind of analysis don't worry we will not just
of analysis don't worry we will not just performing lots of analytics but just
performing lots of analytics but just two to three things and I will show you
two to three things and I will show you how you can build transform uh not
how you can build transform uh not transformation like how you can build
transformation like how you can build charts in data break can you imagine and
charts in data break can you imagine and you do not need to use any code we just
you do not need to use any code we just need to just click on few things and you
need to just click on few things and you can just create charts like bar chart
can just create charts like bar chart line chart so there's a special gift for
line chart so there's a special gift for you so just wait for a few more seconds
you so just wait for a few more seconds because this is the second last data
because this is the second last data frame and then we will be just working
frame and then we will be just working with sales data set and I will show you
with sales data set and I will show you just three to four Transformations and I
just three to four Transformations and I will show you how you can build charts
will show you how you can build charts and also that so that you can just play
and also that so that you can just play with your data frame and you can
with your data frame and you can leverage that in your industry and you
leverage that in your industry and you can just showas this talent in your
can just showas this talent in your project that you also build some
project that you also build some Transformations because every
Transformations because every organization is looking towards
organization is looking towards performing analysis on Big Data so you
performing analysis on Big Data so you can leverage this as well that you
can leverage this as well that you performed analysis on big data in data
performed analysis on big data in data breaks so just a special gift for you so
breaks so just a special gift for you so it's time to write this data so I think
it's time to write this data so I think I have already copied this
I have already copied this data yes
data yes so DF R and then we just need to write
returns perfect let me just write it yes and I think now we have two data
it yes and I think now we have two data frames left sales and returns obviously
frames left sales and returns obviously we will be just covering sales after
we will be just covering sales after territories because that is a special
territories because that is a special gift first let's quickly transform
gift first let's quickly transform territories data and push it to Silver
territories data and push it to Silver and then come to our special gift
and then come to our special gift okay then we
okay then we have uh territories
territories do display let me see
display let me see DF is not defined H we just defined it
DF is not defined H we just defined it let me redefine it don't worry maybe I
let me redefine it don't worry maybe I just skipped it I just wrote the code
just skipped it I just wrote the code and did run it no
and did run it no [Music]
[Music] worries uhuh where's my data reading for
worries uhuh where's my data reading for DF oh here it is let me just run it
DF oh here it is let me just run it maybe it was missed don't worry just
maybe it was missed don't worry just rerun the code we have the notebook we
rerun the code we have the notebook we have the notebook and that is the
have the notebook and that is the biggest advant Vantage because when you
biggest advant Vantage because when you will be having this notebook everything
will be having this notebook everything was in the sequence so so that you can
was in the sequence so so that you can just refer it and you can just learn in
just refer it and you can just learn in a better way so just run
a better way so just run this oh this was a Basics basic small
this oh this was a Basics basic small data set again we do not have much
data set again we do not have much things to do in this data frame as well
things to do in this data frame as well so I think we should just focus on our
so I think we should just focus on our gift part and we should just cover that
gift part and we should just cover that data frame in little bit detail so let's
data frame in little bit detail so let's quickly push this data set as well
quickly push this data set as well because I know you are really excited to
because I know you are really excited to work on that data set because I was
work on that data set because I was really happy when I saw those
really happy when I saw those Transformations and those charts those
Transformations and those charts those visualization in data breaks because I
visualization in data breaks because I didn't imagine that we can build those
didn't imagine that we can build those data like data uh you can say analysis
data like data uh you can say analysis and data breaks using like no code
and data breaks using like no code because we do not need to write any
because we do not need to write any cbone code uh you can say no mat plot
cbone code uh you can say no mat plot lip code so nothing we just need to
lip code so nothing we just need to click on few boxes and charts will be
click on few boxes and charts will be built so let's write this one quickly
built so let's write this one quickly this is
this is BF and then
territories simple now all the data are pushed to
simple now all the data are pushed to Silver layer now it's time to pick pick
Silver layer now it's time to pick pick pick pick sales data frame so I'm really
pick pick sales data frame so I'm really excited for it so let me first read the
excited for it so let me first read the data and let me see what is it inside
data and let me see what is it inside the data
the data frame so
sales and let me just display sales. display and we already have three years
display and we already have three years of data within this so that was a smart
of data within this so that was a smart tip that we picked like we we just merge
tip that we picked like we we just merge the data frames together okay so we have
the data frames together okay so we have some nice stuff to use in our analysis
some nice stuff to use in our analysis we have date function we have order
we have date function we have order number we have customer we have so many
number we have customer we have so many stuff so we can just perform many
stuff so we can just perform many aggregations and I will show you how you
aggregations and I will show you how you can do that so let's start with like
can do that so let's start with like some Transformations and then we can
some Transformations and then we can start our analysis okay so what
start our analysis okay so what transformation we'll be doing let me
transformation we'll be doing let me show you let's say we want to cover one
show you let's say we want to cover one more date function which is time stamp
more date function which is time stamp if we want to convert a date into a Time
if we want to convert a date into a Time Stam date format so how we can do it we
Stam date format so how we can do it we will cover that one because we will
will cover that one because we will transform this talk date column right
transform this talk date column right this is our requirement number one then
this is our requirement number one then we need to replace yes replace I want to
we need to replace yes replace I want to cover that function as well because that
cover that function as well because that is really really important so what we'll
is really really important so what we'll be doing we will replace the alphabet s
be doing we will replace the alphabet s with let's say alphabet T just a
with let's say alphabet T just a requirement because that is really handy
requirement because that is really handy when you learn how to you can how you
when you learn how to you can how you can replace so we that is our second
can replace so we that is our second requirement then the third requirement
requirement then the third requirement yes we have three requirements in
yes we have three requirements in it then what we will be doing we will
it then what we will be doing we will simply perform
simply perform some kind of mathematical functions
some kind of mathematical functions because we haven't covered those so what
because we haven't covered those so what we'll be doing we will just perform a
we'll be doing we will just perform a multiply column where we will be
multiply column where we will be multiplying orderline item with order
multiplying orderline item with order quantity I know this doesn't make any
quantity I know this doesn't make any sense from business perspective but
sense from business perspective but being a developer you should know how
being a developer you should know how you can perform those functions like
you can perform those functions like multiplication of columns because it is
multiplication of columns because it is really necessary when you are working
really necessary when you are working with numerical data right so we will be
with numerical data right so we will be using some functions to achieve these
using some functions to achieve these results so we have three requirements
results so we have three requirements let me quickly do that and then we will
let me quickly do that and then we will perform some aggregations so you will be
perform some aggregations so you will be learning aggregation function as well so
learning aggregation function as well so what is that the function is Group
what is that the function is Group by the most powerful function in all the
by the most powerful function in all the tools and Technologies in the data world
tools and Technologies in the data world because we want to aggregate data we do
because we want to aggregate data we do not want transactional data we are data
not want transactional data we are data domains like like we we are the people
domains like like we we are the people who are building reports or building
who are building reports or building pipelines or analyzing the data or like
pipelines or analyzing the data or like doing some stuff based on the data
doing some stuff based on the data stored in the data
stored in the data warehouses so we do not want
warehouses so we do not want transactional data we want aggregated
transactional data we want aggregated data so let's do that first
data so let's do that first and let's let's let's let's let's get
and let's let's let's let's let's get started so first of all simple
started so first of all simple requirement which is converting the
requirement which is converting the stock date time stamp into the like the
stock date time stamp into the like the date format into the time stamp format
date format into the time stamp format so how we can do that so simply write
so how we can do that so simply write DF Sales equals to DF Sales do with
DF Sales equals to DF Sales do with column obviously and then we just need
column obviously and then we just need to use the same column name because we
to use the same column name because we will be change this column instead of
will be change this column instead of creating new one so column name is stock
creating new one so column name is stock dat that's correct then the
dat that's correct then the transformation that we need to apply is
transformation that we need to apply is two time stamp as you can see the
two time stamp as you can see the function is popped up click on it click
function is popped up click on it click on it two time stamp and what column we
on it two time stamp and what column we need to use to convert it to time stamp
need to use to convert it to time stamp we need to use talk
we need to use talk date
date simple so let's run
this okay this is finished so now in order to replace the S with t we will do
order to replace the S with t we will do our second transformation sales equals
our second transformation sales equals to DF
to DF Sales dot with column and then we need
Sales dot with column and then we need to say order date order number
to say order date order number sorry then we will use a function called
sorry then we will use a function called reg XP replace
reg XP replace yes so this function is used to replace
yes so this function is used to replace the characters with some other
the characters with some other characters so first of all I will just
characters so first of all I will just write the column name which is order
write the column name which is order number because this is the column in
number because this is the column in which we are applying replacing things
which we are applying replacing things right so after mentioning this column I
right so after mentioning this column I want to replace
S with t or let's say yeah s with t simple and let me just apply
simple and let me just apply it okay perfect now let's apply the
it okay perfect now let's apply the transformation which is multiplication
transformation which is multiplication of these two columns order line item and
of these two columns order line item and Order quantity okay let's do that and we
Order quantity okay let's do that and we will create a new column for this one
will create a new column for this one because this is just a random column we
because this is just a random column we just need to use
just need to use the multiplication operation that's it
the multiplication operation that's it okay DF do with
okay DF do with column uh let me just save it DF
column uh let me just save it DF Sales EF sales and then do with column
Sales EF sales and then do with column it's called
it's called multiply right then you need to say
multiply right then you need to say column
column of order line
of order line item
item asri column
asri column of
of order
order quantity simple all the three
quantity simple all the three Transformations are done simple now I
Transformations are done simple now I want to just look at my data frame
simple wow I can see the transformation stock dat is convert to time stamp order
stock dat is convert to time stamp order number is replaced with t like s is
number is replaced with t like s is replaced with t and then we have a new
replaced with t and then we have a new column called multiply where we can see
column called multiply where we can see this number is multiplied with this one
this number is multiplied with this one 2 into 2 = 4 simple now here comes the
2 into 2 = 4 simple now here comes the sales analysis let's create another
sales analysis let's create another heading and
input sales analysis simple so here we will be
analysis simple so here we will be performing a group aggregation how so
performing a group aggregation how so the requirement is I want to aggregate
the requirement is I want to aggregate my sales that means how many orders I
my sales that means how many orders I have received in one day like every day
have received in one day like every day how many orders did we receive so we
how many orders did we receive so we want to apply a group by on order date
want to apply a group by on order date and we want to find the count of order
and we want to find the count of order numbers by every day like by like how
numbers by every day like by like how many orders we received every
many orders we received every day right let's do that so how we can do
day right let's do that so how we can do that we have to apply aggregation okay
that we have to apply aggregation okay DF do sales dot Group by so there's a
DF do sales dot Group by so there's a function called Group by okay then we
function called Group by okay then we need to apply write the column name on
need to apply write the column name on which we need to apply Group by this
which we need to apply Group by this is order date
is order date let me just confirm the name yes it is
let me just confirm the name yes it is order date then we need to use AG
order date then we need to use AG function that stands for aggregation
function that stands for aggregation that says like hey I have grouped the
that says like hey I have grouped the data now you want to apply which
data now you want to apply which aggregation we want to apply aggregation
aggregation we want to apply aggregation count on which column it's order
count on which column it's order number simple then the new column will
number simple then the new column will be called as total orders yes this is
be called as total orders yes this is known as Alias function so now you are
known as Alias function so now you are familiar with alas function as
familiar with alas function as well simple so this was all about
well simple so this was all about aggregation very nice let me run this
aggregation very nice let me run this and let me actually display this as
and let me actually display this as well okay do display so now I want to
well okay do display so now I want to display
display it so this is our analysis number one
it so this is our analysis number one how many sales we have received every
how many sales we have received every day wow and now the best part I know the
day wow and now the best part I know the part you are waiting for we want to
part you are waiting for we want to create charts okay bro okay bro so here
create charts okay bro okay bro so here is
is the area where you can build charts so
the area where you can build charts so as you can see the option table let's
as you can see the option table let's say I want to show my manager that these
say I want to show my manager that these many sales we have received every day so
many sales we have received every day so instead of showing table because
instead of showing table because obviously it is very hard to just see
obviously it is very hard to just see the trend because I'm am interested to
the trend because I'm am interested to see the trend so what I can do I can
see the trend so what I can do I can simply click on this drop down and I can
simply click on this drop down and I can see here Plus button and then click on
see here Plus button and then click on visualization simple simple simple see
visualization simple simple simple see now as I can see the total orders by day
now as I can see the total orders by day and I can see a trend so now this is
and I can see a trend so now this is very much evident earlier it was quite
very much evident earlier it was quite stagnant but suddenly it boomed
stagnant but suddenly it boomed up when from 2016 July 1st so this is a
up when from 2016 July 1st so this is a kind of insight that is really
kind of insight that is really helpful see how easily you can build
helpful see how easily you can build charts so it is not like you cannot
charts so it is not like you cannot modify it simply go here and just choose
modify it simply go here and just choose area chart as well
area chart as well see you do not need to write any code
see you do not need to write any code this is
this is crazy now you can see you have so many
crazy now you can see you have so many options so if you want to change colors
options so if you want to change colors and all you can just do it from here
and all you can just do it from here like there are so many things if you
like there are so many things if you want to choose group bu you can say it's
want to choose group bu you can say it's like total order like anything thing and
like total order like anything thing and you can even change the Y column as well
you can even change the Y column as well like the aggregations we are using for
like the aggregations we are using for our charts and if you want to save it
our charts and if you want to save it you will be saying hey it is okay I can
you will be saying hey it is okay I can see it but how we can save it in
see it but how we can save it in Notebook just click on save button bro
Notebook just click on save button bro that's
that's it that's it now you can just share this
it that's it now you can just share this notebook with your
notebook with your manager this is
manager this is crazy and obviously if any interviewer
crazy and obviously if any interviewer ask you to create
ask you to create visualizations just say we know how to
visualizations just say we know how to create visualization and yes it is
create visualization and yes it is possible so it is really cool it is
possible so it is really cool it is really cool let me show you one more uh
really cool let me show you one more uh you can say
you can say visualization so what I want to
visualization so what I want to do I want to use category column because
do I want to use category column because I think in category column we have some
I think in category column we have some categorization so I want to use a pie
categorization so I want to use a pie chart so this was analysis number one
chart so this was analysis number one let me choose analysis number two okay
let me choose analysis number two okay so for that I want to First display the
so for that I want to First display the data that is available for category so
data that is available for category so DS
DS category subcategory do
display I think it was product category yes let me display
yes let me display it okay so if I want to see the
it okay so if I want to see the visualization for this like the pie
visualization for this like the pie chart or anything where I want to see
chart or anything where I want to see the composition the distribution like
the composition the distribution like which category is performing good like
which category is performing good like which how many how much percentage it
which how many how much percentage it has acquired so simple visualization
has acquired so simple visualization I can just display the uh the bar chart
I can just display the uh the bar chart as well right but I want to display pie
as well right but I want to display pie chart click on pie chart wow cool man
chart click on pie chart wow cool man cool that's cool
cool that's cool simple that's amazing that's amazing so
simple that's amazing that's amazing so one more as I promised that we will be
one more as I promised that we will be doing three things so third thing H what
doing three things so third thing H what we should pick okay let's pick territory
we should pick okay let's pick territory data because we didn't uh apply much
data because we didn't uh apply much Transformations on top of it so let's
Transformations on top of it so let's use it in our
use it in our analysis okay so for that we will be
analysis okay so for that we will be using
using DF dot display and I know you're going
DF dot display and I know you're going to play with it a lot I know because I
to play with it a lot I know because I also played it played with it a lot it's
also played it played with it a lot it's really cool feature so now I want to
really cool feature so now I want to create what yes I I will be just using
create what yes I I will be just using country so I will just pick
country so I will just pick visualization so it will say no data
visualization so it will say no data because it it does not know like what
because it it does not know like what will my X column what will be my y
will my X column what will be my y column no worries so for X column y
column no worries so for X column y column you can just put General and you
column you can just put General and you can just pick X column is region okay
can just pick X column is region okay and you can say y column
and you can say y column is scale is
is scale is automatic
automatic okay I think you just need to pick like
okay I think you just need to pick like y column add column and you can just put
y column add column and you can just put anything just play with it so you can
anything just play with it so you can say hey in every region or let's say in
say hey in every region or let's say in every country how many regions do we
every country how many regions do we have a kind of thing that you can you
have a kind of thing that you can you can just put in your aggregations so as
can just put in your aggregations so as I can see one one one one one but
I can see one one one one one but suddenly five regions in my us so this
suddenly five regions in my us so this is a kind of analysis that we can use if
is a kind of analysis that we can use if you want to find how many regions are
you want to find how many regions are contributing in each region like in each
contributing in each region like in each country then I can also pick let's say
country then I can also pick let's say continent
continent then I can just pick says territory key
then I can just pick says territory key and instead of count we can just say Min
and instead of count we can just say Min median average anything so you can just
median average anything so you can just play with it that's I would leave it up
play with it that's I would leave it up to you so let's save it because this is
to you so let's save it because this is the third one let's save it and these
the third one let's save it and these three are the analysis that we have done
three are the analysis that we have done and you can just play with it because
and you can just play with it because it's really a cool feature and after
it's really a cool feature and after playing with it it's time to just run
playing with it it's time to just run write the data and I think I have not
write the data and I think I have not written the data like sales data it's
written the data like sales data it's time to write the data as well so I will
time to write the data as well so I will just put the right command here above
just put the right command here above the sales analysis yes so that you can
the sales analysis yes so that you can just refer the notebook and you should
just refer the notebook and you should know okay all the data has been written
know okay all the data has been written and now it's time to analys analyze the
and now it's time to analys analyze the data so to write the data simply write
data so to write the data simply write DF
DF Sales let me just
Sales let me just copy it will just save us a lot of time
copy it will just save us a lot of time okay so
okay so now we just need to write it DF
now we just need to write it DF Sales okay and we need to create a
Sales okay and we need to create a folder called sales simple let me run
it perfect so all the data is pushed to Silver layer all the data is analyzed as
Silver layer all the data is analyzed as well we have just done the analysis as
well we have just done the analysis as well so let let me just test it if we
well so let let me just test it if we have data in silver layer or not because
have data in silver layer or not because we didn't validate it for the other
we didn't validate it for the other files yes we have all the data so that
files yes we have all the data so that means that
means that means that means we
means that means we are done with our phase two of the
are done with our phase two of the project as well we have successfully
project as well we have successfully pushed our data to Silver layer from
pushed our data to Silver layer from bronze layer and we have used this
bronze layer and we have used this powerful tool data brakes we
powerful tool data brakes we have like worked with pocket file format
have like worked with pocket file format we have did analysis on top of our data
we have did analysis on top of our data we have buil visualizations as well so
we have buil visualizations as well so we did so much crazy stuff within this
we did so much crazy stuff within this phase so the next phase is serving layer
phase so the next phase is serving layer we need to create a serving layer in
we need to create a serving layer in synapse analytics yes we will be doing
synapse analytics yes we will be doing that in our next phase of this project
that in our next phase of this project so we have covered phase one phase two
so we have covered phase one phase two successfully and I know you have learned
successfully and I know you have learned a lot just tap your back because you
a lot just tap your back because you have done a great job so congratulations
have done a great job so congratulations and now it's time to deploy now it's
and now it's time to deploy now it's time to work with our third phase or you
time to work with our third phase or you can say the third resource synaps
can say the third resource synaps analytics so we are entering into our
analytics so we are entering into our phase three of our project so I think
phase three of our project so I think now you know what you want to do next
now you know what you want to do next yes we will be creating our resource
yes we will be creating our resource which is synapse analytics so basically
which is synapse analytics so basically synapse analytics is a data warehouse
synapse analytics is a data warehouse solution available in aure and it was
solution available in aure and it was really really popular and now also it is
really really popular and now also it is really popular and obviously fabric is
really popular and obviously fabric is the next step after synapse but still as
the next step after synapse but still as your synapse is very much in use and
your synapse is very much in use and your snaps will be there so do not worry
your snaps will be there so do not worry do not believe in myths your snap snaps
do not believe in myths your snap snaps analytics is there so do not worry at
analytics is there so do not worry at all and just learn the skill and without
all and just learn the skill and without wasting any time let's first create our
wasting any time let's first create our resource so first of all just click on
resource so first of all just click on create
create resource and now you need to search
resource and now you need to search synapse synapse
synapse synapse analytics and it will take us to the
analytics and it will take us to the marketplace yeah just click on this
marketplace yeah just click on this Microsoft One do you know what actually
Microsoft One do you know what actually is synaps
is synaps analytics let me tell you
analytics let me tell you let me tell you just hold on let me tell
let me tell you just hold on let me tell you so let's configure this first and
you so let's configure this first and then I will just give you a brief and I
then I will just give you a brief and I will explain you what exactly snap ntic
will explain you what exactly snap ntic is when we will just enter into the
is when we will just enter into the workspace so it will be a better
workspace so it will be a better experience for you as well right so
experience for you as well right so first of all we need to pick the
first of all we need to pick the resource Group simple then again in the
resource Group simple then again in the snaps analytics as well we have managed
snaps analytics as well we have managed Resource Group as we have in data
Resource Group as we have in data breakes so same thing here so I will
breakes so same thing here so I will just name it as r G that is Resource
just name it as r G that is Resource Group managed and aw project and this is
Group managed and aw project and this is for
for synapse so we do not need to worry about
synapse so we do not need to worry about it because aure will be taking care of
it because aure will be taking care of it and it will be saving all its
it and it will be saving all its resources so just forget about it and
resources so just forget about it and just create a resource Group for it now
just create a resource Group for it now we need to name name our workspace so I
we need to name name our workspace so I will say h
will say h synapse and I will add aw
synapse and I will add aw project as my as my prefix okay
project as my as my prefix okay and then we need to create a storage
and then we need to create a storage account as well yes whenever we create
account as well yes whenever we create snaps analytics we need to create a
snaps analytics we need to create a default storage account as well because
default storage account as well because snaps uses it as well so it's time to
snaps uses it as well so it's time to create a storage account so if we
create a storage account so if we already want to like if if we already
already want to like if if we already have a storage account and if we want to
have a storage account and if we want to use it as our default storage then we
use it as our default storage then we can assign it but ideally you should
can assign it but ideally you should create a new one and do not touch that
create a new one and do not touch that one because we have our data stored in
one because we have our data stored in different data l so so this is just for
different data l so so this is just for synapse so these are some kind of things
synapse so these are some kind of things that should be there for synapse
that should be there for synapse ntic so we will click on create
ntic so we will click on create new and let's say I want to name it as
new and let's say I want to name it as default
default synapse storage simple so it will oh
synapse storage simple so it will oh it's already taken so let's say default
it's already taken so let's say default synapse oh it's there yes we can pick it
synapse oh it's there yes we can pick it then we need to create file system name
then we need to create file system name and there's no available then we need to
and there's no available then we need to create one create new I will
create one create new I will say
say default file
default file system perfect oh it's there yes
system perfect oh it's there yes okay so basically it is a default
okay so basically it is a default storage because whenever we create
storage because whenever we create something we need to upload data
something we need to upload data somewhere right so it's up to us if we
somewhere right so it's up to us if we want to use default storage or we want
want to use default storage or we want to use our own external accounts so
to use our own external accounts so obviously in most of the scenarios we
obviously in most of the scenarios we use our own external data L external
use our own external data L external storage account so
storage account so just ignore it just create it and ignore
just ignore it just create it and ignore it then simply click on review plus
it then simply click on review plus create simple or you can just click on
create simple or you can just click on next security so that you can have some
next security so that you can have some configuration regarding SQL Server so as
configuration regarding SQL Server so as you can see these are some of the
you can see these are some of the credentials that you can set for your
credentials that you can set for your dedicated SQL pool do not worry we'll
dedicated SQL pool do not worry we'll tell you what is dedicated SQL pool so
tell you what is dedicated SQL pool so let's me just set my password so I can
let's me just set my password so I can keep any password like obviously you can
keep any password like obviously you can set your own
password because it will be required whenever you want to connect with the
whenever you want to connect with the SQL pool so it is
SQL pool so it is required and I will say here
required and I will say here admin
admin okay admin onch okay I can pick
it then mhm select the ground workspace network access to data or using
network access to data or using workspace identity just use
it okay workspace encryption it's fine we
okay workspace encryption it's fine we do not need to encrypt anything then
do not need to encrypt anything then just check double check everything
just check double check everything everything is fine yes just click on
everything is fine yes just click on networking just disabled virtual Network
networking just disabled virtual Network because we are not using vets here so
because we are not using vets here so vnet is like simply a virtual Network
vnet is like simply a virtual Network where we just create our own network
where we just create our own network you can also relate it with private
you can also relate it with private endpoints so do not worry about that
endpoints so do not worry about that it's all for the restricted data that we
it's all for the restricted data that we use within the industry and when we are
use within the industry and when we are working with sensitive data so we use
working with sensitive data so we use vux so then click on tags next review
vux so then click on tags next review plus create and then simply click
plus create and then simply click on create so it will create a resource
on create so it will create a resource for you which is synapse analytics so
for you which is synapse analytics so once it is done I will give you a brief
once it is done I will give you a brief overview of the snap analytics like why
overview of the snap analytics like why it is in demand and what exactly it
it is in demand and what exactly it offers us so will tell you each and
offers us so will tell you each and everything so let's wait for a few more
everything so let's wait for a few more seconds and let's see what it gives us
seconds and let's see what it gives us and yes this is our serving layer this
and yes this is our serving layer this is our serving Zone where we'll be
is our serving Zone where we'll be creating data available for data analyst
creating data available for data analyst data scientist reporting analyst bi
data scientist reporting analyst bi developers all of them will'll be using
developers all of them will'll be using data from this
data from this layer so it is just creating snaps
layer so it is just creating snaps workspace just hold on for a few more
workspace just hold on for a few more seconds I think it should be
seconds I think it should be [Music]
done by the way till the time it is being
being created how is this video so if you find
created how is this video so if you find it insightful then definitely tell me
it insightful then definitely tell me comment section or any suggestions any
comment section or any suggestions any feedbacks I'm open to all of them so
feedbacks I'm open to all of them so just give me suggestions feedback so
just give me suggestions feedback so that I can just improve my content and
that I can just improve my content and if you do not want to give any
if you do not want to give any suggestions if you just want to give me
suggestions if you just want to give me love so just type anything which can
love so just type anything which can make me happy and make me feel motivated
make me happy and make me feel motivated to create more and more content in
to create more and more content in future so I'll be just creating content
future so I'll be just creating content for you guys so that you can practice
for you guys so that you can practice and learn and just succeed in your data
and learn and just succeed in your data Journey
Journey that's my whole intent of this channel
that's my whole intent of this channel that's it I just want to share my
that's it I just want to share my knowledge with you all I just want to
knowledge with you all I just want to guide you and I just want to be a part
guide you and I just want to be a part of your success Journey just even one
of your success Journey just even one person I'll be happy so that's the whole
person I'll be happy so that's the whole intent and yeah I know like you love me
intent and yeah I know like you love me a
a lot so deployment still in is in
hm let's fast forward it so finally it is deployed and now it's time to explore
is deployed and now it's time to explore synapse analytics you going to love this
synapse analytics you going to love this tool trust me because of this reason
tool trust me because of this reason that I'm going to tell you right now so
that I'm going to tell you right now so without wasting any time just click on
without wasting any time just click on this lovely button go to your resource
this lovely button go to your resource Group just go there and here you can see
Group just go there and here you can see your snaps ntic workspace plus just just
your snaps ntic workspace plus just just just focus on this part here you can see
just focus on this part here you can see default synap storage account I just
default synap storage account I just told you that this is a default storage
told you that this is a default storage account that synapse uses right so you
account that synapse uses right so you can put your data inside this as well
can put your data inside this as well and you do not need to worry about any
and you do not need to worry about any access issues because it will be
access issues because it will be automatically pulling the data from it
automatically pulling the data from it because this is a default storage but we
because this is a default storage but we ideally do not use it we use our
ideally do not use it we use our external data sources so do not worry at
external data sources so do not worry at all so you should be aware of this thing
all so you should be aware of this thing so that's why I created this as well
so that's why I created this as well let's click on our synapse analytics
let's click on our synapse analytics workspace so click on open and it will
workspace so click on open and it will just show the same window like where we
just show the same window like where we can see
can see the uh portal synapse portal or synapse
workspace okay so this is our Azure syapse analytics I know I know I know
syapse analytics I know I know I know this is similar to aure data Factory so
this is similar to aure data Factory so now it's time to expl expor synapse and
now it's time to expl expor synapse and you will know why I was saying this that
you will know why I was saying this that this is such a lovely tool and let me
this is such a lovely tool and let me give you an overview of this tool so
give you an overview of this tool so first of all we have here
first of all we have here beautiful area where you can see similar
beautiful area where you can see similar stuff as we had and your data Factory so
stuff as we had and your data Factory so it's time to explain synapse analytics
it's time to explain synapse analytics so basically synapse analytics is a
so basically synapse analytics is a unified platform why unified platform
unified platform why unified platform because in your synapse analytics we can
because in your synapse analytics we can we can combine ADF which is a your data
we can combine ADF which is a your data Factory Plus data
Factory Plus data breaks not exactly data brakes but spark
breaks not exactly data brakes but spark cluster so let me just write
cluster so let me just write spark or P spark let's say spark right
spark or P spark let's say spark right Plus data
warehousing so that's why it is called as unified platform So within synapse
as unified platform So within synapse analytics we get the opportunity to work
analytics we get the opportunity to work with a jaw data Factory and here we have
with a jaw data Factory and here we have data Factory as well embedded within
data Factory as well embedded within this workspace let me show you and we
this workspace let me show you and we have spark as well yes we have SQL data
have spark as well yes we have SQL data warehouse as
warehouse as well yes this is such a powerful tool so
well yes this is such a powerful tool so as you can see here the
as you can see here the symbol this one so this is the area
symbol this one so this is the area where you can actually build your data
where you can actually build your data pipelines with the exact user interface
pipelines with the exact user interface and with exact functionality let me
and with exact functionality let me click on it and let me show you
click on it and let me show you see it is called integrate just name is
see it is called integrate just name is different but everything is same click
different but everything is same click on this plus button then you will see
on this plus button then you will see pipeline just click on it and then you
pipeline just click on it and then you will see your ADF your a data Factory is
will see your ADF your a data Factory is open no it is not a data Factory as a
open no it is not a data Factory as a resource is this it is synapse analytics
resource is this it is synapse analytics but it has some features available which
but it has some features available which gives us the exact ex same functionality
gives us the exact ex same functionality and user interface as we get an aure
and user interface as we get an aure data Factory so this is the power of
data Factory so this is the power of synapse analytics so if you want to
synapse analytics so if you want to create your pipelines you can either
create your pipelines you can either create in your data Factory or synaps
create in your data Factory or synaps analytics choice is
analytics choice is yours and now you will ask me so do
yours and now you will ask me so do people still use a your data Factory yes
people still use a your data Factory yes because if they want to work with a your
because if they want to work with a your data breakes or just data bre they use
data breakes or just data bre they use like aure data Factory for orchestration
like aure data Factory for orchestration and everything is same
and everything is same all the functions all the things that we
all the functions all the things that we created like Dynamic pipelines functions
created like Dynamic pipelines functions parameters everything everything is same
parameters everything everything is same everything is same exact same okay so
everything is same exact same okay so this was all about aure data Factory
this was all about aure data Factory what about Spark from where we can just
what about Spark from where we can just get this spark let me show you if you go
get this spark let me show you if you go to
to here this is called
here this is called Script this is called Script let me
Script this is called Script let me click on
click on it so here you can just build your all
it so here you can just build your all the scripts scripts scripts I mean SQL
the scripts scripts scripts I mean SQL scripts or you can say data brakes
scripts or you can say data brakes notebooks but here is a catch we cannot
notebooks but here is a catch we cannot call it as data brakes because datab
call it as data brakes because datab Brak is a different Identity or you can
Brak is a different Identity or you can say different organization different
say different organization different management layer for spark synapse
management layer for spark synapse analytics has a separate spark cluster
analytics has a separate spark cluster management layer which is called spark
management layer which is called spark pool so if I click on this plus
pool so if I click on this plus sign I will see an option called sqls
sign I will see an option called sqls script kql script which is custo query
script kql script which is custo query language just forget about that for now
language just forget about that for now then we have notebook if I click on
then we have notebook if I click on Notebook let me click it for you if I
Notebook let me click it for you if I click on Notebook see here you can see
click on Notebook see here you can see the exact same UI that we had in data
the exact same UI that we had in data Brak
Brak but I cannot say that this is data Brak
but I cannot say that this is data Brak no this is not data BR this is Park Pool
no this is not data BR this is Park Pool which is managed by synapse analytics
which is managed by synapse analytics datab braks is a different entity spark
datab braks is a different entity spark PO is a different entity but both are
PO is a different entity but both are managing spark clusters simple yes you
managing spark clusters simple yes you can apply all your ppar functions ppar
can apply all your ppar functions ppar coding because that is an open open
coding because that is an open open source Library you can use it anywhere
source Library you can use it anywhere but I talking about about the you can
but I talking about about the you can say the management layer that we get in
say the management layer that we get in the form of data braks or spark pool so
the form of data braks or spark pool so this was all about the spark
this was all about the spark functionality that we get in synaps ntic
functionality that we get in synaps ntic so here you can see languages we can
so here you can see languages we can pick all the languages that we want plus
pick all the languages that we want plus we have like all the same features we
we have like all the same features we want to add headings we can just use
want to add headings we can just use magic commands
magic commands everything then third thing where is our
everything then third thing where is our data warehousing let me show you that as
data warehousing let me show you that as well so this is the data tab so when you
well so this is the data tab so when you click on it this
click on it this one you can see
one you can see workspace and when you click on plus
workspace and when you click on plus button
button you will see the option to create a SQL
you will see the option to create a SQL database see this one so this is also
database see this one so this is also known as you can say data warehousing
known as you can say data warehousing solution where you can just create your
solution where you can just create your tables and you can just create facts and
tables and you can just create facts and dimensions and build your data warehouse
dimensions and build your data warehouse now what is this L database so as I just
now what is this L database so as I just mentioned that it gives us the ability
mentioned that it gives us the ability to use spark so in spark when we create
to use spark so in spark when we create the database it is called called lake
the database it is called called lake house or you can say Lake database so if
house or you can say Lake database so if we are creating our tables using spark
we are creating our tables using spark code then we call it as Lake database in
code then we call it as Lake database in synapse and in data brakes we call it as
synapse and in data brakes we call it as lake house
lake house simple so in this project we will be
simple so in this project we will be creating SQL databases so do not worry
creating SQL databases so do not worry about that thing as well so this was all
about that thing as well so this was all about the three the like the top three
about the three the like the top three layers of syap analytics ADF that is for
layers of syap analytics ADF that is for pipelines spark clusters for by Spar
pipelines spark clusters for by Spar coding third is data warehousing so this
coding third is data warehousing so this was all about uh
was all about uh atics plus like these two Tabs are
atics plus like these two Tabs are similar to the tabs that we got in enj
similar to the tabs that we got in enj data Factory Monitor and manage tab so
data Factory Monitor and manage tab so it is simple so how is it like what are
it is simple so how is it like what are what are your views on the synaps
what are your views on the synaps analytics workspace because I like it
analytics workspace because I like it very much because I can build my
very much because I can build my pipelines I can use spk code I can just
pipelines I can use spk code I can just create data warehouse and then I can
create data warehouse and then I can create my data warehouse with power
create my data warehouse with power such a massive tool like you can say
such a massive tool like you can say Unified solution everything in one place
Unified solution everything in one place this is the kind of tool that we should
this is the kind of tool that we should use nowadays and obviously everyone is
use nowadays and obviously everyone is using not everyone but obviously as your
using not everyone but obviously as your data engineers and companies who are
data engineers and companies who are relying on a your they use synapse
relying on a your they use synapse analytics a lot I also do like using
analytics a lot I also do like using synapse analytics a lot so this is my
synapse analytics a lot so this is my personal favorite as well so now it's
personal favorite as well so now it's time to create the data warehouse
time to create the data warehouse yes so so so so how we going to do it
yes so so so so how we going to do it let me just click on this home button
let me just click on this home button first let's forget let's forget
first let's forget let's forget everything and let's start from scratch
everything and let's start from scratch so you just need to repeat one more step
so you just need to repeat one more step what is that but it is much more easier
what is that but it is much more easier now so let me show you let's say this is
now so let me show you let's say this is our synapse
our synapse workspace this one and and this is our
workspace this one and and this is our data
data Lake right our external storage account
Lake right our external storage account same thing how this synapse analytics
same thing how this synapse analytics workspace will be accessing this data
workspace will be accessing this data Lake uhhuh a quick revision for you yes
Lake uhhuh a quick revision for you yes we need to allow the exess but here's a
we need to allow the exess but here's a good news for you because your snaps
good news for you because your snaps ntic and aure data L both are aure
ntic and aure data L both are aure product so we do not need to
product so we do not need to include a third party application here
include a third party application here or any service application any we do not
or any service application any we do not need to register any kind of
need to register any kind of application we can directly access this
application we can directly access this data Lake using this synaps analytics by
data Lake using this synaps analytics by just allowing our workspace the
just allowing our workspace the permission because by default our
permission because by default our synapse analytics has a
synapse analytics has a credential and we just need to assign
credential and we just need to assign the access assign the role to that cred
the access assign the role to that cred credential this is the best thing of
credential this is the best thing of using your services so without wasting
using your services so without wasting any time let's allow our synapse
any time let's allow our synapse workspace to access the data stored in
workspace to access the data stored in the data Lake again this is regarding
the data Lake again this is regarding the ownership of the data you as a data
the ownership of the data you as a data engineer need to take care of all these
engineer need to take care of all these things like data access and all so let's
things like data access and all so let's allow our synapse workspace to access
allow our synapse workspace to access this data again using using the identity
this data again using using the identity using the credential that our snaps
using the credential that our snaps workspace or any aure resource get by
workspace or any aure resource get by default and this can be an interview
default and this can be an interview question what is the name of that
question what is the name of that identity it is called managed Identity
identity it is called managed Identity or system managed
or system managed identity just keep this information in
identity just keep this information in your head because this can save you in
your head because this can save you in your
your interviews system managed identity SL
interviews system managed identity SL managed identity right so it's time to
managed identity right so it's time to just use this okay let me jump onto my
just use this okay let me jump onto my aure portal and again I will first go to
aure portal and again I will first go to my Resource Group then I will click on
my Resource Group then I will click on my external data L which is this
my external data L which is this one okay then again the same steps we
one okay then again the same steps we will be going to access control am this
will be going to access control am this is a revision for you guys this is a
is a revision for you guys this is a revision for you you need to do the
revision for you you need to do the following steps on your own do not do
following steps on your own do not do not look into the screen just do it on
not look into the screen just do it on your own and do not worry if you stuck
your own and do not worry if you stuck anywhere I'm here to help so after
anywhere I'm here to help so after clicking on this we just need to click
clicking on this we just need to click on ADD why I saying this because it will
on ADD why I saying this because it will be good for you it will be a quick
be good for you it will be a quick revision for you and when you revise
revision for you and when you revise things you can just save the that
things you can just save the that information in your mind so I was being
information in your mind so I was being a little concerned about you so click on
a little concerned about you so click on ADD and just click on ADD Ro
ADD and just click on ADD Ro assignment and here the same thing we
assignment and here the same thing we need to assign a role and role is known
need to assign a role and role is known as storage blob data contributor this
as storage blob data contributor this one click on it and just click on next
one click on it and just click on next simple then here's the catch as we are
simple then here's the catch as we are not using any kind of service principle
not using any kind of service principle or any kind of user group we are using
or any kind of user group we are using managed
managed identity again I'm explaining this thing
identity again I'm explaining this thing bro because this is really important
bro because this is really important managed identity is a kind of
managed identity is a kind of identity you can say it is a kind of ID
identity you can say it is a kind of ID card that every Azure resource get by
card that every Azure resource get by default so let's say we have created
default so let's say we have created aure snaps analytics let me draw it for
aure snaps analytics let me draw it for you so because you can capture visuals
you so because you can capture visuals in your head I do not want you to just
in your head I do not want you to just be silent when any interviewer asks this
be silent when any interviewer asks this question from you so let's say we
question from you so let's say we created an aure synaps analytics
created an aure synaps analytics workspace so by default by default it
workspace so by default by default it has this credential see this small ID
has this credential see this small ID card which is known as managed Identity
card which is known as managed Identity or system managed identity
or system managed identity it automatically gets it it is like a
it automatically gets it it is like a free combo you can just you can just
free combo you can just you can just follow this analogy this is a kind of
follow this analogy this is a kind of free combo like you create a resource
free combo like you create a resource you get a free ID card as well and now
you get a free ID card as well and now we just need to assign a role on this ID
we just need to assign a role on this ID card simple now I know this information
card simple now I know this information will be saved in your head just choose
will be saved in your head just choose the manage identity and then click on
the manage identity and then click on select members what member we need to
select members what member we need to pick now we need to pick the ID card of
pick now we need to pick the ID card of our synapse workspace so just search
our synapse workspace so just search your synapse workspace name here it is
your synapse workspace name here it is click on synapse workspace then it will
click on synapse workspace then it will ask you to select the name and you can
ask you to select the name and you can just select your synaps workspace name
just select your synaps workspace name which is aw project hyphen synapse it is
which is aw project hyphen synapse it is my synapse workspace so you will have a
my synapse workspace so you will have a different name do not worry click on it
different name do not worry click on it and just click on select so as you know
and just click on select so as you know this is known as managed
this is known as managed identity this is known as man man
identity this is known as man man identity click on select and simple
identity click on select and simple click on review plus assign so basically
click on review plus assign so basically it will just show me that role is added
it will just show me that role is added but it takes like 10 to 15 minutes so if
but it takes like 10 to 15 minutes so if you still cannot access the data do not
you still cannot access the data do not worry just chill have some coffee or tea
worry just chill have some coffee or tea and just wait for 10 to 15 minutes and
and just wait for 10 to 15 minutes and my gym lovers can have some shake and
my gym lovers can have some shake and all so without wasting in time let's
all so without wasting in time let's jump back to the synapse workspace so
jump back to the synapse workspace so click on it and now we are good to go
click on it and now we are good to go now now our synapse workspace has the
now now our synapse workspace has the exess now we can use the data stored in
exess now we can use the data stored in our data L because obviously we need to
our data L because obviously we need to pick the data stored in the silver layer
pick the data stored in the silver layer silver container right yes that was all
silver container right yes that was all about giving excess to over snaps
about giving excess to over snaps workpace let me cancel this
workpace let me cancel this one just cross it yeah discard changes
one just cross it yeah discard changes because we do not want to save it
because we do not want to save it discard changes yes because we need to
discard changes yes because we need to just start from start from scratch so
just start from start from scratch so simply go on your develop tab this one
simply go on your develop tab this one which is also known as scripts I call it
which is also known as scripts I call it as scripts because we just create
as scripts because we just create scripts there click on it simple and now
scripts there click on it simple and now you will see a plus sign in the develop
you will see a plus sign in the develop tab click on it and then it will ask you
tab click on it and then it will ask you what you need to create we want to
what you need to create we want to create a SQL script
create a SQL script okay simple this is our SQL query editor
okay simple this is our SQL query editor where we will be just writing SQL
where we will be just writing SQL scripts where we will be querying our
scripts where we will be querying our data using SQL SQL like everyone is a of
data using SQL SQL like everyone is a of SQL I know SQL is like the bread and
SQL I know SQL is like the bread and butter of every data professional
butter of every data professional okay so first of all let me just tell
okay so first of all let me just tell like like okay you tell me what is the
like like okay you tell me what is the first thing we should have in
first thing we should have in SQL if you want to query if you want to
SQL if you want to query if you want to save our data what is the first thing
save our data what is the first thing yes it's database we should have a
yes it's database we should have a database because in the database we'll
database because in the database we'll be creating tables views everything
be creating tables views everything we'll be quering tables or views inside
we'll be quering tables or views inside the database so let's create the
the database so let's create the database first now you will see we
database first now you will see we already have a master database that is a
already have a master database that is a kind of by default database but we will
kind of by default database but we will be creating our own okay how we can
be creating our own okay how we can create a database you can either use
create a database you can either use command create database database name
command create database database name but we'll be using UI because we are
but we'll be using UI because we are learning synapse analytics so it's
learning synapse analytics so it's really good using synapse instead of
really good using synapse instead of just writing the code so just click on
just writing the code so just click on the drop down button and you will see
the drop down button and you will see nothing instead of Master because we can
nothing instead of Master because we can only see the master command okay so now
only see the master command okay so now if you want to create a database you
if you want to create a database you simply need to go on the data tab this
simply need to go on the data tab this one okay let me go there then click on
one okay let me go there then click on the plus tab simple then it will show
the plus tab simple then it will show you the option to create SQL database
you the option to create SQL database select the SQL
select the SQL database yes now is the time to discuss
database yes now is the time to discuss this as well okay finally so in the
this as well okay finally so in the synapse analytics we get two options
synapse analytics we get two options serverless and dedicated
serverless and dedicated what is the difference between
what is the difference between serverless SQL pool and dedicated SQL
serverless SQL pool and dedicated SQL pool
pool so H so basically let me explain you in
so H so basically let me explain you in like easy language so dedicated SQL pool
like easy language so dedicated SQL pool is the traditional you can say
is the traditional you can say traditional way of storing data where
traditional way of storing data where data actually resides in the database
data actually resides in the database like you would have worked
like you would have worked with uh my SQL workbench post gr SQL Ms
with uh my SQL workbench post gr SQL Ms SQL Server right where data actually
SQL Server right where data actually resides in the
resides in the database then what is the difference so
database then what is the difference so the difference is we users like
the difference is we users like distributions here in the dedicated SQL
distributions here in the dedicated SQL pool we use like compute node control
pool we use like compute node control node do not worry about that things so
node do not worry about that things so it is a kind of traditional database but
it is a kind of traditional database but on cloud and obviously it is optimized
on cloud and obviously it is optimized for query reads for big data for data
for query reads for big data for data warehousing right we get so many things
warehousing right we get so many things like we can just set up a distribution
like we can just set up a distribution hash replicated and Round Robin we can
hash replicated and Round Robin we can set the partitions so all these
set the partitions so all these things what is serverless then so you
things what is serverless then so you must be heard like a word called lak
must be heard like a word called lak house right where we do not actually
house right where we do not actually store the data in the databases our data
store the data in the databases our data let me draw it for you let me draw it
let me draw it for you let me draw it for you so what is
for you so what is serverless basically for that you just
serverless basically for that you just need to understand the lake house
need to understand the lake house concept first so basically let's say
concept first so basically let's say this is your data Lake in which you have
this is your data Lake in which you have small small files like not small like
small small files like not small like you have data within this data L right
you have data within this data L right now you want to create a database which
now you want to create a database which is called serverless database so what it
is called serverless database so what it will be doing this data let's say is in
will be doing this data let's say is in the CSV format right let me just write
the CSV format right let me just write CSV for you don't mind my
CSV for you don't mind my handwriting
handwriting because I used to be
sched in my school every single day because of my handwriting so just ignore
because of my handwriting so just ignore so let's say we have CSV data in our
so let's say we have CSV data in our data right and I want to create a
data right and I want to create a database in which I want to query this
database in which I want to query this data this data I want to query this data
data this data I want to query this data I want to apply select test tricks from
I want to apply select test tricks from this this this on this file but but but
this this this on this file but but but but I do not want to traditionally save
but I do not want to traditionally save this data within this
this data within this database what do I mean I mean I want my
database what do I mean I mean I want my data to stay in this data Lake but I
data to stay in this data Lake but I want to apply select statement so here
want to apply select statement so here comes the concept of lake house where
comes the concept of lake house where data will be residing in the data Lake
data will be residing in the data Lake but it will create an abstraction layer
but it will create an abstraction layer a kind of metadata layer in which it
a kind of metadata layer in which it will store the metadata metadata means
will store the metadata metadata means like columns headers all the information
like columns headers all the information regarding the data so whenever I will be
regarding the data so whenever I will be querying this data let's say this is
querying this data let's say this is me okay I am much more smarter than this
me okay I am much more smarter than this guy but let's say this is
guy but let's say this is me when I'm quering this database right
me when I'm quering this database right when I'm quering this database so I will
when I'm quering this database so I will be writing select as tricks from my
be writing select as tricks from my table right what it will be doing it
table right what it will be doing it will apply this metadata the columns on
will apply this metadata the columns on this data stored in the data lake so I
this data stored in the data lake so I will feel like I am quering a
will feel like I am quering a traditional database but actually I am
traditional database but actually I am just quering this metad data and this
just quering this metad data and this serverless SQL pool will do all the work
serverless SQL pool will do all the work behind the scenes in which it will pull
behind the scenes in which it will pull this data it will apply this metadata
this data it will apply this metadata layer and it will give return like it
layer and it will give return like it will just return the result result to me
will just return the result result to me see this is Magic actually this is not
see this is Magic actually this is not magic this is called lake house concept
magic this is called lake house concept where we use our data Lake but at the
where we use our data Lake but at the same time we want our data to perform as
same time we want our data to perform as a data warehouse so that's why when we
a data warehouse so that's why when we combine data warehouse Plus data Lake it
combine data warehouse Plus data Lake it becomes lake
becomes lake house understood if not just rewatch
house understood if not just rewatch this part because this is really really
this part because this is really really important for you so so just to brief
important for you so so just to brief our data resides in the data Lake why we
our data resides in the data Lake why we want our data to be residing in data
want our data to be residing in data Lake bro because we want to save cost
Lake bro because we want to save cost storing data in the data lake is very
storing data in the data lake is very cheap as compared to storing data in
cheap as compared to storing data in databases and data is growing rapidly it
databases and data is growing rapidly it is growing exponentially so we do not
is growing exponentially so we do not want to spend much money on storing data
want to spend much money on storing data in the
in the databases here is the revolution of data
databases here is the revolution of data engineering so that's why every company
engineering so that's why every company is trying to just Implement lak house
is trying to just Implement lak house concept but it is really difficult to
concept but it is really difficult to implement that's why I'm sharing all the
implement that's why I'm sharing all the insights with you so that you can become
insights with you so that you can become an efficient developer and can help the
an efficient developer and can help the other
other organizations to create efficient
organizations to create efficient Lakehouse
Lakehouse Solutions and trust me in the upcoming
Solutions and trust me in the upcoming videos I'll be sharing so much about lak
videos I'll be sharing so much about lak house this is just an introduction ction
house this is just an introduction ction so that was all about the lake house
so that was all about the lake house concept the serverless pool thing and it
concept the serverless pool thing and it is seress so it will be automatically
is seress so it will be automatically scaling up and down and it supports lake
scaling up and down and it supports lake house concept it does not store data
house concept it does not store data physically in the data base I hope you
physically in the data base I hope you understood this concept now and let's
understood this concept now and let's finally pick this seress I wanted to
finally pick this seress I wanted to explain this concept in detail because
explain this concept in detail because this channel is not for just showing
this channel is not for just showing dragging and dropping things this
dragging and dropping things this channel channel is just to provide real
channel channel is just to provide real education real knowledge so that you can
education real knowledge so that you can become like you can become an outlayer
become like you can become an outlayer outlier in the crowd you can become an
outlier in the crowd you can become an efficient developer you can just succeed
efficient developer you can just succeed in the career because
in the career because Basics everyone knows Basics right
Basics everyone knows Basics right everyone knows Basics everyone knows
everyone knows Basics everyone knows like what is snapse analytics and how to
like what is snapse analytics and how to create table and all but these this
create table and all but these this knowledge like very much limited to few
knowledge like very much limited to few people so I want you all to be
people so I want you all to be knowledgeable enough to answer all the
knowledgeable enough to answer all the questions in this areas in this area so
questions in this areas in this area so that was all about the lak house so now
that was all about the lak house so now it's time to just pick the serverless
it's time to just pick the serverless and finally create the database I hope
and finally create the database I hope you like this concept and let's start
you like this concept and let's start oops again I need to click on plus SQL
oops again I need to click on plus SQL database so this is serverless and I
database so this is serverless and I want to name it as so I will call it as
want to name it as so I will call it as aw data
base simple then click on create
create simple click on create so now it will
simple click on create so now it will create a database for us yes so now if
create a database for us yes so now if you will go to use database now you
you will go to use database now you should see aw database see it is there
should see aw database see it is there okay so we will select our aw database
okay so we will select our aw database so now whatever we will be doing it will
so now whatever we will be doing it will stay in this particular database
stay in this particular database simple I know you L this
simple I know you L this concept so now it's time to actually
concept so now it's time to actually pick the data stored in this silver
pick the data stored in this silver layer and how we can do that we have a
layer and how we can do that we have a powerful function called open row set
powerful function called open row set what is it
what is it called
called open row
open row set it helps us to
set it helps us to apply the abstraction
apply the abstraction layer on the data residing in the data
layer on the data residing in the data lay so let me show you how you can
lay so let me show you how you can actually pull the data stored in the
actually pull the data stored in the silver layer and then you will just
silver layer and then you will just create views on top of it and then just
create views on top of it and then just we will be using this data to populate
we will be using this data to populate in powerbi so now it's time to actually
in powerbi so now it's time to actually create the open roset function in our
create the open roset function in our SQL script but you just need to do a
SQL script but you just need to do a small work so as you remember that we
small work so as you remember that we assigned the role of storage blob data
assigned the role of storage blob data contributor to our synapse workspace
contributor to our synapse workspace space so you just need to assign one
space so you just need to assign one more rule not to the synapse workspace
more rule not to the synapse workspace but to
but to yourself yes so the thing is whenever we
yourself yes so the thing is whenever we query the data residing in data Lake we
query the data residing in data Lake we also should have the permission to
also should have the permission to access the data so we just need to do
access the data so we just need to do the same steps and now you have already
the same steps and now you have already done it twice so it should be just
done it twice so it should be just matter of few seconds for you as well so
matter of few seconds for you as well so without wasting time let's assign us the
without wasting time let's assign us the role and then we will just use open ret
role and then we will just use open ret function so simply go to your home Tab
function so simply go to your home Tab and then just go to your storage account
and then just go to your storage account and just click on your Access Control IM
and just click on your Access Control IM and then you can see add yes simply
and then you can see add yes simply click on it and click add Ro
click on it and click add Ro assignments and then you just need to
assignments and then you just need to search storage blob data
contributor contributor not scanner contributor so here this time we do not
contributor so here this time we do not need to pick managed identity because we
need to pick managed identity because we do not have any kind of managed identity
do not have any kind of managed identity we are users so here we will pick users
we are users so here we will pick users and click select members so you just
and click select members so you just need to select your email ID and then
need to select your email ID and then that's it review plus
assign so it will just add the role again just wait for at least 10 to 15
again just wait for at least 10 to 15 minutes because it sometime takes time
minutes because it sometime takes time to assign the role to the user or to
to assign the role to the user or to manager identity so whenever you see
manager identity so whenever you see error based on any kind of information
error based on any kind of information that data cannot be listed so it means
that data cannot be listed so it means you just need to wait and just refresh
you just need to wait and just refresh data and it will be done so don't worry
data and it will be done so don't worry about that so let's jump onto our
about that so let's jump onto our synapse workspace so now we are good to
synapse workspace so now we are good to actually use open roset function so are
actually use open roset function so are you excited to use that function so
you excited to use that function so let's use it so how this function works
let's use it so how this function works you simply need to write select
you simply need to write select Ax
Ax from open roset this is the function
from open roset this is the function just hit enter and then you just need to
just hit enter and then you just need to put two parameters one is the location
put two parameters one is the location from where we need to read the data
from where we need to read the data right so it called bulk and then you
right so it called bulk and then you need to just
need to just use single uh quotes and then we need to
use single uh quotes and then we need to just pick the URL like now how we can
just pick the URL like now how we can just pick the URL it's very easy just go
just pick the URL it's very easy just go to your storage
to your storage account and from there you should just
account and from there you should just see the URL just go to your containers
see the URL just go to your containers and let's see I want to read the very
and let's see I want to read the very first file which is calendar just open
first file which is calendar just open your file so
your file so basically this is your file like this
basically this is your file like this long name part 00 so this is the default
long name part 00 so this is the default name that paret decides so do not need
name that paret decides so do not need to worry and rest of the file just are
to worry and rest of the file just are just for the confirmation so so the main
just for the confirmation so so the main file is this one so simply click on
file is this one so simply click on these three dots and then you will see
these three dots and then you will see properties click on it and here is your
properties click on it and here is your url just copy it this is the thing that
url just copy it this is the thing that you want just paste it here here's the
you want just paste it here here's the catch so when we read the data with
catch so when we read the data with synapse we do not need to define the
synapse we do not need to define the full URL what do I mean by full URL we
full URL what do I mean by full URL we do not need to mention the name because
do not need to mention the name because we create dedicated folders that is the
we create dedicated folders that is the main reason we create folders for
main reason we create folders for individual files so we just need to
individual files so we just need to remove the file name and just put the
remove the file name and just put the location till calendar that's it and
location till calendar that's it and then just close the single code simple
then just close the single code simple put comma and then we just need to
put comma and then we just need to define the format and format as we all
define the format and format as we all know is
know is aret simple now here is the catch as you
aret simple now here is the catch as you can see this location it is using
can see this location it is using blob it is using blob but we have
blob it is using blob but we have created a data Lake not the blob storage
created a data Lake not the blob storage but by default storage account creates
but by default storage account creates blob so that's why you can see blob but
blob so that's why you can see blob but here we just need to remove it with
here we just need to remove it with DFS in lower case
DFS in lower case DFS
DFS just remember this do not make any
just remember this do not make any mistake that's why I'm repeating just
mistake that's why I'm repeating just remove blob with DFS that's it that
remove blob with DFS that's it that stands for I think data file storage or
stands for I think data file storage or maybe data storage something like that
maybe data storage something like that so so we just need to name it and like
so so we just need to name it and like you can just name it anything query 1
you can just name it anything query 1 query 2 or maybe anything so when I'll
query 2 or maybe anything so when I'll be running this actually I am running a
be running this actually I am running a data residing in data Lake but I will be
data residing in data Lake but I will be getting tabular format
getting tabular format data as I get whenever I query a SQL
data as I get whenever I query a SQL table that is the magic of this powerful
table that is the magic of this powerful feature open row set that's really cool
feature open row set that's really cool let me just hit run command for
let me just hit run command for you
you okay now I should see the
okay now I should see the data perfect as I just told you that you
data perfect as I just told you that you can query the data residing in data Lake
can query the data residing in data Lake and it should return the result in
and it should return the result in tabular format that is known as
tabular format that is known as abstraction layer that this function has
abstraction layer that this function has created
created wow
wow wow this is lake house this is you can
wow this is lake house this is you can say uh logical data warehouse there are
say uh logical data warehouse there are so many names the popular one is lak
so many names the popular one is lak house so as you can see we can export
house so as you can see we can export this result as well because you would
this result as well because you would have seen that managers always demand we
have seen that managers always demand we need Excel files we need Excel files
need Excel files we need Excel files just export result as CSV Json XML as
just export result as CSV Json XML as per the requirement and one more thing
per the requirement and one more thing as we had charts available in data
as we had charts available in data breaks we have available charts
breaks we have available charts available here as well as you can see
available here as well as you can see like obviously this is not a miningful
like obviously this is not a miningful chart you can just edit from here
chart you can just edit from here whatever you need to decide like line
whatever you need to decide like line chart bar chart
chart bar chart anything so you can create charts as
anything so you can create charts as well in synapse
well in synapse analytics I know now you are like
analytics I know now you are like falling in love with synapse synapse is
falling in love with synapse synapse is really cool I love
really cool I love synapse so this is the thing that is
synapse so this is the thing that is really revolutionary and that's why we
really revolutionary and that's why we prefer storing our data in data L
prefer storing our data in data L because we can query our data we can
because we can query our data we can query our data so why to just waste
query our data so why to just waste money in storing the data in the
money in storing the data in the databases doesn't make any sense so this
databases doesn't make any sense so this is all about the calendar data that we
is all about the calendar data that we have created so how we will just report
have created so how we will just report this data in powerbi so what we will be
this data in powerbi so what we will be doing let me show you so this is one
doing let me show you so this is one data set right we will create views
data set right we will create views obviously you would be aware of like
obviously you would be aware of like views it is same as we have views in SQL
views it is same as we have views in SQL so views are just the you can say it
so views are just the you can say it stores the query and whenever we just
stores the query and whenever we just query the view it queries the data
query the view it queries the data itself we will we will uh build views on
itself we will we will uh build views on the top of this query yes on the top of
the top of this query yes on the top of this query we will create
this query we will create views and that those views we will store
views and that those views we will store in gold layer
in gold layer and that gold layer will be used in
and that gold layer will be used in powerbi yes so let me first of all
powerbi yes so let me first of all create my gold there so for that I want
create my gold there so for that I want to create a schema right like I have a
to create a schema right like I have a database and I want to keep all my views
database and I want to keep all my views in gold layer so for that I will create
in gold layer so for that I will create schema so I will first of all rename my
schema so I will first of all rename my SQL script let me call it as
SQL script let me call it as create
create schema right and let me just write the
schema right and let me just write the code for
code for schema create schema schema name let's
schema create schema schema name let's say
say gold and obviously I just need to use
gold and obviously I just need to use semicolon let me run
semicolon let me run it simple simple yep so now let's create
it simple simple yep so now let's create another file like another script from
another file like another script from here click on three dots new SQL script
here click on three dots new SQL script and then we will say
and then we will say create
create views gold because this is gold Golder
views gold because this is gold Golder so how we can create view first of all
so how we can create view first of all like syntax is exact same as we have in
like syntax is exact same as we have in esql so we will say create view view
esql so we will say create view view name will be first of all we need to put
name will be first of all we need to put schema name which is gold then calendar
schema name which is gold then calendar right then create view view name then we
right then create view view name then we need to write
need to write as then oops I just hit shift plus enter
as then oops I just hit shift plus enter like just ignore it because because I
like just ignore it because because I was just hitting enter and I mistakenly
was just hitting enter and I mistakenly hit shift to Center both so then first
hit shift to Center both so then first we will write create view view name then
we will write create view view name then as then we need to put our
as then we need to put our subquery which was select a from open
subquery which was select a from open open let me just rewrite it for
open let me just rewrite it for you
you select
select estx
estx from open row
from open row set okay and then again like we just
set okay and then again like we just need to repeat some steps bulk and for
need to repeat some steps bulk and for location I just need to copy paste it
location I just need to copy paste it and obviously remove the file
and obviously remove the file name perfect and then format equals to
pocket perfect and obviously we just need to name it I can name it this time
need to name it I can name it this time like sare one simple so when I will run
like sare one simple so when I will run this command what it will do it will
this command what it will do it will create a view in my aw database database
create a view in my aw database database name and inside that database in Gold
name and inside that database in Gold schema yes and then I can query this
schema yes and then I can query this View using select statement or if I want
View using select statement or if I want to create reports on top of it I can
to create reports on top of it I can just connect it with power which we'll
just connect it with power which we'll be seeing just in just few more minutes
be seeing just in just few more minutes so just remember one thing whenever you
so just remember one thing whenever you query data to build something always
query data to build something always double sure that you are in the right
double sure that you are in the right database because by default it is master
database because by default it is master and you need to pick your own database
and you need to pick your own database which is aw database just remember this
which is aw database just remember this then run
then run it okay perfect our first view is
it okay perfect our first view is completed let me add a comment because
completed let me add a comment because I'm just planning to upload this script
I'm just planning to upload this script as well so that you can refer it yes so
as well so that you can refer it yes so if you want to add comment just add
if you want to add comment just add Double Dash and I will say
Double Dash and I will say create View
create View calendar and then I will say Dash D D D
calendar and then I will say Dash D D D Dash DH d d d DH
Dash DH d d d DH simple simple
simple simple yes so our first view is completed so
yes so our first view is completed so similarly we need to create all the
similarly we need to create all the other views as well the data that is
other views as well the data that is already in silver layer so process is
already in silver layer so process is same we just need to change the location
same we just need to change the location and how we can change the location we
and how we can change the location we just need to change this folder name
just need to change this folder name because rest of the location is same
because rest of the location is same like this one this location is same till
like this one this location is same till here we just need to change this file
here we just need to change this file name and we will be creating view for
name and we will be creating view for other files other tables as well so just
other files other tables as well so just create all other views with me and let's
create all other views with me and let's complete it together let's do it
so I have fin finally created all the views and I know you have also created
views and I know you have also created all the views so I didn't do much I just
all the views so I didn't do much I just copy pasted the code and I just change
copy pasted the code and I just change this location this one like the last
this location this one like the last folder name that's it because obviously
folder name that's it because obviously view name as well you just need to
view name as well you just need to change the view name so so far we have
change the view name so so far we have created all the views and you know what
created all the views and you know what you have done so far first of all just
you have done so far first of all just save your work because I don't want you
save your work because I don't want you to just do the rework so click on
to just do the rework so click on publish all so it will just publish all
publish all so it will just publish all your work and it will just save all your
your work and it will just save all your work so I I want to show you what you
work so I I want to show you what you have done so once it is published let's
have done so once it is published let's click on these three dots and create a
click on these three dots and create a new script so now if we want to query
new script so now if we want to query the data we do not need to use any open
the data we do not need to use any open row set we do not need to use any
row set we do not need to use any location let me show you first of all
location let me show you first of all pick the right database which is this
pick the right database which is this one just query like select
one just query like select Ax from gold which is our schema name
Ax from gold which is our schema name dot any table let's pick
dot any table let's pick customers run
customers run it run it now you will see all the data
it run it now you will see all the data and this time you haven't used any kind
and this time you haven't used any kind of open row set any location why because
of open row set any location why because you have created
you have created views in the gold layer now you can
views in the gold layer now you can access your data now your managers can
access your data now your managers can access the data now your stakeholders
access the data now your stakeholders can access the data now data analyst can
can access the data now data analyst can access this
access this data so we have finally created views in
data so we have finally created views in our goal layer and after completing the
our goal layer and after completing the views it's time to actually create the
views it's time to actually create the external tables because obviously we
external tables because obviously we create tables within the databases and
create tables within the databases and the difference between external tables
the difference between external tables and manage tables is like very simple
and manage tables is like very simple let me quickly tell you what's that so
let me quickly tell you what's that so let's say this is your external table
let's say this is your external table this is your manage table so in the
this is your manage table so in the scenario of external tables we save the
scenario of external tables we save the data we keep the
data we keep the data for our tables but in case of
data for our tables but in case of manage
manage tables we do not keep the
data we do not keep the data so let's say we are getting manage table in data
say we are getting manage table in data braks or any other environment so it
braks or any other environment so it will be storing the data so that's not
will be storing the data so that's not the scenario in this case inap we just
the scenario in this case inap we just create external tables so in our
create external tables so in our scenario just forget about the manage
scenario just forget about the manage tables we will be creating external
tables we will be creating external tables simple so to create external
tables simple so to create external tables in synapse analytics we have to
tables in synapse analytics we have to follow three steps what are those steps
follow three steps what are those steps let me tell you so first of all we need
let me tell you so first of all we need to create something called
to create something called credential then we need to create
credential then we need to create external data
source then we will create external file format just three steps so BAS basically
format just three steps so BAS basically when we create credential we tell
when we create credential we tell synapse analytics to pick the data using
synapse analytics to pick the data using managed identity as we just mentioned
managed identity as we just mentioned that we are just allowing synapse to use
that we are just allowing synapse to use data stored in the data like using
data stored in the data like using manage identity so this is a kind of
manage identity so this is a kind of credential because there are like so
credential because there are like so many ways to pick the data right like
many ways to pick the data right like SAS token accs keys and manage identity
SAS token accs keys and manage identity so we just Define that pick the data
so we just Define that pick the data using manage identity simple now what is
using manage identity simple now what is this external data source so basically
this external data source so basically you just saw that whenever we want to
you just saw that whenever we want to pick the data we need to mention the URL
pick the data we need to mention the URL full URL so when we do not want to put
full URL so when we do not want to put the URL again and again we create
the URL again and again we create external data source so in that scenario
external data source so in that scenario we usually keep the URL till container
we usually keep the URL till container level let's say I will create uh an
level let's say I will create uh an external source for my silver container
external source for my silver container so I will just keep the URL till silver
so I will just keep the URL till silver container and the rest of the URL I can
container and the rest of the URL I can just put in the location it will save me
just put in the location it will save me a lot of time and lots of efforts very
a lot of time and lots of efforts very good then what is this external file
good then what is this external file format so basically we have so many file
format so basically we have so many file formats available like Json CSV uh
formats available like Json CSV uh pocket so many file formats so we Define
pocket so many file formats so we Define that we have data stored in Pocket file
that we have data stored in Pocket file format CSV file format Json file format
format CSV file format Json file format so for that purpose we create external
so for that purpose we create external file format again you do not need to
file format again you do not need to learn the code for it because we have
learn the code for it because we have code stored in the documentation and
code stored in the documentation and truly speaking once you practice more
truly speaking once you practice more and more you will just remember the code
and more you will just remember the code in your head and otherwise do not worry
in your head and otherwise do not worry just copy the code from the
just copy the code from the documentation I will show you both the
documentation I will show you both the ways because I remember that code now
ways because I remember that code now and I can show you the documentation as
and I can show you the documentation as well so for that let's quickly create a
well so for that let's quickly create a new script just go here on the plus sign
new script just go here on the plus sign and click on plus and here we will write
and click on plus and here we will write external table so basically we will just
external table so basically we will just create One external table because we
create One external table because we just want to show you how you can create
just want to show you how you can create external table and then we will be
external table and then we will be creating some of the visualization based
creating some of the visualization based on this external table in powerbi so it
on this external table in powerbi so it will be an end to endend learning for
will be an end to endend learning for you how to establish connection and then
you how to establish connection and then just build some visualization and we
just build some visualization and we will just briefly cover powerbi like
will just briefly cover powerbi like till like you will see how to build
till like you will see how to build connection using SQL and points what is
connection using SQL and points what is SQL end point just hold on I'm coming to
SQL end point just hold on I'm coming to that point just hold on so first of all
that point just hold on so first of all we will write create external
table okay simple so so as I just mentioned we
simple so so as I just mentioned we first need to create credential but
first need to create credential but there's a prequest for it as you can see
there's a prequest for it as you can see I am using aw database so first we need
I am using aw database so first we need to
to create master key for this database yes
create master key for this database yes so if you are familiar with SQL if you
so if you are familiar with SQL if you have already worked with Ms SQL Server
have already worked with Ms SQL Server so you would know like we need to create
so you would know like we need to create master key for that you just need to
master key for that you just need to write create
write create master key and then password equals to
master key and then password equals to you just need to pick any password just
you just need to pick any password just make sure that you are adding one
make sure that you are adding one capital letter and one special character
capital letter and one special character and you can just see how to just get
and you can just see how to just get this code for master key just go on
this code for master key just go on Google and
Google and type create master key and
type create master key and SQL just write SQL you will get the
SQL just write SQL you will get the Microsoft on page and here you can see
Microsoft on page and here you can see the code for like the whole syntax just
the code for like the whole syntax just copy it and just paste it here you do
copy it and just paste it here you do not need to remember it because that's
not need to remember it because that's something that admins do but you as a
something that admins do but you as a data engineer should know this as well
data engineer should know this as well so just remove these square brackets and
so just remove these square brackets and in password do not put this password you
in password do not put this password you need to create your own
need to create your own password so once you have set your
password so once you have set your password we are good to go just select
password we are good to go just select this and click on run obviously I have
this and click on run obviously I have already run this because I have set my
already run this because I have set my password and I cannot show you so just
password and I cannot show you so just create your password and just make sure
create your password and just make sure you are adding special characters and
you are adding special characters and just make your password little bit
just make your password little bit complicated do not worry do not need to
complicated do not worry do not need to remember this password because it will
remember this password because it will not be asked yeah it is just the
not be asked yeah it is just the database master key so let's remove it
database master key so let's remove it and once it is done we are good to go
and once it is done we are good to go with our external data source credential
with our external data source credential file format everything we'll be building
file format everything we'll be building everything from scratch do not worry at
everything from scratch do not worry at all so as per the steps we first need to
all so as per the steps we first need to create
create credential okay so first we will write
credential okay so first we will write create external not external it's called
create external not external it's called database scoped credential database
database scoped credential database scoped
scoped credential what is it so basically we
credential what is it so basically we create a credential so let's say we are
create a credential so let's say we are using syapse analytics to pull the data
using syapse analytics to pull the data or to read the data or to write the data
or to read the data or to write the data to data it needs some kind of credential
to data it needs some kind of credential yes and there are so many credentials
yes and there are so many credentials available and one of the credentials is
available and one of the credentials is managed identity and we have used
managed identity and we have used managed identity so we just need to tell
managed identity so we just need to tell it that we have used manage identity
it that we have used manage identity simple and let me quickly create one let
simple and let me quickly create one let me
me say gred unch simple and then we just
say gred unch simple and then we just need to write
need to write with identity equals
simple now you will ask me from where I can get this code don't worry just need
can get this code don't worry just need to go on Google and just
to go on Google and just type uh
credential and synapse yeah just click on this link and
synapse yeah just click on this link and then you will
then you will see a documentation I think this is not
see a documentation I think this is not the one let me write database code
the one let me write database code credential yeah
credential yeah just click on this first link and yes
just click on this first link and yes here you go so here you can see the
here you go so here you can see the syntax you can just copy it from here as
syntax you can just copy it from here as well and just paste it here it's up to
well and just paste it here it's up to you and obviously once you use these
you and obviously once you use these kinds of C codes again and again you
kinds of C codes again and again you will just remember the code so do not
will just remember the code so do not need to worry about that so simply we
need to worry about that so simply we need to run
need to run it okay this is done so we have
it okay this is done so we have successfully created ex uh database
successfully created ex uh database scope credential now the second step is
scope credential now the second step is creating external data source
creating external data source for that first of all we will create
for that first of all we will create silver data source and I just mentioned
silver data source and I just mentioned that external data source is for the
that external data source is for the URLs that we are using we are pointing
URLs that we are using we are pointing our synapse external data source to a
our synapse external data source to a container so we create a new data source
container so we create a new data source for a new container so we have to create
for a new container so we have to create two external data sources one is for
two external data sources one is for silver because we will read the data
silver because we will read the data from Silver and one from gold because we
from Silver and one from gold because we need to push the data to gold
need to push the data to gold simple yes and the code is really simple
simple yes and the code is really simple let me show you just write create
let me show you just write create external data source and I will call it
external data source and I will call it as Source
as Source silver simple right and then I can just
silver simple right and then I can just say
say with location now I just need to pick
with location now I just need to pick the location till container level and I
the location till container level and I will simply go to my view script because
will simply go to my view script because I already have the URL see here so I
I already have the URL see here so I just need to copy it till silver
just need to copy it till silver container level just that's it
container level just that's it so provide it here so now we are
so provide it here so now we are pointing our data source which is known
pointing our data source which is known as Source silver
as Source silver to this particular container so we do
to this particular container so we do not need to write this URL now again and
not need to write this URL now again and again so this is the kind of file system
again so this is the kind of file system that you can relate it with data braks
that you can relate it with data braks or you can say this is the kind of
or you can say this is the kind of variable that will save us a lot of time
variable that will save us a lot of time it's really cool I love using data
it's really cool I love using data source
source and okay we have assigned the URL so now
and okay we have assigned the URL so now this data source this data source will
this data source this data source will be going to the silver container but it
be going to the silver container but it needs some credential that's why we
needs some credential that's why we created this credential and now we will
created this credential and now we will say hey Source silver carry this
say hey Source silver carry this credential which is known as cred on
credential which is known as cred on with you whenever you will be going to
with you whenever you will be going to take the data from the data lake so we
take the data from the data lake so we will hand over this credential to our
will hand over this credential to our external data source this is a kind of
external data source this is a kind of real life analogy I love explaining
real life analogy I love explaining Concepts using examples using real life
Concepts using examples using real life examples or relating it with the real
examples or relating it with the real life examples like you can remember it
life examples like you can remember it for the longer period of time so we will
for the longer period of time so we will simply say credential
simply say credential equals
equals grunch grunch simple so it is done we
grunch grunch simple so it is done we simply need to run
simply need to run it okay it is
it okay it is successful I will just simply copy this
successful I will just simply copy this code because I need to create one for
code because I need to create one for gold as well so I will say source and
gold as well so I will say source and then
then gold and I will just change the
gold and I will just change the container
container name obviously credential will be
name obviously credential will be same because manage identity is
same because manage identity is supporting the whole synapse workspace
supporting the whole synapse workspace right yes so simply run this as well
right yes so simply run this as well perfect perfect perfect perfect now it's
perfect perfect perfect perfect now it's time to create external file format so
time to create external file format so basically why we create external file
basically why we create external file format as we just mentioned that we have
format as we just mentioned that we have so many file format so we just Define
so many file format so we just Define hey bro the data that is stored in the
hey bro the data that is stored in the folder is in Pocket file format simple
folder is in Pocket file format simple so we will say create external
so we will say create external file format and I will call it as format
file format and I will call it as format bucket simple
bucket simple then again same thing
then again same thing with and then we need to type format
with and then we need to type format type equals to pocket and then we also
type equals to pocket and then we also need to add a data compression because
need to add a data compression because when we add data compression it supports
when we add data compression it supports you can say better reads performance so
you can say better reads performance so how we can just get this get that code
how we can just get this get that code simply go on Google same thing just
simply go on Google same thing just write
write file
file format or let me just add external as
format or let me just add external as well external file format just click on
well external file format just click on it click on the very first link then you
it click on the very first link then you will see this documentation and here
will see this documentation and here just drag it down drag it down drag it
just drag it down drag it down drag it down and then you will see something
down and then you will see something called uhhuh I think we just need to go
called uhhuh I think we just need to go up just search for things that known as
up just search for things that known as compression yeah here we have data
compression yeah here we have data compression so it has given us the code
compression so it has given us the code for all the data compressions so for
for all the data compressions so for pocket we use we have two options but I
pocket we use we have two options but I personally like using this because it is
personally like using this because it is recommended by Microsoft as well Snappy
recommended by Microsoft as well Snappy codic so we just need to put
codic so we just need to put here data
here data compression equals to this once it is
compression equals to this once it is done we are going to run this code as
well H oh I didn't close the code
H oh I didn't close the code sorry just run it
sorry just run it okay perfect now finally after doing all
okay perfect now finally after doing all the three steps we are good to create
the three steps we are good to create external table but before that I would
external table but before that I would like to just show you what will be
like to just show you what will be happening behind the scenes because this
happening behind the scenes because this can be asked in your interview questions
can be asked in your interview questions plus this is a kind of deep knowledge
plus this is a kind of deep knowledge that not everyone will be having this so
that not everyone will be having this so it can
it can just present you as an outlier in the
just present you as an outlier in the crowd and that's what we want right
crowd and that's what we want right world is really competitive and you need
world is really competitive and you need to be competitive enough to just beat
to be competitive enough to just beat them all so the thing is let's say we
them all so the thing is let's say we have this container this is our silver
have this container this is our silver container right and this is our gold
container right and this is our gold container perfect and we have data
container perfect and we have data stored in this silver container and we
stored in this silver container and we have already created a view on top of it
have already created a view on top of it yes we know so what will be happening in
yes we know so what will be happening in EXT table so first of all we will be
EXT table so first of all we will be using C task don't worry I will tell you
using C task don't worry I will tell you what is that it is really really an
what is that it is really really an important Concept in the world of data
important Concept in the world of data and in synapse obviously so the thing is
and in synapse obviously so the thing is what we'll be doing we will use this
what we'll be doing we will use this view we will use this view yes so we
view we will use this view yes so we will create an external table which will
will create an external table which will push this data to this gold layer how I
push this data to this gold layer how I will show you the code
will show you the code it will push this data to this gold
it will push this data to this gold layer using that code then it will
layer using that code then it will create an external
create an external table on top of that data this is the
table on top of that data this is the architecture and how we will achieve
architecture and how we will achieve this
this using something called
Seas what is a Seas so the full form of this is create external table as select
this is create external table as select I think you must be aware of this part
I think you must be aware of this part as because this is available in SQL as
as because this is available in SQL as well like the regular SQL or traditional
well like the regular SQL or traditional SQL databases that we use like we use
SQL databases that we use like we use subqueries so what we'll be doing we
subqueries so what we'll be doing we will build a table using a statement and
will build a table using a statement and we already have view built so we do not
we already have view built so we do not need to mention the subquery we can
need to mention the subquery we can simply write select ASX from view I know
simply write select ASX from view I know now you can just relate it with the SQL
now you can just relate it with the SQL Concepts that's why we call like SQL is
Concepts that's why we call like SQL is the backbone
the backbone SQL is the backbone for any data
SQL is the backbone for any data professional and do not worry I will
professional and do not worry I will show you everything from scratch so if
show you everything from scratch so if you cannot relate it now you will do it
you cannot relate it now you will do it in just few minutes let me show you so
in just few minutes let me show you so first of all I will just create a nice
first of all I will just create a nice clean comment for you so that you can
clean comment for you so that you can refer this uh script because I'm
refer this uh script because I'm planning to just upload the script so
planning to just upload the script so that you can refer it it's just for your
that you can refer it it's just for your help so just write create external
help so just write create external table known as let's say external
table known as let's say external sales right external sales
sales right external sales simple
simple so okay
so okay simple so the code is we will simply
simple so the code is we will simply write create external table and table
write create external table and table name will be EXT
name will be EXT sales and I will save it in Gold
sales and I will save it in Gold schema simple then we need
schema simple then we need to write with
to write with okay then we need to give three things
okay then we need to give three things first thing is location because okay as
first thing is location because okay as we just mentioned in the architecture it
we just mentioned in the architecture it needs some data stored in the location
needs some data stored in the location so it needs the location URL but we have
so it needs the location URL but we have created external data source so we do
created external data source so we do not need to put the whole URL we will
not need to put the whole URL we will just put the folder name and folder name
just put the folder name and folder name will be external sales simple then
will be external sales simple then obviously we haven't put the location
obviously we haven't put the location uh URL like the complete URL in the
uh URL like the complete URL in the location then we need to add the
location then we need to add the external data source
external data source right now I know you can just relate all
right now I know you can just relate all the things like all the steps why we
the things like all the steps why we created like three steps before
created like three steps before completing the external table because
completing the external table because those are required as parameter in the
those are required as parameter in the external table so external data source
external table so external data source name was Source gold right yes Source
name was Source gold right yes Source gold then we need to put file format and
gold then we need to put file format and file format was format file format
file format was format file format pocket right so this is my external data
pocket right so this is my external data table is the work done no because it
table is the work done no because it will create the external table but how
will create the external table but how it will create it what is the query what
it will create it what is the query what is the as statement what is the ass
is the as statement what is the ass statement let me tell you as statement
statement let me tell you as statement is
is Select SX from gold do
Select SX from gold do sales what is it it is is a view that we
sales what is it it is is a view that we created remember here create views gold
created remember here create views gold and you will see sales
and you will see sales here uh here see so what exactly we are
here uh here see so what exactly we are doing we are using this
doing we are using this query this
query this one see and instead of using this query
one see and instead of using this query we already have view so we can directly
we already have view so we can directly use view that is the power of view and
use view that is the power of view and that's why I always say try to create
that's why I always say try to create views because you never know where you
views because you never know where you will be using those views so it is
will be using those views so it is really important to use the views being
really important to use the views being a developer you should know like why we
a developer you should know like why we are using this so just run this command
are using this so just run this command and it will create an external table for
and it will create an external table for you you do not need to put any uh column
you you do not need to put any uh column definitions their data types it will be
definitions their data types it will be automatically picked by this command
automatically picked by this command because you have stated everything and
because you have stated everything and you select ASX from open roset function
you select ASX from open roset function just click on
just click on run and it should run like it it will
run and it should run like it it will take like just few seconds and it will
take like just few seconds and it will be
be done yes yes yes bro successfully you
done yes yes yes bro successfully you have completed your external table so I
have completed your external table so I would love to query this select Ax from
would love to query this select Ax from gold. EXT sales so let me query
gold. EXT sales so let me query this and I should see the data yes yes
this and I should see the data yes yes yes yes we have all the data now you
yes yes we have all the data now you would ask me hey bro I was seeing the
would ask me hey bro I was seeing the same output using views and I'm seeing
same output using views and I'm seeing the same output using table as well
the same output using table as well what's the difference and why we created
what's the difference and why we created external table do you want to know why
external table do you want to know why because in views we didn't store the
because in views we didn't store the data it was just a query and we know
data it was just a query and we know when we create views we actually store
when we create views we actually store the query not the result not the data
the query not the result not the data but in case of external table we have
but in case of external table we have the data you want to know where in our
the data you want to know where in our data Lake in our gold layer in our
data Lake in our gold layer in our folder so we can refer it for the future
folder so we can refer it for the future we have the power to retain the data
we have the power to retain the data that is the difference between external
that is the difference between external table and view see and now I will just
table and view see and now I will just take you to my storage account and and I
take you to my storage account and and I will show you that we have the data in
will show you that we have the data in Pocket form it so this is my storage
Pocket form it so this is my storage account and you simply need to go on
account and you simply need to go on your Go tab this
your Go tab this one simply click on it and see the magic
one simply click on it and see the magic you have the
you have the data yes it has migrated the data to the
data yes it has migrated the data to the gold layer using C Tas c as is a really
gold layer using C Tas c as is a really really powerful command so when I open
really powerful command so when I open it you will see the data stored in the
it you will see the data stored in the pocket format this is our data
pocket format this is our data file you did it you finally did it and
file you did it you finally did it and now it's time to cover one more
now it's time to cover one more important step which is establishing a
important step which is establishing a connection between synapse and powerbi
connection between synapse and powerbi now most of you will be thinking why we
now most of you will be thinking why we need to cover powerbi bro we are not
need to cover powerbi bro we are not covering powerbi basically we are
covering powerbi basically we are covering only one aspect of it because
covering only one aspect of it because when you will be delivering this data
when you will be delivering this data you should be the owner of data to
you should be the owner of data to distribute it you are responsible to
distribute it you are responsible to establish connection between synapse
establish connection between synapse workspace and powerbi obviously data
workspace and powerbi obviously data analyst can also build it but you should
analyst can also build it but you should assess them if they need any help
assess them if they need any help because you are the owner of this data
because you are the owner of this data you need to efficiently distribute it
you need to efficiently distribute it being a data engineer is not just about
being a data engineer is not just about building pipelines it's about serving
building pipelines it's about serving the stakeholders serving your Downstream
the stakeholders serving your Downstream serving your manager serving your all
serving your manager serving your all the entities which are using that data
the entities which are using that data so we will be covering powerbi just from
so we will be covering powerbi just from that perspective that we want to
that perspective that we want to establish a connection that's it that's
establish a connection that's it that's it we will not be building you can say
it we will not be building you can say relationships visualization obviously I
relationships visualization obviously I will just show you by creating one to
will just show you by creating one to two visualization that's it but nothing
two visualization that's it but nothing more than that
more than that nothing more than that so now it's time
nothing more than that so now it's time to cover powerbi so you simply need to
to cover powerbi so you simply need to go on Google and just type powerbi
go on Google and just type powerbi download simple click on the first link
download simple click on the first link click on the download button and then
click on the download button and then just click on all the boxes and click on
just click on all the boxes and click on this download button and once you
this download button and once you download it you will be landing on power
download it you will be landing on power via desktop application so without
via desktop application so without wasting any time let's let's build that
wasting any time let's let's build that connectivity and we want to know which
connectivity and we want to know which technology we will using we will be
technology we will using we will be using I will show you once we land in
using I will show you once we land in the powerb desktop so I will show you
the powerb desktop so I will show you SQL end points do not worry we will
SQL end points do not worry we will cover that what the SQL endpoint and
cover that what the SQL endpoint and from where we can get it do not worry at
from where we can get it do not worry at all I will tell you each and everything
all I will tell you each and everything in just a few seconds let's
in just a few seconds let's see welcome to powerbi so this is your
see welcome to powerbi so this is your powerbi desktop this is the homepage
powerbi desktop this is the homepage this is the you can say blank report of
this is the you can say blank report of powerbi desktop so it's time to just
powerbi desktop so it's time to just complete the final phase of the project
complete the final phase of the project to establish the connectivity so that we
to establish the connectivity so that we can deliver our data and we as a data
can deliver our data and we as a data engineer we are free so the thing is
engineer we are free so the thing is let's say this is our synapse workspace
let's say this is our synapse workspace right and this is our
right and this is our powerbi so if we want to establish
powerbi so if we want to establish connection between synapse and
connection between synapse and powerbi we need something called
Endo yes we need SQL endpoints so what is that so basically the SQL database
is that so basically the SQL database that we have within this within the
that we have within this within the synapse workspace we have something
synapse workspace we have something called SQL end points I will show you
called SQL end points I will show you from where you can get it so we need to
from where you can get it so we need to copy that and we need to establish the
copy that and we need to establish the connection with power we are using that
connection with power we are using that endpoint and once we put that endpoint
endpoint and once we put that endpoint we can see our database we can see our
we can see our database we can see our external table views everything and then
external table views everything and then we can just pull that data in the
we can just pull that data in the powerbi desktop and then we can just
powerbi desktop and then we can just fill the reports simple so let me show
fill the reports simple so let me show you how you can just get that so go on
you how you can just get that so go on Google and just open your Azure portal
Google and just open your Azure portal and click on your synapse
and click on your synapse workspace and once you click on that you
workspace and once you click on that you will see this
will see this area right and here if you focus you
area right and here if you focus you will see something called a serverless
will see something called a serverless SQL ENT
SQL ENT point there is something called as
point there is something called as dedicated SQL endpoint as well but we we
dedicated SQL endpoint as well but we we should not use that because we are not
should not use that because we are not using dedicated SQL endpoint because we
using dedicated SQL endpoint because we didn't go for dedicated SQL so now it's
didn't go for dedicated SQL so now it's time to copy this serverless SQL
time to copy this serverless SQL endpoint because we use serverless SQL
endpoint because we use serverless SQL right so I will just go here and just
right so I will just go here and just click on copy to clipboard once it is
click on copy to clipboard once it is copied just go back to your powerb
copied just go back to your powerb desktop so now it's time to actually
desktop so now it's time to actually pull the data how we can do that if you
pull the data how we can do that if you can see in your taskar just click on get
can see in your taskar just click on get data and then just click on the at the
data and then just click on the at the bottom which is more button we because
bottom which is more button we because we need to find the connector which is
we need to find the connector which is more inclined towards Azure so as you
more inclined towards Azure so as you can see we have connectors for Azure
can see we have connectors for Azure click on
click on it then just pick your synapse analytics
it then just pick your synapse analytics because we want to establish connection
because we want to establish connection with our synapse analytics workspace
with our synapse analytics workspace click on
click on connect and then you just need to put
connect and then you just need to put the SQL endpoint here in this server
the SQL endpoint here in this server button then database is optional then
button then database is optional then click on
okay and then just wait for few seconds and here you go you are seeing your
and here you go you are seeing your database okay let me just tell you so
database okay let me just tell you so sometime it asks for the credentials as
sometime it asks for the credentials as well so if you are just logging in in
well so if you are just logging in in powerb for the first time so the thing
powerb for the first time so the thing is it will when you will click on okay
is it will when you will click on okay it will ask you to provide the
it will ask you to provide the credentials on the left hand side it
credentials on the left hand side it will like look like this let me just
will like look like this let me just draw it for you see like this and on
draw it for you see like this and on this side you will have some options
this side you will have some options Windows database and something like that
Windows database and something like that so just pick the database credentials
so just pick the database credentials and now you will ask me what should I
and now you will ask me what should I put in username and password so just go
put in username and password so just go back in time when you created your
back in time when you created your synapse analytics workspace you entered
synapse analytics workspace you entered admin and password right when I was
admin and password right when I was putting admin unch and then I just put
putting admin unch and then I just put my password so you just need to put your
my password so you just need to put your admin and password that's it once you
admin and password that's it once you click on that you will will be landing
click on that you will will be landing on this page where you can see your
on this page where you can see your database that is aw database right so
database that is aw database right so here you can see we have views we have
here you can see we have views we have external tables so let me just click on
external tables so let me just click on this table that we created known as gold
this table that we created known as gold do external sales and let me click on
do external sales and let me click on load because I want to load this table
load because I want to load this table only this external table only so now it
only this external table only so now it will be just loading creating connection
will be just loading creating connection and everything will be running in
and everything will be running in parallel behind the scenes we do not
parallel behind the scenes we do not need to do
need to do anything
anything congratulations I'm not kidding I'm not
congratulations I'm not kidding I'm not kidding bro this was not easy so now you
kidding bro this was not easy so now you can see that our gold external sales
can see that our gold external sales table is here in front of our screen we
table is here in front of our screen we have all the columns here and this data
have all the columns here and this data is not residing on our machine no it has
is not residing on our machine no it has directly pulled the data from synapse
directly pulled the data from synapse from synapse from aure from data l so as
from synapse from aure from data l so as we just mentioned that we will not be
we just mentioned that we will not be covering the visualizations in uh deep
covering the visualizations in uh deep because obviously you can build your own
because obviously you can build your own dashboard B based on the requirements
dashboard B based on the requirements based on the kpis based on the business
based on the kpis based on the business areas based on the manager requirements
areas based on the manager requirements everything matters here when we build
everything matters here when we build visualizations so we we will not be
visualizations so we we will not be covering in deep but just for your sake
covering in deep but just for your sake just for your you can say experience I
just for your you can say experience I would love to just draw one to two
would love to just draw one to two visualizations so let me
visualizations so let me tell you that I want to aggregate data
tell you that I want to aggregate data okay if I want to just see the trend I
okay if I want to just see the trend I can just pick this line chart I just
can just pick this line chart I just clicked on this area and I clicked on
clicked on this area and I clicked on this button like the line chart so just
this button like the line chart so just increase the size right and then if we
increase the size right and then if we want to add the data on xaxis and y axis
want to add the data on xaxis and y axis you can see information here x-axis
you can see information here x-axis yaxis secondary yaxis Legends so I will
yaxis secondary yaxis Legends so I will put order dat in my
put order dat in my x-axis okay I have put order in my xaxis
x-axis okay I have put order in my xaxis then I will put order number in my y-
then I will put order number in my y- AIS simple so this is a kind of trend
AIS simple so this is a kind of trend that I can see right now and this trend
that I can see right now and this trend I am
I am getting
getting directly on the top of the data residing
directly on the top of the data residing in data lake so this is my visualization
in data lake so this is my visualization one in which I just want to see the
one in which I just want to see the trend okay now let's try to create
trend okay now let's try to create another
another one so this is a kind
of visualization tool that you can access so people do prefer using TBL as
access so people do prefer using TBL as well people prefer
well people prefer using uh what was that I think it's non
using uh what was that I think it's non as I think Google data Studio I'm not
as I think Google data Studio I'm not sure because I have just used power bnw
sure because I have just used power bnw and these are the top most tools
and these are the top most tools available in market right now so let's
available in market right now so let's say if I want to just show the data or
say if I want to just show the data or this time I just want to create let's
this time I just want to create let's say total number of ERS so I can just
say total number of ERS so I can just pick a visualization called card this
pick a visualization called card this one just click on it
one just click on it and then it will just give you a card so
and then it will just give you a card so let's say I want to build kpis like
let's say I want to build kpis like total number of orders that that I have
total number of orders that that I have so I will simply click on it and there
so I will simply click on it and there here you can see the fields I will
here you can see the fields I will simply put customer key let's calculate
simply put customer key let's calculate total number of customers see we have
total number of customers see we have 56,000 customer IDs wow and we can just
56,000 customer IDs wow and we can just obviously modify it if you want to
obviously modify it if you want to modify it you can simply see a brush
modify it you can simply see a brush here this one
this is the brush that you can use see this
this one simply click on it and then you will
one simply click on it and then you will see the a kind
see the a kind of options that you can leverage while
of options that you can leverage while creating filters and anything and then
creating filters and anything and then you have format options here size style
you have format options here size style title so let's say I just want to give a
title so let's say I just want to give a title I can just turn it on and I can
title I can just turn it on and I can just add any title let's say
just add any title let's say total customers
so you can bold it you can put it in center you can just increase the size as
center you can just increase the size as well you can change the font as well
well you can change the font as well like you can do so many stuff so this is
like you can do so many stuff so this is our visualization that you can see and
our visualization that you can see and let me just create one more
let me just create one more visualization that's it and then we are
visualization that's it and then we are good to complete this course so then let
good to complete this course so then let me just create an area chart I think
me just create an area chart I think this is the area chart chart yeah this
this is the area chart chart yeah this one I want to create this and this time
one I want to create this and this time I want
I want to show data by stock date in the
to show data by stock date in the x-axis and my customers on the y axis
x-axis and my customers on the y axis so this is it this is it so this is a
so this is it this is it so this is a kind of dashboard that we have created
kind of dashboard that we have created do not judge us because we have just
do not judge us because we have just created in just few seconds so we just
created in just few seconds so we just drag Dr drop and we just build this kind
drag Dr drop and we just build this kind of dashboard and obviously if you want
of dashboard and obviously if you want to play with powerbi just play with it
to play with powerbi just play with it and you can just have so many functions
and you can just have so many functions like you can just change the color you
like you can just change the color you can just change uh the area color then
can just change uh the area color then you can just add Legends everything
you can just add Legends everything there are like so many options you can
there are like so many options you can do wonders with powerbi so it's up to
do wonders with powerbi so it's up to you so now it's time to quickly review
you so now it's time to quickly review the progress that we have made so far
the progress that we have made so far okay so congratulations to everyone who
okay so congratulations to everyone who has completed this project so let me
has completed this project so let me just give you a brief what we have done
just give you a brief what we have done first of all this was our phase one
first of all this was our phase one where we just loaded the data in the
where we just loaded the data in the bronze layer then this was our phase two
bronze layer then this was our phase two where we just picked the data from the
where we just picked the data from the bronze layer and we applied crazy
bronze layer and we applied crazy Transformations we applied analysis and
Transformations we applied analysis and we push the data to the silver layer
we push the data to the silver layer then we pull the data from Silver and
then we pull the data from Silver and push the data to gold and establish the
push the data to gold and establish the connection with powerbi as well
connection with powerbi as well so that's all for this project and just
so that's all for this project and just tell me in the comments how was this
tell me in the comments how was this project and if you genuinely love this
project and if you genuinely love this just tell me that as well because I'll
just tell me that as well because I'll be creating more and more your data
be creating more and more your data engineering projects and N to end
engineering projects and N to end projects for you you and you can just
projects for you you and you can just expect crazy stuff in the future and let
expect crazy stuff in the future and let me just give you a hint most of crazy
me just give you a hint most of crazy your projects are already lined up yes I
your projects are already lined up yes I have already uploaded it on YouTube they
have already uploaded it on YouTube they are scheduled they are just coming to
are scheduled they are just coming to your way in the near future lots of
your way in the near future lots of tutorials are on the way just hit the
tutorials are on the way just hit the Subscribe button to motivate me and I
Subscribe button to motivate me and I will be helping you to achieve your
will be helping you to achieve your goals achieve your dreams and obviously
goals achieve your dreams and obviously for free because I want to provide you
for free because I want to provide you the best quality of Education best
the best quality of Education best quality of you can say data engineering
quality of you can say data engineering skills that you can for free do not need
skills that you can for free do not need to pay anything see you soon bye-bye
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.
Works with YouTube, Coursera, Udemy and more educational platforms
Get Instant Transcripts: Just Edit the Domain in Your Address Bar!
YouTube
←
→
↻
https://www.youtube.com/watch?v=UF8uR6Z6KLc
YoutubeToText
←
→
↻
https://youtubetotext.net/watch?v=UF8uR6Z6KLc