This content is an interview transcript where a candidate, Sep, discusses their experience in data engineering, focusing on medallion architecture, SQL, Python, and data lake concepts.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
Hi Sundi, my name is DJ and today I'll
be conducting your interview. Please let
me know about your experiences and tell
me something about yourself.
>> Hi, hi Raj. Uh thank you very much for
giving me the opportunity uh for this
interview. Uh my name is Sep and I have
been working in data engineering domain
since past 2 years and uh I have been
working on multiple technologies like uh
Pispark. I have mostly worked with AWS
cloud uh technologies and uh I have
worked with the project which follows
the medallion architecture. So it is a
batch data processing pipeline on which
I have worked on and uh that is my
overall experience.
>> Okay. Okay. Good. So I'll start with
your first question actually now. So as
you have mentioned I'd like to know what
is the purpose of bronze, silver and
gold layers in medallion architecture.
Could you please?
>> Yes sure. Uh in the bronze, silver and
gold layer uh we keep the data. So the
data which we capture from the sources,
we put it in the bronze layer and that
data is completely raw. There are no
data quality checks performed on it. We
extract the data and put it onto a
bronze layer like a folder on S3. And uh
later we perform some data quality
checks. After performing data quality
checks, we write the same data to silver
layer. While writing data to silver
layer, we might also go for the data
modeling part. So the data which is
present onto the silver layer is going
to follow some data quality rules as
well as it is going to have its own
model. Onto the gold layer what we do is
uh we keep analysis ready data. Onto
gold layer we have the summary tables as
well as the aggregations so that data
analysts data scientists can directly
read from that particular t from those
particular tables and uh they can
perform their own analysis. So from my
point of view, this is the purpose of
bronze, silver and gold layer in the
medallion architecture.
>> Mhm. Okay. Um I'd like to know more
about the bronze layer. So tell me why
do we keep raw data in bronze layer as imitable?
imitable?
>> Uh if we extract the data and we keep
that data onto bronze layer, the benefit
of it is we don't have to extract it
from the sources again and again. Let's
say in the pipeline if there is any
issue which has occurred and if we don't
have the raw data stored in some layer,
we might have to extract it from the
source. And the challenge with that is
if the data has got deleted from the
source, we will not be able to extract
it. If we have data in the bronze layer
and let us assume that the pipeline gets
crashed, we can any time refer to the
data which is in bronze layer and we can
replay the pipeline and that is the
reason we have data in bronze layer and
we normally keep it immutable. We don't
modify it.
>> Okay. Okay. Fine. We'll move to the next
question. So uh tell me how do you
handle duplicate records and partial
files when multiple files arrive for the
same day?
>> Uh if multiple files arrive for the same
date uh the other metadata of that
particular file is going to be different
based on the metadata like the last
modified timestamp as well as the name
of file. Uh we can judge that for a
certain date we have received two files.
Then we also need to check the record
level duplicates. In pispark we have
fill NA and drop NA. So we can also fill
null values with some uh dummy values
and we can also drop these values. So
with the help of drop na it is possible
to drop these values. So in pispark
there are transformations readily
available to deal with record level
duplicates. So if it is a file level
duplicate we will check it through the
file metadata and if it is a record
level duplicate then we are going to use
>> Okay. So previously you mentioned that
in your project you did work on the
bronze silver and the gold layer. So I'd
like to know about what kind of
transformations did you perform while
moving from bronze layer to silver.
>> Sure. Sure. Uh the data used to be kept
in the bronze layer and that was the
completely raw data. What we used to do
is to convert that raw data into a data
which data analysts and data scientists
can analyze and come up with some
insights. We had to perform data
cleaning, data enrichment as well as
data modeling onto the data which is
present onto the bronze layer and then
we used to keep that data onto silver
layer. The data which was there onto
silver layer we used to perform the
aggregation logic and that data used to
be uh kept onto the gold layer. So while
performing data cleaning we heavily used
dduplication the drop in a
transformation. While checking the data
quality rules, we used filter and uh we
also used case expressions uh so that if
we if there is anything like condition
based if we want to make any changes to
the data then we used case expressions
as well. Uh while talking about the
enrichment part uh of course we went
with select exprition in pispark and
while we were calculating the aggregates
we went with the group by transformation
we also went for the window operations
and uh in case of the data modeling part
we have to heavily use the join
operations. So these are the common
operations which we have used in our project.
project.
>> Okay, cool. And um in all of these
actions, how did you handle the schema
evolution during injection?
>> Uh yes, schema evolution was a challenge
for us as well because onto the source
level when the schema changes the
pipeline needs to be modified or updated
alongside and uh if these schema changes
were made without our knowledge then it
can also break our pipeline and these
events have occurred as well in the
past. So what we did was uh we defined
the schema uh initially and uh after
that whenever the data arrives there is
a way in spark using which we can infer
the schema of the file or the content
which we are able to see. So we used to
infer the schema and we used to match
the schema with the schema which we have
defined with the schema which we are
expecting and if there was any mismatch
we used to raise the flags then we used
to deal with the data. Anyways, uh it
was a strict policy that if there is any
change in the schema, the data
engineering team needs to be notified so
that the pipelines can be updated. Uh uh
regardless, uh we always got the updates
and uh as soon as the schema has got
changed or evolved, uh we used to modify
the pipelines. That's why the uh volume
of errors associated with the schema
evolution was less.
>> Okay. Okay. Fine. U moving ahead I'd
like to know um how would you design a
multi-reion or cross crosscloud
medallion architecture since now that we
have spoken about bronze silver layer
and how to move ahead with it but uh I'd
like to know how would you design a
multi-reion or a crosscloud medallion architecture
architecture
>> okay at my current experience level uh I
didn't had any opportunity to uh work
with uh designing the medallion
architecture that job is majorly taken
care by the architects who have designed
the project uh my job is mostly around
the pispark part like if there are any
changes the tickets will get raised and
those tickets will be allocated to us
and we just resolve those tickets. Uh
these tickets are mostly around the
processing part. Uh there might be some
tickets around the orchestration part as
well. Uh but when it comes to
architectural level stuff uh I'm not uh
currently on a level where uh I have
explored these things yet.
>> M okay that's understandable. It's okay.
Not an issue. Okay. Um I've understood a
little bit about how you've been working
on your projects. Um the next section
that I'd like to speak to you about is
SQL. Okay. And we'll include some
machine test questions as well for you
to explain. Okay.
>> So tell me about inner join, left join
and full join and do explain with examples.
examples.
>> Okay. So uh is it fine if I can share my
screen so I'll be able to explain much better?
better?
>> Yeah, certainly. Go ahead.
>> Is it is it visible now?
>> Yeah. Yeah, it is.
>> Yeah, go ahead.
>> Uh so let me start with the inner join.
uh in case of inner join if we have two
tables like we have a table called T1
and we have a table called T2 and in
case of inner join what we have to do is
uh we have to pick some joining key
based on which these two tables can be
joined and we mention the joining
condition as well. So in T1 as well as
in T2 uh whenever the joining condition
is true which means uh the join key
exists which exists in T_1 it also
exists in T2 then these two records are
going to get joined. So all the matching
records from T1 and T2 will be there in
the output in case of inner join. In
case of uh left join, the records from
left table like if this is the left
table and this is going to be the right
table. So the records from left table
will be joined with the right table and
uh if there is no matching record then
in front of that record we will see null
values in case of left. So in case of
left join we will get all records of the
left table but we will not get all we
might not get all the records from the
uh right table. We will just get the
matching pairs and if the records from
right table are not matched we'll just
get the null values in place of the
records of the right table. In case of
right join uh the scenario is exactly
reverse. In case of right join all
records from the right table are going
to be visible. All records of the right
table are going to be visible and their
respective matching uh records are also
going to be visible. But for the records
of right table, if there are no matching
records in the left table, we'll be able
to see null values. And in case of full
join, we'll get to see all the records
of for a certain keys. And for the keys
who don't have matching records, either
in the left data frame or in the right
data frame, we get to see the null
values. Uh so these are the uh join uh
types, I guess. And uh if if you if you
want anything else or if you want me to
dive deeper, please let me know. Um I'd
like to know about row number. Write an
SQL to remove duplicates using row number.
number.
>> Okay. Okay. This query uh will be
removing the duplicates. Uh and uh in
this query of course we are deleting the
records from the users and uh of course
this is the syntax for it.
>> M okay. Okay. Fine. Okay. Write SQL to
list customers who did not place any orders.
orders.
>> Okay. So uh for the customers who have
not placed any orders uh I will consider
two tables. One is the customers table
and another one is going to be the
orders table. So let me just write the
SQL query for it. I think uh this query
is going to resolve our issues. Uh I
have considered customers table as well
as the orders table and used left join
to figure it out.
>> Okay. Okay. What is the difference
between group by and partition by?
>> Okay. Uh in case of group Y we are
performing the aggregations like uh uh
is it fine if I share my screen to
explain this topic?
>> Yeah no problem you can use your screen
whenever it is comfortable for you to
explain your answers.
>> Sure thank you.
So uh when we talk about the group
operation uh if if let us say I have uh
multiple records with different
different country codes like India then
US then again I have records for India
again I have record for US and I have
multiple records. So if I'm talking
about the group by operation. So group
by operation is going to group all the
rows based on the uh based on the column
using which we are performing group by.
If I'm performing group by using country
code column, it will just create groups
of all rows associated with the
respective country like India and all
records of India are going to be in the
group. US and all records for US are
going to be in a group. Uh same is the
case if there are any other countries
and and all the rows in in in this
group. Uh normally group by is a
two-step operation. First we go for
group buy and later we go for
aggregation. Let's imagine if I'd like
to go for average revenue earned or
total count of records or the sum of
revenue. Uh these kind of use cases are
possible through the group by operation.
Talking about partition by the partition
by operation is completely different. In
case of partition by of course the logic
is same but it is not preparing groups.
It is just partitioning the records like
all records of India are going to be
let's say in a single partition. We can
say all records of US might be in a
single partition. Uh so respective
countries will be stored in a respective
logical partition. And of course these
are going to be the full rows not the
groups like we get in the group by
operation. The partition by is mostly
useful in case of the uh window
operations where we can perform certain
window functions like row number rank
dense rank on top of these. So according
to me this is the difference between
group by and partition by.
>> Okay. Uh the next question is how would
you find the second highest distinct
salary from the employees table?
>> Sure. Sure. Uh let me just write a SQL
query for it. Okay. Uh so this is the
query using which uh we can find the
second highest salary.
>> Okay. Where?
>> Okay. Mhm. Find the top three highest
paid employees in each department.
>> Sure. Okay. So uh this SQL query is
going to uh return the top three highest
paid employees. >> Okay.
>> Okay.
Okay. What is the difference between
where and having?
>> Where and having? uh both are going to
be checking the conditions uh in in so
they are going to work on the boolean
boolean data type. However, the wear is
going to run uh initially before the
aggregation. Let's say in a certain
query if we are working with group by
and aggregation. So the wear clause is
going to get executed before the group
by clause and the having clause is
executed after the aggregation has been
performed. So this is the major
difference between where and having.
>> M okay. Okay. >> Fine.
>> Fine.
Tell me the difference between delete,
truncate and drop.
>> Sure. Sure. Uh delete is going to be
there. If I would like to perform some
record level deletes or some condition
based deletes from a certain table, I
can use the delete. Uh if I go for
truncate, in that case all data of the
table is going to get truncated or we
can say deleted. In this in both delete
as well as truncate the table structure
is not going to get eliminated. So the
table entry is going to be there in the
database. But if we go for a drop
operation in that case the data as well
as metadata it will get deleted. So this
is the major difference between these three.
three.
>> Okay. Okay. Cool. Um we'll move on to
the Python section now and we'll see
your skills in Python and how proficient
you are.
>> So we'll start with how to find the
duplicate elements in a list. So for
example, if there's a given list of
integers and you'll have to find the
duplicate values, you can assume the
numbers which you'll choose in the list.
>> Sure. Sure. So I have assumed these
numbers uh like 1 2 3 uh like these and
I have written this Python code uh and
this is going to remove the duplicates
uh from this list.
>> Mhm. Okay. But uh how would you remove
the duplicate values from a list while
keeping the original order? How would
you do that?
>> Sure. Sure. Let me write a Python code
for it. Uh I'm considering the exact
same list in the uh previous example.
And if we work with this code snippet uh
it would also preserve the order as we
are iterating through the list element
and just appending the unique element.
It will preserve the order of uh the list.
list. >> Mhm.
>> Mhm.
Okay. So let's suppose if there are two
lists and they have some common elements
in between the lists. So how would you
find these common elements?
>> Sure. Let me write a code snippet for
it. Uh in this example I have considered
these two list A and B. And in order to
get the common uh I have just removed
the duplicates using sets. And after
that I have converted it back to list uh
and it is going to give me the common
elements uh between both the lists.
>> Okay. And how would you count the
frequency of elements in a list using
dictionary? So now we saw your sets. So
I'd like to know how would you use
dictionary to find the count frequency
of elements.
>> Sure. Let me write a code snippet for it.
it. >> Mhm.
>> Mhm.
>> So uh this code snippet is going to give
me the frequency. In this case, I'm
iterating through the list and after
that uh we are just uh adding up the
element as we get one. So if there is no
entry, it will just go with zero and
after that it will incrementally add
values if it gets the duplicate record
based on which we can get the frequency.
>> Okay. Okay. And um how would you sort a
dictionary based on values in ascending order?
order?
>> Okay. So uh this is the code snippet uh
which is going to uh solve the concern
which we are facing.
>> Can you please explain this? How have
you used this lambda function?
>> Sure. Sure. See, in this case, uh what's
happening is uh we have defined a
dictionary which currently has three
elements. So these are three key value
pairs. What we are doing is uh we are
just uh calling the items function on
the scores dictionary. And in in this
course dictionary, I'm going to get list
of tpples which is going to have these
elements. And then what we are doing is
uh we are running a lambda function. So
this lambda function is going to be
applied onto each and every element. And
in here I'm saying x of one. So uh if if
this is the element in case of scores
dot items the a is going to be zero
index and 80 is going to be the first
index. So in this case the lambda
function is being pushed through the
sorted function and it will sort the
elements by the value which is present
onto the first index. So that is what
happening over here and whatever we get
we are converting it to dictionary and
we are printing the sorted dictionary
over here.
>> Mhm. Okay. Now that we're done like okay
with the dictionary. So I'd like to know
the difference between list and tpple.
At the same time I want to know the
difference between set and list. Do you understand?
understand?
>> Yeah. Yeah. Sure. Uh so let me explain
the difference between all four data
structures which you have mentioned. Uh
list or or all these data structures are
first of all collections. Collection of
elements. In case of list, list is the
collection of elements which might be of
same or different data types and it is
going to be mutable which means we can
add, delete, modify elements from the
list. And uh in case of tpple, tpple is
exactly similar to list but it is going
to be immutable. So we can't make any
modifications onto the tpple. Talking
about sets, in case of sets we don't
have the concept of indexing and
slicing. It is used to perform the set
operations like union, intersection, set
difference. These operations are
applicable onto an element which belongs
to the set data type. When it comes to
dictionary, dictionary is also going to
be mutable where we can add remove data
but it is going to handle the key value
pair data. So we are going to have keys
and values in the dictionary. We can
also have nested keys and values uh in
the dictionary. So that is the main
difference between all four data
structures which you have mentioned.
>> Okay. But then how will you
differentiate between a dictionary and a set?
set?
>> Sure. uh dictionary and set both use
curly brace to uh represent themselves
but in set you get individual elements
and in dictionary we have key value
pairs. It differentiates uh based on the
kind of data it holds.
>> Okay. Okay. Okay. So tell me have you
worked with data leaks? Do you have
experience with data leaks?
>> Uh yes yes I have worked with the data
links and uh in our project uh we have
configured data leak using S3 and glue catalog.
catalog.
>> Okay. Yeah, it's the same thing I was
actually going to ask you like what kind
of object storage like S3 or ADLS and
why is it preferred for data links?
>> Sure. Uh because when it comes to data
links, we are not looking for a
full-time running servers which are
going to handle our data like a proper
data warehouse. In case of data links,
we can put data onto the services like
S3, ADLS. These are object stores. And
in case of let's say S3 or AWS, the
storage cost is not going to be too
high. If you compare it with some
full-time data warehouse like uh red
shift. So in case of red shift we have
to maintain the servers. In case of S3
we just have to pay for the storage and
uh while configuring the data link all
we need is the metadata which can be
kept on AWS glue. Uh so that we can get
a data warehouse kind of environment on
the data which is currently present onto
S3. So we can directly perform queries
onto the data which is on S3. If glue is
involved in case of AWS and uh in case
of uh let's say data bricks we have the
unity catalog for a similar feature.
>> Okay. So would you say what you just
described is the purpose of glue catalog
or let's say meta store or would you
like to define it further or explain?
>> Sure. Sure. Uh the glue catalog is a
place where we can define the metadata.
Uh and based on the metadata we can
perform the queries. Like if I'm
defining a table then the table name,
table properties, columns, the data
types of the columns, all of these
things would count as the metadata and
the actual data would reside onto S3. So
whenever I'm connecting uh for the sake
of queries, I can connect through Athena
or Python script or even through my
application to the glue catalog and just
go for the SQL queries. So the purpose
of glue catalog is uh is is in in both
ways actually in order to access the
data in the SQL fashion and in order to
maintain the metadata and the metadata
versions that is also one of the uh
purpose of AWS glue that to AWS glue catalog.
catalog.
>> Okay. So okay fine. So tell me what is
the what is partitioning and why is it needed?
needed?
>> Uh partitioning is a feature which is
going to reduce the amount of data which
we are trying to process. Uh is it fine
if I share the screen? I'll explain this part.
part.
>> Go ahead. Please share your screen.
Let's consider we are not going through
partitioning and we have the records of
different different countries like
India, US, China and let's say again
some records from India, some records
from US, some records from China and
then I go for a query where I would like
to find the maximum revenue for just
India. In that case I'll have to process
all of the data from start to end. Let's
say if this is a data set of size 10
terabytes, I'll have to go out and
process all 10 terabytes of data. But if
we define partitions the data is going
to be uh stored in the respective
folders like all data of India is going
to be in the India partition or you can
call it folder. All data for US is going
to be in the US partition and the data
for China is going to be in the China
partition. So if in the where clause I
have mentioned India. So the query would
just get into the India partition and
process the data. It will save our
trouble to process the data for US as
well as China. This is one of the
benefit of uh partitioning and it is a
very powerful feature if used in a
accurate way.
>> Okay. Okay. Good. So moving on. I'd like
to know what is hoodie delta iceberg and
when do you use upserts?
>> Sure. Sure. Uh the hoodie deltas or or
we can say the iceberg. So the hoodie
and icebug have been added so that we
can go for the OLTP like rowle inserts,
updates and deletes onto the data which
is present onto the object store like S3
and uh we can also go for upsert
operation upsert uh which simply means
update plus insert. Uh if we don't have
hoodie iceberg in picture or to be
precise if we don't have hoodie in
picture the upsert operation is going to
get complicated because we'll have to
manually write the logic for upsert. In
case of hoodie it also maintains the
metadata. So upsert operation becomes
easier if we are using hoodie.
>> Okay. And how do you implement cost
optimization uh or choose between red
shift snowflake or bigquery?
>> Uh the issue is uh I'm not say I'm just
I'm just having two years of experience.
So I'm not into uh cost optimizations as
well as I'm not into selection of the
tools. Uh all I do is I try to implement
what has been assigned to my part. But I
do have a brief idea on Redshift uh
Snowflake uh and BigQuery. So if uh we
are going with the AWS ecosystem and we
want a full-time data warehouse then we
can go for red shift uh and let's say if
I want a data warehouse with all the
features and integrations and uh I'd
like to have it open for others uh other
features as well. We can go for
snowflake and uh if I'm going with the
Google's tech stack or Google cloud
stack then I can go for bigquery. I
might be wrong on this because I have
just a highle idea on these things.
>> Okay. Okay. Um are you familiar with
airflow? You're comfortable?
>> Uh yes, I'm quite comfortable.
>> Okay. So tell me what is a DAG in an airflow?
airflow?
>> Uh DAG is directed a cyclic graph. Uh we
have used airflow in our project. There
are multiple operations which we perform
uh into our project like we are inesting
ingesting data. Then we are performing
data cleaning and data quality checks.
Later we are writing data onto data
warehouse and we are validating that
data if uh everything is right or not.
So there is a sequence of operation
which we have to perform repeatedly. So
we can automate this with the help of
airflow. We can define each and every
step in airflow as a task and we can
choose the sequence of it and the
sequence of these tasks is nothing but
DAG and it has it it is written in the
Python language and we can submit it uh
and airflow can execute these DAGs. So
Airflow is not the one who actually
performing any heavy lifting. Airflow is
just acting as an orchestrator. It is
just triggering the activities which we
have to perform.
>> Okay. So then what would be the
difference between a task and a task instance?
instance?
>> Sure. Let's imagine I have defined a
task uh which says print hello.
>> So the definition of task is one thing
and let's say if that task gets executed
once so I will say that is one task
instance. If that task executes twice I
will call it two task instances. So the
definition of task is called as task and
the number of times the task is getting
executed and the time at which this task
is getting executed both things are
going to be called as task instances.
>> Okay. And what happens when a task gets
stuck in cued state?
>> Uh in in case of airflow if the task is
stuck in the ceued state there might be
multiple reasons. The very first reason
uh could be that the scheduleuler might
be down that's why uh the tasks are not
uh being launched. Another thing the
availability of worker. If the workers
are not available then also the task
might be in the cute state. Uh another
thing is if we have configured a pool
and all slots of that pool are blocked.
So there is no free slot for our task to
get launched. In that case as well the
task might be might get stuck in the
cute state and uh the kind of executor
we are using like there are different
different types of executors which we
can use with airflow like salary
executor, local executor uh then
sequential exeutor and if there are too
many tasks for the exeutor to handle it
will be quite difficult uh you know to
uh launch our task right away if there
are many waiting tasks. So the issue
might lie in executor or the
scheduleuler or the worker configuration
or the pool configuration. I will check
these four things and uh based on that I
will get the appropriate resolution.
>> M okay. But how will you use retries
then? How will you use retries in
>> air? Sure. Sure. In airflow for a
certain task or for for the tasks we can
define the retries and the retry
duration like if a certain task get
fails should we retry or should we uh
you know simply crash the complete
thing. So there are two things if we are
going to configure the retries. The
first thing is retry count and retry
duration. How many times we have to try
a certain logic and uh after how much
how long time like if a task gets failed
we are not going for immediate retry
because even if we go for immediate
retry we might have not worked with the
actual issue. So we can define the retry
duration and retry count for a task in
airflow uh that on a task level when we
define it with the help of some operator.
operator.
>> Okay. Um tell me have you had any
experience designing a CI/CD pipeline
for deploying an airflow tag? uh
actually I'm quite interested in CI/CD
uh and I'm trying to adapt many things
but uh currently it doesn't fall under
my uh responsibility. It has been taken
care by the senior data engineers. >> Mhm.
>> Mhm.
>> Okay. Okay. Fine. No problem. Okay.
Let's keep this interview moving and uh
we'll go to the next question. So let's
say if one source example there's an FTP
is delayed but RDBMS is on time. So do
you continue your pipeline or how would
you react?
>> Sure. If I have multiple sources uh for
my pipeline and let us assume that if
sources if one of the sources getting
delayed in that case I will first check
that uh if these two sources are
required for me uh in the later stages
of the pipeline. If both of the sources
or if the data coming from both of the
sources is absolutely required for me
for performing the operations which
means the sources are dependent on each
other when we are moving into the silver
or the gold layer. In that case, I will
have to wait for the arrival of data
from the another source. But if these
two sources are not dependent on each
other, I can uh I can let the pipeline
proceed further.
>> Mhm. Mhm. Okay. And uh describe to me a
situation where your pipeline failed and
how did you fix it? What what kind of
steps or actions did you take to fix
your pipeline?
>> Yes. Yes. Uh our pipeline has got
crashed several times. Uh there were a
few reasons behind it. One of the reason
was the uh out of memory error because
we misconfigured the resources. So we
diagnosed the issue root cause analysis
was done and uh we just changed the
configurations of our spark jobs and uh
eventually that error got resolved.
Another instance where the pipeline was
broke was because of the uh schema which
we have received. We were not notified
about the change in schema and the newly
updated schema was not going well with
our current pipeline and that is why the
pipeline got broke. uh we diagnosed that
issue then we marked those files as
corrupt files and we just took out what
we need for our pipeline from that data.
Later we evolved our pipeline to uh
consider the schema changes which have
happened and uh the third instance when
the pipeline was broke was because of
the corrupt data which we received from
the FTP source uh because that data say
we used to decompress the data but the
data was not in a proper format. So
there was issue initially while the
pipeline got executed. that issue was
again diagnosed and uh that issue was
also resolved. Uh but yes, of course,
the pipeline has got broken several
times. I might not be able to uh tell
you all the instances but yeah, these
two to three instances were quite major
which uh stuck to my mind.
>> Okay. Okay. Fine. Uh thank you Sep. So
your interview was fine. We'll proceed
further with your processing and get
back to you shortly. In the meantime, if
you'd like to ask any questions, please
do go ahead.
>> Uh yes, I have just one question. uh if
I ever get an opportunity to work with
you uh what kind of tech stack and
project architecture I might be working on.
on.
>> Okay. So the tech stack that we are
currently working on is the Azure data
engineering tech stack and uh the
architecture which we use in our
projects is the lakehouse architecture.
Get elected you'll be working on it.
>> Yeah. Yeah. I'm looking forward for this opportunity.
opportunity.
>> Certainly. Okay. Fine. Thank you Sep.
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.