This content details the process of designing a scalable audio transcription system, covering functional and non-functional requirements, system architecture, and key considerations like auto-scaling and monitoring, as presented in a senior software engineer interview.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
Hey there, this is Akshin Milan and
welcome back to a new video. And in this
video, I'll be walking you through a
problem statement that was given to me
uh during the interview uh for senior
software engineer position. The company
I cannot tell the exact name but it was
a London based company and it was a
remote opportunity. So as soon as I uh
went on a meet with the interviewer,
there were two interviewers and they
asked me to share my screen and in the
meet chat they put this problem
statement. Design a system that allows
users to upload audio files and receive
transcriptions. The system should handle
100,000 audio files per day with an
average file size of 20 MB and duration
of 10 minutes. User should receive a
notification when the transcription is
complete. So this was clear that this
interview is not just back-end concepts
interview like uh my expertise were in
Python for that interview for that job
role and I realized that this is a
system design interview and I have to
design the system because the company
was AI data processing company. Uh this
is a good statement because here I have
to design a speech to test uh
speechtoext transcription system that
means users will be uploading audio
files and they should be getting
transcriptions back of that audio file.
And for your information this was round
three for this position uh in that whole
interview process. So now I've got this
statement. So I realized that in this
system I might be uh working with some
load balancers for handling load because
audio files are uh big in number and I
was pretty sure that during this whole
interview they will ask me to handle the
spikes like this 100,000 may become
300,000 at uh speak spike duration like
spike times right and like I have to do
some back of the envelope uh
calculations uh like I have to calculate
what will be the cost I'll be bearing
for this whole system per month or what
will be the total uh storage I'll be
handling in in like one year or in five
years. So I was clear that I have to
handle this but I did not start putting
the questions in front of the
interviewer right from there. I I I went
into a flow and I'll tell you how I
navigated through the interview. So the
first thing that I told him that just a
clarification question hey so okay I
read this statement and we will be
building a speechtoext transcription
system and I see that in a normal uh day
like in a normal day we are going to
handle 100,000 audio files okay and one
file is 20 MB and duration is 10 minutes
okay that is cool uh just to uh go one
step further these are the functional
requirements that I can think of and I
have some clarification I have some
follow-up questions that I would like
you to clear for me. This was my
statement and then so I used this
statement this platform itself which is
app eraser and just like now I'm writing
I was writing in that interview as well.
So what I did I wrote a heading
functional requirements okay so what
type of languages or which are languages
we are going to support uh is it just
going to be for English or is it going
to be for English, Spanish, German which
are languages right? So this question is
important because you're going to use an
AI model. It can be an LLM or it can be
your own fine-tuned model. We won't go
into that depth but the language is
going to be important because if you're
using an opensource LLM or a model then
you have to be sure that that LLM
supports these languages. So he told me
that English and Spanish are the two
languages that we are going to support.
And as soon as he told me then I
mentioned him some of the models. So you
should have like a a bird's eye view of
a lot of technology. So I was I had
already worked with LLM. So I I was able
to give him a suggestion okay that so if
English and Spanish are the languages we
can go ahead with whisper model which is
And at this point of time I wanted to
actually know whether he's okay to use
OpenAI's model or not or he's going to
ask me to fine-tune a model. So he was
okay because this whole interview was
focusing on the backend and assist
system design rather than questions on
fine-tuning and all of the model
training stuff. So we went ahead with
whisper. This was all good. Once the
languages are fixed then I told him one
functional requirement that means that
is users should be able to upload the
audio files right. So this statement or
this requirement will uh help us know
that there should be an API an endpoint
or a system or a UI where the users can
actually upload their audio files. The
next functional requirement is system
should be able to transcribe the audio
to text. And the next statement or the
next requirement is we should get
notifications which means if my audio
transcription fails I should get an
email or a push notification. If it
succeeds I should get an email that your
text or the transcription is ready.
Perfect. So once these requirements are
clear then I went ahead with
non-functional requirements. So first of
all the scale scale of 100k files per
day spike time 300k files latency
latency
like how much time a transcription
should take. So so I went ahead with a
uh like a number I said 5 minutes should
be sufficient for transcribing any kind
of audio file and he was okay with that.
High availability that means my system
should be 99.99% available. So high
availability can be one requirement that
means if my uh load is increasing then I
should have some load balancers. So this
I did not tell at that point of time I
kept these pointers ready in my mind
when I will be designing the system.
Then I asked him the budget like what is
the budget monthly budget you are
thinking of to support this whole
system. So he he told me $50,000 per
month. So this might not cover
non-functional requirement but this is
going to cover in budget estimation.
Some of the interviews are more focused
towards uh budgeting and all. If you are
going ahead with solutions architect or
like you know principal architect kind
of roles then you have to also worry
about the budgets as well. One more
non-functional requirement was high
accuracy that means your uh
transcription should not be buggy like
they should be uh good. So 95% accuracy
is good. So high accuracy that means 95%
accuracy means like if there are 100
lines in your or 100 words in your whole
corpus that is generated at least 95
should be correct. So at this point of
time interviewer nodded and I was sure
that these requirements are sufficient
at this point of time. Now I went ahead
with some uh calculations uh so that we
know how much storage we need or how
much bandwidth we need right
calculations or estimations.
So 100k files
uh 20 MB per uh file and this is going
to come out to be so you have to convert
these numbers to terabytes. So this is
going to come 2,000 GB because 2 then
1,000 right? So 2,000 GB which is equal
to 2 terab this much data we are going
to need per day. So into 30
which is 60 terabytes per month. Now you
might not know the cost of S3 buckets
but it is you can ask the interviewer or
you can estimate like 002 per GB per
month and you can multiply it with this
and when you will do that it will come
around $4,100 per month for the S3
bucket storage and when you will be
giving a lot of system design interviews
all of these values you will
automatically remember like how to
convert GBs to TBS uh like calculating
the cost S3 bucket cost and all of these
otherwise you can even ask the
interviewer. Then comes the processing
cost like if you went if you go ahead
with openi whisper or google transcribe
amazon transcribe as transcribe then uh
the approx cost is 002 per minute. This
also cost my interview gave to me. So
for this we have 100k files and uh in
the question it is given 10 minutes uh
per file. So this comes out to be a
million minutes
and then you multiply it with the cost
per minute for Google transcribe or all
of those LLM models for transcription.
So as soon as I reached here, my
interviewer asked me to stop uh here and
actually go ahead with the uh system
design because we only had 1 hour. Uh
but you might not be stopped. So what
I'm going to do is uh all the other cost
like network cost or worker cost all of
those cost I'm going to put it in the
description so that you can do such
calculation and at the end you have to
just add all of these cost if your
interview is asking you to and then you
see whether the cost is fitting in this
budget or not. Great. Now let's go ahead
with very first diagram that I drew on
the screen which is a very high level
architecture diagram and as I'm going to
explain the same explanation I was
giving to my interviewer as well. So
first of all for uploading the audio
file we need to provide a client uh
application to the user. So because the
user has to upload the audio files uh so
we need a client application like a
mobile app or a web app where the user
can upload the uh audio files. So I drew
like a rectangle and sorry yes and I
called it a client app and this
application is only be uh responsible is
only going to be responsible for taking
the audio file or uploading the audio
file. Next thing is we need a API
server. Now first thing that I told him
is that hey you can the user can
directly upload the file to a server and
server can be responsible for uh going
to whisper or kind of a system which is
going to transcribe and give the
transcription back. But this system is
going to be synchronous like user has to
keep the application open for those 5
minutes 10 minutes till the time the
transcription is happening. And in
between if the user closes the
application, closes the tab, the
connection will be lost and the
transcription will not be received and
this is going to be wastage of
resources. So this system is called
synchronous system. I asked him do we
want a synchronous or asynchronous
system. He mentioned we need an async
system. So in that case what happens uh
the user do not upload the uh file the
entire like 5GB file to server. In fact
what we do is the server so I'm going to
have a server over here. now uh which is
my API server.
You can have like a load balancer also
which I created at the end when he asked
me to uh handle the scale. But let's say
for now we only have this API server and
this API server is going to be connected
to S3 buckets. So in S3 buckets we have
a concept called pre-signed URLs. What
you do your client app just create just
calls a very simple API on API server
and API server goes to S3 buckets. We
have S3 buckets over here, right? So,
first of all, the client app is going to
the API server that hey, a new user with
this user ID is trying to upload a file.
Uh, we need a pre-signed URL. So that
URL is called pre-signed URL which API
server goes to S3 and says hey S3 just
creates a just create a pre-signed URL
which I can give to this client app and
then using on on that URL the client app
can directly upload the file on S3.
Okay. And the API server won't be busy
during that uh time. Right. So now what
happens? API server gets the pre-signed
URL which is valid for let's say 1 hour
or 1 day and API server gives it back to
the client app and client app uploads
the audio file directly to S3 buckets.
So with this approach what happens our
API server won't be busy till that audio
file is getting uploaded because here we
are just talking about one audio file
but as we see we have 100,000 audio
files and just assume if 100,000 audio
files are getting uploaded to this API
server at the same time what is the load
that is API server is under right so
that's why this approach is good and
this is a very simple approach all right
okay perfect now what happens as soon as
the audio uploading is done S3 buckets
receives the final object and Now the
audio file is with the S3 buckets.
Inside that S3 buckets we have a system.
So we are going to have two S3 buckets.
Basically one S3 bucket. So I'll just
create another S3 bucket right which we
which we are going to use later. Don't
worry about that. This S3 bucket is
going to store all the audio files. So
in this S3 bucket we will have a file
structure like the user ID. Then we have
the file ID like the file right this MP4
MP3 WAV file. This S3 bucket will be
responsible for storing the transcripts.
the final transcripts. Let's not worry
about that. So now as soon as the S3
bucket receives the final object, final
video part chunk uh client no client in
the on the client side we can send like
a window hey your file is uploaded. Now
wait for 5 minutes 10 minutes and you
are going to get an email when your file
transcription or if audio transcription
is ready. Perfect.
Now what S3 buckets does is it creates
an event. So event-driven architecture
we going to use. So one more thing that
we have missed over here is uh when
client app was asking for a pre-signed
URL from API server. API server also
creates an entry in our main database.
Now we are going to have one more system
over here which is RDS and this is our
main SQL database of our system and when
client app asked for a pre-signed URL
for from the API server the API server
gives the pre-signed URL to the client
app but also creates an entry in our RDS
that hey this is the user let me yeah
this is the user ID right and this is
the time stamp at which this user asked
for a pre-signed URL that means started
the process and this is the status of
this job right and at this point of time
the status will be pending or it can be
uploading right let's go with pending
simple one right so let's just read this
yeah so status is pending so at this
point of time the status of our process
of our job is pending okay now coming to
the process again now let's come back to
our process so at this point of time S3
bucket has received the file completely
and now it has to notify someone that
hey we have received this file now we
have to go ahead and process this file
right so one way is the S3 bucket
notifies the API server itself that hey
the file is ready now whatever you want
to do go and process it and then API
server might fetch the file download the
file but again the problem will be the
API server has to download the file so
again uh API server will be busy and a
lot of load will come on that so we
cannot notify the API server Instead of
that we are going to use a Q system
right and this is the place where
interview is going to judge you like
what are the uh system components that
you are going to you utilize for
handling such scale which is 100,000 so
you're going to use a Q system right so
go ahead and search for SQS which is
Amazon Q service and we are going to
have a Q over here
okay and we are going to the uh S3
bucket is going to send an event to this
uh que and this event is going to
contain the information about that job
ID. So over here uh with this data we
also have job ID so that we can later
update the status of this job. Okay. So
with the same job id is now appended to
this SQS. So this SQS might have let's
say 100 jobs in the queue right. Uh so
that number is called q depth right and
this qdepth number will be important for
scaling up and scaling down our system
we'll come to that later but let's just
assume that s3 bucket has notified this
queue and now we have an event or a job
inside this SQS
now who is going to process these jobs
who is going to complete these jobs
right so we will have a fleet of EC2
instances which are called workers right
so I'm going to go and search for EC2
so You will see that our main API server
is not at all responsible for
downloading the file, processing it,
giving it back, notification. No, API
server is just there for like a central
unit which is going to orchestrate some
of the things. But we have a fleet of
EC2 instances like three EC2 instances
or like let's say 100 EC2 instances
based on the scaling up and scaling
down. We'll come to that. But let's go
with three workers for now. So we have
one, we have two and we have three. So
we have three EC2 instances which are
workers and they are going to
continuously pull the jobs from SQS
something like this. So these EC2
instances which are workers, they are
going to pull the jobs from this queue.
Okay. Now the question is how to handle
failures right. So let's say this AC2
instance picked up a job with job id
let's say a right and it tried to
process that file but the job failed
somehow. So it is not going to discard
that file in one go itself. it is going
to go ahead and put that file or append
that file again to this queue.
Okay. And maybe next time some other
worker picks it up, right? But there's a
concept called exponential back off that
let's say we tried our we tried to
process a file, it failed. Next time it
will try it after 5 minutes. Next time
it will try it after 25 minutes. Next
time it will try it after uh 125
minutes. So something like this. So
there should be exponential time gap and
we also sometimes use a different Q for
that. So you can have a different Q
right for handling all the failure cases
that is called dead letter Q right. So
this is our dead letter Q DL Q and this
is very important to mention in your
system design interview that this is
something you have thought of and you
know such uh concept right. So if it
passes then we are going to proceed. Now
if it fails the queue uh sorry the
object or the job will go to DLQ and
there we have a concept of exponential
backoff. Perfect. Right. So let's
connect this fleet with this DLQ as well.
well.
So now we have understood the concept of
Q. Now instead of SQS we can also go
with Kafka and we have fleet of workers.
Uh and we have a DLQ and you also know
exponential backoff. Great. These are
the concepts which are very important to
mention in such system design. Now when
to update this job right so if the job
fails once okay no problem we put it to
dq till now the status is still uh now
as soon as the worker takes up a job I
missed this uh when the worker let's say
this worker a picks up the job it is
also going to update this uh status
which will be processing
okay so now the status status of this
job is processing and not pending or not
uploading. Okay, if the uh job fails
three times, let's say we have a maximum
retry count of three. If it fails three
times, then the job status will become
failed and we will send a mail to the
user that your file is corrupt or we
cannot process this file at this point
of time. Right? If it succeeds, okay, we
say success
and then the concept of alerting will
come into play. like we have to notify
some of the units of the system design
uh so that that unit notifies the user
via an email. Okay. And before we close
this whole loop of this job and we uh uh
go ahead with notifications, you need to
know that these workers which are
responsible for processing the file and
creating a transcription, they will have
an LLM or a model already loaded. And
this is a question that came to me like
how are how are you going to ensure that
these uh workers do not take a lot of
time in loading the models. So the
answer is you have to uh preload these
models so that every time a new job
comes the worker in its CPU in its uh
processing unit already has the model
loaded right. So this is an answer for
that and all of these worker uh workers
have the model preloaded as soon as a
new job comes they use the pre
pre-loaded LLM or the model and they use
it for transcription. Now once the
transcription is done two things are
going to happen right. One thing we
already saw that this status will be uh
made success from processing. So this is
one thing. Next thing is the job has to
be deleted from the SQS so that again
that uh job is not processed. That is
second. One more thing you have to have
uh actually two more things that you
need to do. One is you you will have
another S3. So I'll go ahead copy this
and you will have another S3 and this is
sorry uh this S3 is for storing the
transcriptions files. So transcriptions
file transcription files are going to be
for are in txt. So this S3 was for
storing WAV, MP4 all those file and this
S3 bucket is for storing the text files.
Okay. So this worker uploads that text
file to the S3
right. Similarly all the workers upload
the S3 file uh like file to the S3 and
next thing that this worker does is we
have another Q. Okay. So we have another
Q. So we have three Q's in the system
and this Q is for notifications.
Okay. So what we have done we have
updated the job status in the f in a in
a main database. We have uploaded the
transcription to this S3 bucket and
maybe this location can also be saved
with this uh main object that this is
the place where the final transcription
is saved. Okay. And the final thing we
have to put a job again to this
notifications uh queue. Okay. And the
job contains the job id and maybe the
status in the metadata that this is a
success job. This is a failed job.
Right? And similarly over here if the
job fails even after uh three uh retry
counts uh then also the job will go to
notifications with the status of failed
and the metadata failed. Now this Q will
also be attached like a worker to a
worker and worker that worker is for
handling the notifications. So we have
another EC2 which is like maybe like
just one EC2 right and this EC2 is also
continuously pulling this notifications
uh SQS and this worker which is like the
notif worker is just for sending the
emails or push notifications to our uh
client. Okay. So we send the
notification back to the client that hey
your job which you submitted at this
time or your file which you submitted at
this time is now ready for downloading.
This is the link and that link will
actually be this and yes you can have a
download URL from S3 which you can use
for downloading in that particular time
frame itself. So that download URL will
have a time to live like a expiry and
within that if the user downloads that
is perfect otherwise that link will not
work right. So that's why sometimes you
get an email and you don't have uh then
after one week you cannot download it
again because uh the URL is expired and
yes you can have a worker or a system
that can actually um like revise that
URL as well. Okay, but till now this is
cool. So we have closed the loop just to
revise then we'll come to the follow-up
questions which came to me after I gave
this highle architecture diagram.
Client app wants to upload a file. Okay.
uh and he or she wants the
transcription. We go to the API server
and we ask the API server to give a
pre-signed URL of the S3 bucket so that
we can upload the file directly to S3.
API server goes to S3, brings a presign
URL, gives it to the client with that
the uh API server also creates an entry
in the SQL database with this user ID,
time stamp, status which is pending and
job ID. Okay, perfect. Once the S3
bucket has uh uploaded the file
completely, S3 bucket uh sends an event
to the SQS with the job ID. Okay, the
worker fleet which is there is there
pulling this SQS and they are picking up
the jobs. If the job fails, the job goes
to the DLQ, dead letter Q, right? If the
job fails, even after three maximum
retry counts, the job fails, we sends
the job, we send the job to the
notifications and notification worker
will pick up the job. It finds that hey,
this is a failed job and sends an email
to the client that hey your file cannot
be processed. If the worker successfully
completes the transcription via the LLM
which is preloaded, then it saves the
transcription in the transcription S3
bucket and also updates the status as
success. Even for the failed, the status
will be updated. Right? And then it
sends the job to notifications Q and
again the notification worker picks it
up and sends a successful email to the
client app. Right? So this is how the
system looks. Now the first question
that came to me why you used Q? What is
the what are the advantages uh that your
Q is providing to this whole system
design? Okay. So now the question comes
from the interviewer. How are you going
to handle the autoscaling? How is your
system going to handle the autoscaling?
Right. And I think this was the question
uh that made them uh select me right
because this is something that you need
to know if you are going ahead with this
with such system uh where cues are there
where workers are there and this is a
very basic concept then okay so we have
to handle autoscaling right so that
autoscaling will happen that should
happen on some number if that number
crosses the upper limit it should be
scaled up right the system should
automatically scale uh up, right? If the
number crosses below the lower limit, we
need to scale it down. The system should
automatically scale itself down. This is
the concept. What is that number? So
that number is going to come from Q
depth. Okay. So you need to know one
concept which is qth.
Q depth is the number of jobs which are
currently there inside your que. So that
is called qth. And one more metric which
is q depth per worker. Okay, per worker
divided by the number of workers.
Okay. So these this metric let's delete
this one. This is the metric. If it
crosses the upper limit, which we're
going to come right now, uh we need to
scale up that means we need to increase
the number of workers. If this skew
depth per worker decreases or it crosses
below the lower limit, we need to
decrease the number of workers which are
active that means we need to scale the
system down. Okay. So this is something
that you have to mention. Now give an
example. Let's say the upper limit is
500. That means if my QEP per worker goes
goes crosses
crosses
above 500 like it goes above 500 then we
need to scale it up. So let's take an
example uh for scaling up. Let's say you
have 10,000 messages in the queue,
right? And you have let's say just 10
workers in your fleet. So it is,000,
right? 1,000 is the queue depth per
worker. That means,000 jobs need to be
handled by one single worker, which is
too much. So you need to scale up
because it crosses above 500. Usually
500 is a good one for such jobs. uh
let's say 30 second is the job time per
job time so I think 500 is a good one
and interviewer also agreed with me so
thousand in this case where your Q depth
per worker is thousand that means you
need to scale up so the strategy is you
either increase the number of workers by 20%
20%
or by 10
okay so one number is percentage and
another number is absolute right uh so
if we have 10 workers. So which is
greater? You you take that number. So 10
right 20% of 10 is two right. So either
you add two more workers or increase it
by 10. Whichever is greater you do that.
So 10 is bigger than two. So you will
add 10 more workers. So now your total
workers are 20. Okay. So this is how
your system autoscaled it up. You
previously had 10 workers. Now you have
20 workers. Okay. Let's say again this
scenario comes and you again have uh
like you again your Q depth per worker
increases above 500. So again you your
system has to make a call and it has to
scale it up. So again 20%
20% of 20 of 20 or 10. So 1 percentage
and one absolute. So 20% of 20 which is
four right? So four or 10 10. So you
will make it 30 workers. So now you have
30 workers. Again 20%
of 30 which is going to come six or 10.
So you will again go 10 and you will
have 40 workers. Again let's say again
the spike is there and you again have to
scale it up. So 20% of which is eight. 8
or 10 again 10. So you will have 50
workers. Now the scenario will come
where either 20% of 50 or 10 both are
10. So you'll again have 10. So you will
have 60 workers, right? So you see how
your system is scaling it up. Okay? So
this is your autoscaling strategy,
right? And 60 now 20% of 60 which is 12
or 10. 12 is greater. So you will not
have 70 workers. Now you will have 72
workers because you will have 12
workers. Okay? Which is 20% is greater
than 10. So you will have so this is how
if your uh if your load is increasing in
such quantities like so much load is
coming then you will have percentages
percentages winning over the absolute
numbers. So this is how your scaling up
will work. Similarly scaling down uh
let's say we have
the number the lower limit of 200. So if
your Q depth per worker goes below 200
then you need to scale it down. So you
either remove 10 workers or you decrease
it by 20%.
Okay. So if you have let's say 50
workers. So 20% of 50 is 10 or 10. So
both are same. So we'll make it 40. So
50 will become 40 workers. So you see
how the number of workers are going away
like how we are decreasing the number of
workers. So 40. So again 20% of 40 which
is eight. You have to remove eight or
you will have to remove 10. So 10 is
bigger. So you'll remove 10. So 38 it
will become. So like this your workers
are decreasing.
Perfect. So this is an example you have
to give to the interviewer so that he
understands that you know such strategy
uh when you have cues and workers in
place in your system and how your system
has to scale up and scale down. So with
this you can also say that this is the
reason this autoscaling is one of the
reasons I went ahead with selecting a Q.
Right? Another reason is Q also act as a
buffer, right? If your spike is coming
up, right? If your load is increasing,
then uh if a Q was not there and only
you had an API server, then your API
server would crash, right? Because so
much load is coming on one single API
server, it would crash. But Q acts as a
buffer because it's like a a Q then your
jobs will keep on getting appended and
for that duration when the spike is
coming up like it is changing from 100k
to 200k 200k to 300k at that time your
work like your que will accommodate more
and more jobs and your system will not
crash right but in case Q is not there
and you just have an API server your
system would crash. Next is alerting. We
did not spend a lot of time on alerting
but uh my interviewer did ask me which
all metrics you are going to track. So
monitoring and alerting is coming under
one umbrella. So which all metrics you
are going to track. So I've already kept
the metrics ready uh which I usually say
for such system designs like you have
business metrics uh and you have like
your system metrics right so for
business metrics how many jobs created
per minute. So these are the metrics
which will be analyzed. Yeah. So these
are system metrics and this is the infra
infra metrics. Okay. So yeah, so these
are the metrics which will be tracked by
the business team. So how many jobs are
created per minute? So that they can
also do business analysis and all. So
how many jobs created per minute? Is the
traffic growing? Jobs completed per
minute. Is the throughput keeping up?
Jobs failed per minute. Are we having
quality issues? So P50, P95 and P99
latency, right? So uh transcription
confidence scores. So when you are
uploading the transcription file txt to
the transcriptions you also upload the
confidence scores file so that the
machine learning team can also take that
that scoring file and they can also
improve their system if they are
training their own models. So confidence
score actually tells you this is the
actual word. This is the predicted word.
This is the delta. This is the
confidence. So that data also needs to
be stored and analyzed. So that will
come under metrics. Okay. Cost per
transcription. how much cost you're
bearing for doing one transcription and
you have to optimize that cost right
system metrics SKSQ depth so that
autoscaling strategies can be built upon
worker count how many workers are active
in a day on an average right is the
autoscaling even workinger CPU
utilization database query latency
database connection count API error rate
API latency then infra eC2 instance
health for how long the EC EC2 instances
went down S3 operation latency network
throughput, disk utilization, all of
these metrics you need to track or just
keep them in mind and at least mention
them when you are having such
interviews. one question which again
which was a final question uh for me uh
what if someone uploads a 2-hour long
podcast a very long podcast for
transcription in that case your one
single file will be too long too heavy
and in that case one AC2 instance might
fail so in that case you also need to
have a chunking strategy in place like
one worker can be there for chunking or
even here itself while uploading also
you can have chunking in place where one
like 2our podcast will be divided into
10 10 minutes videos or of audio audios
and all of those chunks will be handled
by various workers and at the end your
whole transcription file will be patched
up and that one single audio file will
be given to the transcription S3 bucket.
So this is the answer like chunking
strategy is the answer. And one more
thing the final thing is uh what if
someone is a uh very content creator and
he is like a very influencer kind of
person and he has uh asked for uh he has
published a podcast in our system and a
lot of people like a million people are
trying to access that transcript. They
want to read the captions in that
podcast. So for that thing you have to
uh handle the case via CDN. Cloudfront
AWS CloudFront is there. So that this
transcription files are not served by
the S3 uh on that instance or by by your
system instance. It will be served by
the cloud front. So there will be one
single get request to the S3 and then
that uh transcription file will be
cached uh at the CDN. So there is CDN
cloud front.
Cloudfront is there. Yes. So then uh at
the time of get when the users are
trying to get the transcription file you
can also have a cloud front in place so
that we don't have to go to the S3
bucket and we don't have to query the S3
bucket for transcription you can just
fetch the file via the CDN
that's it so this is how your system
design might look and he might also ask
you how will you handle multiple API
servers so in that case you have to
answer load balancer so AWS ELB or an ALB
ALB
application load balancer. Application
I think this is the load balancer logo.
Yes. So you can have application load
balancer and then you can have various
API servers. Okay. So this load balancer
will uh like decide which request should
go where. So there can be various
strategies for that rate limiting and
all those strategies are there but let's
not go into that. So this is how your
system design might look. This is ugly
but this is how it looks. Uh and I hope
you learned something new in this video.
Till the next video, keep coding, keep
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.