YouTube Transcript:
They asked me this question for Senior Software Engineer position

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

This content details the process of designing a scalable audio transcription system, covering functional and non-functional requirements, system architecture, and key considerations like auto-scaling and monitoring, as presented in a senior software engineer interview.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

Hey there, this is Akshin Milan and

welcome back to a new video. And in this

video, I'll be walking you through a

problem statement that was given to me

uh during the interview uh for senior

software engineer position. The company

I cannot tell the exact name but it was

a London based company and it was a

remote opportunity. So as soon as I uh

went on a meet with the interviewer,

there were two interviewers and they

asked me to share my screen and in the

meet chat they put this problem

statement. Design a system that allows

users to upload audio files and receive

transcriptions. The system should handle

100,000 audio files per day with an

average file size of 20 MB and duration

of 10 minutes. User should receive a

notification when the transcription is

complete. So this was clear that this

interview is not just back-end concepts

interview like uh my expertise were in

Python for that interview for that job

role and I realized that this is a

system design interview and I have to

design the system because the company

was AI data processing company. Uh this

is a good statement because here I have

to design a speech to test uh

speechtoext transcription system that

means users will be uploading audio

files and they should be getting

transcriptions back of that audio file.

And for your information this was round

three for this position uh in that whole

interview process. So now I've got this

statement. So I realized that in this

system I might be uh working with some

load balancers for handling load because

audio files are uh big in number and I

was pretty sure that during this whole

interview they will ask me to handle the

spikes like this 100,000 may become

300,000 at uh speak spike duration like

spike times right and like I have to do

some back of the envelope uh

calculations uh like I have to calculate

what will be the cost I'll be bearing

for this whole system per month or what

will be the total uh storage I'll be

handling in in like one year or in five

years. So I was clear that I have to

handle this but I did not start putting

the questions in front of the

interviewer right from there. I I I went

into a flow and I'll tell you how I

navigated through the interview. So the

first thing that I told him that just a

clarification question hey so okay I

read this statement and we will be

building a speechtoext transcription

system and I see that in a normal uh day

like in a normal day we are going to

handle 100,000 audio files okay and one

file is 20 MB and duration is 10 minutes

okay that is cool uh just to uh go one

step further these are the functional

requirements that I can think of and I

have some clarification I have some

follow-up questions that I would like

you to clear for me. This was my

statement and then so I used this

statement this platform itself which is

app eraser and just like now I'm writing

I was writing in that interview as well.

So what I did I wrote a heading

functional requirements okay so what

type of languages or which are languages

we are going to support uh is it just

going to be for English or is it going

to be for English, Spanish, German which

are languages right? So this question is

important because you're going to use an

AI model. It can be an LLM or it can be

your own fine-tuned model. We won't go

into that depth but the language is

going to be important because if you're

using an opensource LLM or a model then

you have to be sure that that LLM

supports these languages. So he told me

that English and Spanish are the two

languages that we are going to support.

And as soon as he told me then I

mentioned him some of the models. So you

should have like a a bird's eye view of

a lot of technology. So I was I had

already worked with LLM. So I I was able

to give him a suggestion okay that so if

English and Spanish are the languages we

can go ahead with whisper model which is

And at this point of time I wanted to

actually know whether he's okay to use

OpenAI's model or not or he's going to

ask me to fine-tune a model. So he was

okay because this whole interview was

focusing on the backend and assist

system design rather than questions on

fine-tuning and all of the model

training stuff. So we went ahead with

whisper. This was all good. Once the

languages are fixed then I told him one

functional requirement that means that

is users should be able to upload the

audio files right. So this statement or

this requirement will uh help us know

that there should be an API an endpoint

or a system or a UI where the users can

actually upload their audio files. The

next functional requirement is system

should be able to transcribe the audio

to text. And the next statement or the

next requirement is we should get

notifications which means if my audio

transcription fails I should get an

email or a push notification. If it

succeeds I should get an email that your

text or the transcription is ready.

Perfect. So once these requirements are

clear then I went ahead with

non-functional requirements. So first of

all the scale scale of 100k files per

day spike time 300k files latency

latency

like how much time a transcription

should take. So so I went ahead with a

uh like a number I said 5 minutes should

be sufficient for transcribing any kind

of audio file and he was okay with that.

High availability that means my system

should be 99.99% available. So high

availability can be one requirement that

means if my uh load is increasing then I

should have some load balancers. So this

I did not tell at that point of time I

kept these pointers ready in my mind

when I will be designing the system.

Then I asked him the budget like what is

the budget monthly budget you are

thinking of to support this whole

system. So he he told me $50,000 per

month. So this might not cover

non-functional requirement but this is

going to cover in budget estimation.

Some of the interviews are more focused

towards uh budgeting and all. If you are

going ahead with solutions architect or

like you know principal architect kind

of roles then you have to also worry

about the budgets as well. One more

non-functional requirement was high

accuracy that means your uh

transcription should not be buggy like

they should be uh good. So 95% accuracy

is good. So high accuracy that means 95%

accuracy means like if there are 100

lines in your or 100 words in your whole

corpus that is generated at least 95

should be correct. So at this point of

time interviewer nodded and I was sure

that these requirements are sufficient

at this point of time. Now I went ahead

with some uh calculations uh so that we

know how much storage we need or how

much bandwidth we need right

calculations or estimations.

So 100k files

uh 20 MB per uh file and this is going

to come out to be so you have to convert

these numbers to terabytes. So this is

going to come 2,000 GB because 2 then

1,000 right? So 2,000 GB which is equal

to 2 terab this much data we are going

to need per day. So into 30

which is 60 terabytes per month. Now you

might not know the cost of S3 buckets

but it is you can ask the interviewer or

you can estimate like 002 per GB per

month and you can multiply it with this

and when you will do that it will come

around $4,100 per month for the S3

bucket storage and when you will be

giving a lot of system design interviews

all of these values you will

automatically remember like how to

convert GBs to TBS uh like calculating

the cost S3 bucket cost and all of these

otherwise you can even ask the

interviewer. Then comes the processing

cost like if you went if you go ahead

with openi whisper or google transcribe

amazon transcribe as transcribe then uh

the approx cost is 002 per minute. This

also cost my interview gave to me. So

for this we have 100k files and uh in

the question it is given 10 minutes uh

per file. So this comes out to be a

million minutes

and then you multiply it with the cost

per minute for Google transcribe or all

of those LLM models for transcription.

So as soon as I reached here, my

interviewer asked me to stop uh here and

actually go ahead with the uh system

design because we only had 1 hour. Uh

but you might not be stopped. So what

I'm going to do is uh all the other cost

like network cost or worker cost all of

those cost I'm going to put it in the

description so that you can do such

calculation and at the end you have to

just add all of these cost if your

interview is asking you to and then you

see whether the cost is fitting in this

budget or not. Great. Now let's go ahead

with very first diagram that I drew on

the screen which is a very high level

architecture diagram and as I'm going to

explain the same explanation I was

giving to my interviewer as well. So

first of all for uploading the audio

file we need to provide a client uh

application to the user. So because the

user has to upload the audio files uh so

we need a client application like a

mobile app or a web app where the user

can upload the uh audio files. So I drew

like a rectangle and sorry yes and I

called it a client app and this

application is only be uh responsible is

only going to be responsible for taking

the audio file or uploading the audio

file. Next thing is we need a API

server. Now first thing that I told him

is that hey you can the user can

directly upload the file to a server and

server can be responsible for uh going

to whisper or kind of a system which is

going to transcribe and give the

transcription back. But this system is

going to be synchronous like user has to

keep the application open for those 5

minutes 10 minutes till the time the

transcription is happening. And in

between if the user closes the

application, closes the tab, the

connection will be lost and the

transcription will not be received and

this is going to be wastage of

resources. So this system is called

synchronous system. I asked him do we

want a synchronous or asynchronous

system. He mentioned we need an async

system. So in that case what happens uh

the user do not upload the uh file the

entire like 5GB file to server. In fact

what we do is the server so I'm going to

have a server over here. now uh which is

my API server.

You can have like a load balancer also

which I created at the end when he asked

me to uh handle the scale. But let's say

for now we only have this API server and

this API server is going to be connected

to S3 buckets. So in S3 buckets we have

a concept called pre-signed URLs. What

you do your client app just create just

calls a very simple API on API server

and API server goes to S3 buckets. We

have S3 buckets over here, right? So,

first of all, the client app is going to

the API server that hey, a new user with

this user ID is trying to upload a file.

Uh, we need a pre-signed URL. So that

URL is called pre-signed URL which API

server goes to S3 and says hey S3 just

creates a just create a pre-signed URL

which I can give to this client app and

then using on on that URL the client app

can directly upload the file on S3.

Okay. And the API server won't be busy

during that uh time. Right. So now what

happens? API server gets the pre-signed

URL which is valid for let's say 1 hour

or 1 day and API server gives it back to

the client app and client app uploads

the audio file directly to S3 buckets.

So with this approach what happens our

API server won't be busy till that audio

file is getting uploaded because here we

are just talking about one audio file

but as we see we have 100,000 audio

files and just assume if 100,000 audio

files are getting uploaded to this API

server at the same time what is the load

that is API server is under right so

that's why this approach is good and

this is a very simple approach all right

okay perfect now what happens as soon as

the audio uploading is done S3 buckets

receives the final object and Now the

audio file is with the S3 buckets.

Inside that S3 buckets we have a system.

So we are going to have two S3 buckets.

Basically one S3 bucket. So I'll just

create another S3 bucket right which we

which we are going to use later. Don't

worry about that. This S3 bucket is

going to store all the audio files. So

in this S3 bucket we will have a file

structure like the user ID. Then we have

the file ID like the file right this MP4

MP3 WAV file. This S3 bucket will be

responsible for storing the transcripts.

the final transcripts. Let's not worry

about that. So now as soon as the S3

bucket receives the final object, final

video part chunk uh client no client in

the on the client side we can send like

a window hey your file is uploaded. Now

wait for 5 minutes 10 minutes and you

are going to get an email when your file

transcription or if audio transcription

is ready. Perfect.

Now what S3 buckets does is it creates

an event. So event-driven architecture

we going to use. So one more thing that

we have missed over here is uh when

client app was asking for a pre-signed

URL from API server. API server also

creates an entry in our main database.

Now we are going to have one more system

over here which is RDS and this is our

main SQL database of our system and when

client app asked for a pre-signed URL

for from the API server the API server

gives the pre-signed URL to the client

app but also creates an entry in our RDS

that hey this is the user let me yeah

this is the user ID right and this is

the time stamp at which this user asked

for a pre-signed URL that means started

the process and this is the status of

this job right and at this point of time

the status will be pending or it can be

uploading right let's go with pending

simple one right so let's just read this

yeah so status is pending so at this

point of time the status of our process

of our job is pending okay now coming to

the process again now let's come back to

our process so at this point of time S3

bucket has received the file completely

and now it has to notify someone that

hey we have received this file now we

have to go ahead and process this file

right so one way is the S3 bucket

notifies the API server itself that hey

the file is ready now whatever you want

to do go and process it and then API

server might fetch the file download the

file but again the problem will be the

API server has to download the file so

again uh API server will be busy and a

lot of load will come on that so we

cannot notify the API server Instead of

that we are going to use a Q system

right and this is the place where

interview is going to judge you like

what are the uh system components that

you are going to you utilize for

handling such scale which is 100,000 so

you're going to use a Q system right so

go ahead and search for SQS which is

Amazon Q service and we are going to

have a Q over here

okay and we are going to the uh S3

bucket is going to send an event to this

uh que and this event is going to

contain the information about that job

ID. So over here uh with this data we

also have job ID so that we can later

update the status of this job. Okay. So

with the same job id is now appended to

this SQS. So this SQS might have let's

say 100 jobs in the queue right. Uh so

that number is called q depth right and

this qdepth number will be important for

scaling up and scaling down our system

we'll come to that later but let's just

assume that s3 bucket has notified this

queue and now we have an event or a job

inside this SQS

now who is going to process these jobs

who is going to complete these jobs

right so we will have a fleet of EC2

instances which are called workers right

so I'm going to go and search for EC2

so You will see that our main API server

is not at all responsible for

downloading the file, processing it,

giving it back, notification. No, API

server is just there for like a central

unit which is going to orchestrate some

of the things. But we have a fleet of

EC2 instances like three EC2 instances

or like let's say 100 EC2 instances

based on the scaling up and scaling

down. We'll come to that. But let's go

with three workers for now. So we have

one, we have two and we have three. So

we have three EC2 instances which are

workers and they are going to

continuously pull the jobs from SQS

something like this. So these EC2

instances which are workers, they are

going to pull the jobs from this queue.

Okay. Now the question is how to handle

failures right. So let's say this AC2

instance picked up a job with job id

let's say a right and it tried to

process that file but the job failed

somehow. So it is not going to discard

that file in one go itself. it is going

to go ahead and put that file or append

that file again to this queue.

Okay. And maybe next time some other

worker picks it up, right? But there's a

concept called exponential back off that

let's say we tried our we tried to

process a file, it failed. Next time it

will try it after 5 minutes. Next time

it will try it after 25 minutes. Next

time it will try it after uh 125

minutes. So something like this. So

there should be exponential time gap and

we also sometimes use a different Q for

that. So you can have a different Q

right for handling all the failure cases

that is called dead letter Q right. So

this is our dead letter Q DL Q and this

is very important to mention in your

system design interview that this is

something you have thought of and you

know such uh concept right. So if it

passes then we are going to proceed. Now

if it fails the queue uh sorry the

object or the job will go to DLQ and

there we have a concept of exponential

backoff. Perfect. Right. So let's

connect this fleet with this DLQ as well.

well.

So now we have understood the concept of

Q. Now instead of SQS we can also go

with Kafka and we have fleet of workers.

Uh and we have a DLQ and you also know

exponential backoff. Great. These are

the concepts which are very important to

mention in such system design. Now when

to update this job right so if the job

fails once okay no problem we put it to

dq till now the status is still uh now

as soon as the worker takes up a job I

missed this uh when the worker let's say

this worker a picks up the job it is

also going to update this uh status

which will be processing

okay so now the status status of this

job is processing and not pending or not

uploading. Okay, if the uh job fails

three times, let's say we have a maximum

retry count of three. If it fails three

times, then the job status will become

failed and we will send a mail to the

user that your file is corrupt or we

cannot process this file at this point

of time. Right? If it succeeds, okay, we

say success

and then the concept of alerting will

come into play. like we have to notify

some of the units of the system design

uh so that that unit notifies the user

via an email. Okay. And before we close

this whole loop of this job and we uh uh

go ahead with notifications, you need to

know that these workers which are

responsible for processing the file and

creating a transcription, they will have

an LLM or a model already loaded. And

this is a question that came to me like

how are how are you going to ensure that

these uh workers do not take a lot of

time in loading the models. So the

answer is you have to uh preload these

models so that every time a new job

comes the worker in its CPU in its uh

processing unit already has the model

loaded right. So this is an answer for

that and all of these worker uh workers

have the model preloaded as soon as a

new job comes they use the pre

pre-loaded LLM or the model and they use

it for transcription. Now once the

transcription is done two things are

going to happen right. One thing we

already saw that this status will be uh

made success from processing. So this is

one thing. Next thing is the job has to

be deleted from the SQS so that again

that uh job is not processed. That is

second. One more thing you have to have

uh actually two more things that you

need to do. One is you you will have

another S3. So I'll go ahead copy this

and you will have another S3 and this is

sorry uh this S3 is for storing the

transcriptions files. So transcriptions

file transcription files are going to be

for are in txt. So this S3 was for

storing WAV, MP4 all those file and this

S3 bucket is for storing the text files.

Okay. So this worker uploads that text

file to the S3

right. Similarly all the workers upload

the S3 file uh like file to the S3 and

next thing that this worker does is we

have another Q. Okay. So we have another

Q. So we have three Q's in the system

and this Q is for notifications.

Okay. So what we have done we have

updated the job status in the f in a in

a main database. We have uploaded the

transcription to this S3 bucket and

maybe this location can also be saved

with this uh main object that this is

the place where the final transcription

is saved. Okay. And the final thing we

have to put a job again to this

notifications uh queue. Okay. And the

job contains the job id and maybe the

status in the metadata that this is a

success job. This is a failed job.

Right? And similarly over here if the

job fails even after uh three uh retry

counts uh then also the job will go to

notifications with the status of failed

and the metadata failed. Now this Q will

also be attached like a worker to a

worker and worker that worker is for

handling the notifications. So we have

another EC2 which is like maybe like

just one EC2 right and this EC2 is also

continuously pulling this notifications

uh SQS and this worker which is like the

notif worker is just for sending the

emails or push notifications to our uh

client. Okay. So we send the

notification back to the client that hey

your job which you submitted at this

time or your file which you submitted at

this time is now ready for downloading.

This is the link and that link will

actually be this and yes you can have a

download URL from S3 which you can use

for downloading in that particular time

frame itself. So that download URL will

have a time to live like a expiry and

within that if the user downloads that

is perfect otherwise that link will not

work right. So that's why sometimes you

get an email and you don't have uh then

after one week you cannot download it

again because uh the URL is expired and

yes you can have a worker or a system

that can actually um like revise that

URL as well. Okay, but till now this is

cool. So we have closed the loop just to

revise then we'll come to the follow-up

questions which came to me after I gave

this highle architecture diagram.

Client app wants to upload a file. Okay.

uh and he or she wants the

transcription. We go to the API server

and we ask the API server to give a

pre-signed URL of the S3 bucket so that

we can upload the file directly to S3.

API server goes to S3, brings a presign

URL, gives it to the client with that

the uh API server also creates an entry

in the SQL database with this user ID,

time stamp, status which is pending and

job ID. Okay, perfect. Once the S3

bucket has uh uploaded the file

completely, S3 bucket uh sends an event

to the SQS with the job ID. Okay, the

worker fleet which is there is there

pulling this SQS and they are picking up

the jobs. If the job fails, the job goes

to the DLQ, dead letter Q, right? If the

job fails, even after three maximum

retry counts, the job fails, we sends

the job, we send the job to the

notifications and notification worker

will pick up the job. It finds that hey,

this is a failed job and sends an email

to the client that hey your file cannot

be processed. If the worker successfully

completes the transcription via the LLM

which is preloaded, then it saves the

transcription in the transcription S3

bucket and also updates the status as

success. Even for the failed, the status

will be updated. Right? And then it

sends the job to notifications Q and

again the notification worker picks it

up and sends a successful email to the

client app. Right? So this is how the

system looks. Now the first question

that came to me why you used Q? What is

the what are the advantages uh that your

Q is providing to this whole system

design? Okay. So now the question comes

from the interviewer. How are you going

to handle the autoscaling? How is your

system going to handle the autoscaling?

Right. And I think this was the question

uh that made them uh select me right

because this is something that you need

to know if you are going ahead with this

with such system uh where cues are there

where workers are there and this is a

very basic concept then okay so we have

to handle autoscaling right so that

autoscaling will happen that should

happen on some number if that number

crosses the upper limit it should be

scaled up right the system should

automatically scale uh up, right? If the

number crosses below the lower limit, we

need to scale it down. The system should

automatically scale itself down. This is

the concept. What is that number? So

that number is going to come from Q

depth. Okay. So you need to know one

concept which is qth.

Q depth is the number of jobs which are

currently there inside your que. So that

is called qth. And one more metric which

is q depth per worker. Okay, per worker

divided by the number of workers.

Okay. So these this metric let's delete

this one. This is the metric. If it

crosses the upper limit, which we're

going to come right now, uh we need to

scale up that means we need to increase

the number of workers. If this skew

depth per worker decreases or it crosses

below the lower limit, we need to

decrease the number of workers which are

active that means we need to scale the

system down. Okay. So this is something

that you have to mention. Now give an

example. Let's say the upper limit is

500. That means if my QEP per worker goes

goes crosses

crosses

above 500 like it goes above 500 then we

need to scale it up. So let's take an

example uh for scaling up. Let's say you

have 10,000 messages in the queue,

right? And you have let's say just 10

workers in your fleet. So it is,000,

right? 1,000 is the queue depth per

worker. That means,000 jobs need to be

handled by one single worker, which is

too much. So you need to scale up

because it crosses above 500. Usually

500 is a good one for such jobs. uh

let's say 30 second is the job time per

job time so I think 500 is a good one

and interviewer also agreed with me so

thousand in this case where your Q depth

per worker is thousand that means you

need to scale up so the strategy is you

either increase the number of workers by 20%

20%

or by 10

okay so one number is percentage and

another number is absolute right uh so

if we have 10 workers. So which is

greater? You you take that number. So 10

right 20% of 10 is two right. So either

you add two more workers or increase it

by 10. Whichever is greater you do that.

So 10 is bigger than two. So you will

add 10 more workers. So now your total

workers are 20. Okay. So this is how

your system autoscaled it up. You

previously had 10 workers. Now you have

20 workers. Okay. Let's say again this

scenario comes and you again have uh

like you again your Q depth per worker

increases above 500. So again you your

system has to make a call and it has to

scale it up. So again 20%

20% of 20 of 20 or 10. So 1 percentage

and one absolute. So 20% of 20 which is

four right? So four or 10 10. So you

will make it 30 workers. So now you have

30 workers. Again 20%

of 30 which is going to come six or 10.

So you will again go 10 and you will

have 40 workers. Again let's say again

the spike is there and you again have to

scale it up. So 20% of which is eight. 8

or 10 again 10. So you will have 50

workers. Now the scenario will come

where either 20% of 50 or 10 both are

10. So you'll again have 10. So you will

have 60 workers, right? So you see how

your system is scaling it up. Okay? So

this is your autoscaling strategy,

right? And 60 now 20% of 60 which is 12

or 10. 12 is greater. So you will not

have 70 workers. Now you will have 72

workers because you will have 12

workers. Okay? Which is 20% is greater

than 10. So you will have so this is how

if your uh if your load is increasing in

such quantities like so much load is

coming then you will have percentages

percentages winning over the absolute

numbers. So this is how your scaling up

will work. Similarly scaling down uh

let's say we have

the number the lower limit of 200. So if

your Q depth per worker goes below 200

then you need to scale it down. So you

either remove 10 workers or you decrease

it by 20%.

Okay. So if you have let's say 50

workers. So 20% of 50 is 10 or 10. So

both are same. So we'll make it 40. So

50 will become 40 workers. So you see

how the number of workers are going away

like how we are decreasing the number of

workers. So 40. So again 20% of 40 which

is eight. You have to remove eight or

you will have to remove 10. So 10 is

bigger. So you'll remove 10. So 38 it

will become. So like this your workers

are decreasing.

Perfect. So this is an example you have

to give to the interviewer so that he

understands that you know such strategy

uh when you have cues and workers in

place in your system and how your system

has to scale up and scale down. So with

this you can also say that this is the

reason this autoscaling is one of the

reasons I went ahead with selecting a Q.

Right? Another reason is Q also act as a

buffer, right? If your spike is coming

up, right? If your load is increasing,

then uh if a Q was not there and only

you had an API server, then your API

server would crash, right? Because so

much load is coming on one single API

server, it would crash. But Q acts as a

buffer because it's like a a Q then your

jobs will keep on getting appended and

for that duration when the spike is

coming up like it is changing from 100k

to 200k 200k to 300k at that time your

work like your que will accommodate more

and more jobs and your system will not

crash right but in case Q is not there

and you just have an API server your

system would crash. Next is alerting. We

did not spend a lot of time on alerting

but uh my interviewer did ask me which

all metrics you are going to track. So

monitoring and alerting is coming under

one umbrella. So which all metrics you

are going to track. So I've already kept

the metrics ready uh which I usually say

for such system designs like you have

business metrics uh and you have like

your system metrics right so for

business metrics how many jobs created

per minute. So these are the metrics

which will be analyzed. Yeah. So these

are system metrics and this is the infra

infra metrics. Okay. So yeah, so these

are the metrics which will be tracked by

the business team. So how many jobs are

created per minute? So that they can

also do business analysis and all. So

how many jobs created per minute? Is the

traffic growing? Jobs completed per

minute. Is the throughput keeping up?

Jobs failed per minute. Are we having

quality issues? So P50, P95 and P99

latency, right? So uh transcription

confidence scores. So when you are

uploading the transcription file txt to

the transcriptions you also upload the

confidence scores file so that the

machine learning team can also take that

that scoring file and they can also

improve their system if they are

training their own models. So confidence

score actually tells you this is the

actual word. This is the predicted word.

This is the delta. This is the

confidence. So that data also needs to

be stored and analyzed. So that will

come under metrics. Okay. Cost per

transcription. how much cost you're

bearing for doing one transcription and

you have to optimize that cost right

system metrics SKSQ depth so that

autoscaling strategies can be built upon

worker count how many workers are active

in a day on an average right is the

autoscaling even workinger CPU

utilization database query latency

database connection count API error rate

API latency then infra eC2 instance

health for how long the EC EC2 instances

went down S3 operation latency network

throughput, disk utilization, all of

these metrics you need to track or just

keep them in mind and at least mention

them when you are having such

interviews. one question which again

which was a final question uh for me uh

what if someone uploads a 2-hour long

podcast a very long podcast for

transcription in that case your one

single file will be too long too heavy

and in that case one AC2 instance might

fail so in that case you also need to

have a chunking strategy in place like

one worker can be there for chunking or

even here itself while uploading also

you can have chunking in place where one

like 2our podcast will be divided into

10 10 minutes videos or of audio audios

and all of those chunks will be handled

by various workers and at the end your

whole transcription file will be patched

up and that one single audio file will

be given to the transcription S3 bucket.

So this is the answer like chunking

strategy is the answer. And one more

thing the final thing is uh what if

someone is a uh very content creator and

he is like a very influencer kind of

person and he has uh asked for uh he has

published a podcast in our system and a

lot of people like a million people are

trying to access that transcript. They

want to read the captions in that

podcast. So for that thing you have to

uh handle the case via CDN. Cloudfront

AWS CloudFront is there. So that this

transcription files are not served by

the S3 uh on that instance or by by your

system instance. It will be served by

the cloud front. So there will be one

single get request to the S3 and then

that uh transcription file will be

cached uh at the CDN. So there is CDN

cloud front.

Cloudfront is there. Yes. So then uh at

the time of get when the users are

trying to get the transcription file you

can also have a cloud front in place so

that we don't have to go to the S3

bucket and we don't have to query the S3

bucket for transcription you can just

fetch the file via the CDN

that's it so this is how your system

design might look and he might also ask

you how will you handle multiple API

servers so in that case you have to

answer load balancer so AWS ELB or an ALB

ALB

application load balancer. Application

I think this is the load balancer logo.

Yes. So you can have application load

balancer and then you can have various

API servers. Okay. So this load balancer

will uh like decide which request should

go where. So there can be various

strategies for that rate limiting and

all those strategies are there but let's

not go into that. So this is how your

system design might look. This is ugly

but this is how it looks. Uh and I hope

you learned something new in this video.

Till the next video, keep coding, keep

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:They asked me this question for Senior Software Engineer position