YouTube Transkripti:
Systems Design in an Hour

Videoları baştan sona izlemeye gerek yok — tam transkripti al, anahtar kelimeleri ara ve tek tıkla kopyala.

Paylaş:

AutoDub

YouTube'daki Yabancı Videoları Anla

YouTube'u Türkçe Seslendirmeyle İzle

Dil engellerini aş, dünyanın dört bir yanındaki kaliteli içeriklerin keyfini çıkar

Ücretsiz Kullan

Video Transkripti

Video Özeti

Summary

Core Theme

This video serves as a comprehensive, albeit high-level, introduction to systems design concepts, aimed at individuals new to the field or preparing for system design interviews. It breaks down complex topics into digestible sections, covering fundamental database concepts, various storage solutions, distributed system patterns, and common interview problem archetypes.

Mind Map

Genişletmek için tıkla

Tam etkileşimli Mind Map'i keşfetmek için tıkla

hey everyone and welcome back to the

channel so today Al be it the fact that

I don't yet have 50,000 subscribers uh

consider this sort of a de facto 50,000

subscribers special uh and yeah I guess

we'll go ahead and get into it the gist

here is that I want to make I guess an

introductory video for those who are

really not sure where to start when it

comes to systems design on this channel

in the past I've been especially

critical of people that post too much

redundant stuff and uh I guess I am kind

of of doing that but I understand that

there are a lot of people who come on to

YouTube or come on to the internet in

general looking for kind of a starting

point to to get involved with this stuff

when they're starting to hear about

these interviews and prepare for them uh

I was in that position a few years ago

and you know I think this would have

been really useful for me to have I

think there are a few of them out there

on GitHub but maybe some of them Miss

some essential topics and also I think

uh towards the end of this video I'm

also going to try and break down a

little bit you know kind of the common

patterns across systems design interview

problems so that you can recognize those

in the same way that you would with

something like a lead code problem and

then try and go ahead and uh you know

implement the patterns that you've seen

in the past so that you can come up with

an ideal solution um so again thank you

all uh I really appreciate it I don't

know how long this video is going to

take but hopefully uh not too long

because I want to be able to clickbait

it and say that it's only going to be an

hour so without further Ado let's go

ahead and get started all righty so you

may think to yourself oh Jordan where's

the iPad today you got to be able to

draw out examples well frankly there are

80 slides and the slide slideshow and uh

if I were to draw all of those out this

video would never get made so I'm going

to pay homage to my original style of

making videos via the old consultant

style PowerPoint and uh if you hear my

keyboard clicking away it's not because

I'm looking up anime girls on the other

monitors it is simply because I am

switching the slides also I'm sorry I'm

probably not going to look at the camera

for most of this because I am going to

be doing a lot of reading off the slides

because I simply can't remember

everything that I've talked about let's

go ahead and get started cool so first

thing that we're going to quickly go

through is actually giving a systems

design interview right this is your

first time you're up there what are you

going to go ahead and say well pretty

simple figure out the requirements of

the problem now the requirements are

obviously going to be very broad right

if I tell you to build Facebook you're

not going to build every feature you're

going to build out a subset of the

features typically they'll want you to

focus on one particular feature because

it is challenging to build and

challenging to talk about so you know

for something like Facebook they might

say build me Facebook Newsfeed they're

not going to say build me Facebook build

me Facebook Messenger build me Instagram

reals build me Instagram threads on and

on and on and on and on no uh identify

with your uh interviewer which parts of

the problem are important go from there

next they may want you to do some back

ofth envelope calculations these are the

type of thing where basically uh you

don't always necessarily have to provide

them and if you do they can be pretty

crude but you're ultimately trying to

get an estimate of you know maybe the

number of reads relative to the number

of writs uh which path you want to

optimize for whether it's read optimized

or write optimized uh and maybe just if

there's a ton of storage whether you're

ultimately going to need to do things

like partitioning or replication or

anything like that these are all terms

that we're going to discuss later in

this video next is going to be API

design so again uh maybe if you're

talking about like a meta product

interview it might be a little bit uh

more important here but for the most

part you know just give the general API

of all the end points that you might

have in your system and then finally

we'll jump into the architectural design

cool so let's go ahead and look quickly

at the table of contents for this video

I'm going to put time stamps for all of

these uh a lot of basic concepts first

and then if you're already familiar with

these basic concepts we have a bunch of

interview patterns so the reason people

are always asking me to to make more

systems design interview problem videos

and I'm like no I don't want to is

because of the fact that I think they're

mostly redundant the reason being that I

think probably 90% of the suggestions

that I get fall into generally one of

these buckets one of these like seven or

eight buckets that I cover and I think

that uh just being able to identify

which bucket a problem is in is probably

90% of the challenge so we're going to

go ahead and talk about those later in

the video but for now let's go ahead and

get started with some Concepts cool the

first thing that is probably the most

important that we have to know about uh

when it comes to any system design

problem is the nuances of how a database

works so databases are just durable

systems to let us store application

State they do so by storing data on a

hard drive and so basically the idea

here is because we have some mechanical

arms spinning around a dis we always

want to keep our related data that we're

reading and writing together as close to

one another as possible so what this is

really going to look like is you know

let's say we want to read related data

if we want to go ahead and read all keys

you know from let's say a through C

because all of our keys are are sorted

alphabetically that's going to be a much

faster read than if all of our data is

just scattered randomly on disk and so

that's kind of the premise behind

something like a database index so

basically the idea here is if we don't

have an index every single right the way

that we would really optimize that is

just by putting it in a right ahead log

so a right ahead log is just a

sequential log of data on disk the

reason we would do this is because like

I mentioned sequential operations on a

hard drive are always faster than random

ones and so if we just want to optimize

for wres and there's nothing we can

really do to optimize for reads well

let's just go ahead and make a write so

what does that actually look like well

basically you know we've got x equals 5

let's say that's the first entry in our

right ahead log I can see that my camera

is also tweaking out right now and I

didn't realize that but because I'm a

one take gang I'm not re-recording this

okay xal 5 over here let's say I want to

make the next right oh y equal 10 all of

a sudden oh zal 22 okay so we're writing

sequentially through this log then

finally let's say I want to go ahead and

rewrite the key for X we can see now

that X is actually going to be written

again because why would we go all the

way back to the beginning of our write

ahead log that's not a sequential write

we'll just write it over so then you say

okay well how can we do reads if we're

going to do reads we just read from the

bottom of the right ahead log and find

the latest value for the key to find its

prop proper value so the problem with

this is that that's a sequential scan

right if there are n entries in the log

this is an O of n

read so what do we want to do to improve

that we want to add something called

indexes to our database where an index

is basically going to allow us to read

very quickly when we're searching for

the value of a particular key cool so

quick fundamentals there are a few

different types of indexes let's discuss

them the first one is called hash

indexes so hash indexes is where we keep

all of the keys in memory we probably

keep uh their corresponding address on

disk in memory with them so that we

don't take up too much space in memory

and then again on disk uh you know those

values can be wherever so we can see

that uh as a result of using a hashmap

if you're familiar with your data

structures you're getting o of one reads

o of one writes which is great uh at the

same time your key set needs to fit in

memory and we have poor support for

range queries the reason being that you

know XYZ are not necessarily going to be

ordered uh on disk or rather the the

rows for XYZ are not going to be order

ordered on dis so our data might not be

contiguous the next type of index that

we have is going to be B+ trees

basically we're going to organize a tree

of all of our keys on disk and the idea

is we basically use a bunch of pointers

to figure out where a key lies so you

can see in the diagram on my left if I'm

searching for Keys between 1 and 11 I'm

going to take the pointer that is you

know between the 1 and 11 key and then I

you know if I'm looking for all the keys

between one and five I'm going to go

left again to that disk page that holds

the data for Keys 1 two 3 and four so

basically we have a bunch of pointers on

disk we organize our data in a tree and

what's really nice about this approach

is that you can see that the data for

related keys so 1 2 3 and four are all

very similar to one another they're you

know when you sort the keys they're next

to one another their data is also next

to one another and what this allows us

to do really nicely is something called

range queries which is a very common use

of databases right maybe I want to get

all the values for Keys 1 through four

now instead of having to read a bunch of

different places on disk all of those

values are next to one another and it

should be a fairly fast read because of

this tree structure again think back to

your data structures here uh and all of

our reads reads are coming from there

and all of our wres are going to there

we are getting logarithmic time

complexities cool we also have a third

type of Index this will be the last one

that we'll call LSM trees and SS tables

so basically for something like this all

of our rights first go to an in-memory

binary search tree and then from there

once that tree gets a little bit too big

in memory we basically take it and we

flush it out to disk in a sorted file so

what this means is that we can have

multiple different sorted files

containing our key value pairs it also

means that we can still have a tree with

data in it so even though that makes

writes fairly fast because they're first

going into this LSM tree what they're

going to do right after that is when we

want to actually read data we first

search the LSM tree if it's not in there

the key value pair then we go ahead and

check out every single SS table from

from newest to oldest uh in the

background you can also optimize this

process a lot by you know merging

together a bunch of the SS table files

in a process known as compaction uh

there are a bunch of other data

structures that you can add uh like

sparse indexes or Bloom filters to make

this process faster but again as an

overview I don't think it's too

important to cover that right

now cool the next thing to talk about

here is going to be transactions so

transactions are a term that you may

have heard a lot about but the main gist

here is in databases these are multi

threaded uh machines right or

multi-threaded programs and so you can't

necessarily rely on just being able to

go ahead and make rights all the time

without having to worry about race

conditions so uh we often hear this term

acid to describe transactions and what

this really means I'm not going to go

through the full acronym for acid it

really means that these wrs that you're

making to your table are Atomic meaning

that if I'm writing to multiple rows at

once they're all either going to succeed

or they're going to fail and they're

also serializable meaning that we

effectively aren't going to have any

concurrency bugs to worry about about

it's going to look as if you know one

thread we're making all of these rights

so in reality how do we Implement

atomicity well basically what every

database does is they have something

called a write ahead log now we

discussed the write ahead log earlier

it's literally you just take every

single right that you're making into the

database and put it into some sequential

log on disk the reason we do that is

then because when we start a right we

put it in the right ahead log when we

complete a right we write uh that that

particular entry is now committed this

means that if the database fails uh

before the entry is committed we can

actually go ahead and start back up read

from our right ahead log see that we had

an uncommitted right and then go ahead

and apply it again serializable on the

other hand is a little bit harder to

implement so we'll talk about that right

now basically there are a few different

ways of ensuring that a database is

serializable one way is literally just

running it on a a single thread I don't

want to go into too many specific

database examples in this video but

something like voltdb does actually do

this we've also got pessimistic

concurrency control in this case we're

basically using a bunch of locks to stop

conflicting transactions from messing

with one another if I'm reading a bunch

of data and then I'm making a write

based on the value that I read we need

to basically make sure that I have locks

applied on all of the rows that I could

have read so that anyone else who's

trying to modify them or do anything

like that cannot actually proceed on the

contrary we also have optimistic

concurrency control this is a little bit

different in the sense that rather than

having transactions directly block one

another they can basically all proceed

until they're right about to commit and

then they check whether basically the

premise or the predicate or whatever

they read has since changed and if it

has it means that their transaction is

no longer valid and they need to abort

it so something like optimistic and

currency control is really nice when we

don't have a lot of uh you know

contending transactions that all want to

grab the same locks because of the fact

that you know now there's no overhead to

grabbing the locks

whatsoever uh pessimistic concurrency

control on the other hand is going to be

better when uh we basically have a lot

of contending transactions because then

we're just blocking them rather than you

know running the whole transaction and

then aboring right before it

completes cool so I guess another

question from here is okay well we have

this concept of database transactions

does every database support them and the

answer to that is actually no typically

most SQL databases will do so because uh

they tend to be a little bit older and

even within all SQL databases it's not

necessarily the case but you know if

you're talking about like MySQL or

postgraft or something like that you can

definitely use transactions that being

said one trend in the last you know 10

15 years has been for these nosql

databases to come around and stop

supporting transactions the reason being

transactions are expensive and so if you

don't necessarily need to use

transactions for your particular use

case maybe it's actually going to be

faster to just drop the ability to

support them and then the database can

run a little bit faster transactions are

really really useful if you have to

reason about concurrency because it

makes things a lot simpler to think

about but at the same time they're going

to make reads and wrs slower right we're

either aborting potentially a lot of

Rights or we are potentially going to be

adding a lot of locks to our

system cool another piece that I want to

talk about when it comes to databases is

actually going to be row based versus

columnar storage so basically like I

mentioned the whole really nice thing

about something like an index is that it

keeps related data together on a disk

and so even within uh you know multiple

different Keys what about thinking about

just one row right if I'm a user and I'm

on Facebook and I want to access my own

profile it would probably be good if all

the fields related to my profile were

next to one another on disk because when

I'm doing a read I'm going to access all

of those at the same time however for

something like analytical queries this

makes a lot less sense right because

typically for a big analytical query

that maybe a data scientist might run

you've got a lot of different columns in

a massive table and you really only care

about one or two columns you're reading

probably the whole thing and you're

performing some sort of aggregation or

grouping on that column and so for this

type of use case something like a column

store is actually going to support you

better and so in a column store the idea

is rather than store every single row

together on disk take the values within

a particular column and store those

together on disk the nice thing about

this is that for one you only have to

read a subset of the data if you're only

interested in a couple of columns at a

time so that's going to speed things up

number two is that you get all of these

nice compression benefits as a result of

the fact that all of the data within a

column is really similar to one

another cool another piece to talk about

is data serialization Frameworks so so

technically this could be part of the

the database uh chapter here but I think

I'm going to make it its own very simple

uh if we have a bunch of data and we

need to represent it on disk or over the

network basically as ones and zeros how

do we want to do so well sometimes you

have no choice if you're supporting a

flexible schema maybe you have to use

something like Json right I could be

sending any arbitrary object with Json

because it's literally just like I put

the field name in quotes and then I put

the object right after it and that's all

really nice however if you are using

strongly typed schemas you you could

actually introduce some other

Technologies here one really good

example of these would be something like

uh protocol buffers out of Google where

basically you create a predes uh

predefined schema uh such that you can

actually use objects that fit this

schema in multiple different types of

code uh you can send them over the

network you can store them in a database

and then actually on uh on the wire or

as binary you don't have to then include

all of the field names in that message

thus making them quite a bit smaller

right because if I know that uh field

name corresponds to tag number one as

you can see in my diagram on the screen

in the bottom left it means that you

basically only have to include one to

say hey this is the name field and then

the actual value of the name field as

opposed to doing that and so it saves

you a lot of

space cool so we've made it through

databases we've made it through data

serialization let's go ahead and talk

about replication so basically if we

have a database it's going to fail right

especially if we have a lot of them

Hardware failures are common uh things

just break especially when they're in

big data centers uh this happens all the

time and so basically uh we want to be

able to do two things one is we want to

protect against database failures and

two is that we want to make our

databases as fast as possible so what

can we actually do to do this well we do

something called replication which is

really just having multiple copies of

the data so replication is going to a

allow us to potentially withstand

Hardware failures and this is going to

change depending on how we do the

replication and also improve our

database performance same deal depending

on the type of replication you do this

is going to change and so the idea is

that you've got some sort of source

database or maybe one database that gets

a right and then it is sending that data

via something called a replication log

to the other databases and the

replication log is very similar to the

right ahead log it basically just has

all of the incremental operations that

are being performed on the database xals

5 nope never mind now x equal 10 nope

never mind now x equal 15 you can put

those all in a replication log and then

apply them Downstream on

replicas cool so what is a pretty

distinction uh pretty big distinction in

replication is synchronous versus

asynchronous replication so synchronous

replication and sometimes you'll hear

this called strong consistency is

basically that on a right all of our

replicas are going to have to process

that right before uh the writer actually

sees that their right succeeded right so

that means that when I write every

single one of the replicas is now up to

date which is great because you don't

have to risk any reads of stale data on

the other hand asynchronous replication

is you know you write to one particular

replica at some point in the future it's

going to send that right to the other

replicas however the problem is now if

someone reads from one of the other

replicas there's no guarantee that that

data is going to be there that's known

as eventual consistency you potentially

have to deal with stale data depending

on your application sometimes that's totally

totally

fine cool so let's go through the actual

types of replication and in all of this

I'm kind of assuming we mean

asynchronous replication because the

truth is synchronous replication is not

very viable due to the fact that if any

of those replicas goes down down now all

of a sudden we can't make any rights so

single leader replication basically

we've got one leader database that

accepts all the rights and then reads

can be served from any of the follower

databases so this is really simple I

would think of it as kind of like the

default type of replication but it's got

a couple of cons for one right

throughput is actually going to be

limited to the leader replica right

rights can only go to one place number

two is that if the leader actually fails

we need to go ahead and pick a new

leader and so that's going to lead to

some downtime whereby we can't submit

any rights and number three uh yeah or

basically I guess I kind of already said

number three is that you know when we're

performing this failover when the leader

goes down we can't perform any rights

and also when a leader does go down

because of the fact that we're doing

this asynchronous replication it is

possible that the follower that becomes

the next leader doesn't actually have

all of the data on it that the leader

had and then we might lose a couple of our

our

rights okay next thing we're going to do

is talk about multi-leader replication

this one's going to get a little bit

more complicated so bear with me

basically the idea is for a given

particular key you can have multiple

leaders that are able to serve rights

for it and then all of those leaders can

also serve reads too so this is really

nice because now we can write to

multiple places right we get increased

write throughput on the contrary we now

have right conflicts because if I write

to one of these leaders saying xal 10

and I write to another one of these

leaders saying xal 20 well what's the

actual correct value for x kind of hard

to say isn't it so basically there are a

few different ways to deal with this

because eventually these leaders are

going to synchronize with one another

right you don't just write to them and

then they hold their own data and you

never merge them together eventually

these guys will talk to one another and

they're going to try and figure out hey

for this value X what's the actual uh or

for this key X what's the actual proper

value so there are a few different ways

to decide what the proper value is

number one is last right wins give

everyone a time stamp and then all of a

sudden you're good to go uh you know

time stamps are always going to choose

one of the rights to win and that's

really nice on the other hand time

stamps industri Ed computers are pretty

much arbitrary I mean they're not

arbitrary But ultimately you can't trust

timestamps to be perfect because uh the

way that actual clocks in a computer

work uh are such that they can like

drift a little bit by you know a couple

of milliseconds at a time and so as a

result of that uh there's no guarantee

that the right that you do pick is

actually the one that occurred later and

even if it is uh when one of them wins

you basically just lose any record of

the other right ever happening and then

you forget about it and so even though

this is really simple to implement you

know it does have some downsides so what

are some other ways that we can deal

with this number two is going to be

storing siblings now please again if you

don't fully understand this example no

worries I talk about it a lot more on

this channel that's why I'm making this

video so you can go watch my other

videos basically the gist is uh a

concurrent right so two rights are

concurrent when they basically don't

know about one another and they go to

the different leaders and the problem

with that is that you know you don't

just want to throw away one of the

concurrent rights ideally you want to

keep that information

so that you can at least use it

somewhere later down the line right we

never want to lose information

especially if a user has written it and

they think that it is written so the

idea is if we can detect two concurrent

rights we want to basically be able to

store both of them and then someone

later when they read the data can see

both of those rights figure out a way to

merge them together and then write back

to the database such that now there's

only going to be one value so let's look

at the following situation we've got two

leaders right they both have x equal to

zero and no one has written to them yet

so let's say I write to uh the leader on

the left which we'll call leader a so

you can see now X is equal to 10 so we

do something called version vectors to

actually determine when we have a

concurrent right and so a version Vector

is very simple it's literally just a

very small array of how many rights has

been processed by each leader so this is

going to allow us uh to determine when

two rights are concurrent so the way

that it works is every time I write to a

leader it increments uh its own entry

and its own version Vector so when I

write to liter a a bumps its index for

liter a so I write xal 10 to liter a it

pumps its version vector and then let's

say we end up having some sort of

synchronization step right where uh

liter a tells lader B hey here's what I

think the value for x is you can see

that when liter B receives uh liter A's

value it also merges in liter A's

version Vector into its own local

version vector and so what this really

means is that you know when we're

merging version vectors we keep the

highest number for each replica so if

I'm leader B I can see oh you know what

uh I have an entry for a coming in

that's higher than I've seen let me go

ahead and change my version Vector such

that I'm up to date with that and also

change my right such that I'm up to date

with it let's say someone now makes a

right to ver uh to leader B leader B is

going to bump their local version number

right so we're going to say oh B is now

seen one right and the value

corresponding to that is xal 20 let's

say however that in the mean time leader

a gets another right and leader a now is

going to bump its own version Vector to

number two so notice if we actually look

at the state on both of these leaders

xal 30 vers xal 20 these two rights are

concurrent right because the person who

wrote xal 20 is not aware of the person

who wrote x equals 30 these are

concurrent rights they should be stored

as siblings in the database and handled

later the way that we know that they

should be stored as siblings is by

looking at the version vectors so you

can see on liter a the a entry is

greater than the a entry on liter B and

on liter a the B entry is less than the

B entry on lader B and so because we

have these kind of interleaved version

vectors that is going to indicate that

one right is concurrent with the other

if one of the version vectors were

strictly greater than the other at all

of the elements then those rights would

not be concurrent one would be more

recent than the other so let's say we go

ahead and sync up since our rights are

concurrent a knows it needs to store

both and so when B eventually

synchronizes back to a what's going to

happen is that we go ahead merge the

version vectors but a is going to say

okay I need to store both of these

things and so this way someone else can

read the data later see ooh it's 30 and

20 what should I write oh maybe I'm

going to write

25 cool so again if that's not that

clear no worries uh it's not an easy

concept and debatably I shouldn't have

even covered it here but we did it

it let's continue replicate

automatic resolution so the third option

for how we deal with multiple leaders

and conflicting rights is we actually

automatically uh resolve basically um we

resolve conflicts on the database so

rather than having a user uh basically

read the data of stored siblings and

then write them back what we'll do

instead is the database itself is

actually capable of merging those two

values so uh basically if you think

about it to some extent the actual

version Vector that we've seen in the

prior slide is a good example of this

for building something like a counter

because it tells us how many rights

total that we've had on each leader and

it's an eventually consistent way of

doing so uh you can also basically

perform automatic resolution for things

like set objects for sequence objects uh

but of course you know these can't get

too complicated right because they have

to be built into the database beforehand

uh it's nice when a database provider

provides this for you uh but you know if

you're really trying to do some

complicated stuff you're probably not

able to do it then you're going to have

to store siblings

cool Okay so we've gone through um

multileader replication uh let's go

ahead and now go through leader list

replication basically the idea here is

that rather than basically um writing to

just one place at a time and reading

from one place at a time now we're

actually going to write and read from

many places at once so the idea here is

we don't have a leader uh because we're

writing to multiple places and so that

means there's no big fail over if a

leader goes down right we're always

available we can always write somewhere

which is really really nice and then I

guess the the problem or the con of this

approach is that we have longer tail

latencies if I need to wait for three

nodes to respond to my right before my

right is considered committed then what

that means is that I'm basically

bottlenecked by the slowest of those

three nodes and the same thing applies

for reads so the question is well how

many replicas do we actually have to

write to and read from for one of these

operations to be valid so this is where

we introduce the concept of something

called quorums so let's say we have n

replicas in our setup and for us to

consider a right to be valid it has to

First make it to W replicas and for us

to have all of the data that we need to

actually make a successful read we need

to read from our replicas a quorum is

when that W + r parameter is greater

than the end nodes and so if this occurs

then we have something known as a quorum

so basically uh as you can see in the

image on the bottom left if we have

three replicas in our cluster where if

we set W equal to two meaning we have to

write to two replicas every single time

we try to write at least two and if we

set r equal to two meaning that we have

to read from two replicas every single

time that we tried to read again at

least two we know that uh those reads

and writes will overlap by at least one

replica on each set and so one of them

in theory should have the upto-date data

from there it's just a question of okay

well now we have multiple different

values on our reads how do we actually

determine which one is the correct value

well we have kind of discussed this

already we can use something like a time

stamp we can use something like a

version Vector to try and get a better

sense of which right is the actual most

upto-date one and that way a reader can

see the upto-date

data cool we finally made it through

replication because I think frankly

that's going to be the most complicated

part of all of this by a long shot let's

go ahead and talk about sharding so the

idea here for sharting is that when we

have a database and there's too much

load on it uh we're eventually going to

have to split it out right that's kind

of the theme of distributed systems uh

you need to split this work between

multiple different computers otherwise

it's just not going to work properly

Facebook can't be run on one computer

that's not going to fly so the main kind

of idea when it comes to sharding or

partitioning whatever you want to call

it is that when we ultimately make our

database shards we want the majority of

our read queries and the majority of our

write queries to only have to interact

with one Shard at a time if they have to

interact with with say 10 different

shards then basically an operation

cannot be considered completed until the

slowest of those 10 is done responding

and so keeping all of our data accesses

and mutations to one place is the way to

keep things as fast as possible we also

want to ensure that we don't have

something called hotspots where hotspots

are shards that have a disproportionate

amount of either the data or the number

of requests going to them so how can we

go ahead and do

that well the question I guess is how

should we actually Shard out our data if

we have a database and we've got a bunch

of key value pairs how do we want to go

ahead and partition them well one thing

we can do is just by doing so by the

range of keys and you can see this image

below if we have a bunch of different

names we can just partition them by key

range so this is really nice if we want

to be able to support range queries

still right because it means that

similar keys to one another are going to

be on the same Shard and we don't have

to make a big distributed query that

being said it also means that we're more

prone to hotspots especially for names

for example there are a lot more names

that start with a or maybe J than there

are that start with Z so ultimately you

have to be very careful here what else

could we do well we could do something

called basically the range of the key

hash so for a given key basically all

you have to do is take the hash of it

and then Shard it based on the hash of

it so why do we actually go ahead and do

something like this well the idea is

that a hash function should in theory

evenly distribute uh a range of keys and

as a result um we should get better

balancing at the same time the downside

of this is that that you know if you

want to query all of the names that

start with a those hashes are all going

to be completely different and now we

can't easily run a range query because

we're going to have to go ahead and

access a bunch of different database

chards so it's not super easy to do this

but nonetheless often times it is going

to be the correct approach if you don't

need these big range queries

cool so one thing we don't want to do in

this particular situation is we don't

want to take the hash of a key and take

it modulo the number of shards and then

send it to The Shard that it ends up at

the reason for this is that let's say I

add one more Shard to our cluster um now

the majority of our keys are going to

have to be shuffled to different shards

right because that's how modulo works if

you think of you know I have 12 keys and

right now all of them can be in one of

three nodes uh based on modulo 3 and

then I now make it modulo 4 a majority

of those keys are moving to a different

node and that's not going to be very

good so what do we do to mitigate this

there's a separate algorithm called

consistent hashing uh the main idea

behind consistent hashing is that you

basically organize uh kind of the hash

range so basically every possible hash

value of a key on some ring and then

when you find a key you take the hash of

it you walk clockwise around the ring

until you find uh a machine's location

on the ring or a shards location on the

ring and that is basically going to tell

you what Shard a given key belongs to

the reason we go through the trouble of

doing all of this is that that it leads

to less rebalancing when we actually add

or remove a machine from the cluster

let's imagine that we added a machine at

that vertical blue line on the circle so

right at the 12:00 position basically

what that's really going to be doing is

just taking a subset of the keys that

would have gone to Machine 2 and now

they go to that vertical line new node

and so what this does is it just

minimizes the amount of nodes that have

to actually send their keys around again

this is an oversimplification of things

in reality there's a little bit more

work done to Ure that all of the nodes

do have a relatively even balance of the

keys but this is just going to allow us

to you know dynamically change our

cluster size without having to send Keys

all over the place all the

time okay another thing that I want to

touch upon is local secondary indexes so

we know what an index is Right an index

is basically just sorting data in a

database by a particular key so that

when we say hey find me all of the rows

that have this particular key that's

really fast simple enough we also know

what secondary indexes are it's the same

thing on a database but just on a

different field right you know if I have

a bunch of different um users in a table

maybe I want an index on user ID and

then maybe I also want an index on their

age because maybe I want to be able to

search for all people whose age are 24

something like that that being said this

gets a little bit more complicated when

you're doing charting so the idea here

is basically that um a local secondary

index is the same exact thing as just

you know adding an index per shard

so it means that rather than uh having

an entire view of all of the rows in the

database your local secondary index only

applies to the rows that are on that

particular Shard so this is nice in the

sense that it doesn't really slow down

rights very much right rights are still

going to go to uh one individual Shard

but at the same time if you want to

actually read on that key uh let's say I

want to find all of the people maybe our

primary index is based on a user ID but

then I want a secondary index on uh the

actual age of every single user if I

want to find the age of all users who

are 24 now I have to read from Every

Single Shard and use their local

secondary index and so that's going to

be a lot slower for

reads what can we do in contrast well

there's something known as Global

secondary indexes the reason they're

called Global is because they basically

index the entire uh Global table and

then they themselves are partitioned

according to that key rather than the

primary key so uh obviously this is

going to make our reads a lot faster

because um you know now ideally we would

be able to have um everything go to just

one particular Shard when performing our

reads so we're not making a kind of

scatter gather query there but the cons

of this approach are that now every

single right has to go to multiple

machines because in all likelihood um

the machine that I write to for our

primary index is probably going to be a

different machine than the one that is

holding our Row for its Global secondary

indexes and so another uh challenge here

is that we actually have to keep these

two index is in sync with one another so

what are our options option number one

is something called two-phase commit so

we haven't introduced this concept yet

but the idea here is basically well

two-phase commit is a way of making

rights go through on multiple different

machines at the same time and ensuring

that they all apply them it can't ever

effectively work because at the end of

the day um you know the network is

completely asynchronous and there's no

way to guarantee that two different

rights absolutely go through but this is

pretty much the best we can do so

basically we've got some node called the

coordinator and all the coordinator is

going to do is it's going to reach out

to all of the nodes that it wants to

write to and in the first phase it's

going to say hey prepare to accept some

right once all the nodes say okay I'm

prepared to do so it's going to reach

back out and say okay you can now commit

the right so simple enough uh once that

prepare phase is completed basically all

of these nodes that are prepared to

accept the right can't really do

anything else until they accept the

right uh they've pretty much guaranteed

to the coord Ator node that they will

never reject the right in this situation

and so uh that is known as the commit

point we basically can't go back from

there so that's one option but this is

expensive right it requires multiple

network round trips uh we have to

request a prepare we have to get our

prepare back we have to commit we have

to hear that we're done that is going to

be a lot of work what can we do instead

well if we're okay with having our

indexes not perfectly in sync but you

know they'll eventually be in sync with

one another we can update our secondary

indexes a syn inonly and so one option

that we actually have here is we can use

eventual consistency uh by basically

saying okay first I'll update my primary

index and then whatever the node is that

received the primary uh right we'll go

ahead and forward that right to um

basically The Shard that is uh going to

be updated for the secondary index so

for example let's say I write to Shard a

because um my user ID is number one and

then my age is 100 so that's a really

high age maybe that puts me on Shard 10

uh Shard one is going to forward that

right over to Shard 10 and Shard 10 will

apply it there's no guarantee on when

that gets done we know it'll happen

eventually but the point is uh if I now

read from my secondary index it may be a

few seconds before I see that update actually

actually

propagated cool uh another piece to

actually talk about here is SQL vers

nosql so uh again I don't really like

when people are like oh no SQL is more

scalable but the reason that sometimes

we'll say it

is effectively that no SQL basically

just drops a lot of features that SQL

offers in order to be more scalable uh

when it comes to Big Data and so really

what it means is it's not necessarily

that no SQL databases are inherently

more scalable it's just that no SQL

databases support more scalable use

cases because they don't really allow

you to do all of these cross Shard wrs

these cross Shard reads they really

encourage you generally speaking to try

and keep all of your operations with in

A Single Shard that you don't have to do

something like a two-phase commit and

that's going to make your life a lot

easier for example on Cassandra you have

this concept of a partitioning key uh

and the idea there is just like hey you

know all of your rights have to be on a

Shard with the same partitioning key and

that way you're never doing something

commit okay so I think we've made it out

of database hell let's go ahead and

start talking about some other stuff

number one is going to be batch

processing so basically batch processing

is just when we run a big batch job or a

massive computation that's distributed

over a massive data set and so typically

uh these are run offline which means

they just have some sort of trigger and

in addition uh it's not easy to write

these right or like if you were to write

one of these from scratch you would have

to account for the possibility that one

of your worker nodes dies you would have

to account for the possibility that you

know one of your worker nodes is doing a

lot more computation than the other and

also you would want to make sure that

you're avoiding any concurrency bugs

associated with doing distributed

computation that's where batch

processing Frameworks come in they

basically handle all of this for you uh

maybe you specify how you want to do

something like charting but uh

ultimately they are going to either

checkpoint state or you know be able to

restart jobs as necessary in order to

ensure that they get done even in the

face of machine failures and uh yeah

they're very useful so if you hear about

things like map reduce or spark that's

what we're talking about uh there are

reasons why spark is better than map

reduce uh but nonetheless that's not the

focus of this video this is just very high

high

level cool the next thing that we're

going to talk about is stream processing

so stream processing messages are

basically um generated by something

known as a producer so a producer is any

sort of server they're eventually going

to be consumed by something known as a

consumer and then uh basically we use

some sort of additional server that is

responsible for routing messages from

producers to Consumers that is known as

a message broker and so what are

basically the use cases of stream

processing well let's say we have a

bunch of application events we don't

want to process them synchronously

because maybe they would take too long

to process and it would lead to request

timeouts or something like that uh that

would be a really good example um if we

have multiple different data streams

that we want to join together maybe we

would do so in our consumer by briefly

caching some of the events waiting for

the others and then joining them

together and then also time-based

aggregation as well I want to find all

the events that have come together

within the last hour and maybe I want to

take the average of some number uh or

some counter that they have this is a

really good way of doing

so cool so as far as stream processing

goes I mentioned that we have these

Brokers and there are basically two

types of them number one is what's known

as the inmemory message broker so the

idea here is that we have a queue uh we

keep it in memory and then we remove

those messages when a consumer

successfully reports that we've

processed them so basically one of the

unique things that in memory message

Brokers can do is round robin message

delivery so what this really means is

that that queue that single individual

queue can deliver messages to multiple

different consumers at once now the nice

thing about that is that it allows us to

get a lot of throughput uh because now

we have multiple consumers who can be

handling messages at a time some of the

cons are that we aren't necessarily

processing all of these messages in

order uh when a message is sent to a

consumer it's briefly removed from the

topic uh and you know kind of put in

this potentially acknowledged area or

this staging area and then once it is

ultimately acknowledged it gets deleted

after some time out maybe it'll get

added back to the queue because maybe

the consumer failed or something like

that but the idea is we don't

necessarily process these messages in

order and number two and more

importantly is that if we are too slow

to consume these messages and too many

of them build up we may run out of

memory on the broker and then things are

going to

break cool the next piece of this is

going to be the log based message broker

so basically log based message broker is

something like Kafka if you ever heard

about that but the idea is that all we

do is instead of keeping all these

messages in memory and some sort of

queue we actually persist them all to

what is effectively a write ahead log on

disk and so that way if I'm a consumer

and I'm reading from this broker uh the

broker will actually keep track of the

last offset of the message that I've

last read and so that way within one

single partition of one of these Brokers

I'm just reading all of the messages

sequentially from start to finish and

and so a nice thing about this is that

we get replayability the reason is our

messages are not actually deleted upon

consumption so what's really nice about

that is you know maybe for whatever

reason we want to go back and check the

value of our messages we can actually

read through this broker and do so again

so again we're not doing this kind of

Round rining Here of one partition to

multiple consumers and each consumer

handles a subset of the message within a

partition every consumer handles every

single message and so that is very much worth

worth

noting cool as far as actual uh stream

processing consumers go this is starting

to be a lot more popular is to have

stateful consumer Frameworks so uh I

personally get on quite a bit in my

videos for using these however I think

they're very useful and I will explain

right now so again when you're consuming

from these message Brokers it is very

frequently the case that you need to

build up some sort of state right if I'm

doing time window aggregation it means I

need to cach all of the messages that

I've seen in the last hour or something

like that and so what would really

really stink is if your consumer happens

to fail and then all of those messages

get lost in theory depending on the type

of message broker you may be able to

replay those messages and build up your

state again and that's a really nice

ability to have but nonetheless that can

be very expensive if it's going to be

the case that uh you know you've missed

a ton of messages in the meantime and

now you're reping a bunch of stuff and

you're behind so what these actual uh

consumers will do in addition to

supporting all of these aggregations

ations and joins and things like that is

that they will occasionally take their

state and they will checkpoint it right

so they'll take a state that they

currently have they'll take the

corresponding offset in a log based

message broker that they've read from

and they'll put it in some other form of

durable storage so maybe they'll say

okay at offset message uh number 20 I

see that currently I've got 10 messages

stored in my cash and you know maybe

here's what they are so that way if we

fail and we start back up we can read

from that snapshot get the exact state

that we had before and start again by

reading at the log based message broker

from that offset rather than having to

read completely from the beginning and

having to reestablish all of our state

so this is the type of thing that can

save you quite a bit of

time cool uh before we go ahead and get

into the actual patterns of um systems

design interview problems let's go ahead

and talk about some other types of

storage so these are just like a few I

guess this is kind of like a pot Pur of

just random database systems that you'll

often have to use during your interviews

and uh yeah they're important to know if

you don't so number one is going to be a

search Index this is useful when you

want to perform something known as a

full text search it is not trivial to

search for text content so imagine you

know we have a bunch of tweets we want

to find all the tweets that say the word

Jordan is handsome so the way that you

do that is you create something known as

an inverted index and of course this has

to scale over many many different

computers but the idea is basically for

every single word in that tweet you go

ahead and extract them all and then you

take all of the Tweet IDs and say okay

well for word Jordan here are all of the

Tweet IDs that contain it and so that

way I can quickly see okay if I'm

searching for the term uh Jordan is

handsome let me find the entry for

Jordan is and handsome and find all the

Tweet IDs that are uh you know

containing all three of those pretty

simple way to do it number two is going

to be distributed file systems so this

is something like had dup but the idea

here is it's basically the file system

on your computer scaled over multiple

nodes I guess the nice thing about this

is that typically you can actually run

computations on these files because

ultimately the files themselves are

stored on the computers though you know

it does really ultimately depend on the

actual implementation of the file system

so if you are going to run something

like a batch processing job typically

you'll store all of that data in

something like a distributed file system

and then do so from

there next we have graph databases so I

guess the idea here is sometimes you do

need to be able to store a graph on disk

because you want it to be durable you

want it to be accessible for many

different servers and uh this is not a

trivial problem so one possible

implementation is called a non-native

graph database and this would be like

using a SQL database to do it where you

basically represent all of the edges of

graph like a many uh many to many

relationship so you have one table where

it's like graph node ID 1 graph node ID

2 and then you have many different rows

of that and so that way you can quickly

figure out for every single node in your

graph all of its edges the way that you

you make that read relatively fast is

you add an index on that many to many

table uh based on the Node ID the

problem with that is that as the graph

gets bigger uh there going to be more

rows in that edges table and if there

are more rows in the edges table it

means that it's going to take us longer

to read from it because indexes

ultimately operate in logarithmic time

so even though it's not a full uh like

linear growth in terms of how slow it's

getting it is still getting slower as

the graph uh grows sometimes we don't

like that thus leading to Native graph

databases native graph databases are

basically just using pointers on disk

right so rather than uh you know trying

to get good data locality they've just

said screw it we're going to use

pointers on disk now maybe this is good

for graph database tasks But ultimately

they're still kind of slow because using

pointers on dis means you're doing a lot

of random reads and doing a lot of

random reads on dis are fairly slow so

maybe not uh the best solution but if

you have to have a graph database

perhaps this is the best way to do

it cool uh next we'll talk about the

object store I guess the the kind of

prime example of this one would be

something like Amazon S3 uh the gist is

this is kind of uh not going to allow

you to do compute uh locally on the data

that you're storing you basically just

dump everything into a bucket and then

you can go ahead and load it elsewhere

and do compute there but you do have to

read it first because of the fact that

this is storage that is kind of scaled

up without compute generally cheaper to

store your stuff in and so this is

becoming really popular these days

Storage Solutions and a lot of times for

batch jobs people will basically take

data out of something like S3 load it

into a distributed file system and then

run the batch job uh there because

perhaps it's

cheaper cool uh next we have time series

databases so these are what they sound

like uh really good for time series data

when your time series data is basically

split up by two axes one is like what is

producing the time series data and the

other is uh the actual time range of it

the reason it's fast is because one

we're using column oriented storage

which we discussed earlier uh a lot of

this data is super similar to one

another so we can compress it really

nicely and two is that you know we can

just be good about how we cach things

right like if I'm accessing a certain

time range uh I know that I'm probably

going to access the next time range

after that and so we can be smart with

that we know it's going to be for the

same sensor and also the majority of

inserts to this database are just going

to be sequential they're kept in the

same order that they're added and so uh

you don't have to do anything too cheeky

with wres you can basically use

something like a write ahead log for all

okay another piece of this to discuss

are something known as geospatial

indexes basically these are really

useful for finding all of the points

within a certain distance of another

point so basically Geographic data is

represented by a lat long combo and

because these are two completely

distinct fields from one another there's

not really a good way to build a

database index that is going to allow me

to find all of the points within say a

circular radius from me indexes are

really good for finding all latitudes

within a certain range or all longitudes

within a certain range but if you look

at the diagram that I made on the bottom

left of the screen if I'm trying to find

all the points near the white dot I can

basically find really easily both of

those rectangular areas that are in the

black so you know the all the points in

a Latitude near me and all the points in

a longitude near me however uh that's

going to be a lot of points right

because the globe is big and this spans

a lot of area and then eventually you're

going to have to take all of those

individual points and check whether

they're actually close enough to the

white dot so that's too expensive

instead what we basically need to do is

figure out a way to convert our latitude and

and

longitude into a sort of one-dimensional

key so that that way um anyone with very

similar one-dimensional Keys is actually

close to one another on our map and this

is kind of the philosophy between

something known as like a geohash or a

quad tree and uh you can see that I've

kind of annotated this structure on the

left here so on the left we've got this

big box with a bunch of smaller little

boxes in it and the way it starts is

okay our largest box has you know no

geohash whatsoever if you take one box

and then you go to the top left of it we

basically take the existing ID and we

add Z 0 to it if we go to the top right

we add 01 bottom left 1 zero uh bottom

right 1 one so if we look at Point C you

can see that the top right uh top

quarter box is 01 and then uh the actual

goash of the box that points C is in is

0 1 0 0 because it is the top left box

of the top right box and so by creating

Keys like this what it means is that if

we actually go ahead and organize all of

the points on disk and sort their keys

lexicographically what it means is that

we can basically do a very quick binary

search to find one particular key value

and then all of the other points that

are nearest to it in the database are

going to have very similar keys and so

we basically take advantage of range

queries here to go ahead and figure out

all of the close points to another

Point finally uh I believe finally we've

got coordination services so basically

the idea here is that occasionally it's

not acceptable to lose our rights uh

under any circumstances right and this

is really going to be true for our

application metadata who's the leader

database what partitions live where do I

have a lock or does this other server

holding some sort of lock uh even if the

database holding that information goes

down we don't want to lose that that

would be very bad our application is

going to go to so the way that we

basically go ahead and ensure that this

doesn't happen is we use something

called a coordination service and the

reason this is good for a task like this

is that they use an algorithm known as

distributed consensus so basically the

idea here is that despite these

algorithms allowing us to you know make

sure we never lose data even in the face

of Hardware faults um they're fairly

slow to write to and read from and as a

result we really only want to be using

them for important application

state so just to demonstrate this

basically you've got a client that

proposes rights to a leader the leader

then proposes rights to a bunch of

followers and the leader is doing so via

two-phase commits which we discussed

earlier the idea here is that a right is

considered successful if a majority of

the replicas in the cluster are actually

saying okay this worked during the

two-phase commit so when the leader is

sending rights to the followers if two

out of those four followers say yes I'm

able to apply this it can go ahead and continue

continue

what this allows it to do is it

basically means that if the leader is to

go down one of the followers uh that

actually has all of the rights can go

ahead and take over as the next leader

and it of course is going to need yet

again a majority of approvals to do so

and it's kind of like the Quorum logic

from earlier right if you're always

making sure that you're writing to and

reading from a majority you can always

be sure that your data is up to date and

by using the two-phase commit we avoid

any sort of race conditions associated

with that

okay H totally forgot about this caching

section let me go ahead and bang that

out real quick before I take a break

because I'm starting to get sweaty here

basically the idea is that reading from

dis is slow dis is bad let's use memory

uh caches get filled up really easily so

how do we actually evict data when

they're full typically we'll use

something like a least recently used

policy so lru eviction and then

similarly to databases caches can and

should be replicated and they should be

sharded so there are three different

types of caching that we'll cover in

this before we quickly cover cdns uh and

let's go ahead and go through them

number one is known as the right through

cache if I'm an application basically

all I do is write to my cache and my

database at the same time depending on

your implementation of this you may use

a two-phase commit which means rights

are going to be pretty darn slow at the

same time it means that your cach and

your database are going to be kept up to

date which is very

nice number two is the write around

cache in the write around cache

basically all you're going to do is uh

you write first to the datab B and then

all reads you're going to attempt to

make from the cache and if the data that

you want to read from the cache is not

in there the cache is going to go ahead

and pull it from the database this is

nice in the sense that it's a faster

right uh the database is still the

source of Truth which is good because

the database should be your source of

Truth and then

finally uh the kind of con of this

approach is that cash misses are

expensive right when you try to read

from a cash and it doesn't have the data

now you still have to go to the database

and you basically did two reads for one

so that's uh

inconvenient then finally we are going

to go ahead and have the right back cach

so in the right back cache basically an

application is going to make multiple

rights directly to the cach and at some

point the cache is asynchronously going

to sync up with the database so this is

really nice because you're just

basically only writing to memory that's

super fast at the same time if I'm a

different application and I'm reading

from the database I'm not going to see

the rights that are in the cache that's

bad okay a final piece of kind of

caching information we'll talk about is

content delivery networks these are

basically geographically distributed

caches for static content like pictures

or videos or anything like that if you

have stuff in S3 that you want to store

or basically serve to multiple users

around the world sometimes it makes

sense to do so from a

CDM okay load balancing I think I'm

getting here but I actually just can't

even remember my own table of contents

which is why I'm still going out this

basically in any large application we

have to be able to do something called

horizontally scaling which is when you

take one server and you make a bunch of

them because ultimately that one server

is going to get too many requests so you

know if I have like a sign-in service

maybe I need 100 signin services and a

load balancer is the piece of technology

that's going to take me a user and tell

me which sign on service I'm actually

using so basically they can operate on

multiple different policies one example

would be round robin we basically just

randomly choose uh or you know you go on

a loop and you choose the next one that

didn't previously serve the request

another would be using consistent

hashing like we spoke about before

sometimes it's really really useful when

the same request or similar request are

being routed to the same exact server

because maybe that server has some

cached information that's going to make

that request a lot faster uh we can do

this hashing on things like your IP

address uh so any like networking

related information known as layer 4

load balancing or application uh

specific information where you'd

actually have to read the packet itself

uh and that would be called layer 7even load

load

balancing cool as far as fault tolerance

goes for load balancers because they too

can become single points of failure and

go down I guess there are a couple ways

that you can run them one is the active

active configuration where you're

running multiple different load

balancers at once uh we probably use

some coordination Service uh so that

clients can actually read from the

coordination service figure out what

load balancers they have access to and

then you know randomly route a request

to one of them number two is active

passive configuration where you're

basically doing something similar every

load balancer is registered with a

coordination service but only one of

them is actively routing requests and

then if that goes down uh the

coordination service is going to report

that it's down via some sort of

heartbeating with it uh and then the

passive one is going to say okay I'm the

active one now all requests can go to me

all right my throat was uh starting to

hurt and I was starting to get a little

bit sweaty but let's go ahead and finish

this thing off here we go systems design

interview patterns like I mentioned a

lot of these problems can really just be

broken down into common themes let's go

ahead and break them down number one

that I see a lot is contending Updates

this is basically when you have a bunch

of rights to the same key so many so

that it's not simple to just do them on

one node because you're just going to

have too much lock contention examples

here would be building out a distributed

counter maybe dealing with order

inventory on something like Amazon for a

popular product and I guess you know the

naive solution is you just do everything

in the same database and deal with

contention might not be

acceptable what would be better is

perhaps using multiple database leaders

and then basically using something like

uh you know kind of that automatic

merging logic that we discussed before

to go ahead and eventually make these

guys consistent uh this works really ni

nicely for something like a distributed

counter or a set uh or you could use

stream processing right so you put

everything in something like a kofka q

uh you handle it Downstream by a

consumer maybe handle them in a mini

batch if we want to use spark streaming

instead and then you'll go ahead and

make rights to the database uh such that

you're doing so with multiple batches at

a time fewer rights means less

contention next we have derived data so

let's say we have two data sets that we

want to keep them in sync with one

another we discussed a couple of

examples of this

already one being the global secondary

index basically uh if we have uh maybe

another data set which is being uh

updated by one particularly slow

database but we want to make a faster

readon view on the data this would also

be another good use case so for

something here what we could do is you

know naively a two-phase commit and this

may be necessary if we need to keep them

perfectly in sync however what might be

better is something like change data

capture change data capture is pretty

simple it is something that works with a

lot of different databases and what it

does is it takes all of the updates to a

particular database pipes them into a

log based message broker and then that

broker consumer can do any

Transformations that are necessary pipe

all of the changes from there into another

another

database cool so this is going to allow

us to be eventually consistent I'm sorry

as you guys can see my throat is dying

so we've got some sort of source

database uh it has all of its

transaction logs it puts that into our

change data capture framework which is

really just stream processing and then

they get sank into our Target database next we've got fanout fan out

database next we've got fanout fan out is one that uh you should know really

is one that uh you should know really really well because it comes up probably

really well because it comes up probably in 50% of these problems the idea here

in 50% of these problems the idea here is that rather than do a really

is that rather than do a really expensive read query uh because we have

expensive read query uh because we have a lot of read optimized applications in

a lot of read optimized applications in real life for users what we want to do

real life for users what we want to do is do more work on the right path and

is do more work on the right path and basically take our data that we're about

basically take our data that we're about to write and deliver it directly to all

to write and deliver it directly to all of the users who are interested examples

of the users who are interested examples of this include push

of this include push notifications uh a social media news

notifications uh a social media news feed something like uh figuring out who

feed something like uh figuring out who our mutual friends are every single time

our mutual friends are every single time that we make a new single friend

that we make a new single friend Connection and also stock price delivery

Connection and also stock price delivery right we've got many interesting parties

right we've got many interesting parties uh when the price of say a Google stock

uh when the price of say a Google stock goes up or down by a

goes up or down by a dollar so one option is we can you know

dollar so one option is we can you know deliver all this data to all interested

deliver all this data to all interested users synchronously by virtue of you

users synchronously by virtue of you know doing it in some sort of AP I maybe

know doing it in some sort of AP I maybe we hit an endpoint we go hit 100 other

we hit an endpoint we go hit 100 other servers to deliver the message there and

servers to deliver the message there and then we report back to the user but the

then we report back to the user but the majority of the time this is going to

majority of the time this is going to time out because there are too many

time out because there are too many places to reach out to so we probably

places to reach out to so we probably have to do it asynchronously in the

have to do it asynchronously in the background what a surprise we're using

background what a surprise we're using stream processing again you guys on

stream processing again you guys on me for using flank all the time except

me for using flank all the time except literally the majority of these problems

literally the majority of these problems it makes the most sense to use stream

it makes the most sense to use stream processing and when you're using stream

processing and when you're using stream processing you're often better off using

processing you're often better off using a stream processing framework for better

a stream processing framework for better fault tolerance anyways the idea here is

fault tolerance anyways the idea here is you want to put your message in a log

you want to put your message in a log based message broker you have some sort

based message broker you have some sort of consumer that receives it it figures

of consumer that receives it it figures out where the message needs to be sent

out where the message needs to be sent to and then it does so

to and then it does so accordingly what does that look like

accordingly what does that look like well basically like I said user posts it

well basically like I said user posts it uh we've got some sort of broker that

uh we've got some sort of broker that has all the tweets and then you know

has all the tweets and then you know right after that broker someone is

right after that broker someone is consuming all those messages and

consuming all those messages and delivering it to some sort of Cash for

delivering it to some sort of Cash for every interested user now one caveat of

every interested user now one caveat of this problem is that we'll often have is

this problem is that we'll often have is kind of the popular message problem

kind of the popular message problem right so let's imagine I'm uh Jordan has

right so let's imagine I'm uh Jordan has no life very famous YouTuber and I've

no life very famous YouTuber and I've got many many Twitter followers I think

got many many Twitter followers I think I have maybe 50 at most but uh okay

I have maybe 50 at most but uh okay let's imagine I'm Ronaldo and I have a

let's imagine I'm Ronaldo and I have a million Twitter followers the idea is

million Twitter followers the idea is that um you can't really deliver this to

that um you can't really deliver this to a million Twitter cashes and so as a

a million Twitter cashes and so as a result for messages like that uh the

result for messages like that uh the stream consumer is going to notice how

stream consumer is going to notice how many places it has to deliver it to and

many places it has to deliver it to and it's going to instead opt to just

it's going to instead opt to just deliver it to some sort of popular

deliver it to some sort of popular message cache this way we ultimately

message cache this way we ultimately build out a hybrid approach where you

build out a hybrid approach where you basically have all of the messages that

basically have all of the messages that are not super popular sent to all of the

are not super popular sent to all of the individual users that are interested in

individual users that are interested in them and for popular ones those same

them and for popular ones those same individual users can occasionally pull

individual users can occasionally pull from the popular cache and uh generate

from the popular cache and uh generate the necessary State

the necessary State there cool another pattern that we see

there cool another pattern that we see come up all the time is proximity search

come up all the time is proximity search basically this is find all of the close

basically this is find all of the close points in a database to you we already

points in a database to you we already discussed this right you want to be

discussed this right you want to be using a geospatial index um examples

using a geospatial index um examples where this comes up is Uber Yelp Airbnb

where this comes up is Uber Yelp Airbnb find my friends I've seen literally

find my friends I've seen literally every variation of this problem on

every variation of this problem on YouTube because people just want to

YouTube because people just want to steal views the idea is that you can

steal views the idea is that you can build indexes on latitude or longitude

build indexes on latitude or longitude but like we already mentioned that's a

but like we already mentioned that's a problem that's not going to work so what

problem that's not going to work so what you use instead is a geospatial index

you use instead is a geospatial index now of course there's a ton of data in

now of course there's a ton of data in these database tables and you need to do

these database tables and you need to do some sort of sharding how do we Shard

some sort of sharding how do we Shard well typically you want to Shard kind of

well typically you want to Shard kind of based on that geospatial bounding box

based on that geospatial bounding box right that being said not all geospatial

right that being said not all geospatial bounding boxes are equal there's a lot

bounding boxes are equal there's a lot more data points in some place like New

more data points in some place like New York than there might be

York than there might be somewhere like Wyoming and so you have

somewhere like Wyoming and so you have to account for that uh ultimately what

to account for that uh ultimately what you'll do is you'll probably make a

you'll do is you'll probably make a bunch of very very small partitions and

bunch of very very small partitions and then potentially dynamically move those

then potentially dynamically move those partitions to equal out the load over

partitions to equal out the load over time as you get a better sense of where

time as you get a better sense of where the load is actually going to and coming

the load is actually going to and coming from next we have job scheduling uh this

from next we have job scheduling uh this one comes up a lot in the literal build

one comes up a lot in the literal build a job scheduler problem and it also

a job scheduler problem and it also comes up uh in like the Netflix and

comes up uh in like the Netflix and YouTube problem because when you upload

YouTube problem because when you upload a video you have to enue a bunch of jobs

a video you have to enue a bunch of jobs to encode it in many different

to encode it in many different resolution and you want to probably be

resolution and you want to probably be doing all of those offline rather than

doing all of those offline rather than synchronously in that way it's kind of

synchronously in that way it's kind of also a fan out problem but same idea so

also a fan out problem but same idea so basically the gist here is um we've got

basically the gist here is um we've got a bunch of agnostic tasks that we need

a bunch of agnostic tasks that we need to get run we don't care where they run

to get run we don't care where they run it just needs to

it just needs to happen so our two options are as follows

happen so our two options are as follows right we have two types of Brokers we

right we have two types of Brokers we know we want to do this

know we want to do this asynchronously if we use a log based

asynchronously if we use a log based message broker uh what we would probably

message broker uh what we would probably end up doing is we'd have to make a

end up doing is we'd have to make a bunch of partitions of it um and then

bunch of partitions of it um and then you would have a consumer reading from

you would have a consumer reading from each partition right one of the workers

each partition right one of the workers in our cluster that runs the jobs so the

in our cluster that runs the jobs so the problem with that is we know that um job

problem with that is we know that um job or messages in a log based message

or messages in a log based message broker are read in order and so that

broker are read in order and so that would mean that if there's one

would mean that if there's one particularly slow job in a partition of

particularly slow job in a partition of a message broker it means that uh we are

a message broker it means that uh we are not going to be able to run all of the

not going to be able to run all of the jobs behind it they're just going to get

jobs behind it they're just going to get completely bottlenecked and it could be

completely bottlenecked and it could be the case that the other workers are

the case that the other workers are going to have nothing to do in their

going to have nothing to do in their particular partition of the log based

particular partition of the log based message broker so it would be a better

message broker so it would be a better approach is to actually use an inmemory

approach is to actually use an inmemory message broker here and use that to

message broker here and use that to basically round robin all of the jobs

basically round robin all of the jobs from worker to worker a worker can then

from worker to worker a worker can then pull a job off the queue the moment that

pull a job off the queue the moment that it becomes

it becomes available one more pattern to discuss

available one more pattern to discuss here is going to be aggregation uh this

here is going to be aggregation uh this is basically okay we've got a bunch of

is basically okay we've got a bunch of messages they're being sent out by

messages they're being sent out by millions of different servers uh they're

millions of different servers uh they're probably going to be aggregated on some

probably going to be aggregated on some sort of key or maybe joined with other

sort of key or maybe joined with other messages how do we want to do this uh

messages how do we want to do this uh well example situations are metrics or

well example situations are metrics or logs right if I'm publishing a bunch of

logs right if I'm publishing a bunch of metrics but the metrics for a particular

metrics but the metrics for a particular key can come from many different servers

key can come from many different servers maybe I want to aggregate them and make

maybe I want to aggregate them and make sure they're in the same file same thing

sure they're in the same file same thing for logs uh if we have multiple

for logs uh if we have multiple components of like a particular job that

components of like a particular job that are all being run and then once they're

are all being run and then once they're done they need to be aggregated this is

done they need to be aggregated this is another example uh data enrichment uh if

another example uh data enrichment uh if I'm publishing a bunch of messages and

I'm publishing a bunch of messages and they need to be joined with a bunch of

they need to be joined with a bunch of other published messages this is an

other published messages this is an example uh one option is you just put

example uh one option is you just put everything in a database uh you know

everything in a database uh you know course responding to the server it's

course responding to the server it's coming from and then you run a batch job

coming from and then you run a batch job and uh do all the joins you need that

and uh do all the joins you need that does work wow I'm really losing my voice

does work wow I'm really losing my voice that does work however uh it's pretty

that does work however uh it's pretty naive in the sense that you know it's

naive in the sense that you know it's going to take you a lot longer to

going to take you a lot longer to actually see the results that you want

actually see the results that you want to see you're going to do a lot of work

to see you're going to do a lot of work in order to get that data number two is

in order to get that data number two is you can do stream processing right so if

you can do stream processing right so if you're smart about how you actually

you're smart about how you actually partition this data and you partition

partition this data and you partition your log based message Brokers on the

your log based message Brokers on the actual key that you want to do the

actual key that you want to do the aggregation on then you can actually

aggregation on then you can actually send those messages to the proper broker

send those messages to the proper broker uh they can be consumed by the proper

uh they can be consumed by the proper stream processing instance perhaps Flink

stream processing instance perhaps Flink perhaps spark streaming anything like

perhaps spark streaming anything like that and then from there you can export

that and then from there you can export them to wherever they need to

them to wherever they need to go okay another pattern to talk about is

go okay another pattern to talk about is item potent basically you have some task

item potent basically you have some task you're running it you don't want to see

you're running it you don't want to see the output of it more than once I think

the output of it more than once I think some good examples of this would be okay

some good examples of this would be okay if we're running a job in a job

if we're running a job in a job scheduler how do we ensure that the

scheduler how do we ensure that the output of that job is only reflected

output of that job is only reflected once in our system system uh number two

once in our system system uh number two is you know let's imagine like I'm

is you know let's imagine like I'm Amazon I'm sending out a bunch of

Amazon I'm sending out a bunch of confirmation emails to users when they

confirmation emails to users when they make an order how do I make sure that

make an order how do I make sure that doesn't happen multiple times systems

doesn't happen multiple times systems can fail they come back up they may

can fail they come back up they may retry their jobs we don't want to see

retry their jobs we don't want to see the result of it more than once uh

the result of it more than once uh sometimes in this situation you have to

sometimes in this situation you have to do a two-phase commit right like if I'm

do a two-phase commit right like if I'm not the person who controls the email

not the person who controls the email service and uh you know like I'm not

service and uh you know like I'm not controlling a user's inbox it's very

controlling a user's inbox it's very hard for me to actually stop myself from

hard for me to actually stop myself from sending a second email that being said

sending a second email that being said one way that you can potentially do this

one way that you can potentially do this is using something known as an item

is using something known as an item potency key so the computer potentially

potency key so the computer potentially that is going to run this job and may

that is going to run this job and may accidentally run it again can use

accidentally run it again can use basically a unique ID associated with

basically a unique ID associated with the job that it's going to run and check

the job that it's going to run and check whether it's seen that before that might

whether it's seen that before that might mean putting that key in a database and

mean putting that key in a database and reading from the database or depending

reading from the database or depending on where you're going to run the job if

on where you're going to run the job if you know that you're always running it

you know that you're always running it on a certain system maybe you can cash

on a certain system maybe you can cash it

it locally uh another kind of thing to note

locally uh another kind of thing to note here in addition to item potency Keys

here in addition to item potency Keys you also have this concept called

you also have this concept called fencing tokens where fencing tokens is

fencing tokens where fencing tokens is similar to item potency keys but it's

similar to item potency keys but it's actually on the receiving side of the

actually on the receiving side of the message so rather than um kind of on the

message so rather than um kind of on the side of the computer doing the job it's

side of the computer doing the job it's on the side that is receiving the output

on the side that is receiving the output of the job and basically if it sees that

of the job and basically if it sees that same uh unique key associated with the

same uh unique key associated with the job it can reject it and say hey I've

job it can reject it and say hey I've already seen this before I don't want to

already seen this before I don't want to consider this

consider this output okay I think the last pattern

output okay I think the last pattern that we've got here to discuss is going

that we've got here to discuss is going to be durable data

to be durable data so again we've already kind of discussed

so again we've already kind of discussed this with coordination Services you have

this with coordination Services you have a bunch of data that can't be lost once

a bunch of data that can't be lost once it's written good example of this would

it's written good example of this would be something like Financial transactions

be something like Financial transactions so one option is you have a bunch of uh

so one option is you have a bunch of uh database replicas and you replicate to

database replicas and you replicate to all of them synchronously um that'll

all of them synchronously um that'll work but it means that if any of them go

work but it means that if any of them go down you can't submit any rights your

down you can't submit any rights your system is not fault tolerant so our

system is not fault tolerant so our really only solution here is to use yet

really only solution here is to use yet again some sort of distributed consensus

again some sort of distributed consensus algorithm uh consensus is going to allow

algorithm uh consensus is going to allow us to build a distributed log and as a

us to build a distributed log and as a result we can basically build a database

result we can basically build a database view on top of that distributed log even

view on top of that distributed log even better than that however would be

better than that however would be actually coupling this with something

actually coupling this with something like change data capture because our log

like change data capture because our log itself is really nice in the sense that

itself is really nice in the sense that it ensures all of our data is persisted

it ensures all of our data is persisted even in the sense of fault uh uh

even in the sense of fault uh uh Hardware failures however what's nice

Hardware failures however what's nice about change data capture is that for

about change data capture is that for something like a financial transaction

something like a financial transaction you know maybe you're writing all of

you know maybe you're writing all of your data into a ledger it would be a

your data into a ledger it would be a lot better if a user individually can

lot better if a user individually can really quickly find all of their own

really quickly find all of their own transactions in a

transactions in a and so you rather than uh you know just

and so you rather than uh you know just reading through the entire Ledger and

reading through the entire Ledger and finding all the transactions that apply

finding all the transactions that apply for a given user what we can do is make

for a given user what we can do is make a user specific index and we can do so

a user specific index and we can do so with change data capture and we already

with change data capture and we already discussed how that

discussed how that works okay guys that's all of it um my

works okay guys that's all of it um my voice is gone I feel like but uh

voice is gone I feel like but uh yeah I hope you enjoyed the video I

yeah I hope you enjoyed the video I understand that this is really highlevel

understand that this is really highlevel uh kind of handwavy for a lot of the

uh kind of handwavy for a lot of the topics that we discussed here and my

topics that we discussed here and my goal is not to go super in- depth on

goal is not to go super in- depth on these topics that's why I have literally

these topics that's why I have literally 200 plus videos on this channel is to go

200 plus videos on this channel is to go ahead and do all of that so I do hope

ahead and do all of that so I do hope that if you know this kind of interested

that if you know this kind of interested you or you were like I didn't really

you or you were like I didn't really understand one particular topic Jordan

understand one particular topic Jordan was talking about that you'll actually

was talking about that you'll actually go and find the videos on my channel

go and find the videos on my channel that relate to it because all of these

that relate to it because all of these are explained in a lot more depth

are explained in a lot more depth Elsewhere on here um besides that again

Elsewhere on here um besides that again thank you guys so much for 50,000

thank you guys so much for 50,000 subscribers um this channel is has

subscribers um this channel is has supplied my life with a purpose that uh

supplied my life with a purpose that uh I didn't really know I could have and

I didn't really know I could have and I've I've derived a much bigger sense of

I've I've derived a much bigger sense of meaning than I than I thought I could

meaning than I than I thought I could through you know just learning a lot

through you know just learning a lot about computer science uh not only do I

about computer science uh not only do I feel like I'm getting to learn a ton and

feel like I'm getting to learn a ton and you know that's really rewarding um

you know that's really rewarding um hearing your guys success stories has

hearing your guys success stories has made all of this more than worth it for

made all of this more than worth it for me and uh yeah I'm I'm just shocked and

me and uh yeah I'm I'm just shocked and appalled to be here I'm I'm truly

appalled to be here I'm I'm truly humbled by all of you uh yeah that's it

humbled by all of you uh yeah that's it this is amazing uh this is great I'm

this is amazing uh this is great I'm going to keep doing it I'm having a lot

going to keep doing it I'm having a lot of fun with this Channel and I I don't

of fun with this Channel and I I don't plan on stopping have a great day guys

Videodaki o ana atlamak için herhangi bir metin veya zaman damgasına tıkla

Paylaş:

Transkriptlerin büyük çoğunluğu 5 saniyeden kısa sürede hazır

Tek Tıkla Kopyala125+ Dilİçerikte AraZaman Damgasına Atla

YouTube URL'sini Yapıştır

Tam transkripti almak için herhangi bir YouTube video bağlantısı gir

Transkriptlerin büyük çoğunluğu 5 saniyeden kısa sürede hazır

Chrome Uzantımızı Yükle

YouTube'dan ayrılmadan transkriptlere anında eriş. Chrome uzantımızı yükle ve izleme sayfasında tek tıkla herhangi bir videonun transkriptine ulaş.

Chrome'a Ekle — Ücretsiz

YouTube, Coursera, Udemy ve daha fazla eğitim platformuyla çalışır

Anında Transkript Al: Adres Çubuğundaki Alan Adını Değiştir!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranskriptiSonuçların hazırlanıyor…

YouTube Transkripti:Systems Design in an Hour