This video serves as a comprehensive, albeit high-level, introduction to systems design concepts, aimed at individuals new to the field or preparing for system design interviews. It breaks down complex topics into digestible sections, covering fundamental database concepts, various storage solutions, distributed system patterns, and common interview problem archetypes.
Mind Map
Genişletmek için tıkla
Tam etkileşimli Mind Map'i keşfetmek için tıkla
hey everyone and welcome back to the
channel so today Al be it the fact that
I don't yet have 50,000 subscribers uh
consider this sort of a de facto 50,000
subscribers special uh and yeah I guess
we'll go ahead and get into it the gist
here is that I want to make I guess an
introductory video for those who are
really not sure where to start when it
comes to systems design on this channel
in the past I've been especially
critical of people that post too much
redundant stuff and uh I guess I am kind
of of doing that but I understand that
there are a lot of people who come on to
YouTube or come on to the internet in
general looking for kind of a starting
point to to get involved with this stuff
when they're starting to hear about
these interviews and prepare for them uh
I was in that position a few years ago
and you know I think this would have
been really useful for me to have I
think there are a few of them out there
on GitHub but maybe some of them Miss
some essential topics and also I think
uh towards the end of this video I'm
also going to try and break down a
little bit you know kind of the common
patterns across systems design interview
problems so that you can recognize those
in the same way that you would with
something like a lead code problem and
then try and go ahead and uh you know
implement the patterns that you've seen
in the past so that you can come up with
an ideal solution um so again thank you
all uh I really appreciate it I don't
know how long this video is going to
take but hopefully uh not too long
because I want to be able to clickbait
it and say that it's only going to be an
hour so without further Ado let's go
ahead and get started all righty so you
may think to yourself oh Jordan where's
the iPad today you got to be able to
draw out examples well frankly there are
80 slides and the slide slideshow and uh
if I were to draw all of those out this
video would never get made so I'm going
to pay homage to my original style of
making videos via the old consultant
style PowerPoint and uh if you hear my
keyboard clicking away it's not because
I'm looking up anime girls on the other
monitors it is simply because I am
switching the slides also I'm sorry I'm
probably not going to look at the camera
for most of this because I am going to
be doing a lot of reading off the slides
because I simply can't remember
everything that I've talked about let's
go ahead and get started cool so first
thing that we're going to quickly go
through is actually giving a systems
design interview right this is your
first time you're up there what are you
going to go ahead and say well pretty
simple figure out the requirements of
the problem now the requirements are
obviously going to be very broad right
if I tell you to build Facebook you're
not going to build every feature you're
going to build out a subset of the
features typically they'll want you to
focus on one particular feature because
it is challenging to build and
challenging to talk about so you know
for something like Facebook they might
say build me Facebook Newsfeed they're
not going to say build me Facebook build
me Facebook Messenger build me Instagram
reals build me Instagram threads on and
on and on and on and on no uh identify
with your uh interviewer which parts of
the problem are important go from there
next they may want you to do some back
ofth envelope calculations these are the
type of thing where basically uh you
don't always necessarily have to provide
them and if you do they can be pretty
crude but you're ultimately trying to
get an estimate of you know maybe the
number of reads relative to the number
of writs uh which path you want to
optimize for whether it's read optimized
or write optimized uh and maybe just if
there's a ton of storage whether you're
ultimately going to need to do things
like partitioning or replication or
anything like that these are all terms
that we're going to discuss later in
this video next is going to be API
design so again uh maybe if you're
talking about like a meta product
interview it might be a little bit uh
more important here but for the most
part you know just give the general API
of all the end points that you might
have in your system and then finally
we'll jump into the architectural design
cool so let's go ahead and look quickly
at the table of contents for this video
I'm going to put time stamps for all of
these uh a lot of basic concepts first
and then if you're already familiar with
these basic concepts we have a bunch of
interview patterns so the reason people
are always asking me to to make more
systems design interview problem videos
and I'm like no I don't want to is
because of the fact that I think they're
mostly redundant the reason being that I
think probably 90% of the suggestions
that I get fall into generally one of
these buckets one of these like seven or
eight buckets that I cover and I think
that uh just being able to identify
which bucket a problem is in is probably
90% of the challenge so we're going to
go ahead and talk about those later in
the video but for now let's go ahead and
get started with some Concepts cool the
first thing that is probably the most
important that we have to know about uh
when it comes to any system design
problem is the nuances of how a database
works so databases are just durable
systems to let us store application
State they do so by storing data on a
hard drive and so basically the idea
here is because we have some mechanical
arms spinning around a dis we always
want to keep our related data that we're
reading and writing together as close to
one another as possible so what this is
really going to look like is you know
let's say we want to read related data
if we want to go ahead and read all keys
you know from let's say a through C
because all of our keys are are sorted
alphabetically that's going to be a much
faster read than if all of our data is
just scattered randomly on disk and so
that's kind of the premise behind
something like a database index so
basically the idea here is if we don't
have an index every single right the way
that we would really optimize that is
just by putting it in a right ahead log
so a right ahead log is just a
sequential log of data on disk the
reason we would do this is because like
I mentioned sequential operations on a
hard drive are always faster than random
ones and so if we just want to optimize
for wres and there's nothing we can
really do to optimize for reads well
let's just go ahead and make a write so
what does that actually look like well
basically you know we've got x equals 5
let's say that's the first entry in our
right ahead log I can see that my camera
is also tweaking out right now and I
didn't realize that but because I'm a
one take gang I'm not re-recording this
okay xal 5 over here let's say I want to
make the next right oh y equal 10 all of
a sudden oh zal 22 okay so we're writing
sequentially through this log then
finally let's say I want to go ahead and
rewrite the key for X we can see now
that X is actually going to be written
again because why would we go all the
way back to the beginning of our write
ahead log that's not a sequential write
we'll just write it over so then you say
okay well how can we do reads if we're
going to do reads we just read from the
bottom of the right ahead log and find
the latest value for the key to find its
prop proper value so the problem with
this is that that's a sequential scan
right if there are n entries in the log
this is an O of n
read so what do we want to do to improve
that we want to add something called
indexes to our database where an index
is basically going to allow us to read
very quickly when we're searching for
the value of a particular key cool so
quick fundamentals there are a few
different types of indexes let's discuss
them the first one is called hash
indexes so hash indexes is where we keep
all of the keys in memory we probably
keep uh their corresponding address on
disk in memory with them so that we
don't take up too much space in memory
and then again on disk uh you know those
values can be wherever so we can see
that uh as a result of using a hashmap
if you're familiar with your data
structures you're getting o of one reads
o of one writes which is great uh at the
same time your key set needs to fit in
memory and we have poor support for
range queries the reason being that you
know XYZ are not necessarily going to be
ordered uh on disk or rather the the
rows for XYZ are not going to be order
ordered on dis so our data might not be
contiguous the next type of index that
we have is going to be B+ trees
basically we're going to organize a tree
of all of our keys on disk and the idea
is we basically use a bunch of pointers
to figure out where a key lies so you
can see in the diagram on my left if I'm
searching for Keys between 1 and 11 I'm
going to take the pointer that is you
know between the 1 and 11 key and then I
you know if I'm looking for all the keys
between one and five I'm going to go
left again to that disk page that holds
the data for Keys 1 two 3 and four so
basically we have a bunch of pointers on
disk we organize our data in a tree and
what's really nice about this approach
is that you can see that the data for
related keys so 1 2 3 and four are all
very similar to one another they're you
know when you sort the keys they're next
to one another their data is also next
to one another and what this allows us
to do really nicely is something called
range queries which is a very common use
of databases right maybe I want to get
all the values for Keys 1 through four
now instead of having to read a bunch of
different places on disk all of those
values are next to one another and it
should be a fairly fast read because of
this tree structure again think back to
your data structures here uh and all of
our reads reads are coming from there
and all of our wres are going to there
we are getting logarithmic time
complexities cool we also have a third
type of Index this will be the last one
that we'll call LSM trees and SS tables
so basically for something like this all
of our rights first go to an in-memory
binary search tree and then from there
once that tree gets a little bit too big
in memory we basically take it and we
flush it out to disk in a sorted file so
what this means is that we can have
multiple different sorted files
containing our key value pairs it also
means that we can still have a tree with
data in it so even though that makes
writes fairly fast because they're first
going into this LSM tree what they're
going to do right after that is when we
want to actually read data we first
search the LSM tree if it's not in there
the key value pair then we go ahead and
check out every single SS table from
from newest to oldest uh in the
background you can also optimize this
process a lot by you know merging
together a bunch of the SS table files
in a process known as compaction uh
there are a bunch of other data
structures that you can add uh like
sparse indexes or Bloom filters to make
this process faster but again as an
overview I don't think it's too
important to cover that right
now cool the next thing to talk about
here is going to be transactions so
transactions are a term that you may
have heard a lot about but the main gist
here is in databases these are multi
threaded uh machines right or
multi-threaded programs and so you can't
necessarily rely on just being able to
go ahead and make rights all the time
without having to worry about race
conditions so uh we often hear this term
acid to describe transactions and what
this really means I'm not going to go
through the full acronym for acid it
really means that these wrs that you're
making to your table are Atomic meaning
that if I'm writing to multiple rows at
once they're all either going to succeed
or they're going to fail and they're
also serializable meaning that we
effectively aren't going to have any
concurrency bugs to worry about about
it's going to look as if you know one
thread we're making all of these rights
so in reality how do we Implement
atomicity well basically what every
database does is they have something
called a write ahead log now we
discussed the write ahead log earlier
it's literally you just take every
single right that you're making into the
database and put it into some sequential
log on disk the reason we do that is
then because when we start a right we
put it in the right ahead log when we
complete a right we write uh that that
particular entry is now committed this
means that if the database fails uh
before the entry is committed we can
actually go ahead and start back up read
from our right ahead log see that we had
an uncommitted right and then go ahead
and apply it again serializable on the
other hand is a little bit harder to
implement so we'll talk about that right
now basically there are a few different
ways of ensuring that a database is
serializable one way is literally just
running it on a a single thread I don't
want to go into too many specific
database examples in this video but
something like voltdb does actually do
this we've also got pessimistic
concurrency control in this case we're
basically using a bunch of locks to stop
conflicting transactions from messing
with one another if I'm reading a bunch
of data and then I'm making a write
based on the value that I read we need
to basically make sure that I have locks
applied on all of the rows that I could
have read so that anyone else who's
trying to modify them or do anything
like that cannot actually proceed on the
contrary we also have optimistic
concurrency control this is a little bit
different in the sense that rather than
having transactions directly block one
another they can basically all proceed
until they're right about to commit and
then they check whether basically the
premise or the predicate or whatever
they read has since changed and if it
has it means that their transaction is
no longer valid and they need to abort
it so something like optimistic and
currency control is really nice when we
don't have a lot of uh you know
contending transactions that all want to
grab the same locks because of the fact
that you know now there's no overhead to
grabbing the locks
whatsoever uh pessimistic concurrency
control on the other hand is going to be
better when uh we basically have a lot
of contending transactions because then
we're just blocking them rather than you
know running the whole transaction and
then aboring right before it
completes cool so I guess another
question from here is okay well we have
this concept of database transactions
does every database support them and the
answer to that is actually no typically
most SQL databases will do so because uh
they tend to be a little bit older and
even within all SQL databases it's not
necessarily the case but you know if
you're talking about like MySQL or
postgraft or something like that you can
definitely use transactions that being
said one trend in the last you know 10
15 years has been for these nosql
databases to come around and stop
supporting transactions the reason being
transactions are expensive and so if you
don't necessarily need to use
transactions for your particular use
case maybe it's actually going to be
faster to just drop the ability to
support them and then the database can
run a little bit faster transactions are
really really useful if you have to
reason about concurrency because it
makes things a lot simpler to think
about but at the same time they're going
to make reads and wrs slower right we're
either aborting potentially a lot of
Rights or we are potentially going to be
adding a lot of locks to our
system cool another piece that I want to
talk about when it comes to databases is
actually going to be row based versus
columnar storage so basically like I
mentioned the whole really nice thing
about something like an index is that it
keeps related data together on a disk
and so even within uh you know multiple
different Keys what about thinking about
just one row right if I'm a user and I'm
on Facebook and I want to access my own
profile it would probably be good if all
the fields related to my profile were
next to one another on disk because when
I'm doing a read I'm going to access all
of those at the same time however for
something like analytical queries this
makes a lot less sense right because
typically for a big analytical query
that maybe a data scientist might run
you've got a lot of different columns in
a massive table and you really only care
about one or two columns you're reading
probably the whole thing and you're
performing some sort of aggregation or
grouping on that column and so for this
type of use case something like a column
store is actually going to support you
better and so in a column store the idea
is rather than store every single row
together on disk take the values within
a particular column and store those
together on disk the nice thing about
this is that for one you only have to
read a subset of the data if you're only
interested in a couple of columns at a
time so that's going to speed things up
number two is that you get all of these
nice compression benefits as a result of
the fact that all of the data within a
column is really similar to one
another cool another piece to talk about
is data serialization Frameworks so so
technically this could be part of the
the database uh chapter here but I think
I'm going to make it its own very simple
uh if we have a bunch of data and we
need to represent it on disk or over the
network basically as ones and zeros how
do we want to do so well sometimes you
have no choice if you're supporting a
flexible schema maybe you have to use
something like Json right I could be
sending any arbitrary object with Json
because it's literally just like I put
the field name in quotes and then I put
the object right after it and that's all
really nice however if you are using
strongly typed schemas you you could
actually introduce some other
Technologies here one really good
example of these would be something like
uh protocol buffers out of Google where
basically you create a predes uh
predefined schema uh such that you can
actually use objects that fit this
schema in multiple different types of
code uh you can send them over the
network you can store them in a database
and then actually on uh on the wire or
as binary you don't have to then include
all of the field names in that message
thus making them quite a bit smaller
right because if I know that uh field
name corresponds to tag number one as
you can see in my diagram on the screen
in the bottom left it means that you
basically only have to include one to
say hey this is the name field and then
the actual value of the name field as
opposed to doing that and so it saves
you a lot of
space cool so we've made it through
databases we've made it through data
serialization let's go ahead and talk
about replication so basically if we
have a database it's going to fail right
especially if we have a lot of them
Hardware failures are common uh things
just break especially when they're in
big data centers uh this happens all the
time and so basically uh we want to be
able to do two things one is we want to
protect against database failures and
two is that we want to make our
databases as fast as possible so what
can we actually do to do this well we do
something called replication which is
really just having multiple copies of
the data so replication is going to a
allow us to potentially withstand
Hardware failures and this is going to
change depending on how we do the
replication and also improve our
database performance same deal depending
on the type of replication you do this
is going to change and so the idea is
that you've got some sort of source
database or maybe one database that gets
a right and then it is sending that data
via something called a replication log
to the other databases and the
replication log is very similar to the
right ahead log it basically just has
all of the incremental operations that
are being performed on the database xals
5 nope never mind now x equal 10 nope
never mind now x equal 15 you can put
those all in a replication log and then
apply them Downstream on
replicas cool so what is a pretty
distinction uh pretty big distinction in
replication is synchronous versus
asynchronous replication so synchronous
replication and sometimes you'll hear
this called strong consistency is
basically that on a right all of our
replicas are going to have to process
that right before uh the writer actually
sees that their right succeeded right so
that means that when I write every
single one of the replicas is now up to
date which is great because you don't
have to risk any reads of stale data on
the other hand asynchronous replication
is you know you write to one particular
replica at some point in the future it's
going to send that right to the other
replicas however the problem is now if
someone reads from one of the other
replicas there's no guarantee that that
data is going to be there that's known
as eventual consistency you potentially
have to deal with stale data depending
on your application sometimes that's totally
totally
fine cool so let's go through the actual
types of replication and in all of this
I'm kind of assuming we mean
asynchronous replication because the
truth is synchronous replication is not
very viable due to the fact that if any
of those replicas goes down down now all
of a sudden we can't make any rights so
single leader replication basically
we've got one leader database that
accepts all the rights and then reads
can be served from any of the follower
databases so this is really simple I
would think of it as kind of like the
default type of replication but it's got
a couple of cons for one right
throughput is actually going to be
limited to the leader replica right
rights can only go to one place number
two is that if the leader actually fails
we need to go ahead and pick a new
leader and so that's going to lead to
some downtime whereby we can't submit
any rights and number three uh yeah or
basically I guess I kind of already said
number three is that you know when we're
performing this failover when the leader
goes down we can't perform any rights
and also when a leader does go down
because of the fact that we're doing
this asynchronous replication it is
possible that the follower that becomes
the next leader doesn't actually have
all of the data on it that the leader
had and then we might lose a couple of our
our
rights okay next thing we're going to do
is talk about multi-leader replication
this one's going to get a little bit
more complicated so bear with me
basically the idea is for a given
particular key you can have multiple
leaders that are able to serve rights
for it and then all of those leaders can
also serve reads too so this is really
nice because now we can write to
multiple places right we get increased
write throughput on the contrary we now
have right conflicts because if I write
to one of these leaders saying xal 10
and I write to another one of these
leaders saying xal 20 well what's the
actual correct value for x kind of hard
to say isn't it so basically there are a
few different ways to deal with this
because eventually these leaders are
going to synchronize with one another
right you don't just write to them and
then they hold their own data and you
never merge them together eventually
these guys will talk to one another and
they're going to try and figure out hey
for this value X what's the actual uh or
for this key X what's the actual proper
value so there are a few different ways
to decide what the proper value is
number one is last right wins give
everyone a time stamp and then all of a
sudden you're good to go uh you know
time stamps are always going to choose
one of the rights to win and that's
really nice on the other hand time
stamps industri Ed computers are pretty
much arbitrary I mean they're not
arbitrary But ultimately you can't trust
timestamps to be perfect because uh the
way that actual clocks in a computer
work uh are such that they can like
drift a little bit by you know a couple
of milliseconds at a time and so as a
result of that uh there's no guarantee
that the right that you do pick is
actually the one that occurred later and
even if it is uh when one of them wins
you basically just lose any record of
the other right ever happening and then
you forget about it and so even though
this is really simple to implement you
know it does have some downsides so what
are some other ways that we can deal
with this number two is going to be
storing siblings now please again if you
don't fully understand this example no
worries I talk about it a lot more on
this channel that's why I'm making this
video so you can go watch my other
videos basically the gist is uh a
concurrent right so two rights are
concurrent when they basically don't
know about one another and they go to
the different leaders and the problem
with that is that you know you don't
just want to throw away one of the
concurrent rights ideally you want to
keep that information
so that you can at least use it
somewhere later down the line right we
never want to lose information
especially if a user has written it and
they think that it is written so the
idea is if we can detect two concurrent
rights we want to basically be able to
store both of them and then someone
later when they read the data can see
both of those rights figure out a way to
merge them together and then write back
to the database such that now there's
only going to be one value so let's look
at the following situation we've got two
leaders right they both have x equal to
zero and no one has written to them yet
so let's say I write to uh the leader on
the left which we'll call leader a so
you can see now X is equal to 10 so we
do something called version vectors to
actually determine when we have a
concurrent right and so a version Vector
is very simple it's literally just a
very small array of how many rights has
been processed by each leader so this is
going to allow us uh to determine when
two rights are concurrent so the way
that it works is every time I write to a
leader it increments uh its own entry
and its own version Vector so when I
write to liter a a bumps its index for
liter a so I write xal 10 to liter a it
pumps its version vector and then let's
say we end up having some sort of
synchronization step right where uh
liter a tells lader B hey here's what I
think the value for x is you can see
that when liter B receives uh liter A's
value it also merges in liter A's
version Vector into its own local
version vector and so what this really
means is that you know when we're
merging version vectors we keep the
highest number for each replica so if
I'm leader B I can see oh you know what
uh I have an entry for a coming in
that's higher than I've seen let me go
ahead and change my version Vector such
that I'm up to date with that and also
change my right such that I'm up to date
with it let's say someone now makes a
right to ver uh to leader B leader B is
going to bump their local version number
right so we're going to say oh B is now
seen one right and the value
corresponding to that is xal 20 let's
say however that in the mean time leader
a gets another right and leader a now is
going to bump its own version Vector to
number two so notice if we actually look
at the state on both of these leaders
xal 30 vers xal 20 these two rights are
concurrent right because the person who
wrote xal 20 is not aware of the person
who wrote x equals 30 these are
concurrent rights they should be stored
as siblings in the database and handled
later the way that we know that they
should be stored as siblings is by
looking at the version vectors so you
can see on liter a the a entry is
greater than the a entry on liter B and
on liter a the B entry is less than the
B entry on lader B and so because we
have these kind of interleaved version
vectors that is going to indicate that
one right is concurrent with the other
if one of the version vectors were
strictly greater than the other at all
of the elements then those rights would
not be concurrent one would be more
recent than the other so let's say we go
ahead and sync up since our rights are
concurrent a knows it needs to store
both and so when B eventually
synchronizes back to a what's going to
happen is that we go ahead merge the
version vectors but a is going to say
okay I need to store both of these
things and so this way someone else can
read the data later see ooh it's 30 and
20 what should I write oh maybe I'm
going to write
25 cool so again if that's not that
clear no worries uh it's not an easy
concept and debatably I shouldn't have
even covered it here but we did it
it let's continue replicate
automatic resolution so the third option
for how we deal with multiple leaders
and conflicting rights is we actually
automatically uh resolve basically um we
resolve conflicts on the database so
rather than having a user uh basically
read the data of stored siblings and
then write them back what we'll do
instead is the database itself is
actually capable of merging those two
values so uh basically if you think
about it to some extent the actual
version Vector that we've seen in the
prior slide is a good example of this
for building something like a counter
because it tells us how many rights
total that we've had on each leader and
it's an eventually consistent way of
doing so uh you can also basically
perform automatic resolution for things
like set objects for sequence objects uh
but of course you know these can't get
too complicated right because they have
to be built into the database beforehand
uh it's nice when a database provider
provides this for you uh but you know if
you're really trying to do some
complicated stuff you're probably not
able to do it then you're going to have
to store siblings
cool Okay so we've gone through um
multileader replication uh let's go
ahead and now go through leader list
replication basically the idea here is
that rather than basically um writing to
just one place at a time and reading
from one place at a time now we're
actually going to write and read from
many places at once so the idea here is
we don't have a leader uh because we're
writing to multiple places and so that
means there's no big fail over if a
leader goes down right we're always
available we can always write somewhere
which is really really nice and then I
guess the the problem or the con of this
approach is that we have longer tail
latencies if I need to wait for three
nodes to respond to my right before my
right is considered committed then what
that means is that I'm basically
bottlenecked by the slowest of those
three nodes and the same thing applies
for reads so the question is well how
many replicas do we actually have to
write to and read from for one of these
operations to be valid so this is where
we introduce the concept of something
called quorums so let's say we have n
replicas in our setup and for us to
consider a right to be valid it has to
First make it to W replicas and for us
to have all of the data that we need to
actually make a successful read we need
to read from our replicas a quorum is
when that W + r parameter is greater
than the end nodes and so if this occurs
then we have something known as a quorum
so basically uh as you can see in the
image on the bottom left if we have
three replicas in our cluster where if
we set W equal to two meaning we have to
write to two replicas every single time
we try to write at least two and if we
set r equal to two meaning that we have
to read from two replicas every single
time that we tried to read again at
least two we know that uh those reads
and writes will overlap by at least one
replica on each set and so one of them
in theory should have the upto-date data
from there it's just a question of okay
well now we have multiple different
values on our reads how do we actually
determine which one is the correct value
well we have kind of discussed this
already we can use something like a time
stamp we can use something like a
version Vector to try and get a better
sense of which right is the actual most
upto-date one and that way a reader can
see the upto-date
data cool we finally made it through
replication because I think frankly
that's going to be the most complicated
part of all of this by a long shot let's
go ahead and talk about sharding so the
idea here for sharting is that when we
have a database and there's too much
load on it uh we're eventually going to
have to split it out right that's kind
of the theme of distributed systems uh
you need to split this work between
multiple different computers otherwise
it's just not going to work properly
Facebook can't be run on one computer
that's not going to fly so the main kind
of idea when it comes to sharding or
partitioning whatever you want to call
it is that when we ultimately make our
database shards we want the majority of
our read queries and the majority of our
write queries to only have to interact
with one Shard at a time if they have to
interact with with say 10 different
shards then basically an operation
cannot be considered completed until the
slowest of those 10 is done responding
and so keeping all of our data accesses
and mutations to one place is the way to
keep things as fast as possible we also
want to ensure that we don't have
something called hotspots where hotspots
are shards that have a disproportionate
amount of either the data or the number
of requests going to them so how can we
go ahead and do
that well the question I guess is how
should we actually Shard out our data if
we have a database and we've got a bunch
of key value pairs how do we want to go
ahead and partition them well one thing
we can do is just by doing so by the
range of keys and you can see this image
below if we have a bunch of different
names we can just partition them by key
range so this is really nice if we want
to be able to support range queries
still right because it means that
similar keys to one another are going to
be on the same Shard and we don't have
to make a big distributed query that
being said it also means that we're more
prone to hotspots especially for names
for example there are a lot more names
that start with a or maybe J than there
are that start with Z so ultimately you
have to be very careful here what else
could we do well we could do something
called basically the range of the key
hash so for a given key basically all
you have to do is take the hash of it
and then Shard it based on the hash of
it so why do we actually go ahead and do
something like this well the idea is
that a hash function should in theory
evenly distribute uh a range of keys and
as a result um we should get better
balancing at the same time the downside
of this is that that you know if you
want to query all of the names that
start with a those hashes are all going
to be completely different and now we
can't easily run a range query because
we're going to have to go ahead and
access a bunch of different database
chards so it's not super easy to do this
but nonetheless often times it is going
to be the correct approach if you don't
need these big range queries
cool so one thing we don't want to do in
this particular situation is we don't
want to take the hash of a key and take
it modulo the number of shards and then
send it to The Shard that it ends up at
the reason for this is that let's say I
add one more Shard to our cluster um now
the majority of our keys are going to
have to be shuffled to different shards
right because that's how modulo works if
you think of you know I have 12 keys and
right now all of them can be in one of
three nodes uh based on modulo 3 and
then I now make it modulo 4 a majority
of those keys are moving to a different
node and that's not going to be very
good so what do we do to mitigate this
there's a separate algorithm called
consistent hashing uh the main idea
behind consistent hashing is that you
basically organize uh kind of the hash
range so basically every possible hash
value of a key on some ring and then
when you find a key you take the hash of
it you walk clockwise around the ring
until you find uh a machine's location
on the ring or a shards location on the
ring and that is basically going to tell
you what Shard a given key belongs to
the reason we go through the trouble of
doing all of this is that that it leads
to less rebalancing when we actually add
or remove a machine from the cluster
let's imagine that we added a machine at
that vertical blue line on the circle so
right at the 12:00 position basically
what that's really going to be doing is
just taking a subset of the keys that
would have gone to Machine 2 and now
they go to that vertical line new node
and so what this does is it just
minimizes the amount of nodes that have
to actually send their keys around again
this is an oversimplification of things
in reality there's a little bit more
work done to Ure that all of the nodes
do have a relatively even balance of the
keys but this is just going to allow us
to you know dynamically change our
cluster size without having to send Keys
all over the place all the
time okay another thing that I want to
touch upon is local secondary indexes so
we know what an index is Right an index
is basically just sorting data in a
database by a particular key so that
when we say hey find me all of the rows
that have this particular key that's
really fast simple enough we also know
what secondary indexes are it's the same
thing on a database but just on a
different field right you know if I have
a bunch of different um users in a table
maybe I want an index on user ID and
then maybe I also want an index on their
age because maybe I want to be able to
search for all people whose age are 24
something like that that being said this
gets a little bit more complicated when
you're doing charting so the idea here
is basically that um a local secondary
index is the same exact thing as just
you know adding an index per shard
so it means that rather than uh having
an entire view of all of the rows in the
database your local secondary index only
applies to the rows that are on that
particular Shard so this is nice in the
sense that it doesn't really slow down
rights very much right rights are still
going to go to uh one individual Shard
but at the same time if you want to
actually read on that key uh let's say I
want to find all of the people maybe our
primary index is based on a user ID but
then I want a secondary index on uh the
actual age of every single user if I
want to find the age of all users who
are 24 now I have to read from Every
Single Shard and use their local
secondary index and so that's going to
be a lot slower for
reads what can we do in contrast well
there's something known as Global
secondary indexes the reason they're
called Global is because they basically
index the entire uh Global table and
then they themselves are partitioned
according to that key rather than the
primary key so uh obviously this is
going to make our reads a lot faster
because um you know now ideally we would
be able to have um everything go to just
one particular Shard when performing our
reads so we're not making a kind of
scatter gather query there but the cons
of this approach are that now every
single right has to go to multiple
machines because in all likelihood um
the machine that I write to for our
primary index is probably going to be a
different machine than the one that is
holding our Row for its Global secondary
indexes and so another uh challenge here
is that we actually have to keep these
two index is in sync with one another so
what are our options option number one
is something called two-phase commit so
we haven't introduced this concept yet
but the idea here is basically well
two-phase commit is a way of making
rights go through on multiple different
machines at the same time and ensuring
that they all apply them it can't ever
effectively work because at the end of
the day um you know the network is
completely asynchronous and there's no
way to guarantee that two different
rights absolutely go through but this is
pretty much the best we can do so
basically we've got some node called the
coordinator and all the coordinator is
going to do is it's going to reach out
to all of the nodes that it wants to
write to and in the first phase it's
going to say hey prepare to accept some
right once all the nodes say okay I'm
prepared to do so it's going to reach
back out and say okay you can now commit
the right so simple enough uh once that
prepare phase is completed basically all
of these nodes that are prepared to
accept the right can't really do
anything else until they accept the
right uh they've pretty much guaranteed
to the coord Ator node that they will
never reject the right in this situation
and so uh that is known as the commit
point we basically can't go back from
there so that's one option but this is
expensive right it requires multiple
network round trips uh we have to
request a prepare we have to get our
prepare back we have to commit we have
to hear that we're done that is going to
be a lot of work what can we do instead
well if we're okay with having our
indexes not perfectly in sync but you
know they'll eventually be in sync with
one another we can update our secondary
indexes a syn inonly and so one option
that we actually have here is we can use
eventual consistency uh by basically
saying okay first I'll update my primary
index and then whatever the node is that
received the primary uh right we'll go
ahead and forward that right to um
basically The Shard that is uh going to
be updated for the secondary index so
for example let's say I write to Shard a
because um my user ID is number one and
then my age is 100 so that's a really
high age maybe that puts me on Shard 10
uh Shard one is going to forward that
right over to Shard 10 and Shard 10 will
apply it there's no guarantee on when
that gets done we know it'll happen
eventually but the point is uh if I now
read from my secondary index it may be a
few seconds before I see that update actually
actually
propagated cool uh another piece to
actually talk about here is SQL vers
nosql so uh again I don't really like
when people are like oh no SQL is more
scalable but the reason that sometimes
we'll say it
is effectively that no SQL basically
just drops a lot of features that SQL
offers in order to be more scalable uh
when it comes to Big Data and so really
what it means is it's not necessarily
that no SQL databases are inherently
more scalable it's just that no SQL
databases support more scalable use
cases because they don't really allow
you to do all of these cross Shard wrs
these cross Shard reads they really
encourage you generally speaking to try
and keep all of your operations with in
A Single Shard that you don't have to do
something like a two-phase commit and
that's going to make your life a lot
easier for example on Cassandra you have
this concept of a partitioning key uh
and the idea there is just like hey you
know all of your rights have to be on a
Shard with the same partitioning key and
that way you're never doing something
commit okay so I think we've made it out
of database hell let's go ahead and
start talking about some other stuff
number one is going to be batch
processing so basically batch processing
is just when we run a big batch job or a
massive computation that's distributed
over a massive data set and so typically
uh these are run offline which means
they just have some sort of trigger and
in addition uh it's not easy to write
these right or like if you were to write
one of these from scratch you would have
to account for the possibility that one
of your worker nodes dies you would have
to account for the possibility that you
know one of your worker nodes is doing a
lot more computation than the other and
also you would want to make sure that
you're avoiding any concurrency bugs
associated with doing distributed
computation that's where batch
processing Frameworks come in they
basically handle all of this for you uh
maybe you specify how you want to do
something like charting but uh
ultimately they are going to either
checkpoint state or you know be able to
restart jobs as necessary in order to
ensure that they get done even in the
face of machine failures and uh yeah
they're very useful so if you hear about
things like map reduce or spark that's
what we're talking about uh there are
reasons why spark is better than map
reduce uh but nonetheless that's not the
focus of this video this is just very high
high
level cool the next thing that we're
going to talk about is stream processing
so stream processing messages are
basically um generated by something
known as a producer so a producer is any
sort of server they're eventually going
to be consumed by something known as a
consumer and then uh basically we use
some sort of additional server that is
responsible for routing messages from
producers to Consumers that is known as
a message broker and so what are
basically the use cases of stream
processing well let's say we have a
bunch of application events we don't
want to process them synchronously
because maybe they would take too long
to process and it would lead to request
timeouts or something like that uh that
would be a really good example um if we
have multiple different data streams
that we want to join together maybe we
would do so in our consumer by briefly
caching some of the events waiting for
the others and then joining them
together and then also time-based
aggregation as well I want to find all
the events that have come together
within the last hour and maybe I want to
take the average of some number uh or
some counter that they have this is a
really good way of doing
so cool so as far as stream processing
goes I mentioned that we have these
Brokers and there are basically two
types of them number one is what's known
as the inmemory message broker so the
idea here is that we have a queue uh we
keep it in memory and then we remove
those messages when a consumer
successfully reports that we've
processed them so basically one of the
unique things that in memory message
Brokers can do is round robin message
delivery so what this really means is
that that queue that single individual
queue can deliver messages to multiple
different consumers at once now the nice
thing about that is that it allows us to
get a lot of throughput uh because now
we have multiple consumers who can be
handling messages at a time some of the
cons are that we aren't necessarily
processing all of these messages in
order uh when a message is sent to a
consumer it's briefly removed from the
topic uh and you know kind of put in
this potentially acknowledged area or
this staging area and then once it is
ultimately acknowledged it gets deleted
after some time out maybe it'll get
added back to the queue because maybe
the consumer failed or something like
that but the idea is we don't
necessarily process these messages in
order and number two and more
importantly is that if we are too slow
to consume these messages and too many
of them build up we may run out of
memory on the broker and then things are
going to
break cool the next piece of this is
going to be the log based message broker
so basically log based message broker is
something like Kafka if you ever heard
about that but the idea is that all we
do is instead of keeping all these
messages in memory and some sort of
queue we actually persist them all to
what is effectively a write ahead log on
disk and so that way if I'm a consumer
and I'm reading from this broker uh the
broker will actually keep track of the
last offset of the message that I've
last read and so that way within one
single partition of one of these Brokers
I'm just reading all of the messages
sequentially from start to finish and
and so a nice thing about this is that
we get replayability the reason is our
messages are not actually deleted upon
consumption so what's really nice about
that is you know maybe for whatever
reason we want to go back and check the
value of our messages we can actually
read through this broker and do so again
so again we're not doing this kind of
Round rining Here of one partition to
multiple consumers and each consumer
handles a subset of the message within a
partition every consumer handles every
single message and so that is very much worth
worth
noting cool as far as actual uh stream
processing consumers go this is starting
to be a lot more popular is to have
stateful consumer Frameworks so uh I
personally get on quite a bit in my
videos for using these however I think
they're very useful and I will explain
right now so again when you're consuming
from these message Brokers it is very
frequently the case that you need to
build up some sort of state right if I'm
doing time window aggregation it means I
need to cach all of the messages that
I've seen in the last hour or something
like that and so what would really
really stink is if your consumer happens
to fail and then all of those messages
get lost in theory depending on the type
of message broker you may be able to
replay those messages and build up your
state again and that's a really nice
ability to have but nonetheless that can
be very expensive if it's going to be
the case that uh you know you've missed
a ton of messages in the meantime and
now you're reping a bunch of stuff and
you're behind so what these actual uh
consumers will do in addition to
supporting all of these aggregations
ations and joins and things like that is
that they will occasionally take their
state and they will checkpoint it right
so they'll take a state that they
currently have they'll take the
corresponding offset in a log based
message broker that they've read from
and they'll put it in some other form of
durable storage so maybe they'll say
okay at offset message uh number 20 I
see that currently I've got 10 messages
stored in my cash and you know maybe
here's what they are so that way if we
fail and we start back up we can read
from that snapshot get the exact state
that we had before and start again by
reading at the log based message broker
from that offset rather than having to
read completely from the beginning and
having to reestablish all of our state
so this is the type of thing that can
save you quite a bit of
time cool uh before we go ahead and get
into the actual patterns of um systems
design interview problems let's go ahead
and talk about some other types of
storage so these are just like a few I
guess this is kind of like a pot Pur of
just random database systems that you'll
often have to use during your interviews
and uh yeah they're important to know if
you don't so number one is going to be a
search Index this is useful when you
want to perform something known as a
full text search it is not trivial to
search for text content so imagine you
know we have a bunch of tweets we want
to find all the tweets that say the word
Jordan is handsome so the way that you
do that is you create something known as
an inverted index and of course this has
to scale over many many different
computers but the idea is basically for
every single word in that tweet you go
ahead and extract them all and then you
take all of the Tweet IDs and say okay
well for word Jordan here are all of the
Tweet IDs that contain it and so that
way I can quickly see okay if I'm
searching for the term uh Jordan is
handsome let me find the entry for
Jordan is and handsome and find all the
Tweet IDs that are uh you know
containing all three of those pretty
simple way to do it number two is going
to be distributed file systems so this
is something like had dup but the idea
here is it's basically the file system
on your computer scaled over multiple
nodes I guess the nice thing about this
is that typically you can actually run
computations on these files because
ultimately the files themselves are
stored on the computers though you know
it does really ultimately depend on the
actual implementation of the file system
so if you are going to run something
like a batch processing job typically
you'll store all of that data in
something like a distributed file system
and then do so from
there next we have graph databases so I
guess the idea here is sometimes you do
need to be able to store a graph on disk
because you want it to be durable you
want it to be accessible for many
different servers and uh this is not a
trivial problem so one possible
implementation is called a non-native
graph database and this would be like
using a SQL database to do it where you
basically represent all of the edges of
graph like a many uh many to many
relationship so you have one table where
it's like graph node ID 1 graph node ID
2 and then you have many different rows
of that and so that way you can quickly
figure out for every single node in your
graph all of its edges the way that you
you make that read relatively fast is
you add an index on that many to many
table uh based on the Node ID the
problem with that is that as the graph
gets bigger uh there going to be more
rows in that edges table and if there
are more rows in the edges table it
means that it's going to take us longer
to read from it because indexes
ultimately operate in logarithmic time
so even though it's not a full uh like
linear growth in terms of how slow it's
getting it is still getting slower as
the graph uh grows sometimes we don't
like that thus leading to Native graph
databases native graph databases are
basically just using pointers on disk
right so rather than uh you know trying
to get good data locality they've just
said screw it we're going to use
pointers on disk now maybe this is good
for graph database tasks But ultimately
they're still kind of slow because using
pointers on dis means you're doing a lot
of random reads and doing a lot of
random reads on dis are fairly slow so
maybe not uh the best solution but if
you have to have a graph database
perhaps this is the best way to do
it cool uh next we'll talk about the
object store I guess the the kind of
prime example of this one would be
something like Amazon S3 uh the gist is
this is kind of uh not going to allow
you to do compute uh locally on the data
that you're storing you basically just
dump everything into a bucket and then
you can go ahead and load it elsewhere
and do compute there but you do have to
read it first because of the fact that
this is storage that is kind of scaled
up without compute generally cheaper to
store your stuff in and so this is
becoming really popular these days
Storage Solutions and a lot of times for
batch jobs people will basically take
data out of something like S3 load it
into a distributed file system and then
run the batch job uh there because
perhaps it's
cheaper cool uh next we have time series
databases so these are what they sound
like uh really good for time series data
when your time series data is basically
split up by two axes one is like what is
producing the time series data and the
other is uh the actual time range of it
the reason it's fast is because one
we're using column oriented storage
which we discussed earlier uh a lot of
this data is super similar to one
another so we can compress it really
nicely and two is that you know we can
just be good about how we cach things
right like if I'm accessing a certain
time range uh I know that I'm probably
going to access the next time range
after that and so we can be smart with
that we know it's going to be for the
same sensor and also the majority of
inserts to this database are just going
to be sequential they're kept in the
same order that they're added and so uh
you don't have to do anything too cheeky
with wres you can basically use
something like a write ahead log for all
okay another piece of this to discuss
are something known as geospatial
indexes basically these are really
useful for finding all of the points
within a certain distance of another
point so basically Geographic data is
represented by a lat long combo and
because these are two completely
distinct fields from one another there's
not really a good way to build a
database index that is going to allow me
to find all of the points within say a
circular radius from me indexes are
really good for finding all latitudes
within a certain range or all longitudes
within a certain range but if you look
at the diagram that I made on the bottom
left of the screen if I'm trying to find
all the points near the white dot I can
basically find really easily both of
those rectangular areas that are in the
black so you know the all the points in
a Latitude near me and all the points in
a longitude near me however uh that's
going to be a lot of points right
because the globe is big and this spans
a lot of area and then eventually you're
going to have to take all of those
individual points and check whether
they're actually close enough to the
white dot so that's too expensive
instead what we basically need to do is
figure out a way to convert our latitude and
and
longitude into a sort of one-dimensional
key so that that way um anyone with very
similar one-dimensional Keys is actually
close to one another on our map and this
is kind of the philosophy between
something known as like a geohash or a
quad tree and uh you can see that I've
kind of annotated this structure on the
left here so on the left we've got this
big box with a bunch of smaller little
boxes in it and the way it starts is
okay our largest box has you know no
geohash whatsoever if you take one box
and then you go to the top left of it we
basically take the existing ID and we
add Z 0 to it if we go to the top right
we add 01 bottom left 1 zero uh bottom
right 1 one so if we look at Point C you
can see that the top right uh top
quarter box is 01 and then uh the actual
goash of the box that points C is in is
0 1 0 0 because it is the top left box
of the top right box and so by creating
Keys like this what it means is that if
we actually go ahead and organize all of
the points on disk and sort their keys
lexicographically what it means is that
we can basically do a very quick binary
search to find one particular key value
and then all of the other points that
are nearest to it in the database are
going to have very similar keys and so
we basically take advantage of range
queries here to go ahead and figure out
all of the close points to another
Point finally uh I believe finally we've
got coordination services so basically
the idea here is that occasionally it's
not acceptable to lose our rights uh
under any circumstances right and this
is really going to be true for our
application metadata who's the leader
database what partitions live where do I
have a lock or does this other server
holding some sort of lock uh even if the
database holding that information goes
down we don't want to lose that that
would be very bad our application is
going to go to so the way that we
basically go ahead and ensure that this
doesn't happen is we use something
called a coordination service and the
reason this is good for a task like this
is that they use an algorithm known as
distributed consensus so basically the
idea here is that despite these
algorithms allowing us to you know make
sure we never lose data even in the face
of Hardware faults um they're fairly
slow to write to and read from and as a
result we really only want to be using
them for important application
state so just to demonstrate this
basically you've got a client that
proposes rights to a leader the leader
then proposes rights to a bunch of
followers and the leader is doing so via
two-phase commits which we discussed
earlier the idea here is that a right is
considered successful if a majority of
the replicas in the cluster are actually
saying okay this worked during the
two-phase commit so when the leader is
sending rights to the followers if two
out of those four followers say yes I'm
able to apply this it can go ahead and continue
continue
what this allows it to do is it
basically means that if the leader is to
go down one of the followers uh that
actually has all of the rights can go
ahead and take over as the next leader
and it of course is going to need yet
again a majority of approvals to do so
and it's kind of like the Quorum logic
from earlier right if you're always
making sure that you're writing to and
reading from a majority you can always
be sure that your data is up to date and
by using the two-phase commit we avoid
any sort of race conditions associated
with that
okay H totally forgot about this caching
section let me go ahead and bang that
out real quick before I take a break
because I'm starting to get sweaty here
basically the idea is that reading from
dis is slow dis is bad let's use memory
uh caches get filled up really easily so
how do we actually evict data when
they're full typically we'll use
something like a least recently used
policy so lru eviction and then
similarly to databases caches can and
should be replicated and they should be
sharded so there are three different
types of caching that we'll cover in
this before we quickly cover cdns uh and
let's go ahead and go through them
number one is known as the right through
cache if I'm an application basically
all I do is write to my cache and my
database at the same time depending on
your implementation of this you may use
a two-phase commit which means rights
are going to be pretty darn slow at the
same time it means that your cach and
your database are going to be kept up to
date which is very
nice number two is the write around
cache in the write around cache
basically all you're going to do is uh
you write first to the datab B and then
all reads you're going to attempt to
make from the cache and if the data that
you want to read from the cache is not
in there the cache is going to go ahead
and pull it from the database this is
nice in the sense that it's a faster
right uh the database is still the
source of Truth which is good because
the database should be your source of
Truth and then
finally uh the kind of con of this
approach is that cash misses are
expensive right when you try to read
from a cash and it doesn't have the data
now you still have to go to the database
and you basically did two reads for one
so that's uh
inconvenient then finally we are going
to go ahead and have the right back cach
so in the right back cache basically an
application is going to make multiple
rights directly to the cach and at some
point the cache is asynchronously going
to sync up with the database so this is
really nice because you're just
basically only writing to memory that's
super fast at the same time if I'm a
different application and I'm reading
from the database I'm not going to see
the rights that are in the cache that's
bad okay a final piece of kind of
caching information we'll talk about is
content delivery networks these are
basically geographically distributed
caches for static content like pictures
or videos or anything like that if you
have stuff in S3 that you want to store
or basically serve to multiple users
around the world sometimes it makes
sense to do so from a
CDM okay load balancing I think I'm
getting here but I actually just can't
even remember my own table of contents
which is why I'm still going out this
basically in any large application we
have to be able to do something called
horizontally scaling which is when you
take one server and you make a bunch of
them because ultimately that one server
is going to get too many requests so you
know if I have like a sign-in service
maybe I need 100 signin services and a
load balancer is the piece of technology
that's going to take me a user and tell
me which sign on service I'm actually
using so basically they can operate on
multiple different policies one example
would be round robin we basically just
randomly choose uh or you know you go on
a loop and you choose the next one that
didn't previously serve the request
another would be using consistent
hashing like we spoke about before
sometimes it's really really useful when
the same request or similar request are
being routed to the same exact server
because maybe that server has some
cached information that's going to make
that request a lot faster uh we can do
this hashing on things like your IP
address uh so any like networking
related information known as layer 4
load balancing or application uh
specific information where you'd
actually have to read the packet itself
uh and that would be called layer 7even load
load
balancing cool as far as fault tolerance
goes for load balancers because they too
can become single points of failure and
go down I guess there are a couple ways
that you can run them one is the active
active configuration where you're
running multiple different load
balancers at once uh we probably use
some coordination Service uh so that
clients can actually read from the
coordination service figure out what
load balancers they have access to and
then you know randomly route a request
to one of them number two is active
passive configuration where you're
basically doing something similar every
load balancer is registered with a
coordination service but only one of
them is actively routing requests and
then if that goes down uh the
coordination service is going to report
that it's down via some sort of
heartbeating with it uh and then the
passive one is going to say okay I'm the
active one now all requests can go to me
all right my throat was uh starting to
hurt and I was starting to get a little
bit sweaty but let's go ahead and finish
this thing off here we go systems design
interview patterns like I mentioned a
lot of these problems can really just be
broken down into common themes let's go
ahead and break them down number one
that I see a lot is contending Updates
this is basically when you have a bunch
of rights to the same key so many so
that it's not simple to just do them on
one node because you're just going to
have too much lock contention examples
here would be building out a distributed
counter maybe dealing with order
inventory on something like Amazon for a
popular product and I guess you know the
naive solution is you just do everything
in the same database and deal with
contention might not be
acceptable what would be better is
perhaps using multiple database leaders
and then basically using something like
uh you know kind of that automatic
merging logic that we discussed before
to go ahead and eventually make these
guys consistent uh this works really ni
nicely for something like a distributed
counter or a set uh or you could use
stream processing right so you put
everything in something like a kofka q
uh you handle it Downstream by a
consumer maybe handle them in a mini
batch if we want to use spark streaming
instead and then you'll go ahead and
make rights to the database uh such that
you're doing so with multiple batches at
a time fewer rights means less
contention next we have derived data so
let's say we have two data sets that we
want to keep them in sync with one
another we discussed a couple of
examples of this
already one being the global secondary
index basically uh if we have uh maybe
another data set which is being uh
updated by one particularly slow
database but we want to make a faster
readon view on the data this would also
be another good use case so for
something here what we could do is you
know naively a two-phase commit and this
may be necessary if we need to keep them
perfectly in sync however what might be
better is something like change data
capture change data capture is pretty
simple it is something that works with a
lot of different databases and what it
does is it takes all of the updates to a
particular database pipes them into a
log based message broker and then that
broker consumer can do any
Transformations that are necessary pipe
all of the changes from there into another
another
database cool so this is going to allow
us to be eventually consistent I'm sorry
as you guys can see my throat is dying
so we've got some sort of source
database uh it has all of its
transaction logs it puts that into our
change data capture framework which is
really just stream processing and then
they get sank into our Target database next we've got fanout fan out
database next we've got fanout fan out is one that uh you should know really
is one that uh you should know really really well because it comes up probably
really well because it comes up probably in 50% of these problems the idea here
in 50% of these problems the idea here is that rather than do a really
is that rather than do a really expensive read query uh because we have
expensive read query uh because we have a lot of read optimized applications in
a lot of read optimized applications in real life for users what we want to do
real life for users what we want to do is do more work on the right path and
is do more work on the right path and basically take our data that we're about
basically take our data that we're about to write and deliver it directly to all
to write and deliver it directly to all of the users who are interested examples
of the users who are interested examples of this include push
of this include push notifications uh a social media news
notifications uh a social media news feed something like uh figuring out who
feed something like uh figuring out who our mutual friends are every single time
our mutual friends are every single time that we make a new single friend
that we make a new single friend Connection and also stock price delivery
Connection and also stock price delivery right we've got many interesting parties
right we've got many interesting parties uh when the price of say a Google stock
uh when the price of say a Google stock goes up or down by a
goes up or down by a dollar so one option is we can you know
dollar so one option is we can you know deliver all this data to all interested
deliver all this data to all interested users synchronously by virtue of you
users synchronously by virtue of you know doing it in some sort of AP I maybe
know doing it in some sort of AP I maybe we hit an endpoint we go hit 100 other
we hit an endpoint we go hit 100 other servers to deliver the message there and
servers to deliver the message there and then we report back to the user but the
then we report back to the user but the majority of the time this is going to
majority of the time this is going to time out because there are too many
time out because there are too many places to reach out to so we probably
places to reach out to so we probably have to do it asynchronously in the
have to do it asynchronously in the background what a surprise we're using
background what a surprise we're using stream processing again you guys on
stream processing again you guys on me for using flank all the time except
me for using flank all the time except literally the majority of these problems
literally the majority of these problems it makes the most sense to use stream
it makes the most sense to use stream processing and when you're using stream
processing and when you're using stream processing you're often better off using
processing you're often better off using a stream processing framework for better
a stream processing framework for better fault tolerance anyways the idea here is
fault tolerance anyways the idea here is you want to put your message in a log
you want to put your message in a log based message broker you have some sort
based message broker you have some sort of consumer that receives it it figures
of consumer that receives it it figures out where the message needs to be sent
out where the message needs to be sent to and then it does so
to and then it does so accordingly what does that look like
accordingly what does that look like well basically like I said user posts it
well basically like I said user posts it uh we've got some sort of broker that
uh we've got some sort of broker that has all the tweets and then you know
has all the tweets and then you know right after that broker someone is
right after that broker someone is consuming all those messages and
consuming all those messages and delivering it to some sort of Cash for
delivering it to some sort of Cash for every interested user now one caveat of
every interested user now one caveat of this problem is that we'll often have is
this problem is that we'll often have is kind of the popular message problem
kind of the popular message problem right so let's imagine I'm uh Jordan has
right so let's imagine I'm uh Jordan has no life very famous YouTuber and I've
no life very famous YouTuber and I've got many many Twitter followers I think
got many many Twitter followers I think I have maybe 50 at most but uh okay
I have maybe 50 at most but uh okay let's imagine I'm Ronaldo and I have a
let's imagine I'm Ronaldo and I have a million Twitter followers the idea is
million Twitter followers the idea is that um you can't really deliver this to
that um you can't really deliver this to a million Twitter cashes and so as a
a million Twitter cashes and so as a result for messages like that uh the
result for messages like that uh the stream consumer is going to notice how
stream consumer is going to notice how many places it has to deliver it to and
many places it has to deliver it to and it's going to instead opt to just
it's going to instead opt to just deliver it to some sort of popular
deliver it to some sort of popular message cache this way we ultimately
message cache this way we ultimately build out a hybrid approach where you
build out a hybrid approach where you basically have all of the messages that
basically have all of the messages that are not super popular sent to all of the
are not super popular sent to all of the individual users that are interested in
individual users that are interested in them and for popular ones those same
them and for popular ones those same individual users can occasionally pull
individual users can occasionally pull from the popular cache and uh generate
from the popular cache and uh generate the necessary State
the necessary State there cool another pattern that we see
there cool another pattern that we see come up all the time is proximity search
come up all the time is proximity search basically this is find all of the close
basically this is find all of the close points in a database to you we already
points in a database to you we already discussed this right you want to be
discussed this right you want to be using a geospatial index um examples
using a geospatial index um examples where this comes up is Uber Yelp Airbnb
where this comes up is Uber Yelp Airbnb find my friends I've seen literally
find my friends I've seen literally every variation of this problem on
every variation of this problem on YouTube because people just want to
YouTube because people just want to steal views the idea is that you can
steal views the idea is that you can build indexes on latitude or longitude
build indexes on latitude or longitude but like we already mentioned that's a
but like we already mentioned that's a problem that's not going to work so what
problem that's not going to work so what you use instead is a geospatial index
you use instead is a geospatial index now of course there's a ton of data in
now of course there's a ton of data in these database tables and you need to do
these database tables and you need to do some sort of sharding how do we Shard
some sort of sharding how do we Shard well typically you want to Shard kind of
well typically you want to Shard kind of based on that geospatial bounding box
based on that geospatial bounding box right that being said not all geospatial
right that being said not all geospatial bounding boxes are equal there's a lot
bounding boxes are equal there's a lot more data points in some place like New
more data points in some place like New York than there might be
York than there might be somewhere like Wyoming and so you have
somewhere like Wyoming and so you have to account for that uh ultimately what
to account for that uh ultimately what you'll do is you'll probably make a
you'll do is you'll probably make a bunch of very very small partitions and
bunch of very very small partitions and then potentially dynamically move those
then potentially dynamically move those partitions to equal out the load over
partitions to equal out the load over time as you get a better sense of where
time as you get a better sense of where the load is actually going to and coming
the load is actually going to and coming from next we have job scheduling uh this
from next we have job scheduling uh this one comes up a lot in the literal build
one comes up a lot in the literal build a job scheduler problem and it also
a job scheduler problem and it also comes up uh in like the Netflix and
comes up uh in like the Netflix and YouTube problem because when you upload
YouTube problem because when you upload a video you have to enue a bunch of jobs
a video you have to enue a bunch of jobs to encode it in many different
to encode it in many different resolution and you want to probably be
resolution and you want to probably be doing all of those offline rather than
doing all of those offline rather than synchronously in that way it's kind of
synchronously in that way it's kind of also a fan out problem but same idea so
also a fan out problem but same idea so basically the gist here is um we've got
basically the gist here is um we've got a bunch of agnostic tasks that we need
a bunch of agnostic tasks that we need to get run we don't care where they run
to get run we don't care where they run it just needs to
it just needs to happen so our two options are as follows
happen so our two options are as follows right we have two types of Brokers we
right we have two types of Brokers we know we want to do this
know we want to do this asynchronously if we use a log based
asynchronously if we use a log based message broker uh what we would probably
message broker uh what we would probably end up doing is we'd have to make a
end up doing is we'd have to make a bunch of partitions of it um and then
bunch of partitions of it um and then you would have a consumer reading from
you would have a consumer reading from each partition right one of the workers
each partition right one of the workers in our cluster that runs the jobs so the
in our cluster that runs the jobs so the problem with that is we know that um job
problem with that is we know that um job or messages in a log based message
or messages in a log based message broker are read in order and so that
broker are read in order and so that would mean that if there's one
would mean that if there's one particularly slow job in a partition of
particularly slow job in a partition of a message broker it means that uh we are
a message broker it means that uh we are not going to be able to run all of the
not going to be able to run all of the jobs behind it they're just going to get
jobs behind it they're just going to get completely bottlenecked and it could be
completely bottlenecked and it could be the case that the other workers are
the case that the other workers are going to have nothing to do in their
going to have nothing to do in their particular partition of the log based
particular partition of the log based message broker so it would be a better
message broker so it would be a better approach is to actually use an inmemory
approach is to actually use an inmemory message broker here and use that to
message broker here and use that to basically round robin all of the jobs
basically round robin all of the jobs from worker to worker a worker can then
from worker to worker a worker can then pull a job off the queue the moment that
pull a job off the queue the moment that it becomes
it becomes available one more pattern to discuss
available one more pattern to discuss here is going to be aggregation uh this
here is going to be aggregation uh this is basically okay we've got a bunch of
is basically okay we've got a bunch of messages they're being sent out by
messages they're being sent out by millions of different servers uh they're
millions of different servers uh they're probably going to be aggregated on some
probably going to be aggregated on some sort of key or maybe joined with other
sort of key or maybe joined with other messages how do we want to do this uh
messages how do we want to do this uh well example situations are metrics or
well example situations are metrics or logs right if I'm publishing a bunch of
logs right if I'm publishing a bunch of metrics but the metrics for a particular
metrics but the metrics for a particular key can come from many different servers
key can come from many different servers maybe I want to aggregate them and make
maybe I want to aggregate them and make sure they're in the same file same thing
sure they're in the same file same thing for logs uh if we have multiple
for logs uh if we have multiple components of like a particular job that
components of like a particular job that are all being run and then once they're
are all being run and then once they're done they need to be aggregated this is
done they need to be aggregated this is another example uh data enrichment uh if
another example uh data enrichment uh if I'm publishing a bunch of messages and
I'm publishing a bunch of messages and they need to be joined with a bunch of
they need to be joined with a bunch of other published messages this is an
other published messages this is an example uh one option is you just put
example uh one option is you just put everything in a database uh you know
everything in a database uh you know course responding to the server it's
course responding to the server it's coming from and then you run a batch job
coming from and then you run a batch job and uh do all the joins you need that
and uh do all the joins you need that does work wow I'm really losing my voice
does work wow I'm really losing my voice that does work however uh it's pretty
that does work however uh it's pretty naive in the sense that you know it's
naive in the sense that you know it's going to take you a lot longer to
going to take you a lot longer to actually see the results that you want
actually see the results that you want to see you're going to do a lot of work
to see you're going to do a lot of work in order to get that data number two is
in order to get that data number two is you can do stream processing right so if
you can do stream processing right so if you're smart about how you actually
you're smart about how you actually partition this data and you partition
partition this data and you partition your log based message Brokers on the
your log based message Brokers on the actual key that you want to do the
actual key that you want to do the aggregation on then you can actually
aggregation on then you can actually send those messages to the proper broker
send those messages to the proper broker uh they can be consumed by the proper
uh they can be consumed by the proper stream processing instance perhaps Flink
stream processing instance perhaps Flink perhaps spark streaming anything like
perhaps spark streaming anything like that and then from there you can export
that and then from there you can export them to wherever they need to
them to wherever they need to go okay another pattern to talk about is
go okay another pattern to talk about is item potent basically you have some task
item potent basically you have some task you're running it you don't want to see
you're running it you don't want to see the output of it more than once I think
the output of it more than once I think some good examples of this would be okay
some good examples of this would be okay if we're running a job in a job
if we're running a job in a job scheduler how do we ensure that the
scheduler how do we ensure that the output of that job is only reflected
output of that job is only reflected once in our system system uh number two
once in our system system uh number two is you know let's imagine like I'm
is you know let's imagine like I'm Amazon I'm sending out a bunch of
Amazon I'm sending out a bunch of confirmation emails to users when they
confirmation emails to users when they make an order how do I make sure that
make an order how do I make sure that doesn't happen multiple times systems
doesn't happen multiple times systems can fail they come back up they may
can fail they come back up they may retry their jobs we don't want to see
retry their jobs we don't want to see the result of it more than once uh
the result of it more than once uh sometimes in this situation you have to
sometimes in this situation you have to do a two-phase commit right like if I'm
do a two-phase commit right like if I'm not the person who controls the email
not the person who controls the email service and uh you know like I'm not
service and uh you know like I'm not controlling a user's inbox it's very
controlling a user's inbox it's very hard for me to actually stop myself from
hard for me to actually stop myself from sending a second email that being said
sending a second email that being said one way that you can potentially do this
one way that you can potentially do this is using something known as an item
is using something known as an item potency key so the computer potentially
potency key so the computer potentially that is going to run this job and may
that is going to run this job and may accidentally run it again can use
accidentally run it again can use basically a unique ID associated with
basically a unique ID associated with the job that it's going to run and check
the job that it's going to run and check whether it's seen that before that might
whether it's seen that before that might mean putting that key in a database and
mean putting that key in a database and reading from the database or depending
reading from the database or depending on where you're going to run the job if
on where you're going to run the job if you know that you're always running it
you know that you're always running it on a certain system maybe you can cash
on a certain system maybe you can cash it
it locally uh another kind of thing to note
locally uh another kind of thing to note here in addition to item potency Keys
here in addition to item potency Keys you also have this concept called
you also have this concept called fencing tokens where fencing tokens is
fencing tokens where fencing tokens is similar to item potency keys but it's
similar to item potency keys but it's actually on the receiving side of the
actually on the receiving side of the message so rather than um kind of on the
message so rather than um kind of on the side of the computer doing the job it's
side of the computer doing the job it's on the side that is receiving the output
on the side that is receiving the output of the job and basically if it sees that
of the job and basically if it sees that same uh unique key associated with the
same uh unique key associated with the job it can reject it and say hey I've
job it can reject it and say hey I've already seen this before I don't want to
already seen this before I don't want to consider this
consider this output okay I think the last pattern
output okay I think the last pattern that we've got here to discuss is going
that we've got here to discuss is going to be durable data
to be durable data so again we've already kind of discussed
so again we've already kind of discussed this with coordination Services you have
this with coordination Services you have a bunch of data that can't be lost once
a bunch of data that can't be lost once it's written good example of this would
it's written good example of this would be something like Financial transactions
be something like Financial transactions so one option is you have a bunch of uh
so one option is you have a bunch of uh database replicas and you replicate to
database replicas and you replicate to all of them synchronously um that'll
all of them synchronously um that'll work but it means that if any of them go
work but it means that if any of them go down you can't submit any rights your
down you can't submit any rights your system is not fault tolerant so our
system is not fault tolerant so our really only solution here is to use yet
really only solution here is to use yet again some sort of distributed consensus
again some sort of distributed consensus algorithm uh consensus is going to allow
algorithm uh consensus is going to allow us to build a distributed log and as a
us to build a distributed log and as a result we can basically build a database
result we can basically build a database view on top of that distributed log even
view on top of that distributed log even better than that however would be
better than that however would be actually coupling this with something
actually coupling this with something like change data capture because our log
like change data capture because our log itself is really nice in the sense that
itself is really nice in the sense that it ensures all of our data is persisted
it ensures all of our data is persisted even in the sense of fault uh uh
even in the sense of fault uh uh Hardware failures however what's nice
Hardware failures however what's nice about change data capture is that for
about change data capture is that for something like a financial transaction
something like a financial transaction you know maybe you're writing all of
you know maybe you're writing all of your data into a ledger it would be a
your data into a ledger it would be a lot better if a user individually can
lot better if a user individually can really quickly find all of their own
really quickly find all of their own transactions in a
transactions in a and so you rather than uh you know just
and so you rather than uh you know just reading through the entire Ledger and
reading through the entire Ledger and finding all the transactions that apply
finding all the transactions that apply for a given user what we can do is make
for a given user what we can do is make a user specific index and we can do so
a user specific index and we can do so with change data capture and we already
with change data capture and we already discussed how that
discussed how that works okay guys that's all of it um my
works okay guys that's all of it um my voice is gone I feel like but uh
voice is gone I feel like but uh yeah I hope you enjoyed the video I
yeah I hope you enjoyed the video I understand that this is really highlevel
understand that this is really highlevel uh kind of handwavy for a lot of the
uh kind of handwavy for a lot of the topics that we discussed here and my
topics that we discussed here and my goal is not to go super in- depth on
goal is not to go super in- depth on these topics that's why I have literally
these topics that's why I have literally 200 plus videos on this channel is to go
200 plus videos on this channel is to go ahead and do all of that so I do hope
ahead and do all of that so I do hope that if you know this kind of interested
that if you know this kind of interested you or you were like I didn't really
you or you were like I didn't really understand one particular topic Jordan
understand one particular topic Jordan was talking about that you'll actually
was talking about that you'll actually go and find the videos on my channel
go and find the videos on my channel that relate to it because all of these
that relate to it because all of these are explained in a lot more depth
are explained in a lot more depth Elsewhere on here um besides that again
Elsewhere on here um besides that again thank you guys so much for 50,000
thank you guys so much for 50,000 subscribers um this channel is has
subscribers um this channel is has supplied my life with a purpose that uh
supplied my life with a purpose that uh I didn't really know I could have and
I didn't really know I could have and I've I've derived a much bigger sense of
I've I've derived a much bigger sense of meaning than I than I thought I could
meaning than I than I thought I could through you know just learning a lot
through you know just learning a lot about computer science uh not only do I
about computer science uh not only do I feel like I'm getting to learn a ton and
feel like I'm getting to learn a ton and you know that's really rewarding um
you know that's really rewarding um hearing your guys success stories has
hearing your guys success stories has made all of this more than worth it for
made all of this more than worth it for me and uh yeah I'm I'm just shocked and
me and uh yeah I'm I'm just shocked and appalled to be here I'm I'm truly
appalled to be here I'm I'm truly humbled by all of you uh yeah that's it
humbled by all of you uh yeah that's it this is amazing uh this is great I'm
this is amazing uh this is great I'm going to keep doing it I'm having a lot
going to keep doing it I'm having a lot of fun with this Channel and I I don't
of fun with this Channel and I I don't plan on stopping have a great day guys
Videodaki o ana atlamak için herhangi bir metin veya zaman damgasına tıkla
Paylaş:
Transkriptlerin büyük çoğunluğu 5 saniyeden kısa sürede hazır
Tek Tıkla Kopyala125+ Dilİçerikte AraZaman Damgasına Atla
YouTube URL'sini Yapıştır
Tam transkripti almak için herhangi bir YouTube video bağlantısı gir
Transkript Çıkarma Formu
Transkriptlerin büyük çoğunluğu 5 saniyeden kısa sürede hazır
Chrome Uzantımızı Yükle
YouTube'dan ayrılmadan transkriptlere anında eriş. Chrome uzantımızı yükle ve izleme sayfasında tek tıkla herhangi bir videonun transkriptine ulaş.