This podcast episode discusses the first part of Martin Kleppmann's book "Designing Data-Intensive Applications," highlighting its foundational concepts of reliability, scalability, and maintainability in software systems. The hosts emphasize the book's value in providing a deep understanding of trade-offs and fundamental principles, even as the tech landscape evolves.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
There's a lot of value in depth of
knowledge and knowing a particular area
of your field really well. There is also
tremendous value as a software engineer
in your breadth of knowledge and just
Hey there, you're listening to Book
Overflows, the podcast for software
engineers by software engineers, where
every week we read one of the best
technical books in the world in an
effort to improve our craft. I'm Carter
Morgan and I'm joined here as always by
my co-host Nathan Tops. How you doing Nathan?
Nathan?
>> Doing great. Hey everybody.
>> Well, thanks for uh tuning in everyone.
As always, like, comment, subscribe if
you're on the YouTube video, if you're
on the Spotify video, anywhere that
really like to comment or like or
subscribe. Uh share the podcast with
your friends and co-workers. And uh you
know, really helps the podcast. And you
can also book time with us on the Leland
platform if you'd like to uh get some
one-on-one career coaching with Nathan
or I. And you can also join our Discord.
We started the book overflow discord.
There's a link to it in the comments
right now. That's been actually really
fun to have lots of fans trickle in and
to start more of a conversation.
>> Yeah, we've got I think 19 seats left
for the what I'm calling the alpha
testers role. That will be uh you'll be
enshrined for all of eternity. Um as one
of the first one of the first 100 uh
folks on the discord. I think we'll also
do like a beta testers. So that'll be
like the second tier of uh early
adopters. Um, after that I have no idea
what we're going to do, but we'll have
some fun perks for for joining the the
server early. So, come come hang out.
>> Yeah, this is a pretext to the custom
crypto coin Nathan and I are going to
launch and then do a rug pull on all of
you. So,
>> yeah, stay tuned for that. [laughter]
>> Does the rug pull work if you announce
it in advance,
>> you know? Um, I think that the uh I
think the SEC really appreciates it. You
know, [laughter]
>> I'm too college football poisoned like
my brain. That's like seem like
southeastern conference. What are you
talking about? I'm like, oh, I I know
what you're talking about. Um, well,
we're not here to joke about the
Securities and Exchange Commission. We
are here to cover what is easily the
most requested book of all time on this
podcast, and that is designing data
inensive applications by Martin Kleman.
We're excited to tackle this. We had
been holding off on this for a while
because there's a uh there's a part two,
not a part two, a second edition coming
out. Um and this book is old. What' you
say, Nathan? 2017 is when
>> I think it came out. Yeah, it came out
in 2017.
>> Yeah. So, um we've been hoping to do the
second edition, but it kept getting
delayed and delayed and you know, we
thought it's time. We got to read this.
We gotta tackle it. And we're really
excited to do it. So, this is a a first
for Book Overflow. This is going to be
our first four-parter. So, >> yeah,
>> yeah,
>> tune in over the next four episodes
while we cover designing data intensive
applications. And you know, was uh
funny, Nathan, you shared on the
Discord. I thought this was really fun
trip down memory lane. You shared the
original Reddit post I made on the
Georgia Tech subreddit looking for a
co-host for a a maniac's idea of a
podcast to read a new software
engineering book each week. And yeah, I
I thought that was fun.
>> Yeah. Yeah. So, that's some of the some
perks and the extras of being on the
Discord is you get first of all, you can
ask us questions and we're, you know,
small enough of a podcast that we'll
actually answer right now.
>> Um, but secondly, yeah, they're little little
little
>> little Easter eggs and uh yeah, I was
just I was searching for I I like to
search periodically if anybody ever
>> mentions the podcast or has questions
and things and uh I was just on Reddit
looking and I was like, "Oh, yeah, look
at that. That's the original the
original post that Carter made. That's
pretty cool." Well, I was looking at it
and I was trying to see like, well, how
how far is the podcast strayed from the
original vision? And the answer is not
very far. It's actually pretty close.
But I I was bringing that up because one
thing I I mentioned on that is I said,
look, this isn't a book report. It's
we're not going to do a faithful
dissection or retelling of everything we
read. Instead, I wanted it to feel like
two co-workers chatting over lunch. Um,
and we're just going to talk about kind
of the most interesting ideas. And uh
I'm just doing that as a disclaimer
because if you're tuning in to these
episodes for designing data intensive
applications thinking this is
the authoritative retelling of designing
data intensive applications. If I listen
to this podcast I won't need to read the
book. That's never been the aim of the
podcast. We are we could not possibly do
this book justice over the course of
four episodes. You'd probably need 50
episodes to fully discuss everything in
this book. We're just going to talk
about uh the meat of it, at least uh
what we interpret as the meat, the most
interesting stuff. And we're going to
try to do it justice because this is a
legendary book and we're so excited to
talk about it. I guess I'll introduce
the book for anyone and the author for
anyone who's not as familiar with
designing data intensive applications.
It's written by Mark Martin Kleman.
Uh the author introduction is Martin
Kleman is an associate professor at the
University of Cambridge where he works
on distributed systems and local first
collaboration software. Before academia
he was in the trenches he co-founded
reportive which was acquired by LinkedIn
in 2012 where he worked on large scale
data infrastructure. He's also one of
the people behind automerge an open
source library for building
collaborative applications.
The book introduction is data is at the
center of many challenges in system
design today. Difficult issues need to
be figured out such as scalability,
consistency, reliability, efficiency,
and maintainability. In addition, we
have an overwhelming variety of tools
including relational databases, NoSQL
data stores, stream batch processors,
and message brokers. What are the right
choices for your application? How do you
make sense of all these buzzwords? In
this practical and comprehensive guide,
author Martin Kleman helps you navigate
this diverse landscape by examining the
pros and cons of various technologies
for processing and storing data.
Software keeps changing, but the
fundamental principles remain the same.
With this book, software engineers and
architects will learn how to apply those
ideas in practice and how to make full
use of data in modern applications. And
you're going to hear over this podcast
I'm going to use data and data
interchangeably uh without any reason.
So uh buckle up. Nathan, we have just
finished part one of designing that
intensive applications. This book is
divided into several parts. I don't know
how many parts to be totally honest. Um
and we chose part one because it fit our
cadence. Is it four parts?
>> I think it's actually in four parts
though. don't think all the parts are like
like
>> some of them I think is going to be a
bigger stretch than others and we'll see
we'll see what happens.
>> Part one was almost exactly a quarter of
the book so it worked out great. So we
read part one. Nathan, give me your your
takes on uh part one.
>> Yeah. So um I wrote that this was this
was like it was almost like Martin Kman
was writing a letter to me of like
things I need to think about. Uh this is
about the right level of depth and
breadth that I like in a book like this.
Um cuz he spends a lot of time being
like, "Okay, well, we can't go deep into
the inner workings of a bee tree, but
he'll give you enough of a, oh, here's
how it operates. Here's the
efficiencies. Here's the inefficiencies.
Here's the trade-offs." I love the
structure of this book. I I felt that
the pacing was really easy for me to
wrap my head around because you could
just I there's a lot of trust I had for
for Kleman and that he he had he would
declare a through line. Then he would
give these like examples of why
something exists, the trade-offs, what
people were trying to do with something,
um where it kind of fell short or where
some strong suit is. And then he would
kind of weave this in. I don't know. It
just it felt very narrative and that's
not common especially for something
that's this ambitious. It's a very
ambitious book. I think it easily could
have fallen on its face and it doesn't
and I understand why people have asked
us to to read this book. Number one,
it's a very large book and you're like
should I even read this book? Um but the
other part is that like does it still
hold up? Right? It's written in 2017. Um
is it still worth reading? And uh I I
will I will give you a little spoiler
which is like yes it is with a big
asterisk that says but there are some
missing pieces. The world has moved on
uh in some of these debates and some of
these arguments. And I'm actually like
I'm actually pretty excited to explore
this today uh with you Carter and so I'd
love to hear your general thoughts.
>> Yeah I completely agree on the pacing
and that's something I hadn't even
thought about until you mentioned it but
yeah it is fantastically written. Um, we
have read books in the past where it's a
little like, okay, like you're you're
going too deep or you're spending too
much time on this or you're we're in
chapter nine now, but you're mentioning
stuff that was like in chapter 3, like
we didn't read chapter 3. Um, no problem
here at all with any of that. Uh,
I mean, I'll say with this, I am
listening to the audiobook of this. I
believe you're doing the audiobook as
well, Nathan.
>> Yeah. So yes, and I I'll tell I'm using
the what the Alex Hermoszi method, which
I know that that might make some
people's eyes roll because he's like a
big, you know, sales and marketing kind
of person.
>> He actually recommends if you're going
to listen to audiobook is to also have a
physical book that you can kind of like
he he actually recommends doing both at
the same time. I did not do that. >> Interesting.
>> Interesting.
>> Um because he says it helps with
retention, but um I primarily listen to
audiobook. I think I listened to like I
listened to like a three and a half hour
chunk of it in one sitting because I had
to drive from San Jose down to Oh, where
I live these days in Costa Rica. Um so I
had a big chunk of it by myself with
just driving, a very monotonous drive.
Um beautiful but monotonous and um so
yeah, I've listened to it in big chunks
like that. Uh, but I went back because I
will tell you if you don't read the
audio book or I mean sorry read the uh
Kindle book or have the physical copy
there's a lot of stuff to miss. It's
there's very like deep graphical
representations of some of this stuff, >> right?
>> right?
>> Um I did well with the audio book though
because a lot of this is review for me
like I I knew a lot of these concepts.
I've run into to managing this as a
platform engineer and an architect. Um,
if you're reading, if you're hearing
some of these concepts for the first
time, I think the audio book would be
incredibly difficult. Um, I'll just put
it that way. So, what was your
experience with the audio book?
>> Well, so I'll say that that some of this
I don't think there's been a single
concept I've heard that that I'm hearing
about for the very first time. Like talk
about like LSM and B trees and like
okay, I've heard about that. We talk
about like different data storage
schemas like Avon or Afro. I'm like,
okay, okay, I got that right. um the
whole first section about like uh
defining reliable, scalable,
maintainable applications, like that
very much felt like review to me as
well. Um
>> but I think you're right. I have not
been able to read any of the actual book
for this. I I really like what you were
saying, like have them both at at the
same time. I've been listening to this I
I bike to and from work and so I've been
listening to this on my uh my bike
commute. Um, and there are times where I
can sort of tell like, ah, dang it.
Like, I wish I could pause and and
look at that section again, but you
know, we've got to record an episode
every week, and so I haven't been able
to go back and read uh revisit as much
as I'd like. But here's what I'll say.
We've talked about this on the podcast
before. There's a lot of value in depth
of knowledge and knowing a particular
area of your field really well. There is
also tremendous value as a software
engineer in your breadth of knowledge
and just being exposed to all of these
concepts. Even if that exposure is just
you saying, "I've heard that once
before. I'm aware that this exists." Um
I I've been doing a lot of uh
interviews. I mentioned we've been
trying to hire for a position and and
we've had brought a lot of candidates in
and my company keeps a pretty high
talent bar and we're uh we would rather
reject candidates than hire one we think
uh won't really level us up as an
engineering organization. Um, and I'm
just really shocked with a lot of cand
maybe not shocked, but like you can tell
that a lot of candidates kind of get
into these interviews, they're like,
especially system designs, they're like,
I didn't know I was supposed to know any
of this, right? Or they don't even know
where to begin.
>> Um, because their their life is just
open up the codebase,
make your changes, and then submit the
PR and that's it. And so I think for
engineers like that, listening to this
audio book, even if a lot of it is a
little like in one ear and out the
other, just
>> being made aware that there's a whole
world out there and
so many of these concept like just
having any inkling of understanding of
what's going on at the lowest level of
your uh data because that's what this is
all about. Part one is called
foundations of data systems.
>> Yeah. Um, it's really really helpful.
And so I I'm not like starting from
zero. I'm I've been familiar with a lot
of these concepts, but I wish I could
say I'm an expert after listening to
this audio book, you know, but but
obviously I'm not.
>> And I think this is an important point,
which is that um sometimes I'll do a
couple things. First of all, yes,
listening to audiobooks is its own skill
and talent. I think all of us can have
our minds drift. We can do this when
we're reading with our eyes, too. Right.
>> Right. Um, and with reading with your
eyes, you just kind of go back and go,
"Oh, you know what? I didn't really
comprehend that last two paragraphs that
I looked at." With audiobooks, you have
to be pretty aggressive if you have the
opportunity is to like rewind or hit a
bookmark. Sometimes I'll kind of do
those things. Not always easy to do,
especially if you're riding a bike or driving,
driving,
>> things like that. What I will say is
that especially with this book coming
away with hey, I didn't fully grasp
everything in this one section,
but I do remember that there was a
trade-off when it came to
uh high transaction count on discs. And
so you just kind of make that mental
note and just be like, hey, if this ever
comes back up in the future, right,
>> I'll just know before I like weigh in
one way or the other, I hear somebody
talking about it, I'll go back and read
DDIA. I'll go look at that section. I'll
go deep dive into understanding like
Why? What's the characteristics of fault
tolerance for this one technology that
we're looking at and does that hit the
risk profile that's okay for us? Right?
Like this really this book is really
kind of this higher order thinking of
like every one of these technologies has
a strong suit and a weakness. Um and you
need to understand the business problem
you're trying to solve or the reality of
the hardware available to you or where
you think you're going to go with
scaling. And you you need to think about
this and say, "Okay, well actually this
puts an undue risk on our business
because if this really weird edge case
of false tolerance comes out and it
corrupts our data, we could lose
millions of dollars, right?" And that's
the thing that nobody can just solve for
you, right? when when you're kind of
looking into these types of problems and
this is what I love about this book is
it kind of gives you a sampling of like
all these little real world things like
oh you this team from Facebook said that
the schema evolution thing didn't work
very well and so they came with this
approach and you're like oh that's very
clever you know I understand why they
made that decision so that's I don't
know I I I know we're going to fanboy
out and and talk about these these parts
of the book but Um, yeah, it's
>> I hope we can get Kman on. I don't know
if he does a lot of media or anything.
Maybe he he would because he he wants to
promote the second edition.
>> Um, right.
>> But we're devoting four episodes,
Martin, to discussing your book. So,
we'd love for you to come on.
>> Hey, and and I'll and I want to put this
up here. If the second edition does come
out this year, because it has been put
off a little bit. It was supposed to
come out last year. If it does come out
this year, I think it would be really
cool to do a follow-up episode because
>> Oh, I think so. Yeah. This is such an
interesting thing and I think it would
be really cool for our audience to know,
well, okay, I had first edition sitting
on my shelf for five years and I never
got around to reading it. Should I go
buy the second edition? And we could
weigh in. We can be like, oh yeah,
you're really missing out on, you know,
uh, vector databases or whatever he's
going to cover in the new version. Uh,
hey gang, quick announcement. I'm
excited to share that I'm relaunching
Rojo Robboto, my platform engineering
consulting practice. If you need help
shipping faster with better
infrastructure, DevOps, engineering
enablement efforts, let's talk. Book
Overflow listeners get 10% off their
first engagement. You just go to rojo.com/bookoverflow.
rojo.com/bookoverflow.
That's r oj r o b o t o.com/bookoverflow.
Okay, great. Now, back to the episode.
So, we'll see. Well, this this first
part might be the one most immune to
second edition changes because it's all
about the foundations. I really enjoyed
chapter one um which is it says look
>> the whole point of kind of understanding
any of this is that you want to build
reliable, scalable, and maintainable
applications. And so we devotes a little
bit of time talking about um okay, if
we're going to talk about reliable,
scalable, maintainable, let's define
that. Let's define what reliability,
scalability, and maintainability is. As
far as reliability goes, I thought
something very interesting he points
out. He says faults do not equal
failures. He says a failure is when your
service stops working for the end user,
but a fault is something that could
potentially lead to a failure. And our
job is to design fault tolerant systems.
Um, you know, so a fault could be
something physical. I mean, you know, a
solar flare, we joke about solar flares
all the time at work. If sometimes we,
you know, we're a startup and so we
can't, when I was at like
the the cloud provider I was at, if we
had some like a deviation in the system
that caused P99 latency to increase for
5 minutes or whatever, right? We would
devote a significant amount of time to
understanding what that was. We're
getting much better at my current job
devoting time to that, but it's like 45
minutes. And if after 45 minutes we we
try to figure out I guess here's what
I'll say. We try to figure out
why the system reacted in the way it did,
did,
>> but sometimes we can't devote nearly as
much time to figuring out what caused it
to begin with. So what we'll be like,
okay, we were hit with a bunch of
queries, right? Um, we can't spend a ton
of time figuring out what exactly were
those queries or why they why we saw an
increase at this moment, but we can
figure out why the system when subjected
to those queries was performing poorly.
But anyhow, and so at a certain point
when we can't when when we can't devote
time to figuring out the the why or like
what caused it, we'll we'll say like
it's a solar flare. Like that's probably
what happened. some solar flare hit the
system and you know um but the whole
point is that like you should be
designing systems that are fault
tolerant to reduce failure. So you
should be designing things that are you
know tolerant to solar flares. Um you
know a malformed data in your database
could be considered a fault and if your
whole system blows up when it encounters
malformed data that's a problem. Um,
this is best exemplified by Netflix, the
whole chaos engineering approach, right?
Let's have a
>> let's purposefully kill servers um and
see how the system responds. Um, I I
would love to work at a place that was
much more aggressive about things like
that. Um, but I've just never been able
to work at somewhere that is is running
those sorts of simulations um, at least
with any sort of frequency. Um, so I
mean I thought those were interesting
points about reliability.
Yeah. And this is this gets back to um
something I I really appreciate about um
various software communities. So how you
handle faults, how you handle and do
fault tolerance really is cultural,
right? It it really has to do with the
domain of the problem you're solving.
For instance, um I'll bring up Go, which
again is a I'm I'm a fanboy, but in Go
errors are just values. There is no sort
of like try catch. there is no sort of
like exception flow control. Uh there is
a panic that is like sort of
existential. So I guess technically
there is some of that but you don't use
that in your normal day-to-day life. One
of the things that's interesting about
this is that you say okay well errors
will happen and it makes you kind of
upfront think about well how am I error
handling? Am I just wrapping it and
shooting it up the stack uh and letting
somebody else upstream deal with it? Or
is it like the John Osterhop method
where you say, you know, um you know,
you you handle errors out of existence.
Uh good fault tolerance systems figure
out what that right contract is. Can I
build something that is resilient in
that the errors actually aren't
something that goes to the end user?
It's just some part of the fault
tolerant system that that goes into
place. Again, obviously there's
trade-offs. There's there's things but
um these are fun problems to think about
and I think this first section of the
book also talks about things like well
there's hardware faults there's software
errors there's human errors right like
if you have some manual part of your
system and uh I've seen this time and
time again with startups that I work
with where you know you start somebody
is the deploy manager and they SSH in a
m some machine and you know trigger some
magic job and they try to get somebody
else in the company to do that and they
don't realize that Like step three is an
undocumented step, right? Step one and
two everybody gets it. Step three is
this like kind of weird edge case that
you know Sally always checked and nobody
realized that Sally checked it and
that's why her deploys were always
perfect. Um and so like there's all
these ways that faults uh the sort of
failures and fault tolerance and these
things can be built into a system. And
uh yeah, you have to again this is what
I love about this book. you have to
think and the whole point is
>> here's a mental
>> he'll give you these little sort of like
uh thought experiments and then he'll be
like oh okay like I I I thought that um
this the other section in in this first
chapter on scalability I thought it was
really kind of interesting where he's
basically like talking about Twitter um
and he was like oh yeah there's this and
this is a classic I would say this is
like a classic systems problem that will
come up from time to time in your interviews
interviews
>> um you say how do you aggregate ate a
feed to everyone, right? Um, and I and I
loved it because, you know, it was like,
okay, well, I do some query and I um I
look for everyone who I follow and then
I go and try to grab the latest piece of
information and then I like shove all
these together in a timeline and then
display that timeline to the end user.
>> Um, the problem is is this is like a one
to many relationship. This is like very
expensive like join operation that
happens in a database. And of course
he's using this as sort of a layup to
talk about all these other topics in the
book. Um and he realized that actually
there's a much better way to do this by
inverting this. And when someone posts
they post rarely but they read a lot.
Right? So you post rarely. >> 12,000
>> 12,000
>> tweets a second write operations 300,000
read operations a second. So, so for
most people who don't tweet that often
and don't have that many followers, it
makes way more sense for them to have
this sort of like filtered timeline view
where you kind of have this thing and it
makes it much more much easier for this
to come in. Except
there's this other problem which is what
happens when you're a super super
popular person like you're Elon Musk on
Twitter or you're you know some other
like you know millions and millions and
millions of followers. uh when you post
that actually breaks the opposite
direction because all of a sudden your
one tweet will update millions of
people's sort of cache timeline. Um and
so they actually had the pendulum swing
back to the original pattern for that
for a special subset and just like a
certain amount of follower base. And it
was just like this it was kind of cool
to see like okay well here's this
engineering problem here's how we fixed
it. Oh but actually there's this edge
case that actually is so existential
that we have to go back and fix it a
different way. And then we have this
hybrid approach and um I think everybody
who's seen cool software sees these
things in reality right there's this
some sort of like weird you look at it
you're like why is it this way and
you're like oh well actually it makes
sense given the constraints that I have
>> when he talks about scalability he says
like it's not binary like system a
scales or it doesn't scale systems scale
in different ways right and writing a
rightheavy service is very
[clears throat] different from writing a
read heavy service right and also in
terms of like scalability like I don't
know stick up a server and all it has is
a health checkpoint right and then gate
that behind you know like put that on
Kubernetes like boom it scales you can
probably handle a million requests per
second right if if you do that um but
it's uh there there's not much to it um
he talks a lot about performance metrics
uh percentiles over averages you know so
you got your P50 and this is so You know
what's so dumb about this is that I have
I I know about P95, P99 latency, right?
And um if you're familiar with the
podcast, you know that Nathan mocked me
at my current place because we did not
have like open entry setup or any sort
of metrics. You're like, what are you
doing? Like how do you know if the
system is stable? And so I got that all
set up. It looks great now. In fact, uh
just recently I got also our MongoDB
drivers exporting automatated metrics
which has been great because we've been
doing the whole month of January, my
project has been just uh performance
improvements for the site and um and we
had we had this like weird library that
was serving as like the application
bottleneck and so we got that removed
which is great. So that's no longer the
bottleneck but it's so funny because now
the bottleneck has moved to MongoDB and
so we are
>> Isn't that funny?
>> I know right. And so it's cool that like
this this former library which had been
such a pain for us. We finally solved
that. But now I'm doing all this [ __ ]
stuff. So I had to get these [ __ ]
driver metrics automatically exporting.
But anyhow, so yeah, I I I have P95 P99
latency set up. But I was monitoring
average latency and then this book
pointed out P50 latency, median latency.
I'm like, oh, why do we not have a
[laughter] dashboard? You know, why
don't we have a panel for that on our
dashboard? So I got that set up and I
was like, hey, like our P50 latency is a
lot better than our average latency. But
in in part because our P99 is just too
high. Um, but this is something I've
been thinking about all month because
again my whole job this month has just
been getting uh latency down on the
site. Amazon determined, this is what
Martin says in the book, Amazon
determined that every 100 milliseconds
of increase in latency decrease sales by 1%.
1%.
>> That's nuts.
>> Insane, right? That's 100 milliseconds.
onetenth of a second leads to 1%
decrease in sales. Um, and so it's
interesting because like I'm jealous of
a company like Amazon who could afford
to run those sorts of experiments and
and come up with that definitively. I
don't have any proof. I don't think I'll
have any proof once this is all said and
done that like oh by decreasing our
latency we you know saw this increase in
this metric for the business or
whatever. But I keep thinking about
that. I'm like I bet that holds true. I
remember when we read Miplex, the the
early Google engineers very strongly
believed in the power of low latency um
how it grows the business.
>> This was a really interesting one too. I
I came into a so I was working at a
bigger startup for me about 300 IC's. We
had were seriously funded and we had
some mergers and acquisitions. There's a
company that we had acquired out of
Spain actually and I remember that one
of the weird problems that we had it
didn't show up in the dashboards
and this is the worst case scenario. Um
we had all this like tracking metric
stuff. So the headers were actually
pretty full. Like it was way too full.
And what would happen was if the headers
got too full, some of the um edge edge
routers would actually like either
completely block or or cut out some of
the uh the headers that were annotated
in. And this would actually
disproportionately affect people that
were part of the loyalty and rewards
program. So like the most valuable
customer, the people who spent enough to
actually like want to be, you know,
getting the loyalty and rewards were the
ones who were getting the worst
performance or the worst uh user
feedback. And so we ended up having to
make dashboards specifically for loyalty
rewards. Uh it was it was a very
interesting like metrics challenge to
kind of be like how do we and actually I
just had this conversation with another
um um client actually I was talking to
you. If you run into a thing where you
can't if you run into a problem that you
can't ask the question of your data
that's a really good indicator that you
need you know to beef up your metrics in
some way like sometimes there are these
questions you just you have a question
you're like I don't actually don't know
how we measure that you know um
>> that's one also processes that scare you
right so these are the kind of couple
things that come up and this actually I
think this gets us into a really the
next part which he talks about maintainability
maintainability
>> I just want to say one more thingability
um which is he says uh so with your your
percentiles um
>> oh yeah so you know right so so P99
means one out of every 100 requests and
he talks about P99.9 which is one out of
every thousand requests he says that big
companies have determined that P99.9 is
about as far as you want to go he says
when you get to P99.99 one out of every
10,000 requests it's just a little like
you know that's where you're getting
into things like solar flares or just
like you can't even really tell what's
causing these things to be extra long
but he says that you might think P99.9
is excessive, one out of every 1,000
requests. But the point he makes, and I
believe this came from Amazon as well,
is that your customers with the most
data tend to be your most valuable
customers. Yeah. And those tend to be
the sorts of requests that start showing
up in P99.9. And so you might think one
out of every thousand, we can afford to
lose it. And to be totally honest, at my
company right now, I'm not even touching
P99.9. I'm focused entirely on P99.
We're just we're starting out doing baby
steps, right? But that is something to
think about. Those tail end requests
aren't necessarily random. They might be
associated with your most valuable
customers. And so
>> that's interesting.
>> You're taking care of them. But that
does bring us Oh, sorry. Go ahead.
>> Yeah. Yeah. No, I've I've actually never
measured P99.9
either. No,
>> at the cloud provider.
>> Oh, okay. Yeah, that makes sense. Well,
and I've never worked at a company
that's that large. So I think P99 was
actually us holding ourselves to a
really high bar
>> because again right what you're doing
with this and I think for the uninitiated
uninitiated
um P99 is looking for outliers. It's
saying hey 99% if you look at the
average latency of something you're
going to have this number that's in a
pretty hopefully in a cozy spot.
>> Um or you'll see something that looks
really bad. So there's two things
that'll happen. Number one it'll either
lie to you and make things look better
than they really are. um versus edge
cases or you'll get some number that's
your average that looks really high, but
really what it is is that the outlier
just happens to be super crazy outlier.
This is kind of like if you look at um
median versus mean income in the United States,
States,
>> right? The people who make the halves
versus the have nots, the people who
make really really really high income
will completely distort what the average
income is for the American household.
And so that's why we
>> look like the the average wealth for
millennials like it it was looking a lot
higher than it should because of Zuckerberg.
Zuckerberg.
>> Yeah. Exactly.
>> Him alone was distorting that so much.
>> Right. Right. They there's this joke
that if you're like at a cocktail party
and like a you know a billionaire walks
into the room the Yeah. The the mean
income goes up by hundreds of thousands
of dollars. Right.
>> It's Wyoming. I It's Wyoming I think or
Montana I forget. But it's the only
state in the union where the average
income for black Americans is higher
than the average income for white
Americans because Kanye West lives there
and there are so few black people in
that particular state. So anyhow, yeah,
median versus me. Lots of fun.
>> Exactly. And so I think statistics, so I
will say um and this this comes up more
time and time again, having at least
like a freshman level college
understanding of statistics is actually
very important in our jobs. And I would
say that the longer you put it off, the
worse it will be because you're if you
want to start asking interesting
problem, asking interesting questions
and solving interesting problems in your
company, you're going to have to get
pretty good with how statistics work and
how statistics can lie to us. Are we
actually is the data giving us something
that's actually meaningful? Um, and it's
non-trivial and it's and I will tell you
that I've walked into environments where
they are willing to lie to themselves
with statistics so that it looks good to
a manager, but they're not actually
solving the real problem because they
aren't putting the right statistical
rigor in place. And sometimes you kind
of have to break it like, hey, this
isn't actually how we should be
measuring ourselves. And um, and so
yeah, I think you can't meet your goals
with scalability unless you're measuring
things properly. And I think that's what
this section really kind of drove drove
down, you know.
>> Well, and the thing with P99, you say,
"Well, it's one out of every 100." But
think about it. It's not crazy that each
page on your website would make four
requests to your backend, right? And so
that means if someone clicks around 20
pages over the course of your time,
they're going to encounter I mean, odds
are they'll encounter one out of those
100 requests. And if your P if your
median latency is 300 milliseconds but
your P99 is six, that means at some
point during your user's experience,
they're going to encounter a page that
takes six seconds to load, right? Some
feature is going to take six seconds to
load. So, you know, worth considering.
Go ahead.
>> Yeah. No, the that was the thing is like
the the um a lot time and time again
I've found that it's oh there's some
cash invalidation path that you know
cash poisoning or something that
>> comes up or we find that there's some
usage of a web page that ends up being
this really expensive SQL query or you
know something and it's really good
investigative tool like truly
>> and then there's maintainability which
we can just touch on maintainability is
is a lot more kind of the the
architecture of the code itself although
he does talk about things like
monitoring automation document
mentation. I mean this is where
something like fundamentals of software
architecture or philosophy of software
design is going to talk a lot more about
this. Um and I mean again you could
devote a whole series of episodes to
just maintainability especially in the
age of AI because you know there's a lot >> right
>> right
>> I I was reading what is it I I never
heard of this open source project
before. It's called TL Draw TLDDR.
>> Oh yeah.
>> Were you seeing this? Um,
>> I've there's some YouTubers that I like
a lot and I've actually started I
actually have been using Teal Draw for
some diagramming that I've been doing
for for client work. So,
>> well, they just announced that they are
automatically closing all new poll
requests on the open source project because
because
>> there's so much AI generated code. Um, yuck.
yuck.
>> And and I think there are just users who
are just trying to because that's been
something people have said forever like,
"Oh, commit to open source. If you're
having trouble finding a job, go commit
to open source." And so I'm sure there
are some programs out there, maybe just
users themselves, you know, being savvy
that are just opening up as many poll
requests and open source projects as
possible. And so, you know, we're just
in an age, very strange, right? We're in
an age where you can generate code much
much faster than you can review it. Um,
>> I think that's funny kind of like how
I've been talking about at my job like,
okay, we removed this application level
bottleneck, but now we have come to a
new bottleneck, right, which is [ __ ]
We're kind of at that point as a field,
right? We're like, okay, the the
bottleneck for a long time had been
writing code.
>> So, you remove that the code generation
is is is faster these days, but what's
the new bottleneck, right? And I think
there are a lot of bottlenecks, right?
>> It's so it's so true, too. And this
actually gets back all the way back to
um some of the conversations we had
about fundamentals of software architecture
architecture
>> where we talk about conence and that
when you find a bottleneck you actually
are finding an accidental conence of process.
process.
>> Oh yeah. >> Right.
>> Right.
>> Because you what you don't realize is oh
there's a dependency graph here. Once I
cleared up this weird little thing I
realized that actually and so it it is
interesting of like how do I decouple
systems? how do I um or and and
sometimes it's the fact is that that is
just the bottleneck and now we just have
one variable set to deal with like oh
MongoDB needs more resources or we need
to structure our database a certain way
maybe that's the sort of ending point
but um sometimes you don't know how to
tune a system until you've uncovered
some of the like blockage of process
that are that are in place
>> well uh so that that's a lot about uh
scalability, reliability, and maintainability.
maintainability.
>> I will bring up one thing with
maintainability. It's going to come up
and so I'm just going to give a layup.
He kind of mentions evolvability and um
so we'll get through chapter 2 and
chapter 3 and chapter 4. That's the
that's where we'll be for this episode.
Chapter 4 is actually my favorite
chapter. It has to do with like um uh
encoding and decoding, serializing and
deserializing data. And a big theme that
comes up with this because this ends up
being a foot gun for a lot of
organizations is the evolvability of
schemas, the evolvability of a contract
and an API and these other things. And
so I again it's I appreciate this book
because he gives himself a layup. He's
like, "Okay, here's the fundamental
here's the foundational principles that
we're going to address everything else
in this book." Um, and he he keeps
coming back. And so I I love that he'll
bring up scalability, bring up
maintainability, and bring up
reliability when we're talking about
trade-offs. Uh for the rest for the rest
of the text,
>> chapter 2 is all about data models and
query languages. This is one where I
started to feel a little more like maybe
he was doing his due diligence to kind
of cover everything. But there he gets
into like a lot of things where I'm
like, I don't know if I'm ever going to
use this, right? Like I don't know if
I'm ever going to use Kodacil, right? or
uh and then he talks about all these
different kind of databases. Um and and
I think we're seeing these days like
yeah like I don't really see a lot of
arguments for like my SQL over Postgress
for example right like Postgress is kind
of eating the world um and but he does
talk about the convergence like this is
the chapter where where he lays out like
okay SQL versus NoSQL right and nosql's
kind of evolved to mean not only SQL um
and I think it's really important to
have an understanding of how these data
models work. I mean, we are actually
struggling with that at work. Um, we use
[ __ ] as our database and kind of the
early engineers who built the product. A
lot of them aren't with the company
anymore just treated [ __ ] was like,
yeah, you know, it's a database like
let's just query whatever we want. And
then as we've been digging in and making
these performance improvements, we're
like, you know, some of these [ __ ]
queries are really, really slow. Um, and
it's because they're [ __ ] isn't like a
magic sack you can just pull things out
of, right? Like it has an actual
physical structure underneath and how it
organizes all this data. And so some
things are really fast, like anything
that has an index on it, right? [ __ ]
can uh query that pretty quickly. If it
doesn't have an index, all of a sudden
you're talking about doing a an item by
item scan of the entire
>> collection, right? And so there's a lot
of value to whatever database you choose
having a basic understanding of how the
data is organized underneath and what
queries you're going to make against it.
Uh because that that really influences
the the amount of processing or the
processing time it takes to return that data.
data.
>> Document stores are really cool if you
have the and again the book does a great
job. It's probably the best explanation
I've had on why I'd pick a document
store. And when we're talking about
document stores, we're really talking
about, you know, MongoDB, we're talking
about DynamoB, which I don't think he
brings up in the book. But there's also
um, you know, there's a bunch of these
where what you have is you need
basically some sort of a primary index,
you know, key value kind of store of a
bunch of tree structures, right? Like
that's really what a document database
is. And that if you need to start doing
relational queries across it, you're
going to be in a lot of pain because
what relational databases do quite well,
which is denormalize, I'm sorry,
normalize the data where you can
actually like abstract out all these
pieces and you put all these joints
together and you can get like a correct
representation of the entire system um
in a very like you know really nice
clean way. um it's really slow for the
type of things in which maybe really
what I care about is finding some chunk
of data and then displaying this tree
structure to the end user where that's
where you know [ __ ] really shines and I
loved that this thing really just kind
of gives us this breakdown um of one
versus the other. It is it is funny too
that you
the world has shifted like there's this
this whole movement I think since this
book came out called the just use
Postgress movement right a lot of people
are just like just use Postgress like
you don't need all these crazy databases
not always true and I think for your
personal projects on the weekend it
probably is true uh especially because
Postgress is blurred the line and all
the major SQL databases have this now
but Postgress has a JSON data type so
you can actually do relational data plus
tree structure document store inside of
the same database and that kind of blurs
the lines like >> Mhm.
>> Mhm.
>> if you have relationalish data with
documents that you want um you can you
can you have all these options that you
didn't have before. Uh and
>> and this is something uh Alex in his
system design interview talks about. He
talks about doing like back of the
envelope math and I wish I saw more
candidates doing that because a lot of
candidates will be like well we got to
got to use [ __ ] we got to go no SQL
because you know it scales it's web
scale we talk about that all the time at
work [ __ ] it's web scale say we don't
know anything about [ __ ] but we know
that it's web scale um and it's uh
and so a lot of candidates will be like
well we got to go do no SQL because it's
got to scale but it's it's one of those
things where it's like okay let's say
you're doing a tweet right and so let's
say a tweet is 140 characters and so how
much is each charact is one character a bite
bite
I I I think so
>> um typically If it's UTF8, it's not
necessarily. They call this and they
call those runes and they can be longer.
If it's ASKY, it's yes, it's one by one character.
character.
>> So a tweet can be 140 characters, you
know, and so that's 140 bytes. And so,
you know, then how many tweets can you
fit in a megabyte? So you can fit like
800 like 700 tweets in a megabyte, which
means you can fit
uh Oh, wait. Sorry. No, no, that means
you can fit 700 tweets in a kilobyte,
which means you can fit 7,000 tweets in
a megabyte, which means you can fit I
mean, 7 million tweets in a gigabyte,
right? And so
>> if you're if you're kind of saying like
well we got to do Postgress because or
we got to do [ __ ] because [ __ ] scales
but then it's kind of like well wait a
minute like what if I did Postgress and
I mean how much does it take to get you
know uh it's like it costs like five
bucks a month to run a 10 gigabyte
Postgress database on AWS right and so
all of a sudden that's 70 million tweets
I can store in in my my Postgress
database and and then you so you kind
ask yourself like, okay,
especially starting out, how how fast
are we going to get to seven 70 million
tweets? And that's why in an interview,
you need to be asking those sorts of
questions like how many writes do we
expect per day, right? You know, what
what kind of growth are we expecting to
see? But if you're doing kind of that
back up of the envelope math and
actually estimating like okay how much
uh data um are we actually going to have
then it opens up some of those options
for you rather than just jumping
immediately to like well we need no SQL
because no SQL scales. One framing that
I thought was interesting is so I think
a lot of people will reach for something
like Firebase or something like
>> uh my um MongoDB because they want to
>> I like to call it the they're kicking
the can down the road in thinking about schemas,
schemas,
>> right? So it's like oh man, you know,
the data the shape of our data is
changing so quickly. It just really
would be annoying
to have to nail a schema down and do all
these schema migrations and I've had a
bad experience with this in the past and
like whatever. It's just a bunch of JSON
blobs. I get that for quick iteration.
Um, but it runs into a lot of problems.
And again, I've never seen the framing
like this. And maybe it's just cuz I'm a
I'm a I'm a dunce, but he calls this
schema on write versus schema on read.
And I think this is again a good example
of the trade-off. So schema on write is
what relational databases are. You are
declaring your column types, your names.
You really have to understand the schema
for your system to interact with
database, right? There's all these
expectations. It also means it's in your
face when you need to add a column or
make some change or do some other thing.
You have to think about the shape of
this data before you get started versus
schema and read. Uh and schema and read
is the document structure.
>> This is actually a lot like how APIs
work, right? We don't know. We we might
have a a written promise that data from
an API endpoint from Stripe is going to
be a certain way, but it can change over
time. We might get some new columns. we
might get some new changes and your code
is written in a way that says like okay
I'm going to parse this out and I will
validate the data and I'll make sure it
fits the right data type and I do that
at read when I read the data I'm
inserting it into whatever structures
that I want and I just I hadn't thought
about like depending on the shape of the
work that you're doing this is the
trade-off that's you may have super
flexible schemas where maybe only three
fields you really care about and
everything else is just kind of like
nice to have in doing a super locked
down relational schema and write
structure really is in the way where you
know maybe I care about the user ID and
the number of times they've logged in
and what groups they're in or something
right and then everything else like the
tags that they've given themselves and
all these other things like extra
annotations if it's there that's great
and if it's not I don't care and you
know well okay well document structure
would be great for that you know you
don't have to denormalize and you know
um do all this like crazy stuff you can
just kind of let this data
sort you you basically get a data cache
optimized for certain types of queries
and you can think of documents being
that way, right? You're not having to
ask a question across all of these
documents. Um,
>> right. But I like that that framing from
you like sometimes you are just kind of
kicking the can down the road and we
we've seen that just in like and I I'm
again we've entered a world where
generating code is a bit of a commodity
these days. And so I I've seen a lot of
posts on the experience dev subreddit
lately of just like and I I I really
feel for these people. These are people
like Uncle Bob talks about like the flow
state, right? Like these are people who
that was their favorite thing about
programming. They just loved getting in
the flow state and just generating
[clears throat] lots and lots of code.
And for better or for worse, that
doesn't exist anymore. Like that that's
there's not or it's not nearly as
valuable as it used to be. And so me,
I'm enjoying this new era of large
language models because I've never
really I've not derived a lot of great
enjoyment out of the the actual act of
writing code. I derive a lot of
enjoyment out of making things and
seeing results, right? And and coming up
with solutions and I really like that
once I kind of have honed in on a
solution like, okay, this is what we
want to do, then I submit the prompt to
claude code and it takes care of what I
want. Uh th this is so interesting and I
think this is like a so I I was actually
literally last night having a
conversation with a buddy of mine who
has um you know you would think so
there's this sort of divide where
there's like all the large language
models are terrible and you know what
are we doing to the world and all this stuff
stuff
>> and who would use them and you're a fool
and then there's this other camp that
seems like there like the large language
model maxis right like they're like
>> like relishing in the fact that
everyone's going to lose their jobs and
that you can like replace Facebook with
one prompt on a weekend or something and
you're like both of y'all are nuts.
That's how I kind of look at it.
[laughter] But my buddy who's like kind
of in this Linux, he started as a Linux
system. He's like a phenomenal
programmer. He's worked at a bunch of
cool companies. He's kind of taken this
like pragmatic middle spot. He loves
actually writing code. He like loves
writing Rust. I I consider him a deeply
thoughtful programmer, someone who
actually does enjoy the flow state and
is also really enjoying using large
language models. He actually just
introduced me to a tool called Happy,
which is like a wrapper around cloud
code or codeex that allows you to um let
it run where you can like you can access
it from like your mobile device. So you
can let it do its agentic stuff and it's
all personal and private cuz he's, you
know, very privacy oriented. And he
actually brings this up in a really
interesting way that I hadn't thought
about. He his analogy was it's like a
when the CNC machine was invented um the
industry around like machinists freaked
out right because before CNC machines
right and these are the ones that are
milling out this stuff and can do this
very technical you know very technical work
work
>> in an automated way um machinists were
like kind of repulsed by it because they
were just like gh no I mean like there's
a craftsmanship there's a tooling
there's a way that we do this they enjoy
machinists were very well compensated
they did these
And yet CNC did two things. Number one,
it allowed people to get custom machine
stuff at a much lower barrier. Like you
can if you could buy a CNC machine and
you can spend a few weekends learning
it, you can be good enough. Like it's
not going to be as good as a machinist.
>> But a machinist with a CNC mill
>> is like next level excellent, right?
They know the excellent craftsmanship
that goes into finishing these parts and
they might CNC it out and then go back
with their machining tools and make it
perfect, right? they may go do their
extra stuff. And I'd never thought about
this idea of like precision plus scale
is what these large language models are
unlocking with professionals like us
where there's certain types of my parts
of my job that literally had such a
cognitive load to them that I just
didn't even know how to get started. Um,
>> and then letting the CNC mill go off and
do a thing and then me go off and work
on some other stuff and then come back
as a machinist and clean it up and do
some stuff that the CNC mill can't do on
its own. I think this is actually like a
really good analogy. It's the best I've
actually heard from anybody. Um, and
it's cool too because like Carter, you
and I I think I'm not quite where my my
friend who's at the REST programmer,
he's the one who gets into deep flow
state and really cares about, you know,
lowle data structures.
>> You're like very product focused. you
love the output,
>> the the intersection of humans and this
and I'm somewhere in between. Like I I
actually like like the producty stuff,
>> but I also love getting into flow state.
Um, and I feel like all three of us,
it's kind of it's rare for me to see you
and him and me all kind of having these
aha moments that there's a there there, right?
right?
>> There's there really is something
interesting and new and you ignore the
hype people and ignore the doomsayers.
Like I I I just this a little PSA in the
middle of this episode. It's just like
there's something there. We're not all
going to lose our jobs. Our jobs are
changing though and the expectations are
going to shift. And it's actually like
if you have a if you have the right
attitude about this, I think that
there's something here that is like
deeply rewarding, like deeply
satisfying. Um and actually makes
reading books like DDIA >> Yeah.
>> Yeah.
>> really important. Like if you want to be
the machinist plus the CNC mill, you got
to read books like DDIA, right? You have
to read domain driven uh sorry domain uh
data intensive >> designing
>> designing
>> designing data intensive >> applications.
>> applications.
>> The domain driven one is also another
one we're going to read down the road.
But uh yeah, that's
>> yeah, I'm with you. The little PSA in
the middle of the episode, but I just
think uh yeah, like I don't know. I I
tend to think of myself as as I'm an
optimist but also a realist. I I tend to
have a pretty cleareyed. I hope I'm not
lying to myself about too many things,
right? And so I am very very aware of
the current state of LLMs. I'm very
aware of what my job constitutes and I'm
just I I've mentioned those podcast
several times. I'm just like I don't see
my job disappearing anytime soon, right?
Like I am not incredibly concerned. The
job has changed 100%. I look at our
junior engineers, I'm just like you
graduated into a completely different
world than I graduated into, right? Um,
but there's still again and reading a
book like this just confirms to me like
this is all really important knowledge.
This is something you absolutely have to
have to succeed in this world even in a
a large language model dominated world.
Chapter three,
>> I don't know. Maybe So, I'm only making
fun of chapter 3 because before we
started the podcast, I was like, Nathan,
I'm gonna be honest. This chapter kind
of went over my head a little bit,
right? Like I remember this is the one
where I if I were reading a book I would
have went back if I had the book in
front of me with the audio book I would
have paused to do it. So we have now
reached the point of the podcast where
Nathan explains chapter 3.
>> Oh my gosh.
>> And to you the audience.
>> So there's a famous story a few years
back with the guy who wrote Brew um the
homebrew uh the command you know the
package the missing package manager for
Mac OS.
>> Uh he interviewed at Google and he ended
up getting
>> Oh was it Apple? I think it was Apple.
Yeah, that's what
>> Oh, well maybe. I thought it was Google.
But anyway, there's a mythology. So, one
of the fang companies. We'll just say
that. Um, but he interviewed and he
failed this. He couldn't uh, you know,
construct a B tree B tree on a
whiteboard. Uh, he's just like, I've got
hundreds of thousands of people using my
software on a daily basis, >> right?
>> right?
>> Writing valuable stuff. And yeah, sure.
I can't do this sort of like, you know,
cording ritual um thing that you've
asked me to do. Uh and I kind of feel
that way about this chapter. So like
this is really important stuff if you're
in this domain, right? if you are having
to solve or understand why are we using
B trees versus LSM trees or why um do we
use this sorting algorithm or why is
level DB you know picked this or you
know these kind of things are they're
really important when you happen to be
in that space most of us are not most of
us should really lean on the same
defaults right like if you're tooling
with which index type you're using for
your um column in your database. Uh I
would say most likely you're probably
like overengineering it. And I'll take
that a step back though. I was actually
doing some Postgress stuff and I
realized that the way that I was the
shape of the data, the default indexing
algorithm was not correct for what I
needed. And I actually needed this other
thing because the way that I was doing
queries was highly optimized. It was
like an O to one relationship with the
type of thing that I was doing. and I
was a luckily with knowledge like this I
was able to understand the trade-off.
Um, this chapter is definitely one that
takes close consideration. If you're
listening to the audiobook, you're
probably going to have to listen to
sections a few times. I would highly
recommend going back and reading the
physical copy if you could because
there's diagrams that are really
important. Um but this really digs into
things like hash indexes. um you know
and uh and also just like how act how
some of these algorithms actually work
on hardware and I think this is
something that we don't think about a
lot even in graduate algorithms right
you take graduate algorithms and you
learn about how uh dynamic programming
works but they don't get into like oh
and actually this dynamic programming
algorithm is really great because of the
way that sequential writes on disk work
and you actually can you can say okay
I'm going to take this chunk of memory
And because I know that like this LS
entries work like this, there's these
chunks of uh continuous memory
allocation and that because the way I
write these sections on disk because
they're appendon logs or whatever, I can
write this and know that the fault
tolerance characteristics of it are
excellent for, you know, if I get
interruptions and and and in in the uh
and and so it ties into things like
write ahead logs.
versus like these appendon records and
how these things work. And I liked this
section because I hadn't thought about
like, oh, if you have a write a head
log, you're actually accessing this data
two times, right? Versus if you have
this thing that's writing to disks the
first time, you only access it one time.
And sometimes it makes sense for this to
be a write ahead log. And sometimes this
makes sense to just like write to disk
um because of how disk access works uh
with what what you have going on. And um
if this is making your eyes glaze over,
it's okay. It it um [laughter]
this is this this chapter really gets
into the weeds of like um things like
ride ahead logs uh and crash recovery or
how certain um certain technologies like
why they were picked for certain for
certain applications?
>> Um yeah,
>> was there anything that stood out to
you? I guess maybe we should maybe we
could hop into more concrete examples of
stuff. I there were some things like he
talked about data warehousing, right?
And he talked about like ETL like
extract, transform, load.
>> That's a section I loved actually.
>> Yeah. Yeah. Which I did a little bit of
work at that with that in my my last
job. But again, this is a little
something like I joined
cuz like there is so much breadth as far
as like what you can work on. Like every
now and again I'll see like a Reddit
post where someone mentions they work on
like embedded systems. I'm like I forgot
that was a thing. I first thought that
there were software engineers actually
doing that sort of work. And that's
probably work I will never do in the
history of, you know, throughout my
entire career.
>> They're doing the Lord's work because
that is some deep focus. [laughter]
>> They're they're cut from a different
cloth because I I um Yeah, actually this
would be a good one whether we got into
too much of the details or not. OLTP
versus OLAB. I think this is uh even if
you don't know a ton about databases,
especially SQL type databases, this is a
really important concept because I think
everyone runs into this at some point.
Uh if you're dealing with SQL, um which
is the shape of the questions you're
asking your database, right? Um there is
uh OOLTPs are the transactional
databases. That is what you think of as
like a web app database, right? have a
bunch of users that maybe go to the
website and log in and are doing stuff
and most of those transactions are tied
to just their user behavior. Right? If I
have a shopping cart, it's my shopping
cart, my credit card history, my uh you
know, shipping addresses, the things
that I've purchased and orders and
stuff. It's a bunch of transactions.
Maybe I have millions of customers, but
I'm not going across and asking
questions of like um uh you know, I
don't not looking into a bunch of other
user data for that, right? In Amazon, Amazon.com,
Amazon.com,
everything in there is either product
inventory or your stuff like your
history, right? Um, OLAP is the other
big argument here, which is that, okay,
let's say I'm an analyst at amazon.com
and I want to know what the total sales
out of the United States during the
month of May 2024, right? Um, that's a really big juicy query that you're
really big juicy query that you're asking and it's going to affect a ton of
asking and it's going to affect a ton of rows. And if you ask that amongst of
rows. And if you ask that amongst of your OLTP, your transactional database,
your OLTP, your transactional database, that database is not optimized for that
that database is not optimized for that type of workload. And you could actually
type of workload. And you could actually take the whole thing down if you ask
take the whole thing down if you ask juicy big enough queries.
juicy big enough queries. >> Um,
>> Um, >> and this is a good example, right, of
>> and this is a good example, right, of like again, I read this I I listened to
like again, I read this I I listened to this chapter on the audiobook to be
this chapter on the audiobook to be honest. A lot of it went over my head,
honest. A lot of it went over my head, but things like OLTB and OLAP, right,
but things like OLTB and OLAP, right, and like ETL, like ETL at the this last
and like ETL, like ETL at the this last job I had, right? like I
job I had, right? like I >> I I didn't even know what that was until
>> I I didn't even know what that was until I showed up and then it's like ETL. So,
I showed up and then it's like ETL. So, I would have just been a little bit
I would have just been a little bit ahead of the curve if I had read this
ahead of the curve if I had read this book. And just like OLTP versus OLAP
book. And just like OLTP versus OLAP again, like I don't really again, a lot
again, like I don't really again, a lot of this went over my head, but I I have
of this went over my head, but I I have a chatbt p window pulled up right now.
a chatbt p window pulled up right now. And I I just wanted to give some context
And I I just wanted to give some context for the audience. But what's great is
for the audience. But what's great is that as I'm going to give you guys some
that as I'm going to give you guys some context what OLTP versus OLAP is, I'm in
context what OLTP versus OLAP is, I'm in my mind I'm like, "Oh, yeah, yeah, yeah.
my mind I'm like, "Oh, yeah, yeah, yeah. I I read this. I I remember this part
I I read this. I I remember this part from the book, right?" Um, and so OLTP
from the book, right?" Um, and so OLTP is online transaction processing. And
is online transaction processing. And that's exactly what you're talking
that's exactly what you're talking about, Nathan, right? This is like user
about, Nathan, right? This is like user signup, user updates, profiles, payment
signup, user updates, profiles, payment processed and what does shad be lists
processed and what does shad be lists key characteristics. It's optimized for
key characteristics. It's optimized for writes and fast small queries. It has
writes and fast small queries. It has highly normalized data, asset
highly normalized data, asset transactions, low latency. OLAP is that
transactions, low latency. OLAP is that online analytical processing. So this is
online analytical processing. So this is going to be used for your your business
going to be used for your your business intelligence dashboards, your weekly
intelligence dashboards, your weekly metric reports, uh trend analysis, and
metric reports, uh trend analysis, and what are its key characteristics? Again,
what are its key characteristics? Again, and this is completely different from
and this is completely different from OOLTP. This is optimized for reads and
OOLTP. This is optimized for reads and large aggregations. It's often
large aggregations. It's often denormalized. It has columner storage,
denormalized. It has columner storage, right? This is for handles very large
right? This is for handles very large data sets, gigabytes to pabytes. And so
data sets, gigabytes to pabytes. And so >> again, I listen to this. I could not
>> again, I listen to this. I could not have told you that until Nathan started
have told you that until Nathan started talking about until I pulled up this
talking about until I pulled up this window. But I do have a just a little
window. But I do have a just a little bit of foundation. So I'm I started
bit of foundation. So I'm I started looking at chat. I'm like, "Okay, yeah,
looking at chat. I'm like, "Okay, yeah, yeah. All those things Martin Kleman or
yeah. All those things Martin Kleman or talked about, I'm remembering them now."
talked about, I'm remembering them now." So again, even if you're listening to
So again, even if you're listening to this
this again like I am on a bike commute,
again like I am on a bike commute, right? I think there is some really
right? I think there is some really solid foundational work it's doing in
solid foundational work it's doing in your brain even if you're not coming
your brain even if you're not coming away with a really detailed
away with a really detailed understanding.
understanding. >> Yeah.
>> Yeah. Yeah.
Yeah. >> Oh, go ahead.
>> Oh, go ahead. >> Yeah. No, this is and and I will tell
>> Yeah. No, this is and and I will tell you like and we we'll kind of talk about
you like and we we'll kind of talk about this towards the end because we're
this towards the end because we're getting close. Uh is that um the world's
getting close. Uh is that um the world's moved on as well like this is a really
moved on as well like this is a really good thing to think about like and
good thing to think about like and understand the concept like a colmer
understand the concept like a colmer database is important. It's literally
database is important. It's literally storing the columns. Um, and typically
storing the columns. Um, and typically it's because these are appendon
it's because these are appendon databases where rows you go across a
databases where rows you go across a bunch of the column files and you're
bunch of the column files and you're just appending the extra data to the
just appending the extra data to the very end. Um, but for in I think it's
very end. Um, but for in I think it's Spanner which is Google's technology. If
Spanner which is Google's technology. If you ask it a set of data in across let's
you ask it a set of data in across let's say 10 row 10 columns um and you tell it
say 10 row 10 columns um and you tell it to limit 100 a lot of people think that
to limit 100 a lot of people think that that'll like reduce the cost of the
that'll like reduce the cost of the query but it actually doesn't. It
query but it actually doesn't. It actually because of its colum or
actually because of its colum or database it's actually accessing the
database it's actually accessing the entire column. So you have billions of
entire column. So you have billions of rows in that column. Um, limit 100 is
rows in that column. Um, limit 100 is like a UI syntax syntactic thing, but it
like a UI syntax syntactic thing, but it doesn't actually save you money because
doesn't actually save you money because and and I've I actually w saw this in an
and and I've I actually w saw this in an organization where we were doing uh
organization where we were doing uh really expensive queries on really
really expensive queries on really really large data sets and they were
really large data sets and they were acting like it was a transactional
acting like it was a transactional database and I was like yeah that's
database and I was like yeah that's we're not using this the right way. And
we're not using this the right way. And it was it was a knowledge gap. It's just
it was it was a knowledge gap. It's just they didn't understand uh what the
they didn't understand uh what the difference in these technologies were.
difference in these technologies were. Um, and so if you're in this world, I I
Um, and so if you're in this world, I I would highly recommend you like spending
would highly recommend you like spending some extra time. Um, this is a really
some extra time. Um, this is a really juicy chapter. I think I'll kind of
juicy chapter. I think I'll kind of close out this section, too, which is
close out this section, too, which is this. Um,
this. Um, we're in a new world. So, like there are
we're in a new world. So, like there are new databases that kind of blur this
new databases that kind of blur this line. um databases like Clickhouse,
line. um databases like Clickhouse, databases like Cockroach DB, um where
databases like Cockroach DB, um where there's even a term that's coming up
there's even a term that's coming up called hybrid transactional um like it's
called hybrid transactional um like it's HTAP I think is what it is where like
HTAP I think is what it is where like there are databases now that allow you
there are databases now that allow you to actually ask both types of questions
to actually ask both types of questions and they kind of handle both things so
and they kind of handle both things so that it reduces cognitive load and it
that it reduces cognitive load and it kind of just magically either will do
kind of just magically either will do analytics optimized queries versus uh
analytics optimized queries versus uh transactional queries and So, um, it's
transactional queries and So, um, it's probably because of books like DDIA that
probably because of books like DDIA that inspired folks to think about other ways
inspired folks to think about other ways that we could structure stuff. Um, but
that we could structure stuff. Um, but this is, it would be surprising if there
this is, it would be surprising if there wasn't amazing developments and how data
wasn't amazing developments and how data intensive applications were engineered
intensive applications were engineered from 2017 to 2026, right? Um, I mean,
from 2017 to 2026, right? Um, I mean, it's been almost 10 years since this
it's been almost 10 years since this book was written.
book was written. >> I know, right? And again, just kind of
>> I know, right? And again, just kind of crazy about her feel that 10 years ago,
crazy about her feel that 10 years ago, I mean, a lot of the stuff, it is
I mean, a lot of the stuff, it is impressive how timeless this book is,
impressive how timeless this book is, but you're right. I mean, it it's only
but you're right. I mean, it it's only been 10 years and still there's a lot of
been 10 years and still there's a lot of >> uh if not things that have become
>> uh if not things that have become obsolete from this book, but there's a
obsolete from this book, but there's a lot of missing context as to far as what
lot of missing context as to far as what has been developed since then.
has been developed since then. >> I want to devote time to chapter 4
>> I want to devote time to chapter 4 because you mentioned partic in
because you mentioned partic in particular that you really enjoyed
particular that you really enjoyed chapter 4. What about chapter 4 stood
chapter 4. What about chapter 4 stood out to you? This is encoding and
out to you? This is encoding and evolution. So this gets in partly into
evolution. So this gets in partly into like um software architecture and
like um software architecture and platform engineering. So these are like
platform engineering. So these are like near and dear to my heart and it happens
near and dear to my heart and it happens that everybody ends up running into this
that everybody ends up running into this into these problems which is how do I do
into these problems which is how do I do how do I evolve my system over time? How
how do I evolve my system over time? How do I make changes in a way that does not
do I make changes in a way that does not introduce
introduce um errors and uh irre irreversible
um errors and uh irre irreversible changes that that can cause major
changes that that can cause major problems? Also, um, he really kind of
problems? Also, um, he really kind of gets into this idea of like, okay, can I
gets into this idea of like, okay, can I do rolling upgrades? Are they backwards
do rolling upgrades? Are they backwards compatible? Like, are he also spends a
compatible? Like, are he also spends a decent amount of time thinking not just
decent amount of time thinking not just about backwards compatibility, which is
about backwards compatibility, which is actually of the of the two that we're
actually of the of the two that we're about to talk about is the easier of the
about to talk about is the easier of the problems. Can I upgrade my system so
problems. Can I upgrade my system so that an old version of the data schema
that an old version of the data schema is still compatible? like I access
is still compatible? like I access something from a backup or from an old
something from a backup or from an old part of a database and it's structured
part of a database and it's structured slightly differently than new data. The
slightly differently than new data. The other one is called forward
other one is called forward compatibility and that's actually the
compatibility and that's actually the harder one which is can I write my code
harder one which is can I write my code in a way that it's actually tolerant of
in a way that it's actually tolerant of changes that I can't imagine to the
changes that I can't imagine to the shape of that data in the future. That's
shape of that data in the future. That's actually a much harder problem. Um, so
actually a much harder problem. Um, so old co old old code reads new data.
old co old old code reads new data. That's kind of how he explains it versus
That's kind of how he explains it versus new code reads old data.
new code reads old data. >> And a resilient
>> And a resilient um data infrastructure should be able to
um data infrastructure should be able to do both and should also be able to
do both and should also be able to handle why it won't do one or the other.
handle why it won't do one or the other. Maybe we do have to for whatever reason
Maybe we do have to for whatever reason have to make an incompatible change. Um
have to make an incompatible change. Um and a lot of this goes into what he
and a lot of this goes into what he calls like so and I've heard like in Go
calls like so and I've heard like in Go we always call it marshalling and
we always call it marshalling and unmarshalling but most people call it uh
unmarshalling but most people call it uh encoding and decoding or serialization
encoding and decoding or serialization and deserialization which is you have
and deserialization which is you have some shape of a data it needs to be
some shape of a data it needs to be written in to some format that can be
written in to some format that can be written to disk which means it's in a a
written to disk which means it's in a a string of bytes some kind of bytes right
string of bytes some kind of bytes right that could be clear text as JSON it
that could be clear text as JSON it could be some highly optimized you know
could be some highly optimized you know bite encoded format um that some binary
bite encoded format um that some binary format. And of course,
format. And of course, tons of people have tried to solve this
tons of people have tried to solve this problem tons of ways that you've
problem tons of ways that you've probably been in organizations where
probably been in organizations where like I remember a data science team that
like I remember a data science team that used Pickle a lot. That's the Python way
used Pickle a lot. That's the Python way of using it's the Python native way and
of using it's the Python native way and it lets you do things like
it lets you do things like >> encapsulate the inner workings of a
>> encapsulate the inner workings of a function into the the pickling format,
function into the the pickling format, >> which can actually be super dangerous.
>> which can actually be super dangerous. But the problem we ran into is that
But the problem we ran into is that Pickle was tied to the particular
Pickle was tied to the particular version of Python you were on. So if
version of Python you were on. So if you're on version
you're on version >> that
>> that >> Yes. Exactly. And so if you're on
>> Yes. Exactly. And so if you're on version 3.7 and then you go to 3.9, well
version 3.7 and then you go to 3.9, well you now have incompatibility and the
you now have incompatibility and the pickling format itself like didn't give
pickling format itself like didn't give you a good clean way of doing forward
you a good clean way of doing forward and backwards compatibility. And so, uh,
and backwards compatibility. And so, uh, you either have to re-encode everything
you either have to re-encode everything every time you're planning to do an
every time you're planning to do an upgrade, um, or you pick what he
upgrade, um, or you pick what he advocates in the book, which is some
advocates in the book, which is some sort of data encoding format that is
sort of data encoding format that is agnostic of the programming interface
agnostic of the programming interface under the under the hood.
under the under the hood. >> It's like Steve Flanders and open
>> It's like Steve Flanders and open telemetry or mastering open telemetry
telemetry or mastering open telemetry like his the whole book is like avoid
like his the whole book is like avoid vendor lock in, avoid vendor lock in,
vendor lock in, avoid vendor lock in, use open telemetry, avoid vendor lock
use open telemetry, avoid vendor lock in. There's it's not necessarily vendor
in. There's it's not necessarily vendor lock in here, but it's a similar thing,
lock in here, but it's a similar thing, right? If you choose a data encoding
right? If you choose a data encoding format like again pickle which is
format like again pickle which is married to that particular
married to that particular implementation of a particular
implementation of a particular programming language well now you're
programming language well now you're super locked in right and so something
super locked in right and so something like JSON which has you know it it's a
like JSON which has you know it it's a language agnostic and so uh you can have
language agnostic and so uh you can have pretty flexible uh you can evolve your
pretty flexible uh you can evolve your system more flexibly so long as you
system more flexibly so long as you maintain those API contracts but another
maintain those API contracts but another point he makes is that your data will
point he makes is that your data will live far longer than your code will.
live far longer than your code will. Yeah.
Yeah. >> And so picking the right way to uh one,
>> And so picking the right way to uh one, the right data structure, but two, how
the right data structure, but two, how you're encoding and transporting that
you're encoding and transporting that data um is more important than uh the
data um is more important than uh the programming language. Now, this is
programming language. Now, this is something I don't know if he was just
something I don't know if he was just trying to do his due diligence here. I
trying to do his due diligence here. I don't know if the world has evolved
don't know if the world has evolved significantly since 2017, but he's kind
significantly since 2017, but he's kind of throwing out all of these like
of throwing out all of these like different options for encoding and and
different options for encoding and and transporting your data, whereas I feel
transporting your data, whereas I feel like today like the answer is JSON. like
like today like the answer is JSON. like just just even like he's talking about
just just even like he's talking about XML as like a viable alternative. I've
XML as like a viable alternative. I've never really I don't really see that
never really I don't really see that these days.
these days. >> Well, it's it isn't. Yeah, it is funny
>> Well, it's it isn't. Yeah, it is funny because he does give this whole section
because he does give this whole section on like oh SOAP is still around and
on like oh SOAP is still around and you're like basically doesn't exist
you're like basically doesn't exist anymore except I guarantee you somewhere
anymore except I guarantee you somewhere somewhere I know that for the longest
somewhere I know that for the longest time like Mechanical Turk over at AWS
time like Mechanical Turk over at AWS was like famously still like a SOAP
was like famously still like a SOAP client because it was such an old part
client because it was such an old part of the system. I'm sure it's different
of the system. I'm sure it's different now. But um one thing that I thought was
now. But um one thing that I thought was interesting and I don't remember him
interesting and I don't remember him mentioning it in the book
mentioning it in the book for data science particularly especially
for data science particularly especially and this is another thing that he like
and this is another thing that he like doesn't didn't really exist now. We
doesn't didn't really exist now. We don't really use data warehouses like we
don't really use data warehouses like we used to and now they're called data
used to and now they're called data lakes. This is like what snowflake and
lakes. This is like what snowflake and all these other
all these other >> organizations do and you literally use
>> organizations do and you literally use blob storage with files like you can use
blob storage with files like you can use CSV, you can use JSON. A lot of people
CSV, you can use JSON. A lot of people use colmer oriented structures like
use colmer oriented structures like parquet. Parquet is if you're in data
parquet. Parquet is if you're in data science, you're probably using parquet
science, you're probably using parquet or some similar optimized data
or some similar optimized data structure. And I I do think that like
structure. And I I do think that like while maybe some of the things that he's
while maybe some of the things that he's talking about in here are a bit dated,
talking about in here are a bit dated, it makes sense in the sense that um you
it makes sense in the sense that um you know, for instance, analytics data is
know, for instance, analytics data is very typically very sparse. there's a
very typically very sparse. there's a lot of repetition in a particular column
lot of repetition in a particular column because maybe you know there's lots of
because maybe you know there's lots of zeros and lots of 100s or whatever and I
zeros and lots of 100s or whatever and I can compact that down and store it
can compact that down and store it efficiently. So when I query a terabyte
efficiently. So when I query a terabyte worth of data to do queries, I can do
worth of data to do queries, I can do this in an efficient way. Um and so
this in an efficient way. Um and so yeah, like it's
yeah, like it's it's really interesting
it's really interesting to to um to think about why I would want
to to um to think about why I would want to encode something that's not just JSON
to encode something that's not just JSON like
like >> Right. Right. Maybe I want to encode. Um
>> Right. Right. Maybe I want to encode. Um I I think it was which one was it was
I I think it was which one was it was one I was not aware of. It came out of
one I was not aware of. It came out of Facebook.
Facebook. Um
Um came out of Facebook, but I'm trying to
came out of Facebook, but I'm trying to remember
remember what it was called.
Maybe it was Maybe it was Thrift. >> Which was the one?
>> Which was the one? >> Thrift.
>> Thrift. >> Yeah. which was the one that had the
>> Yeah. which was the one that had the ability to say there was like a writer
ability to say there was like a writer schema and a reader schema. Um, and it
schema and a reader schema. Um, and it basically could map if if you've changed
basically could map if if you've changed I can't remember which one it was now,
I can't remember which one it was now, but it was kind of cool and it was one
but it was kind of cool and it was one that was like it was made for schema
that was like it was made for schema evolution. Um, and you basically could
evolution. Um, and you basically could if you've changed the shape of the
if you've changed the shape of the schema from one to the other, this tool
schema from one to the other, this tool could like reconcile the mappings
could like reconcile the mappings between the two and then find like the
between the two and then find like the most compatible version. Anyway, it was
most compatible version. Anyway, it was kind of some cool stuff where I'm like,
kind of some cool stuff where I'm like, "Oh, that's a really clever way of
"Oh, that's a really clever way of handling that." And and again, this is
handling that." And and again, this is one of the reasons I love this chapter
one of the reasons I love this chapter is because if you're doing stuff that's
is because if you're doing stuff that's like API heavy, REST heavy, gRPCheavy
like API heavy, REST heavy, gRPCheavy type stuff, um all of the demons that
type stuff, um all of the demons that you've run into, all of the like nice
you've run into, all of the like nice design decisions are like how did we get
design decisions are like how did we get here? It's like in this chapter. So,
here? It's like in this chapter. So, yeah.
yeah. >> Right. Right. Well, and this would have
>> Right. Right. Well, and this would have benefited me my last job. we were using
benefited me my last job. we were using a lot of gRPC and Protobuff and I was
a lot of gRPC and Protobuff and I was kind of like, yeah, this is stupid. Why
kind of like, yeah, this is stupid. Why don't we just do it using HTTP and uh
don't we just do it using HTTP and uh and REST and uh and JSON, but learning
and REST and uh and JSON, but learning more from this chapter, I'm like, okay,
more from this chapter, I'm like, okay, I'm starting to see why some of those
I'm starting to see why some of those design decisions were going down. I it
design decisions were going down. I it was a, you know, a faster
was a, you know, a faster little faster, a little slimmer, and we
little faster, a little slimmer, and we were handling lots and lots of data. So,
were handling lots and lots of data. So, maybe that was the best decision. Um,
maybe that was the best decision. Um, well, we got a this podcast the time we
well, we got a this podcast the time we recorded is now limited by when I have
recorded is now limited by when I have to leave for work. So, we are [laughter]
to leave for work. So, we are [laughter] we're wrapping up here. We would and
we're wrapping up here. We would and you're seeing from this episode, right,
you're seeing from this episode, right, we're like we could devote four episodes
we're like we could devote four episodes to just part one here. Um really, this
to just part one here. Um really, this is a fantastic book. I've been enjoying
is a fantastic book. I've been enjoying it immensely. I'm very excited to finish
it immensely. I'm very excited to finish it. Um maybe I we like to do our hot
it. Um maybe I we like to do our hot takes. I don't have a ton of hot takes.
takes. I don't have a ton of hot takes. Um, aside from, you know, the book's a
Um, aside from, you know, the book's a little outdated and they're I I guess
little outdated and they're I I guess the my hot take would be like you cannot
the my hot take would be like you cannot read this book and be like this is going
read this book and be like this is going to expose me to all these different
to expose me to all these different types ways to work with the world and
types ways to work with the world and and and ways to work with my data and
and and ways to work with my data and all of them are equally valid. And so in
all of them are equally valid. And so in any project I choose from now on, I need
any project I choose from now on, I need to have this big checklist of like am I
to have this big checklist of like am I going to use thrift? Am I going to use
going to use thrift? Am I going to use Protobuff? Am I going to use JSON?
Protobuff? Am I going to use JSON? Right? No. your answer most of the time
Right? No. your answer most of the time is going to be like JSON, HTTP, REST,
is going to be like JSON, HTTP, REST, right? Um,
right? Um, >> but you may wind up in these edge cases
>> but you may wind up in these edge cases and if you wind up in these edge cases,
and if you wind up in these edge cases, having this knowledge of all these other
having this knowledge of all these other options can be very very valuable. You
options can be very very valuable. You got to know when to to break out these.
got to know when to to break out these. >> Yeah. And and I will say that
>> Yeah. And and I will say that increasingly the the type of data
increasingly the the type of data sciency work has like diverged from what
sciency work has like diverged from what people who are building web apps and
people who are building web apps and I've been lucky enough to work in that
I've been lucky enough to work in that data side of the thing and their tools
data side of the thing and their tools are starting to look less and less like
are starting to look less and less like the web apps that we're dealing with as
the web apps that we're dealing with as well. And um yeah, I think a couple hot
well. And um yeah, I think a couple hot takes. Uh he spends a section talking
takes. Uh he spends a section talking about graph databases and I kind of
about graph databases and I kind of remember in 2017 everybody was like
remember in 2017 everybody was like super excited about graph databases,
super excited about graph databases, >> right?
>> right? >> They still don't feel like they've like
>> They still don't feel like they've like had their moment in the sun. I don't
had their moment in the sun. I don't know if they will have their moment in
know if they will have their moment in the in the sun.
the in the sun. >> They're incredibly useful for Facebook.
>> They're incredibly useful for Facebook. >> Yeah, they're graph databases are cool,
>> Yeah, they're graph databases are cool, but I haven't seen it one used in like
but I haven't seen it one used in like some way where I'm like, "Wow." Like
some way where I'm like, "Wow." Like that's one thing. The other one is um oh
that's one thing. The other one is um oh I maybe this is just me like I think a
I maybe this is just me like I think a maintainability for data intensive
maintainability for data intensive applications would be a great book.
applications would be a great book. [laughter]
[laughter] I was just like I think you could go off
I was just like I think you could go off and just talk about maintainability of
and just talk about maintainability of all of these things and like not even
all of these things and like not even have the other subject matter and I
have the other subject matter and I think that would be like an amazing an
think that would be like an amazing an amazing book for folks.
amazing book for folks. >> Yeah.
>> Yeah. >> Well Nathan, what are you going to do
>> Well Nathan, what are you going to do differently in your career because
differently in your career because you've read part one?
you've read part one? >> So I love evolvable evolvable systems
>> So I love evolvable evolvable systems design. uh in this book touched on some
design. uh in this book touched on some patterns on schema evolution that I
patterns on schema evolution that I hadn't thought about. So like I do think
hadn't thought about. So like I do think a lot about how do we have a two-way
a lot about how do we have a two-way street sort of maintainable schema
street sort of maintainable schema migrations. Um I'm going to go back and
migrations. Um I'm going to go back and spend some time with some ideas that
spend some time with some ideas that were in chapter 4 and also see what uh
were in chapter 4 and also see what uh technologies have come out since 2017
technologies have come out since 2017 because I have a feeling that there's
because I have a feeling that there's probably some stuff I could learn about
probably some stuff I could learn about that's modern. Yeah.
that's modern. Yeah. >> As far as me I forgot to fill out this
>> As far as me I forgot to fill out this section of our notes and so I'm gonna do
section of our notes and so I'm gonna do it differently in my career. I'm just
it differently in my career. I'm just gonna keep reading. I'm gonna keep
gonna keep reading. I'm gonna keep reading this book.
reading this book. >> That is my commitment to everyone. I'm
>> That is my commitment to everyone. I'm going to finish designing data intensive
going to finish designing data intensive applications. And I feel like you should
applications. And I feel like you should get put on like a leaderboard or
get put on like a leaderboard or something. If you everyone talks about
something. If you everyone talks about design data intensive applications, I
design data intensive applications, I want a badge that says I read
want a badge that says I read >> I actually read it. Yeah, that's great.
>> I actually read it. Yeah, that's great. >> Um, we should make a t-shirt and sell
>> Um, we should make a t-shirt and sell it. We don't have like our store, but
it. We don't have like our store, but like like I read designing that
like like I read designing that intensive applications and all I got was
intensive applications and all I got was this lousy t-shirt. That's what we
this lousy t-shirt. That's what we should do.
should do. >> That would be great. Signed by. Yeah.
>> That would be great. Signed by. Yeah. Yeah. [laughter] Who would you recommend
Yeah. [laughter] Who would you recommend the book to, Nathan?
the book to, Nathan? >> So, um, this is for software engineers
>> So, um, this is for software engineers who are deeply curious about systems
who are deeply curious about systems architecture and want to grow in their
architecture and want to grow in their understanding and the trade-offs. Um, I
understanding and the trade-offs. Um, I think that, and again, I can only speak
think that, and again, I can only speak for part one, haven't read the rest of
for part one, haven't read the rest of it, but um, this is not a tutorial book.
it, but um, this is not a tutorial book. This is not going to sit here and like
This is not going to sit here and like tell you how to build all this stuff.
tell you how to build all this stuff. This is really about systems thinking
This is really about systems thinking and the trade-offs. Um, so if that kind
and the trade-offs. Um, so if that kind of thing sounds deeply rewarding, if you
of thing sounds deeply rewarding, if you want to get to that next level,
want to get to that next level, especially if you want to be staff or
especially if you want to be staff or some sort of engineering leadership,
some sort of engineering leadership, this is a really important book for that
this is a really important book for that kind of trajectory.
kind of trajectory. >> Yeah, I I think you you have to have
>> Yeah, I I think you you have to have your feet under you a bit. And this
your feet under you a bit. And this isn't the perfect analysis here, but I
isn't the perfect analysis here, but I would say first read the DevOps
would say first read the DevOps handbook. And if while reading the
handbook. And if while reading the DevOps handbook, you're a
DevOps handbook, you're a [clears throat] little like, okay, yeah,
[clears throat] little like, okay, yeah, I'm familiar with a lot of these
I'm familiar with a lot of these concepts. This all makes sense to me.
concepts. This all makes sense to me. you you'll learn a lot of new things
you you'll learn a lot of new things reading the DevOps handbook. But if you
reading the DevOps handbook. But if you kind of read that and are like, got it
kind of read that and are like, got it that this lines up with kind of my
that this lines up with kind of my experience and what I've done, then I
experience and what I've done, then I would say, okay, now redesigning data
would say, okay, now redesigning data inensive applications. It's not, again,
inensive applications. It's not, again, that's not a perfect comparison, but I
that's not a perfect comparison, but I just think I would not recommend this to
just think I would not recommend this to anyone who can't at least explain to me
anyone who can't at least explain to me in good detail how their application is
in good detail how their application is built, how it's deployed, how it's
built, how it's deployed, how it's monitored. Um, you know, how it's a
monitored. Um, you know, how it's a basic understanding like scalability.
basic understanding like scalability. Um,
Um, >> yep.
>> yep. >> So, get that first. But if you have a
>> So, get that first. But if you have a good understanding of how that all that
good understanding of how that all that works, hey, this is kind of the next
works, hey, this is kind of the next level.
level. >> Yeah, I would say, yeah, I like I like
>> Yeah, I would say, yeah, I like I like that idea. DevOps handbook and
that idea. DevOps handbook and fundamentals of software architecture.
fundamentals of software architecture. I'd say if you read those two and you're
I'd say if you read those two and you're like I'm hungry. I want more.
like I'm hungry. I want more. >> This is this is the obvious next step
>> This is this is the obvious next step like DDA.
like DDA. >> Those two I would recommend to any sort
>> Those two I would recommend to any sort of eager ambitious junior engineer like
of eager ambitious junior engineer like you know what someone might be over your
you know what someone might be over your head but this is great to kind of get
head but this is great to kind of get that understanding of breadth and
that understanding of breadth and understand what's going on. But I'd say
understand what's going on. But I'd say read those two first.
read those two first. >> Yes. Start tackling this.
>> Yes. Start tackling this. >> Absolutely. 100%.
>> Absolutely. 100%. >> Great. Well hey we're so excited. This
>> Great. Well hey we're so excited. This is going to be great. Um, we're going to
is going to be great. Um, we're going to cover the rest of this book across the
cover the rest of this book across the next three episodes. Thanks for tuning
next three episodes. Thanks for tuning in everyone. Uh, you can always contact
in everyone. Uh, you can always contact us at contactbook overflow.io. You can
us at contactbook overflow.io. You can find us on Twitter at bookoverflow pod.
find us on Twitter at bookoverflow pod. I'm on Twitter at Carter Morgan. Nathan
I'm on Twitter at Carter Morgan. Nathan and his consulting business rojo is at
and his consulting business rojo is at rojo.com. And his newsletter is at
rojo.com. And his newsletter is at rojo.comnewsletter.
rojo.comnewsletter. And if you like uh this is funny, I I do
And if you like uh this is funny, I I do a second podcast with my brother. It's a
a second podcast with my brother. It's a theme park podcast called Please Remain
theme park podcast called Please Remain Heated. Um, my brother is a professional
Heated. Um, my brother is a professional YouTuber. He's got like 130,000
YouTuber. He's got like 130,000 subscribers as a full-time. Anyhow, a
subscribers as a full-time. Anyhow, a lot of our audience is like his super
lot of our audience is like his super fans and so they're very interested in
fans and so they're very interested in him. Not that they don't like me. I'm
him. Not that they don't like me. I'm saying if you're if you're listening to
saying if you're if you're listening to this and you're a theme park guy, you
this and you're a theme park guy, you got to come and show up on the, you
got to come and show up on the, you know, you got to show up in the
know, you got to show up in the comments, right? And you got to let
comments, right? And you got to let people know that there's at least one
people know that there's at least one Carter Morgan super fan who is listening
Carter Morgan super fan who is listening to the podcast because you just like me.
to the podcast because you just like me. So, you know, make me look good for my
So, you know, make me look good for my brother, guys. Uh, but anyhow, and I'll
brother, guys. Uh, but anyhow, and I'll have you know on my other podcast,
have you know on my other podcast, please remain heated. I always close it
please remain heated. I always close it off saying if you're an aspiring
off saying if you're an aspiring software engineer, check out Book
software engineer, check out Book Overflow. So, you know,
Overflow. So, you know, >> we're gonna get the most minimal
>> we're gonna get the most minimal overlaps.
overlaps. >> This is an O'Reilly book. So, I didn't
>> This is an O'Reilly book. So, I didn't even think about it, but we should we
even think about it, but we should we maybe we can uh I don't know, push if
maybe we can uh I don't know, push if come on over join us on the Discord and
come on over join us on the Discord and maybe we'll have something related to
maybe we'll have something related to this on Discord and maybe we'll do a
this on Discord and maybe we'll do a book giveaway over there. Riley, we're
book giveaway over there. Riley, we're still working out the kinks. Um,
still working out the kinks. Um, >> I can't promise anything, but if you
>> I can't promise anything, but if you join us on the Discard Discord and ask
join us on the Discard Discord and ask how to get a free book or if you post
how to get a free book or if you post this on LinkedIn and and tag us uh and
this on LinkedIn and and tag us uh and tag the episode, um, like we we'll do
tag the episode, um, like we we'll do our best to take care of you and we'll
our best to take care of you and we'll figure this out by next week to to know
figure this out by next week to to know what exactly we can offer.
what exactly we can offer. >> Yeah, exactly. We're amateurs when it
>> Yeah, exactly. We're amateurs when it comes to this stuff. Okay,
comes to this stuff. Okay, >> I know, right? We're pretty good
>> I know, right? We're pretty good software engineers. Um, as far as
software engineers. Um, as far as running a podcasting business, we are
running a podcasting business, we are learning every episode. That was uh a
learning every episode. That was uh a ton of fun. Thanks folks. Uh we'll see
ton of fun. Thanks folks. Uh we'll see you next week for the part two roughly
you next week for the part two roughly of designing data intensive
of designing data intensive applications.
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.