YouTube Transcript:
Reliability, Scalability, and Maintainability - Designing Data-Intensive Applications by Kleppman

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

This podcast episode discusses the first part of Martin Kleppmann's book "Designing Data-Intensive Applications," highlighting its foundational concepts of reliability, scalability, and maintainability in software systems. The hosts emphasize the book's value in providing a deep understanding of trade-offs and fundamental principles, even as the tech landscape evolves.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

There's a lot of value in depth of

knowledge and knowing a particular area

of your field really well. There is also

tremendous value as a software engineer

in your breadth of knowledge and just

Hey there, you're listening to Book

Overflows, the podcast for software

engineers by software engineers, where

every week we read one of the best

technical books in the world in an

effort to improve our craft. I'm Carter

Morgan and I'm joined here as always by

my co-host Nathan Tops. How you doing Nathan?

Nathan?

>> Doing great. Hey everybody.

>> Well, thanks for uh tuning in everyone.

As always, like, comment, subscribe if

you're on the YouTube video, if you're

on the Spotify video, anywhere that

really like to comment or like or

subscribe. Uh share the podcast with

your friends and co-workers. And uh you

know, really helps the podcast. And you

can also book time with us on the Leland

platform if you'd like to uh get some

one-on-one career coaching with Nathan

or I. And you can also join our Discord.

We started the book overflow discord.

There's a link to it in the comments

right now. That's been actually really

fun to have lots of fans trickle in and

to start more of a conversation.

>> Yeah, we've got I think 19 seats left

for the what I'm calling the alpha

testers role. That will be uh you'll be

enshrined for all of eternity. Um as one

of the first one of the first 100 uh

folks on the discord. I think we'll also

do like a beta testers. So that'll be

like the second tier of uh early

adopters. Um, after that I have no idea

what we're going to do, but we'll have

some fun perks for for joining the the

server early. So, come come hang out.

>> Yeah, this is a pretext to the custom

crypto coin Nathan and I are going to

launch and then do a rug pull on all of

you. So,

>> yeah, stay tuned for that. [laughter]

>> Does the rug pull work if you announce

it in advance,

>> you know? Um, I think that the uh I

think the SEC really appreciates it. You

know, [laughter]

>> I'm too college football poisoned like

my brain. That's like seem like

southeastern conference. What are you

talking about? I'm like, oh, I I know

what you're talking about. Um, well,

we're not here to joke about the

Securities and Exchange Commission. We

are here to cover what is easily the

most requested book of all time on this

podcast, and that is designing data

inensive applications by Martin Kleman.

We're excited to tackle this. We had

been holding off on this for a while

because there's a uh there's a part two,

not a part two, a second edition coming

out. Um and this book is old. What' you

say, Nathan? 2017 is when

>> I think it came out. Yeah, it came out

in 2017.

>> Yeah. So, um we've been hoping to do the

second edition, but it kept getting

delayed and delayed and you know, we

thought it's time. We got to read this.

We gotta tackle it. And we're really

excited to do it. So, this is a a first

for Book Overflow. This is going to be

our first four-parter. So, >> yeah,

>> yeah,

>> tune in over the next four episodes

while we cover designing data intensive

applications. And you know, was uh

funny, Nathan, you shared on the

Discord. I thought this was really fun

trip down memory lane. You shared the

original Reddit post I made on the

Georgia Tech subreddit looking for a

co-host for a a maniac's idea of a

podcast to read a new software

engineering book each week. And yeah, I

I thought that was fun.

>> Yeah. Yeah. So, that's some of the some

perks and the extras of being on the

Discord is you get first of all, you can

ask us questions and we're, you know,

small enough of a podcast that we'll

actually answer right now.

>> Um, but secondly, yeah, they're little little

little

>> little Easter eggs and uh yeah, I was

just I was searching for I I like to

search periodically if anybody ever

>> mentions the podcast or has questions

and things and uh I was just on Reddit

looking and I was like, "Oh, yeah, look

at that. That's the original the

original post that Carter made. That's

pretty cool." Well, I was looking at it

and I was trying to see like, well, how

how far is the podcast strayed from the

original vision? And the answer is not

very far. It's actually pretty close.

But I I was bringing that up because one

thing I I mentioned on that is I said,

look, this isn't a book report. It's

we're not going to do a faithful

dissection or retelling of everything we

read. Instead, I wanted it to feel like

two co-workers chatting over lunch. Um,

and we're just going to talk about kind

of the most interesting ideas. And uh

I'm just doing that as a disclaimer

because if you're tuning in to these

episodes for designing data intensive

applications thinking this is

the authoritative retelling of designing

data intensive applications. If I listen

to this podcast I won't need to read the

book. That's never been the aim of the

podcast. We are we could not possibly do

this book justice over the course of

four episodes. You'd probably need 50

episodes to fully discuss everything in

this book. We're just going to talk

about uh the meat of it, at least uh

what we interpret as the meat, the most

interesting stuff. And we're going to

try to do it justice because this is a

legendary book and we're so excited to

talk about it. I guess I'll introduce

the book for anyone and the author for

anyone who's not as familiar with

designing data intensive applications.

It's written by Mark Martin Kleman.

Uh the author introduction is Martin

Kleman is an associate professor at the

University of Cambridge where he works

on distributed systems and local first

collaboration software. Before academia

he was in the trenches he co-founded

reportive which was acquired by LinkedIn

in 2012 where he worked on large scale

data infrastructure. He's also one of

the people behind automerge an open

source library for building

collaborative applications.

The book introduction is data is at the

center of many challenges in system

design today. Difficult issues need to

be figured out such as scalability,

consistency, reliability, efficiency,

and maintainability. In addition, we

have an overwhelming variety of tools

including relational databases, NoSQL

data stores, stream batch processors,

and message brokers. What are the right

choices for your application? How do you

make sense of all these buzzwords? In

this practical and comprehensive guide,

author Martin Kleman helps you navigate

this diverse landscape by examining the

pros and cons of various technologies

for processing and storing data.

Software keeps changing, but the

fundamental principles remain the same.

With this book, software engineers and

architects will learn how to apply those

ideas in practice and how to make full

use of data in modern applications. And

you're going to hear over this podcast

I'm going to use data and data

interchangeably uh without any reason.

So uh buckle up. Nathan, we have just

finished part one of designing that

intensive applications. This book is

divided into several parts. I don't know

how many parts to be totally honest. Um

and we chose part one because it fit our

cadence. Is it four parts?

>> I think it's actually in four parts

though. don't think all the parts are like

>> some of them I think is going to be a

bigger stretch than others and we'll see

we'll see what happens.

>> Part one was almost exactly a quarter of

the book so it worked out great. So we

read part one. Nathan, give me your your

takes on uh part one.

>> Yeah. So um I wrote that this was this

was like it was almost like Martin Kman

was writing a letter to me of like

things I need to think about. Uh this is

about the right level of depth and

breadth that I like in a book like this.

Um cuz he spends a lot of time being

like, "Okay, well, we can't go deep into

the inner workings of a bee tree, but

he'll give you enough of a, oh, here's

how it operates. Here's the

efficiencies. Here's the inefficiencies.

Here's the trade-offs." I love the

structure of this book. I I felt that

the pacing was really easy for me to

wrap my head around because you could

just I there's a lot of trust I had for

for Kleman and that he he had he would

declare a through line. Then he would

give these like examples of why

something exists, the trade-offs, what

people were trying to do with something,

um where it kind of fell short or where

some strong suit is. And then he would

kind of weave this in. I don't know. It

just it felt very narrative and that's

not common especially for something

that's this ambitious. It's a very

ambitious book. I think it easily could

have fallen on its face and it doesn't

and I understand why people have asked

us to to read this book. Number one,

it's a very large book and you're like

should I even read this book? Um but the

other part is that like does it still

hold up? Right? It's written in 2017. Um

is it still worth reading? And uh I I

will I will give you a little spoiler

which is like yes it is with a big

asterisk that says but there are some

missing pieces. The world has moved on

uh in some of these debates and some of

these arguments. And I'm actually like

I'm actually pretty excited to explore

this today uh with you Carter and so I'd

love to hear your general thoughts.

>> Yeah I completely agree on the pacing

and that's something I hadn't even

thought about until you mentioned it but

yeah it is fantastically written. Um, we

have read books in the past where it's a

little like, okay, like you're you're

going too deep or you're spending too

much time on this or you're we're in

chapter nine now, but you're mentioning

stuff that was like in chapter 3, like

we didn't read chapter 3. Um, no problem

here at all with any of that. Uh,

I mean, I'll say with this, I am

listening to the audiobook of this. I

believe you're doing the audiobook as

well, Nathan.

>> Yeah. So yes, and I I'll tell I'm using

the what the Alex Hermoszi method, which

I know that that might make some

people's eyes roll because he's like a

big, you know, sales and marketing kind

of person.

>> He actually recommends if you're going

to listen to audiobook is to also have a

physical book that you can kind of like

he he actually recommends doing both at

the same time. I did not do that. >> Interesting.

>> Interesting.

>> Um because he says it helps with

retention, but um I primarily listen to

audiobook. I think I listened to like I

listened to like a three and a half hour

chunk of it in one sitting because I had

to drive from San Jose down to Oh, where

I live these days in Costa Rica. Um so I

had a big chunk of it by myself with

just driving, a very monotonous drive.

Um beautiful but monotonous and um so

yeah, I've listened to it in big chunks

like that. Uh, but I went back because I

will tell you if you don't read the

audio book or I mean sorry read the uh

Kindle book or have the physical copy

there's a lot of stuff to miss. It's

there's very like deep graphical

representations of some of this stuff, >> right?

>> right?

>> Um I did well with the audio book though

because a lot of this is review for me

like I I knew a lot of these concepts.

I've run into to managing this as a

platform engineer and an architect. Um,

if you're reading, if you're hearing

some of these concepts for the first

time, I think the audio book would be

incredibly difficult. Um, I'll just put

it that way. So, what was your

experience with the audio book?

>> Well, so I'll say that that some of this

I don't think there's been a single

concept I've heard that that I'm hearing

about for the very first time. Like talk

about like LSM and B trees and like

okay, I've heard about that. We talk

about like different data storage

schemas like Avon or Afro. I'm like,

okay, okay, I got that right. um the

whole first section about like uh

defining reliable, scalable,

maintainable applications, like that

very much felt like review to me as

well. Um

>> but I think you're right. I have not

been able to read any of the actual book

for this. I I really like what you were

saying, like have them both at at the

same time. I've been listening to this I

I bike to and from work and so I've been

listening to this on my uh my bike

commute. Um, and there are times where I

can sort of tell like, ah, dang it.

Like, I wish I could pause and and

look at that section again, but you

know, we've got to record an episode

every week, and so I haven't been able

to go back and read uh revisit as much

as I'd like. But here's what I'll say.

We've talked about this on the podcast

before. There's a lot of value in depth

of knowledge and knowing a particular

area of your field really well. There is

also tremendous value as a software

engineer in your breadth of knowledge

and just being exposed to all of these

concepts. Even if that exposure is just

you saying, "I've heard that once

before. I'm aware that this exists." Um

I I've been doing a lot of uh

interviews. I mentioned we've been

trying to hire for a position and and

we've had brought a lot of candidates in

and my company keeps a pretty high

talent bar and we're uh we would rather

reject candidates than hire one we think

uh won't really level us up as an

engineering organization. Um, and I'm

just really shocked with a lot of cand

maybe not shocked, but like you can tell

that a lot of candidates kind of get

into these interviews, they're like,

especially system designs, they're like,

I didn't know I was supposed to know any

of this, right? Or they don't even know

where to begin.

>> Um, because their their life is just

open up the codebase,

make your changes, and then submit the

PR and that's it. And so I think for

engineers like that, listening to this

audio book, even if a lot of it is a

little like in one ear and out the

other, just

>> being made aware that there's a whole

world out there and

so many of these concept like just

having any inkling of understanding of

what's going on at the lowest level of

your uh data because that's what this is

all about. Part one is called

foundations of data systems.

>> Yeah. Um, it's really really helpful.

And so I I'm not like starting from

zero. I'm I've been familiar with a lot

of these concepts, but I wish I could

say I'm an expert after listening to

this audio book, you know, but but

obviously I'm not.

>> And I think this is an important point,

which is that um sometimes I'll do a

couple things. First of all, yes,

listening to audiobooks is its own skill

and talent. I think all of us can have

our minds drift. We can do this when

we're reading with our eyes, too. Right.

>> Right. Um, and with reading with your

eyes, you just kind of go back and go,

"Oh, you know what? I didn't really

comprehend that last two paragraphs that

I looked at." With audiobooks, you have

to be pretty aggressive if you have the

opportunity is to like rewind or hit a

bookmark. Sometimes I'll kind of do

those things. Not always easy to do,

especially if you're riding a bike or driving,

driving,

>> things like that. What I will say is

that especially with this book coming

away with hey, I didn't fully grasp

everything in this one section,

but I do remember that there was a

trade-off when it came to

uh high transaction count on discs. And

so you just kind of make that mental

note and just be like, hey, if this ever

comes back up in the future, right,

>> I'll just know before I like weigh in

one way or the other, I hear somebody

talking about it, I'll go back and read

DDIA. I'll go look at that section. I'll

go deep dive into understanding like

Why? What's the characteristics of fault

tolerance for this one technology that

we're looking at and does that hit the

risk profile that's okay for us? Right?

Like this really this book is really

kind of this higher order thinking of

like every one of these technologies has

a strong suit and a weakness. Um and you

need to understand the business problem

you're trying to solve or the reality of

the hardware available to you or where

you think you're going to go with

scaling. And you you need to think about

this and say, "Okay, well actually this

puts an undue risk on our business

because if this really weird edge case

of false tolerance comes out and it

corrupts our data, we could lose

millions of dollars, right?" And that's

the thing that nobody can just solve for

you, right? when when you're kind of

looking into these types of problems and

this is what I love about this book is

it kind of gives you a sampling of like

all these little real world things like

oh you this team from Facebook said that

the schema evolution thing didn't work

very well and so they came with this

approach and you're like oh that's very

clever you know I understand why they

made that decision so that's I don't

know I I I know we're going to fanboy

out and and talk about these these parts

of the book but Um, yeah, it's

>> I hope we can get Kman on. I don't know

if he does a lot of media or anything.

Maybe he he would because he he wants to

promote the second edition.

>> Um, right.

>> But we're devoting four episodes,

Martin, to discussing your book. So,

we'd love for you to come on.

>> Hey, and and I'll and I want to put this

up here. If the second edition does come

out this year, because it has been put

off a little bit. It was supposed to

come out last year. If it does come out

this year, I think it would be really

cool to do a follow-up episode because

>> Oh, I think so. Yeah. This is such an

interesting thing and I think it would

be really cool for our audience to know,

well, okay, I had first edition sitting

on my shelf for five years and I never

got around to reading it. Should I go

buy the second edition? And we could

weigh in. We can be like, oh yeah,

you're really missing out on, you know,

uh, vector databases or whatever he's

going to cover in the new version. Uh,

hey gang, quick announcement. I'm

excited to share that I'm relaunching

Rojo Robboto, my platform engineering

consulting practice. If you need help

shipping faster with better

infrastructure, DevOps, engineering

enablement efforts, let's talk. Book

Overflow listeners get 10% off their

first engagement. You just go to rojo.com/bookoverflow.

rojo.com/bookoverflow.

That's r oj r o b o t o.com/bookoverflow.

Okay, great. Now, back to the episode.

So, we'll see. Well, this this first

part might be the one most immune to

second edition changes because it's all

about the foundations. I really enjoyed

chapter one um which is it says look

>> the whole point of kind of understanding

any of this is that you want to build

reliable, scalable, and maintainable

applications. And so we devotes a little

bit of time talking about um okay, if

we're going to talk about reliable,

scalable, maintainable, let's define

that. Let's define what reliability,

scalability, and maintainability is. As

far as reliability goes, I thought

something very interesting he points

out. He says faults do not equal

failures. He says a failure is when your

service stops working for the end user,

but a fault is something that could

potentially lead to a failure. And our

job is to design fault tolerant systems.

Um, you know, so a fault could be

something physical. I mean, you know, a

solar flare, we joke about solar flares

all the time at work. If sometimes we,

you know, we're a startup and so we

can't, when I was at like

the the cloud provider I was at, if we

had some like a deviation in the system

that caused P99 latency to increase for

5 minutes or whatever, right? We would

devote a significant amount of time to

understanding what that was. We're

getting much better at my current job

devoting time to that, but it's like 45

minutes. And if after 45 minutes we we

try to figure out I guess here's what

I'll say. We try to figure out

why the system reacted in the way it did,

did,

>> but sometimes we can't devote nearly as

much time to figuring out what caused it

to begin with. So what we'll be like,

okay, we were hit with a bunch of

queries, right? Um, we can't spend a ton

of time figuring out what exactly were

those queries or why they why we saw an

increase at this moment, but we can

figure out why the system when subjected

to those queries was performing poorly.

But anyhow, and so at a certain point

when we can't when when we can't devote

time to figuring out the the why or like

what caused it, we'll we'll say like

it's a solar flare. Like that's probably

what happened. some solar flare hit the

system and you know um but the whole

point is that like you should be

designing systems that are fault

tolerant to reduce failure. So you

should be designing things that are you

know tolerant to solar flares. Um you

know a malformed data in your database

could be considered a fault and if your

whole system blows up when it encounters

malformed data that's a problem. Um,

this is best exemplified by Netflix, the

whole chaos engineering approach, right?

Let's have a

>> let's purposefully kill servers um and

see how the system responds. Um, I I

would love to work at a place that was

much more aggressive about things like

that. Um, but I've just never been able

to work at somewhere that is is running

those sorts of simulations um, at least

with any sort of frequency. Um, so I

mean I thought those were interesting

points about reliability.

Yeah. And this is this gets back to um

something I I really appreciate about um

various software communities. So how you

handle faults, how you handle and do

fault tolerance really is cultural,

right? It it really has to do with the

domain of the problem you're solving.

For instance, um I'll bring up Go, which

again is a I'm I'm a fanboy, but in Go

errors are just values. There is no sort

of like try catch. there is no sort of

like exception flow control. Uh there is

a panic that is like sort of

existential. So I guess technically

there is some of that but you don't use

that in your normal day-to-day life. One

of the things that's interesting about

this is that you say okay well errors

will happen and it makes you kind of

upfront think about well how am I error

handling? Am I just wrapping it and

shooting it up the stack uh and letting

somebody else upstream deal with it? Or

is it like the John Osterhop method

where you say, you know, um you know,

you you handle errors out of existence.

Uh good fault tolerance systems figure

out what that right contract is. Can I

build something that is resilient in

that the errors actually aren't

something that goes to the end user?

It's just some part of the fault

tolerant system that that goes into

place. Again, obviously there's

trade-offs. There's there's things but

um these are fun problems to think about

and I think this first section of the

book also talks about things like well

there's hardware faults there's software

errors there's human errors right like

if you have some manual part of your

system and uh I've seen this time and

time again with startups that I work

with where you know you start somebody

is the deploy manager and they SSH in a

m some machine and you know trigger some

magic job and they try to get somebody

else in the company to do that and they

don't realize that Like step three is an

undocumented step, right? Step one and

two everybody gets it. Step three is

this like kind of weird edge case that

you know Sally always checked and nobody

realized that Sally checked it and

that's why her deploys were always

perfect. Um and so like there's all

these ways that faults uh the sort of

failures and fault tolerance and these

things can be built into a system. And

uh yeah, you have to again this is what

I love about this book. you have to

think and the whole point is

>> here's a mental

>> he'll give you these little sort of like

uh thought experiments and then he'll be

like oh okay like I I I thought that um

this the other section in in this first

chapter on scalability I thought it was

really kind of interesting where he's

basically like talking about Twitter um

and he was like oh yeah there's this and

this is a classic I would say this is

like a classic systems problem that will

come up from time to time in your interviews

interviews

>> um you say how do you aggregate ate a

feed to everyone, right? Um, and I and I

loved it because, you know, it was like,

okay, well, I do some query and I um I

look for everyone who I follow and then

I go and try to grab the latest piece of

information and then I like shove all

these together in a timeline and then

display that timeline to the end user.

>> Um, the problem is is this is like a one

to many relationship. This is like very

expensive like join operation that

happens in a database. And of course

he's using this as sort of a layup to

talk about all these other topics in the

book. Um and he realized that actually

there's a much better way to do this by

inverting this. And when someone posts

they post rarely but they read a lot.

Right? So you post rarely. >> 12,000

>> 12,000

>> tweets a second write operations 300,000

read operations a second. So, so for

most people who don't tweet that often

and don't have that many followers, it

makes way more sense for them to have

this sort of like filtered timeline view

where you kind of have this thing and it

makes it much more much easier for this

to come in. Except

there's this other problem which is what

happens when you're a super super

popular person like you're Elon Musk on

Twitter or you're you know some other

like you know millions and millions and

millions of followers. uh when you post

that actually breaks the opposite

direction because all of a sudden your

one tweet will update millions of

people's sort of cache timeline. Um and

so they actually had the pendulum swing

back to the original pattern for that

for a special subset and just like a

certain amount of follower base. And it

was just like this it was kind of cool

to see like okay well here's this

engineering problem here's how we fixed

it. Oh but actually there's this edge

case that actually is so existential

that we have to go back and fix it a

different way. And then we have this

hybrid approach and um I think everybody

who's seen cool software sees these

things in reality right there's this

some sort of like weird you look at it

you're like why is it this way and

you're like oh well actually it makes

sense given the constraints that I have

>> when he talks about scalability he says

like it's not binary like system a

scales or it doesn't scale systems scale

in different ways right and writing a

rightheavy service is very

[clears throat] different from writing a

read heavy service right and also in

terms of like scalability like I don't

know stick up a server and all it has is

a health checkpoint right and then gate

that behind you know like put that on

Kubernetes like boom it scales you can

probably handle a million requests per

second right if if you do that um but

it's uh there there's not much to it um

he talks a lot about performance metrics

uh percentiles over averages you know so

you got your P50 and this is so You know

what's so dumb about this is that I have

I I know about P95, P99 latency, right?

And um if you're familiar with the

podcast, you know that Nathan mocked me

at my current place because we did not

have like open entry setup or any sort

of metrics. You're like, what are you

doing? Like how do you know if the

system is stable? And so I got that all

set up. It looks great now. In fact, uh

just recently I got also our MongoDB

drivers exporting automatated metrics

which has been great because we've been

doing the whole month of January, my

project has been just uh performance

improvements for the site and um and we

had we had this like weird library that

was serving as like the application

bottleneck and so we got that removed

which is great. So that's no longer the

bottleneck but it's so funny because now

the bottleneck has moved to MongoDB and

so we are

>> Isn't that funny?

>> I know right. And so it's cool that like

this this former library which had been

such a pain for us. We finally solved

that. But now I'm doing all this [ __ ]

stuff. So I had to get these [ __ ]

driver metrics automatically exporting.

But anyhow, so yeah, I I I have P95 P99

latency set up. But I was monitoring

average latency and then this book

pointed out P50 latency, median latency.

I'm like, oh, why do we not have a

[laughter] dashboard? You know, why

don't we have a panel for that on our

dashboard? So I got that set up and I

was like, hey, like our P50 latency is a

lot better than our average latency. But

in in part because our P99 is just too

high. Um, but this is something I've

been thinking about all month because

again my whole job this month has just

been getting uh latency down on the

site. Amazon determined, this is what

Martin says in the book, Amazon

determined that every 100 milliseconds

of increase in latency decrease sales by 1%.

1%.

>> That's nuts.

>> Insane, right? That's 100 milliseconds.

onetenth of a second leads to 1%

decrease in sales. Um, and so it's

interesting because like I'm jealous of

a company like Amazon who could afford

to run those sorts of experiments and

and come up with that definitively. I

don't have any proof. I don't think I'll

have any proof once this is all said and

done that like oh by decreasing our

latency we you know saw this increase in

this metric for the business or

whatever. But I keep thinking about

that. I'm like I bet that holds true. I

remember when we read Miplex, the the

early Google engineers very strongly

believed in the power of low latency um

how it grows the business.

>> This was a really interesting one too. I

I came into a so I was working at a

bigger startup for me about 300 IC's. We

had were seriously funded and we had

some mergers and acquisitions. There's a

company that we had acquired out of

Spain actually and I remember that one

of the weird problems that we had it

didn't show up in the dashboards

and this is the worst case scenario. Um

we had all this like tracking metric

stuff. So the headers were actually

pretty full. Like it was way too full.

And what would happen was if the headers

got too full, some of the um edge edge

routers would actually like either

completely block or or cut out some of

the uh the headers that were annotated

in. And this would actually

disproportionately affect people that

were part of the loyalty and rewards

program. So like the most valuable

customer, the people who spent enough to

actually like want to be, you know,

getting the loyalty and rewards were the

ones who were getting the worst

performance or the worst uh user

feedback. And so we ended up having to

make dashboards specifically for loyalty

rewards. Uh it was it was a very

interesting like metrics challenge to

kind of be like how do we and actually I

just had this conversation with another

um um client actually I was talking to

you. If you run into a thing where you

can't if you run into a problem that you

can't ask the question of your data

that's a really good indicator that you

need you know to beef up your metrics in

some way like sometimes there are these

questions you just you have a question

you're like I don't actually don't know

how we measure that you know um

>> that's one also processes that scare you

right so these are the kind of couple

things that come up and this actually I

think this gets us into a really the

next part which he talks about maintainability

maintainability

>> I just want to say one more thingability

um which is he says uh so with your your

percentiles um

>> oh yeah so you know right so so P99

means one out of every 100 requests and

he talks about P99.9 which is one out of

every thousand requests he says that big

companies have determined that P99.9 is

about as far as you want to go he says

when you get to P99.99 one out of every

10,000 requests it's just a little like

you know that's where you're getting

into things like solar flares or just

like you can't even really tell what's

causing these things to be extra long

but he says that you might think P99.9

is excessive, one out of every 1,000

requests. But the point he makes, and I

believe this came from Amazon as well,

is that your customers with the most

data tend to be your most valuable

customers. Yeah. And those tend to be

the sorts of requests that start showing

up in P99.9. And so you might think one

out of every thousand, we can afford to

lose it. And to be totally honest, at my

company right now, I'm not even touching

P99.9. I'm focused entirely on P99.

We're just we're starting out doing baby

steps, right? But that is something to

think about. Those tail end requests

aren't necessarily random. They might be

associated with your most valuable

customers. And so

>> that's interesting.

>> You're taking care of them. But that

does bring us Oh, sorry. Go ahead.

>> Yeah. Yeah. No, I've I've actually never

measured P99.9

either. No,

>> at the cloud provider.

>> Oh, okay. Yeah, that makes sense. Well,

and I've never worked at a company

that's that large. So I think P99 was

actually us holding ourselves to a

really high bar

>> because again right what you're doing

with this and I think for the uninitiated

uninitiated

um P99 is looking for outliers. It's

saying hey 99% if you look at the

average latency of something you're

going to have this number that's in a

pretty hopefully in a cozy spot.

>> Um or you'll see something that looks

really bad. So there's two things

that'll happen. Number one it'll either

lie to you and make things look better

than they really are. um versus edge

cases or you'll get some number that's

your average that looks really high, but

really what it is is that the outlier

just happens to be super crazy outlier.

This is kind of like if you look at um

median versus mean income in the United States,

States,

>> right? The people who make the halves

versus the have nots, the people who

make really really really high income

will completely distort what the average

income is for the American household.

And so that's why we

>> look like the the average wealth for

millennials like it it was looking a lot

higher than it should because of Zuckerberg.

Zuckerberg.

>> Yeah. Exactly.

>> Him alone was distorting that so much.

>> Right. Right. They there's this joke

that if you're like at a cocktail party

and like a you know a billionaire walks

into the room the Yeah. The the mean

income goes up by hundreds of thousands

of dollars. Right.

>> It's Wyoming. I It's Wyoming I think or

Montana I forget. But it's the only

state in the union where the average

income for black Americans is higher

than the average income for white

Americans because Kanye West lives there

and there are so few black people in

that particular state. So anyhow, yeah,

median versus me. Lots of fun.

>> Exactly. And so I think statistics, so I

will say um and this this comes up more

time and time again, having at least

like a freshman level college

understanding of statistics is actually

very important in our jobs. And I would

say that the longer you put it off, the

worse it will be because you're if you

want to start asking interesting

problem, asking interesting questions

and solving interesting problems in your

company, you're going to have to get

pretty good with how statistics work and

how statistics can lie to us. Are we

actually is the data giving us something

that's actually meaningful? Um, and it's

non-trivial and it's and I will tell you

that I've walked into environments where

they are willing to lie to themselves

with statistics so that it looks good to

a manager, but they're not actually

solving the real problem because they

aren't putting the right statistical

rigor in place. And sometimes you kind

of have to break it like, hey, this

isn't actually how we should be

measuring ourselves. And um, and so

yeah, I think you can't meet your goals

with scalability unless you're measuring

things properly. And I think that's what

this section really kind of drove drove

down, you know.

>> Well, and the thing with P99, you say,

"Well, it's one out of every 100." But

think about it. It's not crazy that each

page on your website would make four

requests to your backend, right? And so

that means if someone clicks around 20

pages over the course of your time,

they're going to encounter I mean, odds

are they'll encounter one out of those

100 requests. And if your P if your

median latency is 300 milliseconds but

your P99 is six, that means at some

point during your user's experience,

they're going to encounter a page that

takes six seconds to load, right? Some

feature is going to take six seconds to

load. So, you know, worth considering.

Go ahead.

>> Yeah. No, the that was the thing is like

the the um a lot time and time again

I've found that it's oh there's some

cash invalidation path that you know

cash poisoning or something that

>> comes up or we find that there's some

usage of a web page that ends up being

this really expensive SQL query or you

know something and it's really good

investigative tool like truly

>> and then there's maintainability which

we can just touch on maintainability is

is a lot more kind of the the

architecture of the code itself although

he does talk about things like

monitoring automation document

mentation. I mean this is where

something like fundamentals of software

architecture or philosophy of software

design is going to talk a lot more about

this. Um and I mean again you could

devote a whole series of episodes to

just maintainability especially in the

age of AI because you know there's a lot >> right

>> right

>> I I was reading what is it I I never

heard of this open source project

before. It's called TL Draw TLDDR.

>> Oh yeah.

>> Were you seeing this? Um,

>> I've there's some YouTubers that I like

a lot and I've actually started I

actually have been using Teal Draw for

some diagramming that I've been doing

for for client work. So,

>> well, they just announced that they are

automatically closing all new poll

requests on the open source project because

because

>> there's so much AI generated code. Um, yuck.

yuck.

>> And and I think there are just users who

are just trying to because that's been

something people have said forever like,

"Oh, commit to open source. If you're

having trouble finding a job, go commit

to open source." And so I'm sure there

are some programs out there, maybe just

users themselves, you know, being savvy

that are just opening up as many poll

requests and open source projects as

possible. And so, you know, we're just

in an age, very strange, right? We're in

an age where you can generate code much

much faster than you can review it. Um,

>> I think that's funny kind of like how

I've been talking about at my job like,

okay, we removed this application level

bottleneck, but now we have come to a

new bottleneck, right, which is [ __ ]

We're kind of at that point as a field,

right? We're like, okay, the the

bottleneck for a long time had been

writing code.

>> So, you remove that the code generation

is is is faster these days, but what's

the new bottleneck, right? And I think

there are a lot of bottlenecks, right?

>> It's so it's so true, too. And this

actually gets back all the way back to

um some of the conversations we had

about fundamentals of software architecture

architecture

>> where we talk about conence and that

when you find a bottleneck you actually

are finding an accidental conence of process.

process.

>> Oh yeah. >> Right.

>> Right.

>> Because you what you don't realize is oh

there's a dependency graph here. Once I

cleared up this weird little thing I

realized that actually and so it it is

interesting of like how do I decouple

systems? how do I um or and and

sometimes it's the fact is that that is

just the bottleneck and now we just have

one variable set to deal with like oh

MongoDB needs more resources or we need

to structure our database a certain way

maybe that's the sort of ending point

but um sometimes you don't know how to

tune a system until you've uncovered

some of the like blockage of process

that are that are in place

>> well uh so that that's a lot about uh

scalability, reliability, and maintainability.

maintainability.

>> I will bring up one thing with

maintainability. It's going to come up

and so I'm just going to give a layup.

He kind of mentions evolvability and um

so we'll get through chapter 2 and

chapter 3 and chapter 4. That's the

that's where we'll be for this episode.

Chapter 4 is actually my favorite

chapter. It has to do with like um uh

encoding and decoding, serializing and

deserializing data. And a big theme that

comes up with this because this ends up

being a foot gun for a lot of

organizations is the evolvability of

schemas, the evolvability of a contract

and an API and these other things. And

so I again it's I appreciate this book

because he gives himself a layup. He's

like, "Okay, here's the fundamental

here's the foundational principles that

we're going to address everything else

in this book." Um, and he he keeps

coming back. And so I I love that he'll

bring up scalability, bring up

maintainability, and bring up

reliability when we're talking about

trade-offs. Uh for the rest for the rest

of the text,

>> chapter 2 is all about data models and

query languages. This is one where I

started to feel a little more like maybe

he was doing his due diligence to kind

of cover everything. But there he gets

into like a lot of things where I'm

like, I don't know if I'm ever going to

use this, right? Like I don't know if

I'm ever going to use Kodacil, right? or

uh and then he talks about all these

different kind of databases. Um and and

I think we're seeing these days like

yeah like I don't really see a lot of

arguments for like my SQL over Postgress

for example right like Postgress is kind

of eating the world um and but he does

talk about the convergence like this is

the chapter where where he lays out like

okay SQL versus NoSQL right and nosql's

kind of evolved to mean not only SQL um

and I think it's really important to

have an understanding of how these data

models work. I mean, we are actually

struggling with that at work. Um, we use

[ __ ] as our database and kind of the

early engineers who built the product. A

lot of them aren't with the company

anymore just treated [ __ ] was like,

yeah, you know, it's a database like

let's just query whatever we want. And

then as we've been digging in and making

these performance improvements, we're

like, you know, some of these [ __ ]

queries are really, really slow. Um, and

it's because they're [ __ ] isn't like a

magic sack you can just pull things out

of, right? Like it has an actual

physical structure underneath and how it

organizes all this data. And so some

things are really fast, like anything

that has an index on it, right? [ __ ]

can uh query that pretty quickly. If it

doesn't have an index, all of a sudden

you're talking about doing a an item by

item scan of the entire

>> collection, right? And so there's a lot

of value to whatever database you choose

having a basic understanding of how the

data is organized underneath and what

queries you're going to make against it.

Uh because that that really influences

the the amount of processing or the

processing time it takes to return that data.

data.

>> Document stores are really cool if you

have the and again the book does a great

job. It's probably the best explanation

I've had on why I'd pick a document

store. And when we're talking about

document stores, we're really talking

about, you know, MongoDB, we're talking

about DynamoB, which I don't think he

brings up in the book. But there's also

um, you know, there's a bunch of these

where what you have is you need

basically some sort of a primary index,

you know, key value kind of store of a

bunch of tree structures, right? Like

that's really what a document database

is. And that if you need to start doing

relational queries across it, you're

going to be in a lot of pain because

what relational databases do quite well,

which is denormalize, I'm sorry,

normalize the data where you can

actually like abstract out all these

pieces and you put all these joints

together and you can get like a correct

representation of the entire system um

in a very like you know really nice

clean way. um it's really slow for the

type of things in which maybe really

what I care about is finding some chunk

of data and then displaying this tree

structure to the end user where that's

where you know [ __ ] really shines and I

loved that this thing really just kind

of gives us this breakdown um of one

versus the other. It is it is funny too

that you

the world has shifted like there's this

this whole movement I think since this

book came out called the just use

Postgress movement right a lot of people

are just like just use Postgress like

you don't need all these crazy databases

not always true and I think for your

personal projects on the weekend it

probably is true uh especially because

Postgress is blurred the line and all

the major SQL databases have this now

but Postgress has a JSON data type so

you can actually do relational data plus

tree structure document store inside of

the same database and that kind of blurs

the lines like >> Mhm.

>> Mhm.

>> if you have relationalish data with

documents that you want um you can you

can you have all these options that you

didn't have before. Uh and

>> and this is something uh Alex in his

system design interview talks about. He

talks about doing like back of the

envelope math and I wish I saw more

candidates doing that because a lot of

candidates will be like well we got to

got to use [ __ ] we got to go no SQL

because you know it scales it's web

scale we talk about that all the time at

work [ __ ] it's web scale say we don't

know anything about [ __ ] but we know

that it's web scale um and it's uh

and so a lot of candidates will be like

well we got to go do no SQL because it's

got to scale but it's it's one of those

things where it's like okay let's say

you're doing a tweet right and so let's

say a tweet is 140 characters and so how

much is each charact is one character a bite

bite

I I I think so

>> um typically If it's UTF8, it's not

necessarily. They call this and they

call those runes and they can be longer.

If it's ASKY, it's yes, it's one by one character.

character.

>> So a tweet can be 140 characters, you

know, and so that's 140 bytes. And so,

you know, then how many tweets can you

fit in a megabyte? So you can fit like

800 like 700 tweets in a megabyte, which

means you can fit

uh Oh, wait. Sorry. No, no, that means

you can fit 700 tweets in a kilobyte,

which means you can fit 7,000 tweets in

a megabyte, which means you can fit I

mean, 7 million tweets in a gigabyte,

right? And so

>> if you're if you're kind of saying like

well we got to do Postgress because or

we got to do [ __ ] because [ __ ] scales

but then it's kind of like well wait a

minute like what if I did Postgress and

I mean how much does it take to get you

know uh it's like it costs like five

bucks a month to run a 10 gigabyte

Postgress database on AWS right and so

all of a sudden that's 70 million tweets

I can store in in my my Postgress

database and and then you so you kind

ask yourself like, okay,

especially starting out, how how fast

are we going to get to seven 70 million

tweets? And that's why in an interview,

you need to be asking those sorts of

questions like how many writes do we

expect per day, right? You know, what

what kind of growth are we expecting to

see? But if you're doing kind of that

back up of the envelope math and

actually estimating like okay how much

uh data um are we actually going to have

then it opens up some of those options

for you rather than just jumping

immediately to like well we need no SQL

because no SQL scales. One framing that

I thought was interesting is so I think

a lot of people will reach for something

like Firebase or something like

>> uh my um MongoDB because they want to

>> I like to call it the they're kicking

the can down the road in thinking about schemas,

schemas,

>> right? So it's like oh man, you know,

the data the shape of our data is

changing so quickly. It just really

would be annoying

to have to nail a schema down and do all

these schema migrations and I've had a

bad experience with this in the past and

like whatever. It's just a bunch of JSON

blobs. I get that for quick iteration.

Um, but it runs into a lot of problems.

And again, I've never seen the framing

like this. And maybe it's just cuz I'm a

I'm a I'm a dunce, but he calls this

schema on write versus schema on read.

And I think this is again a good example

of the trade-off. So schema on write is

what relational databases are. You are

declaring your column types, your names.

You really have to understand the schema

for your system to interact with

database, right? There's all these

expectations. It also means it's in your

face when you need to add a column or

make some change or do some other thing.

You have to think about the shape of

this data before you get started versus

schema and read. Uh and schema and read

is the document structure.

>> This is actually a lot like how APIs

work, right? We don't know. We we might

have a a written promise that data from

an API endpoint from Stripe is going to

be a certain way, but it can change over

time. We might get some new columns. we

might get some new changes and your code

is written in a way that says like okay

I'm going to parse this out and I will

validate the data and I'll make sure it

fits the right data type and I do that

at read when I read the data I'm

inserting it into whatever structures

that I want and I just I hadn't thought

about like depending on the shape of the

work that you're doing this is the

trade-off that's you may have super

flexible schemas where maybe only three

fields you really care about and

everything else is just kind of like

nice to have in doing a super locked

down relational schema and write

structure really is in the way where you

know maybe I care about the user ID and

the number of times they've logged in

and what groups they're in or something

right and then everything else like the

tags that they've given themselves and

all these other things like extra

annotations if it's there that's great

and if it's not I don't care and you

know well okay well document structure

would be great for that you know you

don't have to denormalize and you know

um do all this like crazy stuff you can

just kind of let this data

sort you you basically get a data cache

optimized for certain types of queries

and you can think of documents being

that way, right? You're not having to

ask a question across all of these

documents. Um,

>> right. But I like that that framing from

you like sometimes you are just kind of

kicking the can down the road and we

we've seen that just in like and I I'm

again we've entered a world where

generating code is a bit of a commodity

these days. And so I I've seen a lot of

posts on the experience dev subreddit

lately of just like and I I I really

feel for these people. These are people

like Uncle Bob talks about like the flow

state, right? Like these are people who

that was their favorite thing about

programming. They just loved getting in

the flow state and just generating

[clears throat] lots and lots of code.

And for better or for worse, that

doesn't exist anymore. Like that that's

there's not or it's not nearly as

valuable as it used to be. And so me,

I'm enjoying this new era of large

language models because I've never

really I've not derived a lot of great

enjoyment out of the the actual act of

writing code. I derive a lot of

enjoyment out of making things and

seeing results, right? And and coming up

with solutions and I really like that

once I kind of have honed in on a

solution like, okay, this is what we

want to do, then I submit the prompt to

claude code and it takes care of what I

want. Uh th this is so interesting and I

think this is like a so I I was actually

literally last night having a

conversation with a buddy of mine who

has um you know you would think so

there's this sort of divide where

there's like all the large language

models are terrible and you know what

are we doing to the world and all this stuff

stuff

>> and who would use them and you're a fool

and then there's this other camp that

seems like there like the large language

model maxis right like they're like

>> like relishing in the fact that

everyone's going to lose their jobs and

that you can like replace Facebook with

one prompt on a weekend or something and

you're like both of y'all are nuts.

That's how I kind of look at it.

[laughter] But my buddy who's like kind

of in this Linux, he started as a Linux

system. He's like a phenomenal

programmer. He's worked at a bunch of

cool companies. He's kind of taken this

like pragmatic middle spot. He loves

actually writing code. He like loves

writing Rust. I I consider him a deeply

thoughtful programmer, someone who

actually does enjoy the flow state and

is also really enjoying using large

language models. He actually just

introduced me to a tool called Happy,

which is like a wrapper around cloud

code or codeex that allows you to um let

it run where you can like you can access

it from like your mobile device. So you

can let it do its agentic stuff and it's

all personal and private cuz he's, you

know, very privacy oriented. And he

actually brings this up in a really

interesting way that I hadn't thought

about. He his analogy was it's like a

when the CNC machine was invented um the

industry around like machinists freaked

out right because before CNC machines

right and these are the ones that are

milling out this stuff and can do this

very technical you know very technical work

work

>> in an automated way um machinists were

like kind of repulsed by it because they

were just like gh no I mean like there's

a craftsmanship there's a tooling

there's a way that we do this they enjoy

machinists were very well compensated

they did these

And yet CNC did two things. Number one,

it allowed people to get custom machine

stuff at a much lower barrier. Like you

can if you could buy a CNC machine and

you can spend a few weekends learning

it, you can be good enough. Like it's

not going to be as good as a machinist.

>> But a machinist with a CNC mill

>> is like next level excellent, right?

They know the excellent craftsmanship

that goes into finishing these parts and

they might CNC it out and then go back

with their machining tools and make it

perfect, right? they may go do their

extra stuff. And I'd never thought about

this idea of like precision plus scale

is what these large language models are

unlocking with professionals like us

where there's certain types of my parts

of my job that literally had such a

cognitive load to them that I just

didn't even know how to get started. Um,

>> and then letting the CNC mill go off and

do a thing and then me go off and work

on some other stuff and then come back

as a machinist and clean it up and do

some stuff that the CNC mill can't do on

its own. I think this is actually like a

really good analogy. It's the best I've

actually heard from anybody. Um, and

it's cool too because like Carter, you

and I I think I'm not quite where my my

friend who's at the REST programmer,

he's the one who gets into deep flow

state and really cares about, you know,

lowle data structures.

>> You're like very product focused. you

love the output,

>> the the intersection of humans and this

and I'm somewhere in between. Like I I

actually like like the producty stuff,

>> but I also love getting into flow state.

Um, and I feel like all three of us,

it's kind of it's rare for me to see you

and him and me all kind of having these

aha moments that there's a there there, right?

right?

>> There's there really is something

interesting and new and you ignore the

hype people and ignore the doomsayers.

Like I I I just this a little PSA in the

middle of this episode. It's just like

there's something there. We're not all

going to lose our jobs. Our jobs are

changing though and the expectations are

going to shift. And it's actually like

if you have a if you have the right

attitude about this, I think that

there's something here that is like

deeply rewarding, like deeply

satisfying. Um and actually makes

reading books like DDIA >> Yeah.

>> Yeah.

>> really important. Like if you want to be

the machinist plus the CNC mill, you got

to read books like DDIA, right? You have

to read domain driven uh sorry domain uh

data intensive >> designing

>> designing

>> designing data intensive >> applications.

>> applications.

>> The domain driven one is also another

one we're going to read down the road.

But uh yeah, that's

>> yeah, I'm with you. The little PSA in

the middle of the episode, but I just

think uh yeah, like I don't know. I I

tend to think of myself as as I'm an

optimist but also a realist. I I tend to

have a pretty cleareyed. I hope I'm not

lying to myself about too many things,

right? And so I am very very aware of

the current state of LLMs. I'm very

aware of what my job constitutes and I'm

just I I've mentioned those podcast

several times. I'm just like I don't see

my job disappearing anytime soon, right?

Like I am not incredibly concerned. The

job has changed 100%. I look at our

junior engineers, I'm just like you

graduated into a completely different

world than I graduated into, right? Um,

but there's still again and reading a

book like this just confirms to me like

this is all really important knowledge.

This is something you absolutely have to

have to succeed in this world even in a

a large language model dominated world.

Chapter three,

>> I don't know. Maybe So, I'm only making

fun of chapter 3 because before we

started the podcast, I was like, Nathan,

I'm gonna be honest. This chapter kind

of went over my head a little bit,

right? Like I remember this is the one

where I if I were reading a book I would

have went back if I had the book in

front of me with the audio book I would

have paused to do it. So we have now

reached the point of the podcast where

Nathan explains chapter 3.

>> Oh my gosh.

>> And to you the audience.

>> So there's a famous story a few years

back with the guy who wrote Brew um the

homebrew uh the command you know the

package the missing package manager for

Mac OS.

>> Uh he interviewed at Google and he ended

up getting

>> Oh was it Apple? I think it was Apple.

Yeah, that's what

>> Oh, well maybe. I thought it was Google.

But anyway, there's a mythology. So, one

of the fang companies. We'll just say

that. Um, but he interviewed and he

failed this. He couldn't uh, you know,

construct a B tree B tree on a

whiteboard. Uh, he's just like, I've got

hundreds of thousands of people using my

software on a daily basis, >> right?

>> right?

>> Writing valuable stuff. And yeah, sure.

I can't do this sort of like, you know,

cording ritual um thing that you've

asked me to do. Uh and I kind of feel

that way about this chapter. So like

this is really important stuff if you're

in this domain, right? if you are having

to solve or understand why are we using

B trees versus LSM trees or why um do we

use this sorting algorithm or why is

level DB you know picked this or you

know these kind of things are they're

really important when you happen to be

in that space most of us are not most of

us should really lean on the same

defaults right like if you're tooling

with which index type you're using for

your um column in your database. Uh I

would say most likely you're probably

like overengineering it. And I'll take

that a step back though. I was actually

doing some Postgress stuff and I

realized that the way that I was the

shape of the data, the default indexing

algorithm was not correct for what I

needed. And I actually needed this other

thing because the way that I was doing

queries was highly optimized. It was

like an O to one relationship with the

type of thing that I was doing. and I

was a luckily with knowledge like this I

was able to understand the trade-off.

Um, this chapter is definitely one that

takes close consideration. If you're

listening to the audiobook, you're

probably going to have to listen to

sections a few times. I would highly

recommend going back and reading the

physical copy if you could because

there's diagrams that are really

important. Um but this really digs into

things like hash indexes. um you know

and uh and also just like how act how

some of these algorithms actually work

on hardware and I think this is

something that we don't think about a

lot even in graduate algorithms right

you take graduate algorithms and you

learn about how uh dynamic programming

works but they don't get into like oh

and actually this dynamic programming

algorithm is really great because of the

way that sequential writes on disk work

and you actually can you can say okay

I'm going to take this chunk of memory

And because I know that like this LS

entries work like this, there's these

chunks of uh continuous memory

allocation and that because the way I

write these sections on disk because

they're appendon logs or whatever, I can

write this and know that the fault

tolerance characteristics of it are

excellent for, you know, if I get

interruptions and and and in in the uh

and and so it ties into things like

write ahead logs.

versus like these appendon records and

how these things work. And I liked this

section because I hadn't thought about

like, oh, if you have a write a head

log, you're actually accessing this data

two times, right? Versus if you have

this thing that's writing to disks the

first time, you only access it one time.

And sometimes it makes sense for this to

be a write ahead log. And sometimes this

makes sense to just like write to disk

um because of how disk access works uh

with what what you have going on. And um

if this is making your eyes glaze over,

it's okay. It it um [laughter]

this is this this chapter really gets

into the weeds of like um things like

ride ahead logs uh and crash recovery or

how certain um certain technologies like

why they were picked for certain for

certain applications?

>> Um yeah,

>> was there anything that stood out to

you? I guess maybe we should maybe we

could hop into more concrete examples of

stuff. I there were some things like he

talked about data warehousing, right?

And he talked about like ETL like

extract, transform, load.

>> That's a section I loved actually.

>> Yeah. Yeah. Which I did a little bit of

work at that with that in my my last

job. But again, this is a little

something like I joined

cuz like there is so much breadth as far

as like what you can work on. Like every

now and again I'll see like a Reddit

post where someone mentions they work on

like embedded systems. I'm like I forgot

that was a thing. I first thought that

there were software engineers actually

doing that sort of work. And that's

probably work I will never do in the

history of, you know, throughout my

entire career.

>> They're doing the Lord's work because

that is some deep focus. [laughter]

>> They're they're cut from a different

cloth because I I um Yeah, actually this

would be a good one whether we got into

too much of the details or not. OLTP

versus OLAB. I think this is uh even if

you don't know a ton about databases,

especially SQL type databases, this is a

really important concept because I think

everyone runs into this at some point.

Uh if you're dealing with SQL, um which

is the shape of the questions you're

asking your database, right? Um there is

uh OOLTPs are the transactional

databases. That is what you think of as

like a web app database, right? have a

bunch of users that maybe go to the

website and log in and are doing stuff

and most of those transactions are tied

to just their user behavior. Right? If I

have a shopping cart, it's my shopping

cart, my credit card history, my uh you

know, shipping addresses, the things

that I've purchased and orders and

stuff. It's a bunch of transactions.

Maybe I have millions of customers, but

I'm not going across and asking

questions of like um uh you know, I

don't not looking into a bunch of other

user data for that, right? In Amazon, Amazon.com,

Amazon.com,

everything in there is either product

inventory or your stuff like your

history, right? Um, OLAP is the other

big argument here, which is that, okay,

let's say I'm an analyst at amazon.com

and I want to know what the total sales

out of the United States during the

month of May 2024, right? Um, that's a really big juicy query that you're

really big juicy query that you're asking and it's going to affect a ton of

asking and it's going to affect a ton of rows. And if you ask that amongst of

rows. And if you ask that amongst of your OLTP, your transactional database,

your OLTP, your transactional database, that database is not optimized for that

that database is not optimized for that type of workload. And you could actually

type of workload. And you could actually take the whole thing down if you ask

take the whole thing down if you ask juicy big enough queries.

juicy big enough queries. >> Um,

>> Um, >> and this is a good example, right, of

>> and this is a good example, right, of like again, I read this I I listened to

like again, I read this I I listened to this chapter on the audiobook to be

this chapter on the audiobook to be honest. A lot of it went over my head,

honest. A lot of it went over my head, but things like OLTB and OLAP, right,

but things like OLTB and OLAP, right, and like ETL, like ETL at the this last

and like ETL, like ETL at the this last job I had, right? like I

job I had, right? like I >> I I didn't even know what that was until

>> I I didn't even know what that was until I showed up and then it's like ETL. So,

I showed up and then it's like ETL. So, I would have just been a little bit

I would have just been a little bit ahead of the curve if I had read this

ahead of the curve if I had read this book. And just like OLTP versus OLAP

book. And just like OLTP versus OLAP again, like I don't really again, a lot

again, like I don't really again, a lot of this went over my head, but I I have

of this went over my head, but I I have a chatbt p window pulled up right now.

a chatbt p window pulled up right now. And I I just wanted to give some context

And I I just wanted to give some context for the audience. But what's great is

for the audience. But what's great is that as I'm going to give you guys some

that as I'm going to give you guys some context what OLTP versus OLAP is, I'm in

context what OLTP versus OLAP is, I'm in my mind I'm like, "Oh, yeah, yeah, yeah.

my mind I'm like, "Oh, yeah, yeah, yeah. I I read this. I I remember this part

I I read this. I I remember this part from the book, right?" Um, and so OLTP

from the book, right?" Um, and so OLTP is online transaction processing. And

is online transaction processing. And that's exactly what you're talking

that's exactly what you're talking about, Nathan, right? This is like user

about, Nathan, right? This is like user signup, user updates, profiles, payment

signup, user updates, profiles, payment processed and what does shad be lists

processed and what does shad be lists key characteristics. It's optimized for

key characteristics. It's optimized for writes and fast small queries. It has

writes and fast small queries. It has highly normalized data, asset

highly normalized data, asset transactions, low latency. OLAP is that

transactions, low latency. OLAP is that online analytical processing. So this is

online analytical processing. So this is going to be used for your your business

going to be used for your your business intelligence dashboards, your weekly

intelligence dashboards, your weekly metric reports, uh trend analysis, and

metric reports, uh trend analysis, and what are its key characteristics? Again,

what are its key characteristics? Again, and this is completely different from

and this is completely different from OOLTP. This is optimized for reads and

OOLTP. This is optimized for reads and large aggregations. It's often

large aggregations. It's often denormalized. It has columner storage,

denormalized. It has columner storage, right? This is for handles very large

right? This is for handles very large data sets, gigabytes to pabytes. And so

data sets, gigabytes to pabytes. And so >> again, I listen to this. I could not

>> again, I listen to this. I could not have told you that until Nathan started

have told you that until Nathan started talking about until I pulled up this

talking about until I pulled up this window. But I do have a just a little

window. But I do have a just a little bit of foundation. So I'm I started

bit of foundation. So I'm I started looking at chat. I'm like, "Okay, yeah,

looking at chat. I'm like, "Okay, yeah, yeah. All those things Martin Kleman or

yeah. All those things Martin Kleman or talked about, I'm remembering them now."

talked about, I'm remembering them now." So again, even if you're listening to

So again, even if you're listening to this

this again like I am on a bike commute,

again like I am on a bike commute, right? I think there is some really

right? I think there is some really solid foundational work it's doing in

solid foundational work it's doing in your brain even if you're not coming

your brain even if you're not coming away with a really detailed

away with a really detailed understanding.

understanding. >> Yeah.

>> Yeah. Yeah.

Yeah. >> Oh, go ahead.

>> Oh, go ahead. >> Yeah. No, this is and and I will tell

>> Yeah. No, this is and and I will tell you like and we we'll kind of talk about

you like and we we'll kind of talk about this towards the end because we're

this towards the end because we're getting close. Uh is that um the world's

getting close. Uh is that um the world's moved on as well like this is a really

moved on as well like this is a really good thing to think about like and

good thing to think about like and understand the concept like a colmer

understand the concept like a colmer database is important. It's literally

database is important. It's literally storing the columns. Um, and typically

storing the columns. Um, and typically it's because these are appendon

it's because these are appendon databases where rows you go across a

databases where rows you go across a bunch of the column files and you're

bunch of the column files and you're just appending the extra data to the

just appending the extra data to the very end. Um, but for in I think it's

very end. Um, but for in I think it's Spanner which is Google's technology. If

Spanner which is Google's technology. If you ask it a set of data in across let's

you ask it a set of data in across let's say 10 row 10 columns um and you tell it

say 10 row 10 columns um and you tell it to limit 100 a lot of people think that

to limit 100 a lot of people think that that'll like reduce the cost of the

that'll like reduce the cost of the query but it actually doesn't. It

query but it actually doesn't. It actually because of its colum or

actually because of its colum or database it's actually accessing the

database it's actually accessing the entire column. So you have billions of

entire column. So you have billions of rows in that column. Um, limit 100 is

rows in that column. Um, limit 100 is like a UI syntax syntactic thing, but it

like a UI syntax syntactic thing, but it doesn't actually save you money because

doesn't actually save you money because and and I've I actually w saw this in an

and and I've I actually w saw this in an organization where we were doing uh

organization where we were doing uh really expensive queries on really

really expensive queries on really really large data sets and they were

really large data sets and they were acting like it was a transactional

acting like it was a transactional database and I was like yeah that's

database and I was like yeah that's we're not using this the right way. And

we're not using this the right way. And it was it was a knowledge gap. It's just

it was it was a knowledge gap. It's just they didn't understand uh what the

they didn't understand uh what the difference in these technologies were.

difference in these technologies were. Um, and so if you're in this world, I I

Um, and so if you're in this world, I I would highly recommend you like spending

would highly recommend you like spending some extra time. Um, this is a really

some extra time. Um, this is a really juicy chapter. I think I'll kind of

juicy chapter. I think I'll kind of close out this section, too, which is

close out this section, too, which is this. Um,

this. Um, we're in a new world. So, like there are

we're in a new world. So, like there are new databases that kind of blur this

new databases that kind of blur this line. um databases like Clickhouse,

line. um databases like Clickhouse, databases like Cockroach DB, um where

databases like Cockroach DB, um where there's even a term that's coming up

there's even a term that's coming up called hybrid transactional um like it's

called hybrid transactional um like it's HTAP I think is what it is where like

HTAP I think is what it is where like there are databases now that allow you

there are databases now that allow you to actually ask both types of questions

to actually ask both types of questions and they kind of handle both things so

and they kind of handle both things so that it reduces cognitive load and it

that it reduces cognitive load and it kind of just magically either will do

kind of just magically either will do analytics optimized queries versus uh

analytics optimized queries versus uh transactional queries and So, um, it's

transactional queries and So, um, it's probably because of books like DDIA that

probably because of books like DDIA that inspired folks to think about other ways

inspired folks to think about other ways that we could structure stuff. Um, but

that we could structure stuff. Um, but this is, it would be surprising if there

this is, it would be surprising if there wasn't amazing developments and how data

wasn't amazing developments and how data intensive applications were engineered

intensive applications were engineered from 2017 to 2026, right? Um, I mean,

from 2017 to 2026, right? Um, I mean, it's been almost 10 years since this

it's been almost 10 years since this book was written.

book was written. >> I know, right? And again, just kind of

>> I know, right? And again, just kind of crazy about her feel that 10 years ago,

crazy about her feel that 10 years ago, I mean, a lot of the stuff, it is

I mean, a lot of the stuff, it is impressive how timeless this book is,

impressive how timeless this book is, but you're right. I mean, it it's only

but you're right. I mean, it it's only been 10 years and still there's a lot of

been 10 years and still there's a lot of >> uh if not things that have become

>> uh if not things that have become obsolete from this book, but there's a

obsolete from this book, but there's a lot of missing context as to far as what

lot of missing context as to far as what has been developed since then.

has been developed since then. >> I want to devote time to chapter 4

>> I want to devote time to chapter 4 because you mentioned partic in

because you mentioned partic in particular that you really enjoyed

particular that you really enjoyed chapter 4. What about chapter 4 stood

chapter 4. What about chapter 4 stood out to you? This is encoding and

out to you? This is encoding and evolution. So this gets in partly into

evolution. So this gets in partly into like um software architecture and

like um software architecture and platform engineering. So these are like

platform engineering. So these are like near and dear to my heart and it happens

near and dear to my heart and it happens that everybody ends up running into this

that everybody ends up running into this into these problems which is how do I do

into these problems which is how do I do how do I evolve my system over time? How

how do I evolve my system over time? How do I make changes in a way that does not

do I make changes in a way that does not introduce

introduce um errors and uh irre irreversible

um errors and uh irre irreversible changes that that can cause major

changes that that can cause major problems? Also, um, he really kind of

problems? Also, um, he really kind of gets into this idea of like, okay, can I

gets into this idea of like, okay, can I do rolling upgrades? Are they backwards

do rolling upgrades? Are they backwards compatible? Like, are he also spends a

compatible? Like, are he also spends a decent amount of time thinking not just

decent amount of time thinking not just about backwards compatibility, which is

about backwards compatibility, which is actually of the of the two that we're

actually of the of the two that we're about to talk about is the easier of the

about to talk about is the easier of the problems. Can I upgrade my system so

problems. Can I upgrade my system so that an old version of the data schema

that an old version of the data schema is still compatible? like I access

is still compatible? like I access something from a backup or from an old

something from a backup or from an old part of a database and it's structured

part of a database and it's structured slightly differently than new data. The

slightly differently than new data. The other one is called forward

other one is called forward compatibility and that's actually the

compatibility and that's actually the harder one which is can I write my code

harder one which is can I write my code in a way that it's actually tolerant of

in a way that it's actually tolerant of changes that I can't imagine to the

changes that I can't imagine to the shape of that data in the future. That's

shape of that data in the future. That's actually a much harder problem. Um, so

actually a much harder problem. Um, so old co old old code reads new data.

old co old old code reads new data. That's kind of how he explains it versus

That's kind of how he explains it versus new code reads old data.

new code reads old data. >> And a resilient

>> And a resilient um data infrastructure should be able to

um data infrastructure should be able to do both and should also be able to

do both and should also be able to handle why it won't do one or the other.

handle why it won't do one or the other. Maybe we do have to for whatever reason

Maybe we do have to for whatever reason have to make an incompatible change. Um

have to make an incompatible change. Um and a lot of this goes into what he

and a lot of this goes into what he calls like so and I've heard like in Go

calls like so and I've heard like in Go we always call it marshalling and

we always call it marshalling and unmarshalling but most people call it uh

unmarshalling but most people call it uh encoding and decoding or serialization

encoding and decoding or serialization and deserialization which is you have

and deserialization which is you have some shape of a data it needs to be

some shape of a data it needs to be written in to some format that can be

written in to some format that can be written to disk which means it's in a a

written to disk which means it's in a a string of bytes some kind of bytes right

string of bytes some kind of bytes right that could be clear text as JSON it

that could be clear text as JSON it could be some highly optimized you know

could be some highly optimized you know bite encoded format um that some binary

bite encoded format um that some binary format. And of course,

format. And of course, tons of people have tried to solve this

tons of people have tried to solve this problem tons of ways that you've

problem tons of ways that you've probably been in organizations where

probably been in organizations where like I remember a data science team that

like I remember a data science team that used Pickle a lot. That's the Python way

used Pickle a lot. That's the Python way of using it's the Python native way and

of using it's the Python native way and it lets you do things like

it lets you do things like >> encapsulate the inner workings of a

>> encapsulate the inner workings of a function into the the pickling format,

function into the the pickling format, >> which can actually be super dangerous.

>> which can actually be super dangerous. But the problem we ran into is that

But the problem we ran into is that Pickle was tied to the particular

Pickle was tied to the particular version of Python you were on. So if

version of Python you were on. So if you're on version

you're on version >> that

>> that >> Yes. Exactly. And so if you're on

>> Yes. Exactly. And so if you're on version 3.7 and then you go to 3.9, well

version 3.7 and then you go to 3.9, well you now have incompatibility and the

you now have incompatibility and the pickling format itself like didn't give

pickling format itself like didn't give you a good clean way of doing forward

you a good clean way of doing forward and backwards compatibility. And so, uh,

and backwards compatibility. And so, uh, you either have to re-encode everything

you either have to re-encode everything every time you're planning to do an

every time you're planning to do an upgrade, um, or you pick what he

upgrade, um, or you pick what he advocates in the book, which is some

advocates in the book, which is some sort of data encoding format that is

sort of data encoding format that is agnostic of the programming interface

agnostic of the programming interface under the under the hood.

under the under the hood. >> It's like Steve Flanders and open

>> It's like Steve Flanders and open telemetry or mastering open telemetry

telemetry or mastering open telemetry like his the whole book is like avoid

like his the whole book is like avoid vendor lock in, avoid vendor lock in,

vendor lock in, avoid vendor lock in, use open telemetry, avoid vendor lock

use open telemetry, avoid vendor lock in. There's it's not necessarily vendor

in. There's it's not necessarily vendor lock in here, but it's a similar thing,

lock in here, but it's a similar thing, right? If you choose a data encoding

right? If you choose a data encoding format like again pickle which is

format like again pickle which is married to that particular

married to that particular implementation of a particular

implementation of a particular programming language well now you're

programming language well now you're super locked in right and so something

super locked in right and so something like JSON which has you know it it's a

like JSON which has you know it it's a language agnostic and so uh you can have

language agnostic and so uh you can have pretty flexible uh you can evolve your

pretty flexible uh you can evolve your system more flexibly so long as you

system more flexibly so long as you maintain those API contracts but another

maintain those API contracts but another point he makes is that your data will

point he makes is that your data will live far longer than your code will.

live far longer than your code will. Yeah.

Yeah. >> And so picking the right way to uh one,

>> And so picking the right way to uh one, the right data structure, but two, how

the right data structure, but two, how you're encoding and transporting that

you're encoding and transporting that data um is more important than uh the

data um is more important than uh the programming language. Now, this is

programming language. Now, this is something I don't know if he was just

something I don't know if he was just trying to do his due diligence here. I

trying to do his due diligence here. I don't know if the world has evolved

don't know if the world has evolved significantly since 2017, but he's kind

significantly since 2017, but he's kind of throwing out all of these like

of throwing out all of these like different options for encoding and and

different options for encoding and and transporting your data, whereas I feel

transporting your data, whereas I feel like today like the answer is JSON. like

like today like the answer is JSON. like just just even like he's talking about

just just even like he's talking about XML as like a viable alternative. I've

XML as like a viable alternative. I've never really I don't really see that

never really I don't really see that these days.

these days. >> Well, it's it isn't. Yeah, it is funny

>> Well, it's it isn't. Yeah, it is funny because he does give this whole section

because he does give this whole section on like oh SOAP is still around and

on like oh SOAP is still around and you're like basically doesn't exist

you're like basically doesn't exist anymore except I guarantee you somewhere

anymore except I guarantee you somewhere somewhere I know that for the longest

somewhere I know that for the longest time like Mechanical Turk over at AWS

time like Mechanical Turk over at AWS was like famously still like a SOAP

was like famously still like a SOAP client because it was such an old part

client because it was such an old part of the system. I'm sure it's different

of the system. I'm sure it's different now. But um one thing that I thought was

now. But um one thing that I thought was interesting and I don't remember him

interesting and I don't remember him mentioning it in the book

mentioning it in the book for data science particularly especially

for data science particularly especially and this is another thing that he like

and this is another thing that he like doesn't didn't really exist now. We

doesn't didn't really exist now. We don't really use data warehouses like we

don't really use data warehouses like we used to and now they're called data

used to and now they're called data lakes. This is like what snowflake and

lakes. This is like what snowflake and all these other

all these other >> organizations do and you literally use

>> organizations do and you literally use blob storage with files like you can use

blob storage with files like you can use CSV, you can use JSON. A lot of people

CSV, you can use JSON. A lot of people use colmer oriented structures like

use colmer oriented structures like parquet. Parquet is if you're in data

parquet. Parquet is if you're in data science, you're probably using parquet

science, you're probably using parquet or some similar optimized data

or some similar optimized data structure. And I I do think that like

structure. And I I do think that like while maybe some of the things that he's

while maybe some of the things that he's talking about in here are a bit dated,

talking about in here are a bit dated, it makes sense in the sense that um you

it makes sense in the sense that um you know, for instance, analytics data is

know, for instance, analytics data is very typically very sparse. there's a

very typically very sparse. there's a lot of repetition in a particular column

lot of repetition in a particular column because maybe you know there's lots of

because maybe you know there's lots of zeros and lots of 100s or whatever and I

zeros and lots of 100s or whatever and I can compact that down and store it

can compact that down and store it efficiently. So when I query a terabyte

efficiently. So when I query a terabyte worth of data to do queries, I can do

worth of data to do queries, I can do this in an efficient way. Um and so

this in an efficient way. Um and so yeah, like it's

yeah, like it's it's really interesting

it's really interesting to to um to think about why I would want

to to um to think about why I would want to encode something that's not just JSON

to encode something that's not just JSON like

like >> Right. Right. Maybe I want to encode. Um

>> Right. Right. Maybe I want to encode. Um I I think it was which one was it was

I I think it was which one was it was one I was not aware of. It came out of

one I was not aware of. It came out of Facebook.

Facebook. Um

Um came out of Facebook, but I'm trying to

came out of Facebook, but I'm trying to remember

remember what it was called.

Maybe it was Maybe it was Thrift. >> Which was the one?

>> Which was the one? >> Thrift.

>> Thrift. >> Yeah. which was the one that had the

>> Yeah. which was the one that had the ability to say there was like a writer

ability to say there was like a writer schema and a reader schema. Um, and it

schema and a reader schema. Um, and it basically could map if if you've changed

basically could map if if you've changed I can't remember which one it was now,

I can't remember which one it was now, but it was kind of cool and it was one

but it was kind of cool and it was one that was like it was made for schema

that was like it was made for schema evolution. Um, and you basically could

evolution. Um, and you basically could if you've changed the shape of the

if you've changed the shape of the schema from one to the other, this tool

schema from one to the other, this tool could like reconcile the mappings

could like reconcile the mappings between the two and then find like the

between the two and then find like the most compatible version. Anyway, it was

most compatible version. Anyway, it was kind of some cool stuff where I'm like,

kind of some cool stuff where I'm like, "Oh, that's a really clever way of

"Oh, that's a really clever way of handling that." And and again, this is

handling that." And and again, this is one of the reasons I love this chapter

one of the reasons I love this chapter is because if you're doing stuff that's

is because if you're doing stuff that's like API heavy, REST heavy, gRPCheavy

like API heavy, REST heavy, gRPCheavy type stuff, um all of the demons that

type stuff, um all of the demons that you've run into, all of the like nice

you've run into, all of the like nice design decisions are like how did we get

design decisions are like how did we get here? It's like in this chapter. So,

here? It's like in this chapter. So, yeah.

yeah. >> Right. Right. Well, and this would have

>> Right. Right. Well, and this would have benefited me my last job. we were using

benefited me my last job. we were using a lot of gRPC and Protobuff and I was

a lot of gRPC and Protobuff and I was kind of like, yeah, this is stupid. Why

kind of like, yeah, this is stupid. Why don't we just do it using HTTP and uh

don't we just do it using HTTP and uh and REST and uh and JSON, but learning

and REST and uh and JSON, but learning more from this chapter, I'm like, okay,

more from this chapter, I'm like, okay, I'm starting to see why some of those

I'm starting to see why some of those design decisions were going down. I it

design decisions were going down. I it was a, you know, a faster

was a, you know, a faster little faster, a little slimmer, and we

little faster, a little slimmer, and we were handling lots and lots of data. So,

were handling lots and lots of data. So, maybe that was the best decision. Um,

maybe that was the best decision. Um, well, we got a this podcast the time we

well, we got a this podcast the time we recorded is now limited by when I have

recorded is now limited by when I have to leave for work. So, we are [laughter]

to leave for work. So, we are [laughter] we're wrapping up here. We would and

we're wrapping up here. We would and you're seeing from this episode, right,

you're seeing from this episode, right, we're like we could devote four episodes

we're like we could devote four episodes to just part one here. Um really, this

to just part one here. Um really, this is a fantastic book. I've been enjoying

is a fantastic book. I've been enjoying it immensely. I'm very excited to finish

it immensely. I'm very excited to finish it. Um maybe I we like to do our hot

it. Um maybe I we like to do our hot takes. I don't have a ton of hot takes.

takes. I don't have a ton of hot takes. Um, aside from, you know, the book's a

Um, aside from, you know, the book's a little outdated and they're I I guess

little outdated and they're I I guess the my hot take would be like you cannot

the my hot take would be like you cannot read this book and be like this is going

read this book and be like this is going to expose me to all these different

to expose me to all these different types ways to work with the world and

types ways to work with the world and and and ways to work with my data and

and and ways to work with my data and all of them are equally valid. And so in

all of them are equally valid. And so in any project I choose from now on, I need

any project I choose from now on, I need to have this big checklist of like am I

to have this big checklist of like am I going to use thrift? Am I going to use

going to use thrift? Am I going to use Protobuff? Am I going to use JSON?

Protobuff? Am I going to use JSON? Right? No. your answer most of the time

Right? No. your answer most of the time is going to be like JSON, HTTP, REST,

is going to be like JSON, HTTP, REST, right? Um,

right? Um, >> but you may wind up in these edge cases

>> but you may wind up in these edge cases and if you wind up in these edge cases,

and if you wind up in these edge cases, having this knowledge of all these other

having this knowledge of all these other options can be very very valuable. You

options can be very very valuable. You got to know when to to break out these.

got to know when to to break out these. >> Yeah. And and I will say that

>> Yeah. And and I will say that increasingly the the type of data

increasingly the the type of data sciency work has like diverged from what

sciency work has like diverged from what people who are building web apps and

people who are building web apps and I've been lucky enough to work in that

I've been lucky enough to work in that data side of the thing and their tools

data side of the thing and their tools are starting to look less and less like

are starting to look less and less like the web apps that we're dealing with as

the web apps that we're dealing with as well. And um yeah, I think a couple hot

well. And um yeah, I think a couple hot takes. Uh he spends a section talking

takes. Uh he spends a section talking about graph databases and I kind of

about graph databases and I kind of remember in 2017 everybody was like

remember in 2017 everybody was like super excited about graph databases,

super excited about graph databases, >> right?

>> right? >> They still don't feel like they've like

>> They still don't feel like they've like had their moment in the sun. I don't

had their moment in the sun. I don't know if they will have their moment in

know if they will have their moment in the in the sun.

the in the sun. >> They're incredibly useful for Facebook.

>> They're incredibly useful for Facebook. >> Yeah, they're graph databases are cool,

>> Yeah, they're graph databases are cool, but I haven't seen it one used in like

but I haven't seen it one used in like some way where I'm like, "Wow." Like

some way where I'm like, "Wow." Like that's one thing. The other one is um oh

that's one thing. The other one is um oh I maybe this is just me like I think a

I maybe this is just me like I think a maintainability for data intensive

maintainability for data intensive applications would be a great book.

applications would be a great book. [laughter]

[laughter] I was just like I think you could go off

I was just like I think you could go off and just talk about maintainability of

and just talk about maintainability of all of these things and like not even

all of these things and like not even have the other subject matter and I

have the other subject matter and I think that would be like an amazing an

think that would be like an amazing an amazing book for folks.

amazing book for folks. >> Yeah.

>> Yeah. >> Well Nathan, what are you going to do

>> Well Nathan, what are you going to do differently in your career because

differently in your career because you've read part one?

you've read part one? >> So I love evolvable evolvable systems

>> So I love evolvable evolvable systems design. uh in this book touched on some

design. uh in this book touched on some patterns on schema evolution that I

patterns on schema evolution that I hadn't thought about. So like I do think

hadn't thought about. So like I do think a lot about how do we have a two-way

a lot about how do we have a two-way street sort of maintainable schema

street sort of maintainable schema migrations. Um I'm going to go back and

migrations. Um I'm going to go back and spend some time with some ideas that

spend some time with some ideas that were in chapter 4 and also see what uh

were in chapter 4 and also see what uh technologies have come out since 2017

technologies have come out since 2017 because I have a feeling that there's

because I have a feeling that there's probably some stuff I could learn about

probably some stuff I could learn about that's modern. Yeah.

that's modern. Yeah. >> As far as me I forgot to fill out this

>> As far as me I forgot to fill out this section of our notes and so I'm gonna do

section of our notes and so I'm gonna do it differently in my career. I'm just

it differently in my career. I'm just gonna keep reading. I'm gonna keep

gonna keep reading. I'm gonna keep reading this book.

reading this book. >> That is my commitment to everyone. I'm

>> That is my commitment to everyone. I'm going to finish designing data intensive

going to finish designing data intensive applications. And I feel like you should

applications. And I feel like you should get put on like a leaderboard or

get put on like a leaderboard or something. If you everyone talks about

something. If you everyone talks about design data intensive applications, I

design data intensive applications, I want a badge that says I read

want a badge that says I read >> I actually read it. Yeah, that's great.

>> I actually read it. Yeah, that's great. >> Um, we should make a t-shirt and sell

>> Um, we should make a t-shirt and sell it. We don't have like our store, but

it. We don't have like our store, but like like I read designing that

like like I read designing that intensive applications and all I got was

intensive applications and all I got was this lousy t-shirt. That's what we

this lousy t-shirt. That's what we should do.

should do. >> That would be great. Signed by. Yeah.

>> That would be great. Signed by. Yeah. Yeah. [laughter] Who would you recommend

Yeah. [laughter] Who would you recommend the book to, Nathan?

the book to, Nathan? >> So, um, this is for software engineers

>> So, um, this is for software engineers who are deeply curious about systems

who are deeply curious about systems architecture and want to grow in their

architecture and want to grow in their understanding and the trade-offs. Um, I

understanding and the trade-offs. Um, I think that, and again, I can only speak

think that, and again, I can only speak for part one, haven't read the rest of

for part one, haven't read the rest of it, but um, this is not a tutorial book.

it, but um, this is not a tutorial book. This is not going to sit here and like

This is not going to sit here and like tell you how to build all this stuff.

tell you how to build all this stuff. This is really about systems thinking

This is really about systems thinking and the trade-offs. Um, so if that kind

and the trade-offs. Um, so if that kind of thing sounds deeply rewarding, if you

of thing sounds deeply rewarding, if you want to get to that next level,

want to get to that next level, especially if you want to be staff or

especially if you want to be staff or some sort of engineering leadership,

some sort of engineering leadership, this is a really important book for that

this is a really important book for that kind of trajectory.

kind of trajectory. >> Yeah, I I think you you have to have

>> Yeah, I I think you you have to have your feet under you a bit. And this

your feet under you a bit. And this isn't the perfect analysis here, but I

isn't the perfect analysis here, but I would say first read the DevOps

would say first read the DevOps handbook. And if while reading the

handbook. And if while reading the DevOps handbook, you're a

DevOps handbook, you're a [clears throat] little like, okay, yeah,

[clears throat] little like, okay, yeah, I'm familiar with a lot of these

I'm familiar with a lot of these concepts. This all makes sense to me.

concepts. This all makes sense to me. you you'll learn a lot of new things

you you'll learn a lot of new things reading the DevOps handbook. But if you

reading the DevOps handbook. But if you kind of read that and are like, got it

kind of read that and are like, got it that this lines up with kind of my

that this lines up with kind of my experience and what I've done, then I

experience and what I've done, then I would say, okay, now redesigning data

would say, okay, now redesigning data inensive applications. It's not, again,

inensive applications. It's not, again, that's not a perfect comparison, but I

that's not a perfect comparison, but I just think I would not recommend this to

just think I would not recommend this to anyone who can't at least explain to me

anyone who can't at least explain to me in good detail how their application is

in good detail how their application is built, how it's deployed, how it's

built, how it's deployed, how it's monitored. Um, you know, how it's a

monitored. Um, you know, how it's a basic understanding like scalability.

basic understanding like scalability. Um,

Um, >> yep.

>> yep. >> So, get that first. But if you have a

>> So, get that first. But if you have a good understanding of how that all that

good understanding of how that all that works, hey, this is kind of the next

works, hey, this is kind of the next level.

level. >> Yeah, I would say, yeah, I like I like

>> Yeah, I would say, yeah, I like I like that idea. DevOps handbook and

that idea. DevOps handbook and fundamentals of software architecture.

fundamentals of software architecture. I'd say if you read those two and you're

I'd say if you read those two and you're like I'm hungry. I want more.

like I'm hungry. I want more. >> This is this is the obvious next step

>> This is this is the obvious next step like DDA.

like DDA. >> Those two I would recommend to any sort

>> Those two I would recommend to any sort of eager ambitious junior engineer like

of eager ambitious junior engineer like you know what someone might be over your

you know what someone might be over your head but this is great to kind of get

head but this is great to kind of get that understanding of breadth and

that understanding of breadth and understand what's going on. But I'd say

understand what's going on. But I'd say read those two first.

read those two first. >> Yes. Start tackling this.

>> Yes. Start tackling this. >> Absolutely. 100%.

>> Absolutely. 100%. >> Great. Well hey we're so excited. This

>> Great. Well hey we're so excited. This is going to be great. Um, we're going to

is going to be great. Um, we're going to cover the rest of this book across the

cover the rest of this book across the next three episodes. Thanks for tuning

next three episodes. Thanks for tuning in everyone. Uh, you can always contact

in everyone. Uh, you can always contact us at contactbook overflow.io. You can

us at contactbook overflow.io. You can find us on Twitter at bookoverflow pod.

find us on Twitter at bookoverflow pod. I'm on Twitter at Carter Morgan. Nathan

I'm on Twitter at Carter Morgan. Nathan and his consulting business rojo is at

and his consulting business rojo is at rojo.com. And his newsletter is at

rojo.com. And his newsletter is at rojo.comnewsletter.

rojo.comnewsletter. And if you like uh this is funny, I I do

And if you like uh this is funny, I I do a second podcast with my brother. It's a

a second podcast with my brother. It's a theme park podcast called Please Remain

theme park podcast called Please Remain Heated. Um, my brother is a professional

Heated. Um, my brother is a professional YouTuber. He's got like 130,000

YouTuber. He's got like 130,000 subscribers as a full-time. Anyhow, a

subscribers as a full-time. Anyhow, a lot of our audience is like his super

lot of our audience is like his super fans and so they're very interested in

fans and so they're very interested in him. Not that they don't like me. I'm

him. Not that they don't like me. I'm saying if you're if you're listening to

saying if you're if you're listening to this and you're a theme park guy, you

this and you're a theme park guy, you got to come and show up on the, you

got to come and show up on the, you know, you got to show up in the

know, you got to show up in the comments, right? And you got to let

comments, right? And you got to let people know that there's at least one

people know that there's at least one Carter Morgan super fan who is listening

Carter Morgan super fan who is listening to the podcast because you just like me.

to the podcast because you just like me. So, you know, make me look good for my

So, you know, make me look good for my brother, guys. Uh, but anyhow, and I'll

brother, guys. Uh, but anyhow, and I'll have you know on my other podcast,

have you know on my other podcast, please remain heated. I always close it

please remain heated. I always close it off saying if you're an aspiring

off saying if you're an aspiring software engineer, check out Book

software engineer, check out Book Overflow. So, you know,

Overflow. So, you know, >> we're gonna get the most minimal

>> we're gonna get the most minimal overlaps.

overlaps. >> This is an O'Reilly book. So, I didn't

>> This is an O'Reilly book. So, I didn't even think about it, but we should we

even think about it, but we should we maybe we can uh I don't know, push if

maybe we can uh I don't know, push if come on over join us on the Discord and

come on over join us on the Discord and maybe we'll have something related to

maybe we'll have something related to this on Discord and maybe we'll do a

this on Discord and maybe we'll do a book giveaway over there. Riley, we're

book giveaway over there. Riley, we're still working out the kinks. Um,

still working out the kinks. Um, >> I can't promise anything, but if you

>> I can't promise anything, but if you join us on the Discard Discord and ask

join us on the Discard Discord and ask how to get a free book or if you post

how to get a free book or if you post this on LinkedIn and and tag us uh and

this on LinkedIn and and tag us uh and tag the episode, um, like we we'll do

tag the episode, um, like we we'll do our best to take care of you and we'll

our best to take care of you and we'll figure this out by next week to to know

figure this out by next week to to know what exactly we can offer.

what exactly we can offer. >> Yeah, exactly. We're amateurs when it

>> Yeah, exactly. We're amateurs when it comes to this stuff. Okay,

comes to this stuff. Okay, >> I know, right? We're pretty good

>> I know, right? We're pretty good software engineers. Um, as far as

software engineers. Um, as far as running a podcasting business, we are

running a podcasting business, we are learning every episode. That was uh a

learning every episode. That was uh a ton of fun. Thanks folks. Uh we'll see

ton of fun. Thanks folks. Uh we'll see you next week for the part two roughly

you next week for the part two roughly of designing data intensive

of designing data intensive applications.

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Reliability, Scalability, and Maintainability - Designing Data-Intensive Applications by Kleppman