YouTube Transcript:
Why Spaghetti Code Beat Clean Architecture

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

A junior engineer's experience highlights that while clean code is important, it is insufficient for production readiness. True system success relies on a holistic approach encompassing infrastructure, observability, and an operational culture that prioritizes resilience over mere code elegance.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

I watched a senior engineer with 8 years

of experience build really good code,

good patterns, solid test coverage,

looked great on paper. Then I watched it

crash and burn in production at onetenth

of the scale of a legacy code it was

supposed to replace. I was the mid-level

on the team who saw the problems early

tried to raise concerns three different

times and nobody listened. This is how I

learned that beautiful code is

completely worthless. It's 2018. I'm

working with a midsized e-commerce

company. They do about 50 million in

annual repeating revenue. Seven backend

engineers, three maintaining the legacy

PHP system, four of us building the

future. I was on the future team. Well,

management brings in Steven, 8 years of

experience, recently gave a talk called

why code quality is your competitive

advantage. I'm only 2 years in. I'm

excited to learn from someone like this.

First team meeting, Steven's on the

whiteboard for hours showing

architecture, repository patterns, event

sourcing, and everything that you need.

all the clean code book stuff. And I'm

thinking, this is really impressive.

This is how you're supposed to build

systems. But there's something in how he

talks about it. This confidence that

borders on dismissiveness. He waves his

hand at the legacy system, saying, "This

is exactly why we need this migration to

happen now. Clean code is what matters."

I'm cautiously optimistic. The patterns

make sense, but something feels off

about ignoring everything else. What I

didn't realize was that Steven was

optimizing for like the wrong backend

pillar and I was about to watch it play

out in slow motion. Steven's first PR

comes in. The code is clean. It's nicely

structured. It has proper dependency

injection. It follows everything you'd

want it to follow. He had a few

comments, mostly positive. Two comments

that were talking about caching, but he

replied and said, you know, we don't

want too much premature optimization.

Then Steven assigns us, me and a couple

other people to study the legacy PHP

system. You know, understanding why

we're migrating. I open it. The legacy

system is really messy. One of the files

has 4,847

lines. It has no classes, SQL

concatenation, magic numbers, pure

spaghetti. Steven also looked at it, and

in standup, he mentioned that it was

worse than he thought. But I'm curious

because it's currently running properly.

Of course, it could be better, you know,

code quality, but it worked. So I dug

deeper and I started to find things

buried in the configuration file

scattered through the deployment

scripts. The whole hidden infrastructure

of this legacy application. It has a

three- tier caching and the cache rate

is 91%. It also has 47 database indexes

on the orders table alone. partitioning

by month, three read replicates, and

then I found custom metrics for clients,

240 to be exact, tracked across every

step of checkout, error rates by payment

method, latency percentiles, and

real-time dashboards. It had everything

you needed in case a disaster happened.

So, I went back and looked at the Black

Friday of 2017 data cuz it was the year

before, and it had 85,000 concurrent

users, 77,000 orders, 246 milliseconds

on average with an uptime of 99.4%. I'm

staring at these numbers thinking like,

"Wow, this is really impressive." So, I

grabb coffee with Mike. He's the legacy

team lead. 15 years at the company. I

show him the numbers. I remember telling

him, "Mike, the code's a mess, but these

numbers are really good. How did you do

it?" I still remember today cuz I felt

like it felt funny cuz I just remember

him leaning back, taking a sip of his

coffee, and saying, "Code quality is

just one dimension, Eric.

Infrastructure, database design,

observability, those matter, too. Maybe

more when you're under load than you

think." that still sticks with me today.

Code's just one dimension. We give so

much credit and so much effort toward

code, but it's only one dimension. It's

only one pillar of building software.

So, I bring it up in our planning. I

said like, "Hey, Stephen, you know, I

looked at the Black Friday metrics and

they handled 85,000 concurrent users.

Should we be looking at more caching and

observability to kind of match what they

were doing in production today?" And I

remember like Steven like barely even

looking at me. That's just compensating

for bad code. Eric, clean architecture

doesn't need all that infrastructure.

Trust me. And I'm thinking, well, the

data doesn't lie, right? But he has 8

years. I have 2 years. Maybe I'm missing

something. This was the first time I

raised a concern and I was shut down

almost instantly. Over the next 2

months, we start building out Steven's

vision. Uh the code is very clean. I'm

going to give Stephen all the credit for

that. It's clean. It has uh clean

structures. You know, the patterns are

great. Everything's testable. It's very

satisfying. But that question keeps

nagging me, you know, like will it

scale? While Steven's perfecting the

code, infrastructure is delegated to a

DevOps engineer who's never scaled

before. Like that DevOps engineer really

knew what they were doing, but like not

in the standpoint of like how to scale

users. And on top of it, we barely used

caching in staging for this new system.

Caching only hit 23%. The legacy app had

91%. And observability, we had basic

logs, but no custom metrics, no query

visibility. So, I do something I've

never done before. I start documenting

everything. I create a document called

like production readiness concerns or

something, whatever I was thinking at

the time. And I list them out as the

cache hit rate. We have 23%. Legacy

system was 91. You know, we have no

query monitoring. The production one has

a ton of query monitoring. We have no

sustained load testing. We're missing

composite indexes. I spent a little bit

of time adding links to all the legacy

metrics. And I added some suggestions

and I I sent it over to Steven. His

response was what you'd expect. again.

You know, thanks for thinking about

this, Eric. But we don't want to

overengineer the first MVP. We'll

address issues if they show up in beta.

But beta was October. Black Friday is

November. So, let's go ahead and fast

forward it to the week before beta. We

load test 1,000 current users. It works.

No issues at all. And you know why?

Because we're testing with 1,000 users,

not the 85,000 users that was on Black

Friday last year. And I bring this up

again that we need to test with more

people. Steven looks at me and the rest

of the team. Eric, I appreciate you

being thorough, but we can't let

perfection be the enemy of good enough,

which is something I believe in. I

always talk about good enough, but good

enough means good enough for what you're

trying to do, not just shipping code.

But other engineers nod. It sounds

reasonable when someone says it, whether

they're right in the context or not.

Everyone's, you know, is confident, so I

stay quiet. Two times now. Two times

I've raised concerns. Two times I was

told not to worry. I didn't know it yet,

but I was running out of chances. Fast

forward to the beta launch day. You

know, we we flipped the feature flag.

New checkout enabled for 5% of the

users. You know, I'm at my desk with

everybody else watching. We only have

three metrics. That's all we have. CPU

15% looks good. Memory 38%. Okay. Air

rate 0%. It's exactly what we want. But

I still can't shake the mindset of the

244 custom metrics that the other system

had that we're not looking at. We

looking at connection um pool

utilization? No. The cache hit rate? No.

the query performances. No. After

launch, the first order is complete. It

was fast, 142 milliseconds. But 30

minutes later, we got a customer support

message in a Slack channel of like #

incidents. Getting reports of slow

checkout. Is something wrong. I remember

me and Steven looking at the dashboard.

We we look at the numbers. He types in

Slack. Probably just perception. Numbers

look fine. But then it happens again and

again. And then we finally get the

dreaded database connection error

connection pool exhausted. I refresh the

dashboard. Air rate is climbing. Support

tickets start coming in. And here's what

we couldn't see. All the issues because

we weren't adding in the metrics behind

the dashboard showing the CPU and the

memory. We couldn't see the queries or

the database connections. Our beautiful,

beautiful repository pattern was lazy

loading everything. We had to roll back

our modern application to the legacy

app. The roll back takes 47 minutes. At

the time, it was the longest 47 minutes

of my career. And here's the thing, it

wasn't even Black Friday. It wasn't even

the big day of the 85,000 concurrent

users that I had in the back of my mind.

And this was just beta. We probably had

a little over a thousand. We load tested

at like probably the maximum that we

thought would happen on that day, but we

were so close to crashing. And that's

not even close to the 85,000 we had on

Black Friday the year before. So, I take

a breath. I think there are four pillars

to a production system. We nailed one.

Code quality. Didn't really nail it all

that well. We ignored three. We go over

each pillar. There's the first pillar,

which is code quality. Yes, our code was

clean, beautiful, even. And that does

matter. Clean code is easier to

maintain. is easier to onboard to,

easier to reason about. Those are real

benefits of having clean code. But clean

code doesn't automatically mean scalable

code. Doesn't automatically mean you're

building the right thing for the

product. Let me show you what I mean.

Our repository pattern was textbook

example of a good design. Had single

responsibility, dependency injection. It

was testable. Everything the clean code

book tells you to do. Here's what it

looked like conceptually. We had a

product repository that handles

products. We had a category repository

that handled categories. We had a user

repository which you probably can guess

what it did. It handled users. They were

separated. They were clean. They were

easy to test. But this is what happened.

We called the get product details for a

single product page. First, we fetch the

product. One query makes sense. Then the

product object has a category property.

We access it. We lazy load the second

query. That category has a parent

category. We access it. Third query. The

product has reviews. Now, we can say it

has like 10 reviews for this example,

but if it had 10 reviews, that's 10 more

queries. We were falling into the N+1

trap. In the legacy system, the ugly 4,847

4,847

line file had two queries. One query

with a massive join that gets the

product category in the parent category

and one query that batch fetched all the

reviews for the user. Is it ugly?

Absolutely. Can you maintain it easily?

Probably not. But can you see every

query? Does it have the metrics to get

you by? Yes. Our clean abstractions made

the expensive operations invisible. And

here's the thing that really bothers me.

We never would have caught this in code

review. Look at this from like a code

review process. Is the code modular?

Yes. Check. Is it testable? Yes. Is it

follow solid principles? Yes. Approved.

Ship it. No one was looking at the

queries because the queries were hidden

behind. Nobody asked how many queries

does this generate because you can't see

it. The abstraction hid it from us.

Steven was working on the queries. We

just called them through our repository.

And if you ask Eric, why didn't you

catch it when you were testing? Well,

that's because we were creating mocks.

We genuinely never tested the one thing

that matters, and that's behavior under

real production load. And it's not just

queries, it's the memory, too. Now,

pillar two is the infrastructure and

scaling strategy. This is where we

really, really failed as a team. And I

own some of this, too. I saw the

infrastructure gap, and I didn't push

hard enough. The legacy system had three

tier caching, application, reddus, and

CDN. had database partitioning by month,

three read replicas, custom connection

pooling. We built our code first and

delegated infrastructure to a DevOps

engineer who's never scaled an

e-commerce store before. That should

have been a red flag in itself, but we

just kept moving forward. We had minimal

caching, 23%, one database instance.

Steven said something in the planning

that like I can't get out of my mind,

and it's that clean architecture won't

need all that infrastructure. I said

that earlier in the story, and that's

just simply not how it works.

Infrastructure doesn't compensate for

bad code. Infrastructure is like the

foundation for the code. Underload,

infrastructure beats whatever you're

trying to do in your architecture almost

every single time. Because when you have

85,000 concurrent users, even the most

perfect optimized query takes time. Even

if you get the query down to 5

milliseconds, multiply that by 85,000,

you need more infrastructure. There's a

reason all these giant tech companies

have crazy infrastructures. You need

caching to handle repeated requests. Do

you need connection pooling to manage

database connections? You need read

replicas to distribute load. This isn't

optional. This is literally the

baseline. So here's the question you

need to ask yourself every time you

build a system. How and when will you

need to scale to like 10 times the

traffic? And if it's kind of soon, can

your database handle it? Can your cache

handle it? Can your network handle it?

If you can't answer those questions,

your system is not ready to be scaled.

And if your system's not ready to be

scaled and because you don't have the

users, that's fine. But if you have the

users, you need to figure it out. Now,

pillar three is observability. This one

hurts the most because we could have

caught this. The legacy system has 247

metrics. Like I mentioned earlier, we

built three. When things started going

wrong, we were completely blind. We

could see error rates climbing, but we

couldn't see why. We couldn't see which

queries were slow, which endpoints were

struggling, where the bottlenecks were.

Simply put, you can't fix what you can't

see. Observability isn't just logging.

Logs tell you what happened. That's

valuable, but metrics tell you why.

Here's what we should have implemented

from day one. We should have implemented

some type of query performance metrics.

We should have implemented some resource

utilization metrics. We should have

implemented, you know, some business

metrics over how everything's working.

The legacy system had all of this.

That's how Mike and his team knew they

could handle Black Friday. They watch

the metrics climb and they knew exactly

where to put their ceilings to match

where the metrics could go. We didn't

know our ceiling until we hit our head

on it. And here's the thing, it's not

about building a monitoring system from

scratch. You know, you can use tools.

There's tools that exist. Data dog, New

Relic, it doesn't matter. Just pick one.

Instrument your code. Expose the metrics

that matter. Observability is the

difference between like responding to a

fire and preventing one. And as a

backend engineer, you always want to

prevent it. Now, pillar four is the

operational culture. This is the part

that nobody teaches you, and it's by far

one of the most important ones. Culture.

The culture you're in kills more systems

than bad code ever will. And here's what

I mean. I raised concerns three times.

One, in planning when I showed the Black

Friday metrics and asked about caching.

Two, in the code review, I documented

production readiness concerns. Three,

before beta, I questioned the load test

scope. Each time I was told the same

thing, don't overengineer. We'll fix

issues when they come up. They're like

real on trust me, bro moments. Those

kind of messages shut down discussions

because Steven had 8 years of experience

and only had two. His confidence was

treated as expertise. But confidence and

expertise aren't the same thing. Steven

was confident in clean code because he's

he has seen it work. But he's never

scaled an application to 85,000

concurrent users. And neither have I.

Neither had anyone on our modern team.

The only person was Mike. And we weren't

even talking to Mike. We had a culture

that rewarded elegance over resilience

that treated production readiness as a

nice to have instead of a requirement.

And that culture always comes from the

top. When leadership rewards beautiful

demos over boring reliability, you get

what we got. Real engineering isn't just

about writing code. It's about designing

systems that can survive contact with

reality. If you've ever been on a team

where clean code got prioritized over

operational reality, let me know in the

comments. And if you want me to break

down, you know, each one of those

pillars a little bit more, let me know.

I'll make videos on them. Thanks for

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Why Spaghetti Code Beat Clean Architecture