0:02 90% of cloud code was written by claude
0:04 code. Codeex is releasing features
0:07 entirely written by codecs. And yet most
0:10 developers using AI empirically get
0:12 slower, at least at first. The gap
0:13 between these two facts is where the
0:15 future of software lives. Imagine
0:18 hearing this at work. Code must not be
0:20 written by humans. Code must not be even
0:23 reviewed by humans. Those are the first
0:24 two principles of a real production
0:27 software team called Strong DM and their
0:30 software factory. They're just three
0:32 engineers. No one writes code. No one
0:35 reviews code. The system is a set of AI
0:37 agents orchestrated by markdown
0:39 specification files. The system is
0:41 designed to take a specification, build
0:43 the software, test the software against
0:45 real behavior scenarios, and
0:48 independently ship it. All the humans do
0:50 is write the specs and evaluate the
0:53 outcomes. The machines do absolutely
0:55 everything in between. As I was saying,
0:58 meanwhile, 90% and yes, it's true. Over
1:00 at Anthropic, 90% of Claude Code's
1:02 codebase was written by Claude Code
1:04 itself. Boris Triny, who leads the
1:06 Claude Code project at Anthropic, hasn't
1:08 personally written code in months. And
1:10 Anthropic's leadership is now estimating
1:12 that functionally 100% the entirety of
1:14 code produced at the company is AI
1:17 generated. And yet at the same time, in
1:19 the same industry, with us here on the
1:22 same planet, a rigorous 2025 randomized
1:24 control trial by METR found that
1:27 experienced open-source developers using
1:32 AI tools took 19% longer to complete
1:34 tasks than developers working without
1:36 them. There is a mystery here. They're
1:38 not going faster, they're going slower.
1:40 And here's the part that should really
1:42 unsettle you. Those developers are bad
1:45 at estimation. They believed AI had made
1:48 them 24% faster. They were wrong not
1:50 just about the direction but about the
1:53 magnitude of the change. Three teams are
1:56 running lights out software factories.
1:57 The rest of the industry is getting
1:59 measurably slower. Just a few teams
2:02 around tech are running truly lights out
2:04 software factories. The rest of the
2:06 industry tends to get measurably slower
2:08 while convincing themselves and everyone
2:09 around them with press releases that
2:11 they're speeding up. The distance
2:14 between these two realities is the most
2:16 important gap in tech right now and
2:19 almost nobody is talking honestly about
2:21 it and what it takes to cross it. That
2:22 is what this video is about. Dan
2:25 Shapiro, the CEO over at Glow Forge and
2:26 the veteran of multiple companies built
2:28 on the boundary between software and
2:30 physical products, just published a
2:32 framework earlier this year in 2026 that
2:35 maps where the industry stands. He calls
2:37 it the five levels of vibe coding. And
2:38 the name is deliberately informal
2:40 because the underlying reality is what
2:43 matters. Level zero is what he calls
2:45 spicy autocomplete. You type the code,
2:48 the AI suggests the next line. You
2:50 accept or reject. This is GitHub copilot
2:52 in its original format. Just a faster
2:54 tab key. The human is really writing the
2:56 software here. And the AI is just
2:57 reducing the keystrokes and the effort
3:00 your fingers have. Level one is coding
3:02 intern. You hand the AI a discrete well
3:05 scoped task. You write the function. You
3:06 build the component. You refactor the
3:08 module. That's the task you give the AI.
3:10 You hand the AI a discrete and well
3:13 scoped task like write this function or
3:15 build this component or refactor this
3:17 module. You then review as the human
3:19 everything that comes back. The AI
3:21 handles the tasks. The human handles the
3:22 architecture, the judgment and the
3:24 integration. Do you see the pattern
3:25 here? Do you see how the human is
3:27 stepping back more and more through
3:29 these levels? Let's keep going. Level
3:31 two is the junior developer. The AI
3:33 handles multifile changes. It can
3:35 navigate a codebase. It can understand
3:36 dependencies. It can build features that
3:39 span modules. You're reviewing more
3:41 complicated output, but you as a human
3:42 are still reading all of the code.
3:45 Shapiro estimates that 90% of developers
3:48 who say they are AI native are operating
3:49 at this level. And I think from what
3:51 I've seen, he's right. Software
3:53 developers who operate here think
3:55 they're farther along than they are.
3:57 Let's move on. Level three, the
3:59 developer is now the manager. This is
4:01 where the relationship starts to flip.
4:02 This is where it gets interesting.
4:04 You're now not writing code and having
4:06 the AI help. You're simply directing the
4:08 AI and you're reviewing what it
4:11 produces. Your day is whether you want
4:12 to read, whether you want to approve,
4:14 whether you want to reject, but at the
4:17 feature level, at the PR level. The
4:18 model is doing the implementation. The
4:21 model is submitting PRs for your review.
4:23 You have to have the judgment. Almost
4:26 everybody tops out here right now. Most
4:27 developers, Shapiro says, hit that
4:30 ceiling at level three because they are
4:33 struggling with the psychological
4:35 difficulty of letting go of the code.
4:37 But there are more levels. And this is
4:39 where it gets spicy and exciting. Level
4:41 four is the developer as the product
4:44 manager. You write a specification, you
4:46 leave, you come back hours later and
4:48 check whether the tests pass. You're not
4:50 really reading the code anymore. You're
4:52 just evaluating the outcomes. The code
4:54 is a black box. you care whether it
4:56 works, but because you have written your
4:59 eval so completely, you don't have to
5:01 worry too much about how it's written if
5:03 it passes. This requires a level of
5:06 trust both in the system and in your
5:08 ability to write spec. And that quality
5:10 of spec writing almost nobody has
5:13 developed well yet. Level five, the dark
5:16 factory. This is effectively a black box
5:18 that turns specs into software. It is
5:20 where the industry is going. No human
5:23 writes the code. No human even reviews
5:26 the code. The factory runs autonomously
5:29 with the lights off. Specification goes
5:32 in, working software comes out. And you
5:34 know, Shapiro is correct. Almost nobody
5:36 on the planet operates at this level.
5:38 The rest of the industry is mostly
5:40 between level one and level three, and
5:42 most of them are treating AI kind of
5:44 like a junior developer. I like this
5:46 framework because it gives us really
5:48 honest language for a conversation
5:50 that's been drowning in hype. When a
5:52 vendor tells you their tool writes code
5:55 for you, they often mean level one. When
5:57 a startup says they're doing agentic
5:59 software development, they often mean
6:01 level two or three. But when strong DM
6:03 says their code must not be written by
6:06 humans, they really do mean level five,
6:08 the dark factory, and they actually
6:11 operate there. The gap between marketing
6:13 language and operating reality is
6:16 enormous. and collapsing that gap into
6:18 what is actually going on on the ground
6:21 requires changes that go way beyond
6:24 picking a better AI tool. So many people
6:26 look at this problem and think this is a
6:28 tool problem. It's not a tool problem.
6:31 It's a people problem. So what does
6:34 level five software development actually
6:37 look like? I think strong DM software
6:38 factory is the most thoroughly
6:40 documented example of level five in
6:42 production. Simon Willis, one of the
6:44 most careful and credible observers in
6:46 the developer tooling space, calls
6:49 StrongDm Software Factory, quote, "The
6:51 most ambitious form of AI assisted
6:53 software development that I've seen
6:55 yet." The details are really worth
6:57 digging into here because they reveal
6:59 what it looks like to run a dark factory
7:02 for software on today's agents. And as
7:05 we have this discussion, I want you to
7:07 keep in mind that for most of us
7:09 listening, we are getting to time
7:12 travel. We are seeing how a bold vision
7:14 for the future can be translated into
7:16 reality with today's agents and today's
7:19 agent harnesses. It is only going to get
7:22 easier as we go into 2026 which is one
7:25 of the reasons I think this is going to
7:27 be a massive center of gravity for
7:29 future agentic software development
7:31 practices. We are all going to level
7:34 five. So what does strong DM do? The
7:36 team is three people. Justin McCarthy,
7:39 CTO, Jay Taylor, and Nan Chowan. They've
7:41 been running the factory since July of
7:44 last year, actually. And the inflection
7:46 point they identify is Claude 3.5
7:49 Sonnet, which shipped actually in the
7:52 fall of 2024. That's when long horizon
7:54 agentic coding started compounding
7:56 correctness more than compounding
7:58 errors. Give them credit for thinking
8:00 ahead. Almost no one was thinking in
8:03 terms of dark factories that far back.
8:06 But they found that 3.5 sonnet could
8:09 sustain coherent work across sessions
8:11 long enough that the output was reliable
8:14 and it wasn't just a flash in the pan.
8:16 It wasn't just demo worthy and so they
8:18 built around it. The factory runs on an
8:19 open-source coding agent called
8:22 attractor. The repo is just three
8:24 markdown specification files and that's
8:27 it. That's the agent. The specifications
8:29 describe what the software should do.
8:31 The agent reads them. It writes the code
8:33 and it tests it. And here's where it
8:35 gets really interesting and where most
8:37 people's mental model really starts to
8:40 break down. Strong DM doesn't actually
8:42 use traditional software tests. They use
8:44 what they call scenarios. And the
8:46 distinction is important. Tests
8:48 typically live inside the codebase. The
8:50 AI agent can read them, which means the
8:53 AI agent can intentionally or not
8:55 optimize for passing the tests rather
8:58 than building correct software. It's the
9:00 same problem as teaching to the test in
9:02 education. You can get perfect scores
9:04 and shallow understanding. Scenarios are
9:06 different. Scenarios live outside the
9:08 codebase. They're behavioral
9:10 specifications that describe what the
9:12 software should do from an external
9:15 perspective, stored separately so the
9:16 agent cannot see them during
9:19 development. They function as a holdout
9:21 set. The same concept that machine
9:23 learning users use to prevent
9:25 overfitting. The agent builds the
9:27 software and the scenarios evaluate
9:30 whether the software actually works. The
9:32 agent never sees the evaluation
9:34 criteria. It can't game the system. This
9:36 is really a new idea in software
9:38 development and I don't see it
9:40 implemented very frequently yet. But it
9:42 solves a problem that nobody was
9:44 thinking about when all the code was
9:46 written by humans. When humans write
9:48 code, we don't tend to worry about the
9:50 developer gaming their own test suite
9:52 unless incentives are really, really
9:54 skewed at that organization and then you
9:57 have bigger problems. When AI writes the
10:00 code, optimizing for test passage is the
10:02 default behavior unless you deliberately
10:04 architect around it. And it's one of the
10:07 most important differences to really
10:09 understand as you start to think about
10:11 AI as a code builder. Strongdm
10:14 architected around that with external
10:16 scenarios. The other major piece of the
10:18 architecture is what StrongDM calls
10:21 their digital twin universe. Behavioral
10:24 clones of every external service the
10:26 software interacts with. a simulated
10:29 octa, a simulated Jira, a simulated
10:31 Slack, Google Docs, Google Drive, Google
10:34 Sheets. The AI agents develop against
10:36 these digital twins, which means they
10:38 can run full integration testing
10:41 scenarios without ever touching real
10:44 production systems, real APIs, or real
10:46 data. It's a complete simulated
10:48 environment purpose-built for autonomous
10:50 software development. And the output is
10:53 real. CXDB, their AI context store, has
10:55 16,000 lines of Rust, nine and a half
10:58 thousand lines of Go, and 700 lines of
11:00 TypeScript. It's shipped, it's in
11:01 production, it works, it's real
11:03 software, and it's built by agents end
11:04 to end. And then the metric that tells
11:07 you how seriously they take it. They say
11:10 if you haven't spent $1,000 per human
11:12 engineer, your software factory has room
11:15 for improvement. I think they're right.
11:17 That's not a joke. $1,000 per engineer
11:20 per day enables AI agents to run at a
11:23 volume that makes the cost of compute
11:25 meaningful if you are giving them a
11:27 mission to build software that has real
11:30 scale and real utility in production use
11:32 cases and it's often still cheaper than
11:34 the humans they're replacing. Let's hop
11:36 over and look at what the hyperscalers
11:39 are doing. The self-referential loop has
11:41 taken hold at both anthropic and open
11:43 AAI and it's stranger than the hype
11:46 might make it sound. Codex 5.3 is the
11:47 first frontier AI model that was
11:50 instrumental in creating itself. And
11:51 that's not a metaphor. Earlier builds of
11:53 Codeex would analyze training logs,
11:55 would flag failing tests, and might
11:58 suggest fixes to training scripts. But
12:01 this model shipped as a direct product
12:04 of its own predecessors coding labor.
12:07 OpenAI reported a 25% speed improvement
12:11 and 93% fewer wasted tokens in the
12:14 effort to build Codeex 5.3. And those
12:16 improvements came in part from the model
12:19 identifying its own inefficiencies
12:21 during the build process. Isn't that
12:22 wild? Cloud code is doing something
12:25 similar. 90% of the code in Claude Code,
12:27 including the tool itself, was built by
12:29 Claude Code, and that number is rapidly
12:31 converging toward 100%.
12:34 Boris Churny isn't joking when he talks
12:35 about not writing code in the last few
12:37 months. He's simply saying his role has
12:40 shifted to specification, to direction,
12:43 to judgment. Anthropic is estimating all
12:45 of their company moving to entirely AI
12:48 generated code about now. Everyone at
12:51 Anthropic is architecting and the
12:52 machines are implementing. And the
12:55 downstream numbers tell the same story.
12:57 When I made a video on co-work and
12:59 talked about how it was written in 10
13:02 days by four engineers, what I want you
13:04 to remember is it wasn't just four
13:06 engineers hyperting so that they could
13:08 get that out super fast and write every
13:11 line by hand. No, no, no. They were
13:14 directing machines to build the code for
13:16 co-work. And that's why it was so fast.
13:19 4% of public commits on GitHub are now
13:21 directly authored by Claude Code, a
13:23 number that Anthropic thinks will exceed
13:25 20% by the end of this year. I think
13:27 they're probably right. Claude Code by
13:30 itself has hit a billion dollar run rate
13:33 just 6 months since launch. This is all
13:36 real today in February of 2026. The
13:38 tools are building themselves. They're
13:40 improving themselves. is they're
13:42 enabling us to go faster at improving
13:44 themselves and that means the next
13:46 generation is going to be faster and
13:47 better than it would have been otherwise
13:49 and we're going to keep compounding. The
13:53 feedback loop on AI has closed and the
13:55 question is not whether we're going to
13:57 start using AI to improve AI. The
13:59 question is how fast that loop is going
14:02 to accelerate and what it means for the
14:04 40 or 50 million of us around the world
14:05 who currently build software for a
14:08 living. This is true for vendors as much
14:10 as it's true for software developers.
14:11 And I don't think we talk about that
14:13 enough because the gap between what's
14:15 possible at the frontier in February of
14:18 2026 and what tends to happen in
14:20 practice and what vendors want to sell
14:23 has never been wider. That MER study, a
14:24 randomized control trial, by the way,
14:27 not a survey, found that open source
14:29 developers using AI coding tools
14:32 completed their task 19% slower. We
14:33 talked about that, right? The
14:34 researchers controlled for task
14:36 difficulty. They controlled for
14:38 developer experience. They controlled
14:40 even for tool familiarity and none of it
14:42 mattered. AI made even experienced
14:45 developers slower. Why? In a world where
14:48 co-work can ship that fast. Why? Because
14:50 the workflow disruption outweighed the
14:53 generation speed. Developers spent time
14:56 evaluating AI suggestions, correcting
14:58 almost right code, context switching
15:00 between their own mental model and the
15:02 model's output, and debugging really
15:04 subtle errors introduced by generated
15:06 code that looked correct but weren't.
15:09 46% of developers in broader surveys say
15:11 they don't fully trust AI generated
15:13 code. These guys aren't lites, right?
15:15 This is experienced engineers running
15:18 into a consistent problem. The AI is
15:19 fast, but it struggles with the
15:22 reliability to trust without what they
15:25 view as vital human review. And this
15:28 irony is the J curve that adoption
15:30 researchers keep identifying. When you
15:33 bolt an AI coding assistant onto an
15:36 existing workflow, productivity dips
15:38 before it gets better. It goes down like
15:40 the bottom of a J. Sometimes for a
15:42 while, sometimes for months. And the dip
15:44 happens because the tool changes the
15:46 workflow, but the workflow has not been
15:49 redesigned around the tool explicitly.
15:51 And so you're kind of running a new
15:54 engine on old transmission. The gears
15:55 are going to grind. Most organizations
15:57 are sitting in the bottom of that J
15:59 curve right now. And many of them are
16:02 interpreting the dip as evidence that AI
16:04 tools don't work, that the vendors did
16:06 not tell them the truth, and that the
16:08 evidence that their workflows haven't
16:11 adapted is really evidence that AI is
16:13 hype and not real. I think GitHub
16:15 Copilot might be the clearest
16:17 illustration of this. It has 20 million
16:20 users, 42% market share among AI coding
16:22 tools, apparently. Uh, and lab studies
16:25 show 55% faster code completion on
16:28 isolated tasks. I'm sure that makes the
16:30 people driving GitHub Copilot happy in
16:32 their slide decks. But in production,
16:35 the story is much more complicated.
16:36 There are larger poll requests. There
16:38 are higher review costs. There's more
16:40 security vulnerabilities introduced by
16:43 generated code. And developers are
16:44 wrestling with how to do it well. One
16:46 senior engineer put it really sharply.
16:49 C-Ilot makes writing code cheaper but
16:51 owning it more expensive. And that is
16:52 actually a very common sentiment I've
16:54 heard across a lot of engineers in the
16:56 industry. not just for co-pilot but for
16:58 AI generated code in general. The
17:00 organizations that are seeing
17:02 significant call it 25 30% or more
17:05 productivity gains with AI are not the
17:08 ones that just installed co-pilot had a
17:10 one-day seminar and called it done.
17:12 They're the ones that thought carefully
17:14 went back to the whiteboard and
17:16 redesigned their entire development
17:19 workflow around AI capabilities.
17:20 changing how they write their specs,
17:22 changing how they review their code,
17:24 changing what they expect from junior
17:26 versus senior engineers, changing their
17:28 CI/CD pipelines to catch the new
17:30 category of errors that AI generated
17:33 code introduces. End to end process
17:35 transformation. It's not about tool
17:37 adoption. And end toend transformation
17:40 is hard. It's sometimes it's politically
17:42 contentious. It's expensive. It's slow
17:44 and most companies don't have the
17:46 stomach for it. Which is why most
17:48 companies are stuck at the bottom of the
17:50 J curve. Which is why the gap between
17:53 frontier teams and everyone else is not
17:55 just widening, it's accelerating
17:57 rapidly. Because those teams on the edge
17:59 that are running dark factories, they
18:01 are positioned to gain the most. As
18:05 tools like Opus 4.6 and Codeex 5.3
18:08 enable widespread agentic powers for
18:10 every software engineer on the planet.
18:12 95% of those software engineers don't
18:14 know what to do with that. It's the ones
18:15 that are actually operating at level
18:18 four, level five that truly get the
18:20 multiplicative value of these tools. So
18:22 if this is a politically contentious
18:24 problem, if this is not just a tool
18:26 problem but a people problem, we need to
18:29 look at the nature of our software
18:31 organizations. Most software
18:33 organizations were designed to
18:36 facilitate people building software.
18:38 every process, every ceremony, every
18:41 role. They exist because humans building
18:44 software in teams need coordination
18:46 structures. Stand-up meetings exist
18:47 because developers working on the same
18:50 codebase, they got to synchronize every
18:52 single day. Sprint planning exists
18:54 because humans can only hold a certain
18:56 number of tasks in working memory and
18:58 then they need a regular cadence to rep
19:00 prioritize. Code review exists because
19:02 humans make mistakes that other humans
19:05 can catch. QA teams exist because the
19:07 people who build software, they can't
19:09 evaluate it objectively. You get the
19:12 idea. Every one of these structures is a
19:14 response to a human limitation. And when
19:16 the human is no longer the one writing
19:19 the code, the structures, they're not
19:22 optional, they're friction. So what does
19:24 sprint planning look like when the
19:26 implementation happens in hours, not
19:28 weeks? What does code review look like
19:31 when no human wrote the code and no
19:34 human can really review the diff that AI
19:35 produced in 20 minutes because it's
19:37 going to produce another one in 20 more
19:39 minutes. So what does a QA team do when
19:42 the AI already tested against scenarios
19:43 it was never shown? Strong BM's
19:46 threeperson team doesn't have sprints.
19:48 They don't have standups. They don't
19:50 have a Jiraa board. They write specs and
19:53 they evaluate outcomes. That is it.
19:55 The entire coordination layer that
19:57 constitutes the operating system of a
19:59 modern software organization. The layer
20:02 that most managers spend 60% of their
20:05 time maintaining is just deleted. It
20:07 does not exist. Not because it was
20:09 eliminated as a cost-saving measure, but
20:12 because it no longer serves a purpose.
20:13 This is the structural shift that's
20:16 harder to see than the tech shift, and
20:18 it might matter more. The question is
20:19 becoming what happens to the
20:21 organizational structures that were
20:24 built for a world where humans write
20:26 code? What happens to the engineering
20:28 manager whose primary value is
20:31 coordination? What happens to the scrum
20:32 master, the release manager, the
20:34 technical program manager whose job is
20:38 to make sure a dozen teams ship on time?
20:39 Look, those roles don't disappear
20:42 overnight, but the center of gravity is
20:44 shifting. The engineering manager's
20:48 value is moving from coordinate the team
20:50 building the feature to define the
20:52 specification clearly enough that agents
20:54 build the feature. The program manager's
20:57 value is moving from track dependencies
20:59 between human teams to architect the
21:01 pipeline of specs that flow through the
21:03 factory. The skills that matter are
21:06 shifting very rapidly from coordination
21:08 to articulation. From making sure people
21:10 are rowing in the same direction to
21:12 making sure the direction is described
21:14 precisely enough that machines can go do
21:16 it. And oh, by the way, for engineering
21:18 managers, there's an extra challenge.
21:20 How do you coach your engineers to do
21:22 the same thing? It's a people challenge.
21:24 If you think this is a trivial shift,
21:26 you have never tried to write a
21:28 specification detailed enough for an AI
21:30 agent to implement it correctly without
21:32 human intervention. And you've certainly
21:34 never sat down and tried to coach an
21:35 engineer to do the same. It is a
21:38 different skill. It requires the kind of
21:40 rigorous systems thinking that most
21:42 organizations have never needed from
21:44 most of their people because the humans
21:45 on the other end of the spec could fill
21:48 in the gaps with judgment, with context,
21:49 with a slack message that says, "Did you
21:52 mean X or Y?" The machines don't have
21:54 that layer of human context. They build
21:56 what you described. If what you
21:58 described was ambiguous, you're going to
22:00 get software that fills in the gaps with
22:02 software guesses, not customer- ccentric
22:04 guesses. The bottleneck has moved from
22:07 implementation speed to spec quality.
22:10 And spec quality is a function of how
22:12 deeply you understand the system, your
22:15 customer, and your problem. That kind of
22:17 understanding has always been the
22:19 scarcest resource in software
22:20 engineering. The dark factory doesn't
22:22 reduce the demand for that. It just
22:25 makes the demand an absolute law. It
22:28 becomes the only thing that matters.
22:30 Now, let's be honest. Everything that I
22:32 have just talked about assumes you're
22:34 building from scratch. Most of the
22:36 software economy is not built from
22:39 scratch. The vast majority of enterprise
22:41 software is brownfield. It's existing
22:43 systems. It's accumulated over years,
22:45 over decades. It's running in
22:47 production, serving real users, carrying
22:50 real revenue. CRUD applications that
22:52 process business transactions. Monoliths
22:54 that have grown organically through 15
22:56 years of feature additions. CI/CD
22:58 pipelines tuned to the quirks of a
23:00 specific codebase and a specific team's
23:02 workflow. Config management that exists
23:04 in the heads of the three people who've
23:05 been at the company long enough to
23:07 remember why that one environment
23:09 variable is set to that one value. You
23:11 know who you are. You cannot dark
23:13 factory your way through a legacy
23:15 system. You cannot just pretend that you
23:17 can bolt that on. It doesn't work that
23:19 way. The specification for that does not
23:22 exist. The tests, if they're any, cover
23:24 30% of your existing codebase, and the
23:26 other 70% runs on institutional
23:29 knowledge and tribal lore and someone
23:31 who shows up once a week in a polo shirt
23:33 and knows where all the skeletons are
23:35 buried in the code. The system is the
23:38 specification. It's the only complete
23:40 description of what the software does
23:42 because no one ever wrote down the
23:44 thousand implicit decisions that
23:47 accumulated over a decade or more of
23:49 patches of hot fixes of temporary
23:51 workarounds that of course became
23:54 permanent. This is the truth about the
23:57 interstitial states that lie along this
23:59 continuum toward more autonomous
24:01 software development. For most
24:04 organizations, the path is not to start
24:06 with deploy an agent that writes code.
24:08 It starts with let's develop a
24:11 specification for what your real
24:14 existing software really actually does.
24:17 And that specification work that reverse
24:19 engineering of the implicit knowledge
24:22 embedded in a running system is very
24:25 difficult and it's deeply human work. It
24:27 requires the engineer who knows why the
24:29 billing module has the one edge case for
24:31 Canadian customers. It requires the
24:34 architect who remembers which micros
24:36 service it was that carved out of the
24:38 monolith under duress during the 2021
24:39 outage and we've always maintained it
24:41 ever since. It requires the product
24:44 person who can explain what the software
24:46 actually does for real users versus what
24:49 the PRD says it does. Domain expertise,
24:51 ruthless honesty, customer
24:54 understanding, systems thinking. exactly
24:57 the human capabilities that matter even
25:00 more in the dark factory era, not less.
25:02 Look, the migration path is different
25:04 for every business, but it starts to
25:07 look something like this. First, you use
25:09 your AI as much as you can at say level
25:11 two or level three to accelerate the
25:14 work your developers are already doing,
25:16 writing new features, fixing bugs,
25:18 refactoring modules. This is where most
25:20 organizations are at now and it's where
25:23 the J-Curve productivity dip and it's
25:25 where the J-Curve productivity dip
25:26 happened. You should expect that.
25:29 Second, you start using AI to document
25:32 what your system really does, generating
25:34 specs directly from the code, building
25:36 scenario suites that capture real
25:38 existing behavior, creating the holdout
25:40 sets that a future dark factory will
25:43 need. Then you redesign your CI/CD
25:45 pipeline to handle AI generated code at
25:47 volume. different testing strategies,
25:49 different review processes, different
25:53 deployment gates. Fourth, you start to
25:55 begin to shift new development to level
25:57 four or five autonomous agent patterns
26:00 while maintaining the legacy system in
26:02 parallel. That path takes time. Anyone
26:04 telling you otherwise is selling you
26:06 something. The organizations that will
26:08 get there the fastest aren't necessarily
26:10 the ones that bought the fanciest vendor
26:13 tools. They're the ones who can write
26:15 the best and most honest specs about
26:17 their code, who have the deepest domain
26:19 understanding, who have the discipline
26:21 to invest in the boring, unglamorous
26:24 work of documenting what their systems
26:26 really do and of how they can support
26:29 their people to scale up in the ways
26:31 that will support this new dark factory
26:33 era. I cannot give you a clear timeline
26:36 here. For some organizations, this is
26:38 looking like a multi-year transition,
26:39 and I don't want to hide the ball on
26:41 that. Some are going faster and it's
26:43 looking like multimonth. It will depend,
26:45 frankly, on the stomach you have for
26:47 organizational pain. And that brings me
26:49 to the talent reckoning. Junior
26:52 developer employment is dropping 9 to
26:55 10% within six quarters of widespread AI
26:56 coding tool adoption, according to a
26:59 2025 Harvard study. Anyone out there at
27:00 the start of their career is nodding
27:01 along and saying it's actually worse
27:04 than that. In the UK, graduate tech
27:08 roles fell 46% in 2024 with a further
27:11 53% drop projected by 2026. In the US,
27:13 junior developer job postings have
27:16 declined by 67%.
27:18 Simply put, the junior developer
27:20 pipeline is starting to collapse, and
27:22 the implications go far beyond the
27:24 people who cannot find entry-level jobs,
27:26 although that is bad enough and it's a
27:28 real issue. The career ladder in
27:30 software engineering has always worked
27:34 like this. Juniors learn by doing. They
27:35 write simple features. They fix small
27:38 bugs. They absorb the codebase through
27:40 immersion. Seniors review the work and
27:42 mentor them and catch their mistakes.
27:44 Over 5 to seven years, a junior becomes
27:47 a senior through accumulated experience.
27:50 The system is frankly an apprenticeship
27:52 model wearing enterprise clothing. AI
27:54 breaks that model at the bottom. If AI
27:56 handles the simple features and the
27:58 small bug fixes, the work that juniors
28:01 lean on, where do the juniors learn? If
28:03 AI reviews code faster and more
28:05 thoroughly than a senior engineer doing
28:07 a PR review, where does the mentorship
28:09 start to happen? The career ladder is
28:11 getting hollowed out from underneath.
28:13 Seniors at the top, AI at the bottom,
28:14 and a thinning middle where learning
28:16 used to happen. So, the pipeline is
28:19 starting to break. And yet, we need more
28:21 excellent engineers than we have ever
28:24 needed before, not fewer engineers. I've
28:26 said this before. I do not believe in
28:28 the death of software engineering. We
28:31 need better engineers. The bar is rising
28:34 and it's rising toward exactly the
28:36 skills that have always been the hardest
28:38 to develop and the hardest to hire for.
28:41 The junior of 2026 needs the systems
28:43 design understanding that was expected
28:46 of a mid-level engineer in 2020. Not
28:48 because the entry-level work necessarily
28:50 got harder, but because the entry-level
28:53 work got automated and the remaining
28:55 work requires deeper judgment. And you
28:57 don't need someone who can write a CRUD
28:58 endpoint anymore. Right? The AI will
29:00 handle that in a few minutes. You need
29:01 someone who can look at a system
29:04 architecture and identify where it will
29:06 break under load, where the security
29:08 model has gaps, where the user
29:09 experience falls apart at the edge
29:11 cases, and where the business logic
29:13 encodes assumptions that are about to
29:15 become wrong. And if you think as a
29:17 junior that you can use AI to patch
29:19 those gaps, I've got news for you. The
29:22 seniors are using AI to do that and they
29:24 have the intuition over the top. So you
29:26 need systems thinking, you need customer
29:28 intuition. You need the ability to hold
29:31 a whole product in your head and reason
29:33 about how those pieces interact. You
29:34 need the ability to write a
29:36 specification clearly enough that an
29:38 autonomous agent can implement it
29:40 correctly, which requires understanding
29:42 the problem deeply enough to anticipate
29:45 the questions the agent does not know to
29:47 ask. Those skills have always separated
29:49 really great engineers from merely
29:51 adequate ones. The difference now is
29:53 that adequate is no longer a viable
29:56 career position regardless of seniority
29:58 because adequate is what the models do.
30:00 Enthropics hiring has already shifted.
30:02 Open AAI's hiring has already shifted.
30:04 Hiring is shifting across the industry
30:06 and it's shifting toward generalists
30:08 over specialists. People who can think
30:11 across domains rather than people who
30:13 are expert in one really narrow tech
30:14 stack. The logic is super
30:16 straightforward, right? When the AI
30:19 handles the implementation, the human's
30:21 value is in understanding the problem
30:22 space broadly enough to direct
30:25 implementation correctly. A specialist
30:26 who knows everything about Kubernetes
30:28 but can't reason about the product
30:30 implications of an architectural
30:33 decision is way way less valuable than a
30:35 generalist who understands the systems,
30:36 the users, and the business constraints
30:39 even if they can't handconfigure a pot.
30:41 Some orgs are moving toward what amounts
30:43 to a medical residency model for their
30:45 junior engineers. Simulated environments
30:47 where early career developers learn by
30:49 working alongside AI systems, reviewing
30:51 AI output, and developing judgment about
30:53 what's correct and what's subtly wrong
30:56 by working with AI. It is not the same
30:58 thing as learning by writing code from
31:00 scratch. I don't want to pretend it is,
31:02 but it might be better training for a
31:04 world where the job is directing and
31:06 evaluating AI output rather than
31:08 producing code from a blank editor. I
31:10 will also call out, as I've called out
31:12 before, there are organizations
31:15 preferentially hiring juniors right now,
31:17 despite the pipeline collapsing
31:20 precisely because the juniors they are
31:22 looking for provide an AI native
31:24 injection of fresh blood into an
31:27 engineering org where most of the
31:29 developers started their careers long
31:32 before chat GPT launched in 2022. In
31:34 that world, having people who are AI
31:36 native from the get-go can be a huge
31:38 accelerating factor. And that points to
31:40 one of the things that is a plus for
31:43 juniors coming in. Lean into the AI if
31:45 you're a junior. Lean into your
31:48 generalist capabilities. Lean into how
31:50 quickly you can learn. Show that you can
31:53 pick up a problem set and solve it in a
31:56 few minutes with AI across a really wide
31:58 range of use cases. Gartner is
32:00 projecting that 80% of software
32:02 engineers will need to upskill in AI
32:05 assisted dev tools by 2027. Estimating
32:09 wrong. it's going to be 100%. The number
32:11 is not the point. The question isn't
32:13 whether the skills need to change. We
32:15 all know they will. It's whether we in
32:18 the industry can develop the training
32:20 infrastructure quickly enough to keep
32:22 pace with the capability change. Because
32:24 I've got to be honest with you, if
32:27 you're a software engineer and the last
32:30 model you touched was released in
32:33 January of 2026, you are out of date.
32:35 You need a February model. And that is
32:36 going to keep being true all the way
32:38 through this year and into next year.
32:40 And whether the organizations that
32:43 depend on software can tolerate a period
32:45 where the talent pipeline is being built
32:48 and rebuilt like this on a monthly basis
32:51 is a big question because you have to
32:54 invest in your people more to get them
32:56 through this period of transition. So
32:58 what does the shape of a new org look
33:01 like when we look at AI native startups?
33:02 How are they different from these
33:05 traditional orgs? cursor. The AI native
33:07 code editor is past half a billion
33:09 dollars in annual recurring revenue and
33:12 it has at last count a couple of dozen
33:14 few dozen employees. It's operating at
33:16 roughly three and a half million in
33:18 revenue per employee in a world where
33:22 the average SAS company is generating
33:25 $600,000 per employee. Midjourney is
33:26 similar. They have the story of
33:28 generating half a billion in revenue
33:31 with a few dozen people around a hundred
33:32 a little bit more depending on who's
33:34 counting. Lovable is well into the
33:37 multiundred million dollars in ARR in
33:39 just a few months and their team is
33:42 scaling but it's way way behind the
33:43 amount of revenue gain they're
33:45 experiencing. They are also seeing that
33:47 multi-million dollar revenue per
33:50 employee world. The top 10 AI native
33:52 startups are averaging three and change
33:55 million in revenue per employee which is
33:57 between five and six times the SAS
34:00 average. This is happening enough that
34:02 it is not an outlier. This is the
34:05 template for an AI native org. So what
34:07 does that org look like? If you have 15
34:08 million people generating a hund00
34:10 million a year, which we've seen in
34:12 multiple cases in 2025, what does that
34:14 look like? It does not look like a
34:16 traditional software company. It does
34:18 not have a traditional engineering team,
34:20 a traditional product team, a QA team, a
34:23 DevOps team. It looks like a small group
34:26 of people who are exceptionally good at
34:28 understanding what users need, who are
34:30 exceptional at translating that into
34:32 clear spec, and who are directing AI
34:34 systems that handle that implementation.
34:37 The org chart is flattening radically.
34:39 The layers of coordination that exist to
34:41 manage hundreds of engineers building a
34:43 product can be deleted when the
34:45 engineering is done by agents. The
34:47 middle management layer is going to
34:48 either evolve into something
34:50 fundamentally different at these big
34:52 companies or it's going to cease to
34:55 exist entirely. The only people who
34:58 remain are the ones whose judgment
35:00 cannot be automated. The ones who know
35:02 what to build for whom and why, and who
35:06 have excellent AI sense. Sort of like
35:08 horse sense where you have a sense of
35:10 the horse if you're a rider and you can
35:11 direct the horse where you want to go.
35:13 You'll need people who have that sense
35:15 with artificial intelligence. And yes,
35:18 it is a learned skill. The restructuring
35:20 that is going to happen as more and more
35:23 companies move toward that cursor model
35:25 of operating, even if they never
35:27 completely get there, that restructuring
35:30 is real. It's going to happen. It's
35:32 going to be very painful for specific
35:34 people in specific roles. the middle
35:36 management layer, the junior developer
35:38 whose entry-level work is getting
35:40 automated first, the QA engineers who
35:43 just run manual test passes, the release
35:46 manager whose entire value is just
35:49 coordination. Those kinds of roles are
35:51 going to have to transform or they're
35:53 just going to disappear. And for people
35:57 in those roles, you need to find ways to
36:02 move toward developing with AI and
36:04 rewriting your entire workflow around
36:07 agents as central to your development.
36:08 That is going to look different
36:10 depending on your stack, your manager's
36:13 budget for token spend, and your
36:16 appetite to learn. But you need to lean
36:18 that way as quickly as you can for your
36:21 own career's sake. I want to leave you
36:24 with one thing that gets lost in every
36:27 conversation about AI and jobs. We have
36:30 never found a ceiling on the demand for
36:32 software and we have never found a
36:34 ceiling on the demand for intelligence.
36:36 Every time the cost of computing has
36:40 dropped from mainframes to PCs, from PCs
36:43 to cloud, from cloud to serverless, the
36:44 total amount of software the world
36:48 produced did not stay flat. It exploded.
36:50 New categories of software that were
36:52 economically impossible at the old cost
36:54 structure became viable and then
36:56 ubiquitous and then essential. The cloud
36:58 didn't just make existing software
37:01 cheaper to run. It created SAS, mobile
37:03 apps, streaming, real-time analytics,
37:05 and a hundred other categories that
37:07 could not exist when you had to buy a
37:09 rack of servers to ship something. I
37:12 think the same dynamic applies now and
37:15 it applies at a scale that dwarfs every
37:17 previous transition. Every company in
37:20 every industry needs software. Most of
37:22 them, like a regional hospital or a
37:24 mid-market manufacturer or a family
37:26 logistics company. They can't afford to
37:28 build what they need at current labor
37:30 costs. A custom inventory system
37:32 traditionally could cost a half a
37:34 million or more and take over a year. A
37:36 patient portal integration might cost a
37:38 third of a million. You get the idea.
37:40 These companies tend to make do with
37:43 spreadsheets today. But we are dropping
37:46 the cost of software production by an
37:48 order of magnitude or more. And now that
37:52 unmet need is becoming addressable. Not
37:55 theoretically now. You can serve markets
37:57 that traditional software companies
38:00 could never afford to enter. The total
38:02 addressable market for software is
38:05 exploding. Now this can sound like a
38:06 very comfortable rebuttal to people
38:08 struggling with the pain of jobs
38:10 disappearing. It is not the same thing.
38:12 Just saying the market is getting bigger
38:15 doesn't fix it. But it is a structural
38:17 observation about what happens as
38:20 intelligence gets cheaper. The demand is
38:23 going to go up, not down. We watched
38:25 this happen with compute, with storage,
38:27 with bandwidth, with every resource
38:29 that's ever gotten dramatically cheaper.
38:32 Demand has never saturated. The
38:34 constraint has always moved to the next
38:35 bottleneck. And in this case, the
38:37 judgment is to know what to build and
38:40 for whom. The people who thrive in this
38:42 world are going to be the ones who were
38:44 always the hardest to replace. The ones
38:47 who understand customers deeply, who
38:49 think in systems, who can hold ambiguity
38:52 and make decisions under uncertainty,
38:54 who can articulate what needs to exist
38:56 before it exists at all. The dark
38:58 factory does not replace those people
39:00 and it won't. It amplifies them. It
39:02 turns a great product thinker with five
39:05 engineers into a great product thinker
39:07 with unlimited engineering capacity. The
39:10 constraint moves from can we build it to
39:12 should we build it and should we build
39:14 it has always been the harder and more
39:16 interesting question. I don't have a
39:18 silver bullet to magically resolve this
39:20 but I have to tell you that we must
39:22 confront the tension or we are being
39:26 dishonest. The dark factory is real. It
39:29 is not hype. It actually works. A small
39:30 number of teams around the world are
39:33 producing software without any humans
39:35 writing or reviewing code. They are
39:39 shipping shippable production code that
39:41 improves with every single model
39:43 generation. The tools are building
39:46 themselves. The feedback loop is closed.
39:48 And those teams are going faster and
39:51 faster and faster and faster. And yet
39:52 most companies aren't there. They're
39:54 stuck at level two. They're getting
39:56 measurably slower with AI tools they
39:58 believe are making them faster. They're
40:01 wrong. running organizational structures
40:03 designed for a world where humans do all
40:06 of the implementation work. Both of
40:08 these things are true at the same time.
40:10 The frontier is farther ahead than
40:13 almost anyone wants to admit and the
40:15 middle is farther behind than the
40:17 frontier teams like to talk about. The
40:20 distance between them isn't a technology
40:23 gap. It's a people gap. It's a culture
40:25 gap. It's an organizational gap. It's a
40:29 willingness to change gap that no tool
40:31 and no vendor can close. The enterprises
40:34 that get across this distance are not
40:37 the ones that buy the best coding tool.
40:39 They're the ones that do the very hard,
40:41 very slow, very unglamorous work of
40:44 documenting what their systems do, of
40:45 rebuilding their org charts and their
40:48 people around the skill of judgment
40:50 instead of the skill of coordination.
40:52 And they are organizations who invest in
40:55 the kind of talent that understands
40:58 systems and customers deeply enough to
41:00 direct machines to build anything that
41:02 should be built. And those orgs need to
41:04 be honest enough with themselves to
41:06 admit that this change will not happen
41:08 as fast as they want it to because
41:11 people change slowly. The dark factory
41:14 does not need more engineers, but it
41:16 desperately needs better ones. And
41:18 better means something different than it
41:20 did a few years ago. It means people who
41:22 can think clearly about what should
41:24 exist, describe it precisely enough that
41:26 machines can build it and who can
41:29 evaluate whether what got built actually
41:32 serves the real humans it was built for.
41:34 This has always been the hard part of
41:36 software engineering. We just used to
41:39 let the implementation complexity hide
41:41 how few people were actually good at it.
41:43 The machines have now stripped away that
41:45 camouflage, and we're all about to find
41:48 out how good we are at building
41:50 software. I hope this video has helped
41:52 you make sense of the enormous gap
41:54 between the dark factories in automated
41:57 software production and the way most of
41:59 us are building software today. Best of
42:01 luck navigating that transition. I wrote
42:04 up a ton of exercises and a ton of
42:06 resources over on the Substack if you'd
42:07 like to dig in further. This tends to be
42:09 something where people want to learn
42:10 more, so I wanted to give you as much as
42:13 I could. Have fun, enjoy, and I'll see