The way we interact with AI, termed "prompting," has fundamentally shifted due to the advent of autonomous agents. Traditional chat-based prompting is becoming obsolete, necessitating a new, multi-layered skill set to effectively leverage AI for complex, long-running tasks.
Mind Map
انقر للتوسيع
انقر لاستعراض خريطة الذهن التفاعلية الكاملة
If you're prompting like it's last
month, you're already too late. And I'm
not just doing that for clickbait. If
you haven't updated how you think about
prompting since January 2026, you're
already behind. Opus 4.6, Gemini 3.1
Pro, and GPT 5.3 codecs have all shipped
in the past few weeks with autonomous
agent capabilities that make the
chatbased prompting most people are
practicing functionally obsolete for
serious work. These models don't just
answer better. They work autonomously
for a long time, for hours, for days
against specs without really checking
in. That changes what good at prompting
means on a fundamental level. And it's
time to revisit how we think about
prompting as a result. Not because
prompting stopped mattering. It actually
matters more than ever, but because the
word prompting is now hiding four
completely different skill sets, and
most people are only practicing one of
them. And the gap between the people who
see all four of them and the people who
don't is already 10x and widening. In
this piece, I'm going to lay out what
those four skills are, why the
distinction matters now, and exactly how
to build the skills you're missing. This
builds on my earlier work on intent
engineering, but it goes way beyond it
to lay out a full framework for how to
think about prompting post February
2026. Intent engineering is just one
layer in a larger stack. This is the
full stack for prompting post February,
post these new autonomous models. First,
what changed? The prompting skill that
mattered since 2024 has been really
conversational. You sit in a chat
window, you type a request, you read the
output, you iterate, you get better at
phrasing things, you provide examples,
you structure instructions. If you're
good at that, and if you've been
following this video series, you
probably are. You've been building real
skills. They work. you're faster than
you were a year ago. But that
fundamental chatbased skill has a
ceiling. And in early 2026, a lot of
people are hitting it because the models
have stopped really being chat partners
and started being workers. Workers that
run for a long time. I'm not kidding
when I say days and sometimes weeks. And
the thing about a worker that runs for a
long time is that everything you relied
on in a conversation, like your ability
to catch mistakes in real time, your
ability to provide missing context
accurately when the model asks, your
ability to course correct when things
drift. All of that must be encoded
before the agent starts. Not during the
course of a conversation, but at the
top. This is a fundamentally different
skill. It's not a harder version of the
same skill. It's actually different.
I've talked before about the importance
of thinking about prompting even in the
chat window as providing the relevant
context for the LLM to provide you an
accurate response. But this goes way
beyond that. If you were giving an agent
a longunning task, which is where most
of AI is going, even if you're not a
coder, then you have to think not about
how do I build for a chat response, but
how do I build for economically real
work that this agent will do and provide
the agent the relevant context for that.
And this shift is happening really
quickly. Between October of 2025 and
January of 2026, in just 3 months, the
longest autonomous cloud code sessions
nearly doubled, and they've doubled
again since then. Agents are running
into the hundreds and thousands in
production systems at major companies.
And this is just from publicly available
information. We have Telus reporting
that they have 13,000 custom AI
solutions internally. We have Zapia
reporting that they have over 800 agents
internally. Look, whenever they release
a press release, you got to assume that
company feels behind and is releasing
something to help them feel better about
themselves. The companies that are
really serious about AI don't feel the
need for press releases and have an
order of magnitude or more agents. So,
this is not about a world that is
coming. This is about a world that has
landed. But this still might not feel
concrete enough for you. So, let me talk
to you about a random Tuesday. Two
people sit down with the same model,
same subscription, same context window.
One of them uses 2025 skills. You type a
request, you get back something that is
80% right. You spend 40 minutes cleaning
it up and you have a good use of AI
because you saved it a lot of time. It
might have saved 50% on your time, might
have saved 60%, maybe higher. Let's make
the gap concrete. Let's say two people
are sitting down with the same model on
a Tuesday morning in February of 2026.
Same subscription, same context window.
The only difference is that one of them
is using 2025 prompting skills and one
of them is using 2026 prompting skills.
So the 2025 person types a request and
they're asking for a PowerPoint deck,
right? They get back something that's
about 80% correct. Maybe there's some
formatting issues. Maybe the font has
some collisions in it. Maybe there's
some styling issues. They spend about 40
minutes cleaning it up, but they're
pretty happy because this deck would
have taken two or three hours. That's a
2025 prompting skull application. it
would have been good in 2025.
Person B sits down with 2026 prompting
skills. They write a structured
specification in 11 minutes. They take
longer to prompt. Then they hand it off
to the same chatbot, but they're
thinking of it and using it as an
autonomous agent. They go to make
coffee. They come back to a completed
PowerPoint that hits every quality bar
defined up front. And they're able to do
this for five other decks before lunch.
In other words, they are now doing a
week's worth of work in a morning
easily. Same model, same Tuesday, 10x
gap. And if you want to replicate this,
you can replicate this experiment
directly in Claude Opus 4.6 in the
co-work model, which is available on
Windows and Mac. And you can see exactly
how this plays out. And this did not
happen because the person with 2026
prompting skills is smarter or because
she's more technical. It's because she's
practicing a different skill than person
one and person one doesn't know that
kind of prompting skill exists. I think
it's worth paying attention to Shopify
CEO Toby Lutkkey here. Unlike most CEOs,
Toby is a technical guy and he does not
just dig into AI from a LinkedIn
perspective. He has a folder of prompts
that he runs against every new model
release and he really deeply thinks
about how new model releases change his
workflow. He uses the term context
engineering because he believes the
fundamental skill that we're all facing
is the ability to state a problem with
enough context in a way that without any
additional pieces of information, the
task becomes plausibly solvable. I think
that's a really elegant way to describe
what person B did in the example I just
showed you. Person B put all of the
information that the model needed to
build a deck in one task very clearly
defined and the model could just go to
work. This isn't about clever prompt
tricks. This isn't about magical words
that an AI can use to produce a better
output. It's about a communication
discipline. Can you state a problem so
completely with so much relevant
surrounding information that a capable
system can solve it without going out
and fetching more context? Can you make
your request as self-contained as
possible? This is a really big deal
because it demands a much higher bar for
communication from us humans than we're
used to. And that's something that Toby
called out when he reflected on the
impact of AI on his own leadership
style. One of the things he mentioned is
that by being forced to provide AI with
complete context, he is now better at
communicating as a CEO. His emails are
tighter, his memos are better, his
decision-making frameworks are stronger,
and Toby has gone farther than most
people thinking about the implications
of context engineering. I think one of
his most provocative assessments is that
a lot of what people in big companies
call politics is actually bad context
engineering for humans. What he suggests
is that essentially good context
engineering would surface disagreements
about assumptions that are never
surfaced explicitly but play out as
politics and grudges in large companies.
And he says that happens because humans
tend to be sloppy communicators who rely
on shared context that doesn't actually
exist. I think that's a really
interesting thesis. I think one of the
implications of getting this February
2026 prompting lesson deeply ingrained
in ourselves is that our communication
humanto human is likely to improve and
our organizations are likely to have
cleaner decisioning and cleaner
communication even between humans as a
result. So I bet you're wondering what
are these four disciplines? What does it
mean when prompting becomes multiple
skills that we need to learn? Well, I
don't want to hide the ball here. Here
is the framework that I would lay out
that describes what prompting should be
in February 2026. And I've built it to
be futurep proof. So we look at the
direction that agents are going, how
they're developing this year. These four
disciplines are going to matter even as
we expect agents to continue to scale.
This represents a significant update on
how I've taught prompting before 2026
because the way we prompted before 2026
was helpful as a foundation. You're not
losing something by having learned it,
but it's not enough as agents get more
capable. And I think we're due for a
reset. So, fundamentally, prompting is
the broad skill of providing input to AI
systems so that they can do useful work.
Prompting has diverged into four
distinct disciplines, but it's not
taught that way. And so, I'm laying this
out for the first time here. Each of
these disciplines is operating at a
different altitude and time horizon. And
you need to understand them all to
prompt well. Just because they're not
often prompted this way, just because
they're not often taught this way,
doesn't mean that good prompters don't
intuitively know this. What I'm taking
is intuitive knowledge that I see in
excellent prompters and boiling it down
into four key disciplines that you can
practice and learn from. And these build
on each other. If you skip one, I'm
presenting them in order. If you skip
one, you're creating the kind of
failures we tend to see at scale in the
enterprise, but you're creating it for
yourself in your own prompting. And I'll
kind of get at what that means and
you'll get the idea. So discipline one
is prompt craft. This is the original
skill. This is the skill I have taught
and many others have taught for the last
year or two. It's synchronous. It's
sessionbased and it's an individual
skill. You you sit in front of a chat
window. You write an instruction. You
evaluate the output. Then you iterate.
The skill here is knowing how to
structure a query. And I've talked a lot
about this in the past. I'll rehash it
briefly here. You must have clear
instructions. You must include relevant
examples and counter examples. You need
to include appropriate guard rails. You
need to include an explicit output
format. And you should be very clear
about how you resolve ambiguity and
conflicts. So the model doesn't have to
make it up on the fly. This is what
anthropics prompt engineering
documentation covers. Open AI talks
about this. Google talks about this.
It's on a thousand blog posts and
LinkedIn courses. Prompt craft has not
become irrelevant. Don't hear that. It's
just become table stakes. It's sort of
the way knowing how to type with 10
fingers was once a professional
differentiator and now it's just table
stakes. It's just assumed. If you can't
write a clear, well ststructured prompt
in 2026, you're the person in 1998 who
couldn't send an email. Is it important
to be able to do it? Yes. Is it going to
differentiate you in the workforce? No,
not really. The key shift is that
promptcraft was the whole game when AI
interactions were synchronous and
sessionbased. You wrote something, you
got something back, and you refined it
in real time. As a human interacting
with that model, you were acting as the
intent layer, as the context layer, and
as the quality layer so that longunning
tasks could get done. You did all the
breaking out of those tasks. That model
of prompting broke the moment agents
started running for hours without
checking in. Discipline 2 we've been
talking about for a few months now. It's
called context engineering. Enthropic
published the foundational piece on this
back in September of 2025, but there's a
lot of other good pieces out there as
well. I've written a fair bit on it. I
define context engineering as the set of
strategies for curating and maintaining
the optimal set of tokens during an LLM
task. And that's not just me defining
that. That's a pretty commonly held
definition. Wang Shane's Harrison Chase
was even bluntter about what context
engineering is during a recent Sequoia
Capital interview when he said
everything is context engineering. It
actually describes everything we've done
at lane chain without knowing the term
existed. So that's actually somewhat
dangerous because context engineering is
only one of four levels and people have
misunderstood it to mean everything. And
one of the things I'm trying to get us
to move toward is a world where we
understand context engineering as a
specific skill where we're providing
relevant tokens to the LLM for
inference. And yes, it is certainly
foundational. It is certainly
significant. It is where the industry's
attention is focused today. It is the
shift from crafting a single instruction
to curating the entire information
environment and agent operates within.
all of the system prompts, all of the
tool definitions, all of the retrieved
documents, all of the message history,
all of the memory systems, the MCP
connections. The prompt you write might
be 200 tokens. The context window it
lands in might be a million. Your 200
tokens are 002% of what the model sees.
The other 99.98%
that's context engineering. This is the
discipline that produces claw.md files,
agent specifications, rag pipeline
design, memory architectures. It's the
discipline that determines whether a
coding agent understands your project's
conventions, whether a research agent
has access to the right documents,
whether a customer service agent can
retrieve relevant account history.
Anthropics engineering team identified
the core challenge precisely. LLM
degrade as you give them more
information. That's correct. They do.
And the point therefore is to include
relevant tokens because the issue is not
that they can't hold the tokens. It's
that retrieval quality does drop as
context grows. The practical implication
is that people who are 10x more
effective with AI than their peers are
not writing 10x better prompts. They're
building 10x better context
infrastructure. Their agents start each
session with the right project files,
the right conventions, the right
constraints already loaded. The prompt
itself can be relatively simple because
the context does the heavy lifting. I
have seen this for myself as I've built
out my own context engineering
scaffolding. And if you're wondering how
to do it for yourself, I'm putting a
guide together with this video over on
the Substack and I think it'll be
helpful. Discipline number three. This
one we don't talk a lot about. I think
is where we're going as these agents
start to do much longer autonomous
running tasks. I wrote about this one at
length in a prior piece. I did a video
on it. I'm going to be brief here. I'm
going to contextualize where it sits in
the stack. Context engineering tells
agents what to know. Intent engineering
tells agents what to want. It's the
practice of encoding organizational
purpose, your goals, your values, your
trade-off hierarchies, your decision
boundaries into infrastructure that
agents can act against. Claro story is
the proof case I talked about. Their AI
agent resolved 2.3 million customer
conversations in the first month, but it
optimized for the wrong thing. It
slashed resolution times, but it didn't
optimize for customer satisfaction. And
as a result, CLA got into big trouble
and had to rehire a bunch of human
agents and is still dealing with the
customer trust aftermath. So, intent
engineering sits above context
engineering the way strategy sits above
tactics. You can have perfect context
and terrible intent alignment. You
cannot have good intent alignment
without good context, though, because
the agent needs information to act on
the intent. So these disciplines again
they're cumulative. Another thing I want
you to notice is that failure as we
progress up this hierarchy is getting
more and more serious. When you as an
individual sit down and you screw up a
prompt, it might waste your morning at
worst. When you as a human being sit
down and screw up context engineering or
intent engineering, you are screwing up
for the entire team, your entire org,
your entire company. The stakes get
higher. And because the stakes get
higher, our attention to detail matters
and the value of the work we do
increases commensurately. What I am
talking about when I talk about context
engineering and intent engineering can
be a full-time role at a big company.
And if it's not, it is a highstakes
human skill that has a lot of
transferable value. Level four is
specification engineering. We're just
starting to talk about this now, even
though the best practitioners are
already doing it. Specification
engineering is the practice of writing
documents across your organization that
autonomous agents can execute against
over extended time horizons without
human intervention. This is a level
above everything I've described because
all of the first three levels focused on
how you prepare work directly for an
agent. Specification engineering is
really about thinking about your entireformational
entireformational
corpus in your organization as agent
fungeible, agent readable. Everything
you write has to be something the agent
can access and do something with. It's
not really about prompting per se. It's
not about an individual agent's context
window. It's not even about the intent
you've given agents. Specifications
are complete. They're structured.
They're internally consistent
descriptions of what an output should be
for a given task. They look at how
quality is measured. Specifications are
a mindset you bring to your documents
that allow you to apply agents across
large swaths of your current company's
context with the confidence that what
the agent reads is going to be relevant.
I think an interesting example from
Anthropic actually comes from the team's
struggles with the Opus 4.5 agent which
is one generation ago now. They were
trying to build a production quality web
app. But if you give the agent only a
highlevel prompt like build a clone of
claw.ai, the agent tries to do too much
at once, runs out of context
mid-implementation, and leaves the next
session guessing at what happened. The
fix, it turned out, was not a better
model. It was specification engineering.
It was a pattern that you could specify
where an initial laser agent sets up the
environment. A progress log documents
what's been done and a coding agent then
makes incremental progress against a
structured plan every session. The
specification became the scaffolding
that let multiple agents produce
coherent output over days. So the shift
from prompt to specification mirrors a
transition that happened in human
engineering decades ago. When you're
building something small, verbal
instructions and conversations work
really well. When you're building
something large enough to require a team
or span multiple sessions, you need
blueprints. Anthropic needed blueprints
in the Opus 4.5 example. And even though
we've now moved to Opus 4.6, the need
for specification engineering has not
gone down, it's gone up because Opus 4.6
can do even more work. That's true for
Codeex 5.3. It's true for Gemini 3.1 Pro
as well. The smarter models get, the
better you need to get at specification
engineering. Which is why I deliberately
started this section zooming out and
saying the entire org's document corpus
should be viewed as a form of
specification engineering. And yes, this
is a fractal insight. You can also think
about specification engineering for your
individual agent task where you think
about what is the log that the agent
has. How do we assign tasks across this
agent build? How do we make sure the
agent has a clearly specified
requirements list to work from? But all
of that gets way way easier to put
together if you think of your entire
organizational document corpus as
specifications that are agent readable.
Your corporate strategy is a
specification. Your product strategy is
a specification. Your OKRs are a
specification. Everything ends up being
a specification that your agent can use.
And that's different from context
engineering because the art of context
engineering is really about shaping the
context window in a way that's relevant
for the agent. Right? If you look at
these four levels, the prompt is you and
the agent and you're working on crafting
clear instructions. The context window
is how do you shape relevant tokens.
Intent engineering is how do you
communicate goals and objectives to the
agent that allow the agent to work
autonomously for long periods of time in
a direction consistent with company
strategy. Specification engineering is
how do you think about your entire
corporate document structure, the
knowledge, the context that makes the
corporation work as a form of
specification. And yes, for individual
agent runs, how do you refine that
specification that becomes something
where you give the agent a good
specification? It is going to start to
keep the context window cleaner than
before. And so these start to interplay,
right? If you write good spec, if you
have a good task log, if the agent
understands what the spec is from the
broader organizational context, they're
less likely to go off the rails because
of intent engineering conflicts. They're
less likely to bloat out with bad
context. And so all of these start to
interplay, but the highest level is to
think about specification as the way
your organization does business. You
specify the outputs you want. The agent
does the work. the outputs are produced.
That is the highest level description of
what business is going to look like in
the next couple of years and it starts
with understanding how to specify. This
is where anthropics best practices
documentation for cloud code becomes
really revealing. The recommended
workflow for complex features is
relatively simple. Interview me in
detail. Ask about technical
implementation, UIUX, edge cases,
concerns, and trade-offs. Don't ask
obvious questions. Dig into the hard
parts. The agent then writes the spec
with the human. I think that that is an
artifact of this moment in time. And I
think we will get to a point where the
agent will only be asking us for places
where the broader specification corpus
is in conflict or ambiguous and we have
to talk about what it means to us for
this task to be accomplished. Well,
because the entire organizational
infrastructure is going to be agent. So
the practical skill going forward is not
writing code. It's not crafting prompts.
It's the ability to describe an outcome
with enough precision and completeness
that an autonomous system can execute
against it for days or weeks. I hope
you're seeing here how that is a
fundamentally different skill from
writing a good prompt in a chat window.
And the people who are excellent at one
of these layers are not automatically
excellent at all of them. Context
engineers spend a lot of time thinking
about how to compress tokens and get
good tokens into context windows and how
to keep bad tokens out. That is a
different mindset than thinking about
your information environment is agent
translatable, agent readable, agent
fungeible. We have to have all of these
skills in order to effectively bring AI
into the enterprise or even into a small
business in 2026. And I will tell you
one-person businesses have the greatest
advantage right now because if you are a
oneperson business and you can just
convert your notion to be agent
readable, you're off to the races today.
There's no gigantic effort required to
make all of your SharePoint agent
readable. It's simple. It's easy. You
just get it into notion and you're done.
This comes back to the core idea that in
2026, speed is going to matter because
agents are going to keep getting better
quickly. What we have now as days and
weeks is going to become weeks and
months by the end of the year. And the
corresponding impact of getting
specification engineering correct is
going to be even higher. The
corresponding impact of getting all four
levels translated into specific roles,
people who are responsible, DRIs, teams
who handle this, that's going to be even
more valuable in 2026. And so what I
mean by that is that if you are at a
large company, you should have people
who are doing context engineering and
that's all they're doing. You should
have people who are doing specification
engineering and thinking about how
agents can read the enterprise. You
should have people who are thinking
about intent engineering and how you
translate goals into a set of objectives
that an agent can read and value and a
set of verifiable guardrails the agent
can follow. Look, the mental model most
people carry is that prompting is good
instructions for the AI and that fails
for a very specific reason. And I hope
you're seeing it here. That entire model
assumes synchronous interaction. In the
synchronous AI human partnership model,
you're always there at the computer. You
see the output in real time. You correct
mistakes right away. You provide
additional context when the model asks
or when you notice it going off track.
longunning agents break every single
assumption in that model. So if you've
relied on the assumptions of synchronous
prompting, you have a structural
vulnerability in the way you think about
AI. You need to start thinking about AI
as if your real time oversight is
embedded in the specification before the
agent begins to work. The planner worker
architecture that's dominating
production agent deployments really
reflects this reality. A capable model
plans the work, decomposes it into
subtasks, defines the acceptance
criteria, and assigns work out to models
and then cheaper, faster models do the
work. The planning phase, you could call
it the specification phase, determines
the quality ceiling. The planning phase,
basically taking your specification and
expanding it and enriching it and
breaking it out and planning against it.
That's what determines the quality of
the work of the overall system.
Execution without that specification
step produces broken work and it
requires extensive human rework to be a
value at all. So the shift from fixing
it in real time, which is what we do
with a lot of prompting in the chat
window to we must get the spec right up
front changes your bottleneck skill.
Right? Real time prompting rewards
verbal fluency. It rewards quick
iteration. It rewards a good eye for
output quality. Specification
engineering rewards completeness of
thinking, anticipation of edge cases,
clear articulation of acceptance
criteria, and the ability to decompose
really complicated outcomes into
independently executable components.
Different people are good at these
things in different ways. Some people
are going to be naturally exceptional at
synchronous prompting and they're going
to struggle with specification work. And
some people will be mediocre at
chatbased interaction, but they might
actually be excellent spec engineers. My
challenge to you is that you don't
tolerate whatever your natural
propensity is as your ceiling. Think of
this as a learnable skill and go after
it. Now, you might ask, if we're going
after it, what are the foundational
elements to learn? Specification is
ironically very vague. Nate, please tell
me what you mean. I want to suggest to
you that in 2026 we can define the
primitives that go into good
specifications in ways that are useful
for us to learn. And I'm going to go
ahead and define them right here. I
think these are the foundation that we
need to learn if we want to get better
at specifying and we want to get better
at the prompting skills that will matter
in 2026 and beyond. Primitive number
one, self-contained problem statements.
This is actually Toby's insight, but
it's not only his insight. It's been
primitive. Number one is self-contained
problem statements. Think back to what
Toby said when he talked about the idea
that we have to give the model
everything it needs to do the work. Can
you state a problem with enough context
that the task is plausibly solvable
without the agent going out and getting
more information? The discipline of
self-containment forces you to be clear.
It surfaces hidden assumptions. It makes
you articulate constraints you normally
leave implicit because you trust the
human on the other end to fill in the
gaps. AI doesn't fill in gaps reliably.
It fills them with statistical
plausibility, and that's a polite way of
saying it guesses in ways that are often
subtly wrong. So if you're trying to
train this primitive, I would say take a
request you would normally make
conversationally, like update the
dashboard to show the Q3 numbers and
rewrite that as if the person receiving
it has never seen your dashboard,
doesn't know what Q3 means in your org
context, doesn't know what database to
query, and has no access to any
information other than what you include.
That is the level of self-containment
you should be challenging yourself with
if you want to get better at this
primitive. Primitive two, learn about
acceptance criteria. If you can't
describe what done looks like, an agent
can't know when to stop or more
precisely, it will stop at whatever
point its internal huristics say the
task is complete, which may bear no
relationship to what you needed. This is
why the 80% problem is a big issue for
agent system design. the specification.
Let's say the specification said build a
login page when it should have said
build a login page that handles email
passwords, social ooth via Google and
GitHub, progressive disclosure of 2FA
session persistence for 30 days and rate
limiting after five failed attempts. If
you're training on this primitive, you
want to get to the latter, not the
former. For every task you delegate,
write three sentences that an
independent observer could use to verify
the output without asking you any
questions whatsoever. If you can't write
those sentences down, you probably do
not understand the task well enough to
give it to an agent. I have had that
happen where I've been in a conversation
with an AI agent and I realize I don't
know enough to delegate the task and I
have to come back later. That's okay.
It's good to realize that before you
assign the work. Primitive three is
constraint architecture. Learn
constraint architecture. What the agent
has to do, what the agent cannot do,
what the agent should prefer when
multiple valid approaches exist, what
the agent should escalate rather than
decide autonomously. These four
categories, the musts, the must notss,
the preferences, and the escalation
triggers form the constraint
architecture that turns a loose
specification into a very reliable one.
The claw.md pattern that's emerging in
the coding community is a practical
implementation of constraint
architecture. The best claw.md files are
not long lists of rules. They're
concise. They're extremely high signal
constraint documents. Use these build
commands. Follow these code conventions.
Run these tests before marking a task
complete. Never modify these files
without explicit instructions. The
community consensus is very strongly
that every line in an file needs to earn
its place. If you ask would removing
this line cause the AI to make mistakes
and the answer is no it really wouldn't
then kill the line. So if you want to
train this primitive before delegating a
task you should be writing down what a
smart well-intentioned person might do
that would technically satisfy the
request but produce the wrong outcome.
Those failure mos end up being your
constraint architecture. Encode them.
Primitive four is decomposition. Large
tasks need to be broken into components
that can be executed independently,
tested independently, and integrated
predictably. This is software
engineering's oldest lesson, modularity,
but is being applied to AI task
delegation. Anthropic's longunning agent
harness splits every complex project
into an environment setup phase, a
progress documentation phase, and an
incremental coding session, each
independently verifiable. You get
similar task decomposition automatically
inside codecs. A marketing content audit
requires the same decomposition as a
coding task. So I'm not just talking to
engineers here. You would have to
decompose your marketing content into
quality scoring, gap analysis,
recommendation generation, etc. If you
want to train on the primitive of task
decomposition, take any project that you
would estimate at a few days of work and
decompose it into subtasks that each
take less than 2 hours for you to do.
Have clear input output boundaries and
can be verified independently of the
other tasks. That is the granularity at
which agents work best and it's the
granularity at which specification
engineering tends to operate. Now, in
2026, you do not have to pre-specify all
of those 2-hour tasks when you are
writing a prompt, but you do have to
understand what all of those tasks are.
And you have to understand how to
describe for a planner agent what done
looks like and what decomposible
pieces look like in such a way that the
planner agent can reliably break the
work into those 50 or 60 subtasks. In
other words, your job increasingly is
not to manually write the subtasks for
the agent. Your job is to provide the
break patterns that a planner agent can
use to break up larger work in a
reliable executable fashion. That's a
level of abstraction even above
decomposition and that is a lot of where
we're going as agents start to run.
Primitive 5 is evaluation design. This
is critical to do not just in an
individual level but at an org level.
Organizations need to think about every
level of AI deployment in terms of eval.
How do you know the output is good? Not
does it look reasonable, which is how
most people evaluate AI output. But can
you prove measurably consistently that
this is good. If prompt craft is the art
of the input, evaluation design is the
art of knowing whether that input
worked. And in a world where agents can
run for a really long time, eval design
is the only thing standing between AI
generated output that I can't use and AI
generated output we can really use asis.
If you want to train this primitive for
every recurring AI task in your world,
build an eval build three to five test
cases with known good outputs and run
them periodically, especially after
model updates. This is going to catch a
regression. It'll build your intuition
for where models fail. It will create
institutional knowledge about what good
looks like for your specific use cases,
your team, you, your org. You need to be
doing this systematically. If you're
listening to all this and you're
wondering where to start, I gave you
those four layers for a reason in order.
Start by closing the prompt craft gap.
Most people are worse at basic prompting
than they think. You should be rereading
prompting documentation. I've written a
bunch on the Substack. I'm writing more
for this piece. You should do
interactive tutorials. You can head over
to AI Cred and see where you are on
prompting. You should be building a
folder of tasks that you do regularly,
writing your best prompt against each
one and saving the outputs as your
baseline and then revisiting them over
time. Take prompt craft seriously.
Second, once you feel like you start to
have a handle on that, start to build
your personal context layer. You should
be writing a cla.markown equivalent for
your work. I don't care if you use
claude. You still need to have an idea
of your goals, your constraints, your
communication preferences, your quality
standards, the institutional context
that a new team member would need 6
months to absorb written down. Start AI
sessions by loading this context. The
difference in output quality should be
immediate and obvious. Then get into
specification engineering. Take a real
project, not a toy problem, and write a
specification for it. Then start to get
into intent infrastructure. This is an
organizational layer. So if you manage
people or systems, you can start
encoding the decision frameworks your
team uses implicitly. If you are an
individual contributor, you should be
encoding the decision frameworks you
understand and trying to be a champion
that pushes for this at the
organizational level. A lot of teams
like to talk about adopting AI. We'll
talk about it in terms of building
intent infrastructure. Talk about what
good enough looks like for each category
of work together. Talk about what gets
escalated by AI versus what AI can
decide. Write it down, structure it,
make it available to agents. And then
practice specification engineering. Take
a real project, not a toy problem. Write
a spec for it before touching AI. Talk
about acceptance criteria, constraint
architectures, decomposition, etc. Hand
that spec to an agent and see what comes
back. And oh yes, from an org
perspective, start to think about every
doc you touch as a spec that the agent
will need to read and operate against.
Your org is a system of business
processes, even if you're a team of one.
And those business processes should be
agent readable and they should be
speckable. This has some of the
downstream implications that Toby talked
about where he said a lot of
organizational politics is just bad
context engineering at the org level. If
we practice better specification
engineering for our documents, we will
expose a lot of the implicit assumptions
that we end up being political about
inside orgs and we will start to make
those agent readable and they will start
to become fungeible and we will start to
have fewer issues. Practicing
specification engineering is a way for
us to clearly describe intent at
organizational scale and clearly
translate that intent in a way that
agents can read it. And yes, it nests
down to individual agent runs and it
ladders up to the full organizational
context. And that's why it's the last
and most difficult skill to learn. The
progression here from prompt craft all
the way up to spec engineering is not a
ladder where you can abandon lower
rungs. It's a stack where each layer
makes the layers above it possible. You
cannot write good spec if you can't
write good prompts. You can't build
effective agent systems if you don't
understand context engineering. You
can't align agent behavior with
organizational goals without
understanding how intent works and how
that plays into context engineering.
They all go together. There's a final
dimension to this that goes beyond AI,
and I want to spend some time on it. I
hinted at it when I talked about the
idea that Toby found that he
communicated better when he got better
at prompting. The best human managers
that I've worked with already operate
with that degree of clarity. They give
complete context when they delegate.
They specify acceptance criteria to
their team members. They articulate
constraints. They're effectively
following the four disciplines of AI
input with their people. And that makes
for effective leadership. What's
happening right now, I think, if we step
back, is that AI is enforcing a
communication discipline that the best
leaders have always practiced
intuitively and now everyone needs it in
order to be effective. You cannot just
rely on shared context with the machine.
You cannot just assume that AI will
know. And that is something that is a
gift to us because so many of our
colleagues don't know what we mean
either. How many times have you sat in a
meeting where someone is referring to a
document and you don't know what that
document is and you're afraid to ask?
That is a wonderful example of the kind
of poor communication quality that goes
into human meetings. This is not a
framing you'll see in a lot of how to
prompt courses. I think it should be.
The skill of providing highquality input
to intelligent systems turns out to be a
skill that's translatable for AIs and
for humans. It turns out to be a
fundamental skill of the agent age that
benefits us as humans and how we work
together. I think the people who develop
these skills, this collection of skills
around prompting for 2026 are going to
end up being the leaders who will run
organizations where agents and humans
both perform at their ceilings. And the
people who don't, the people who are
stuck in 2025 prompting skills are going
to wonder why their AI investments keep
producing partial value. And meanwhile,
their human teams keep having alignment
issues. The prompt by itself is dead.
The specification, the context, the
organizational intent. That is where the
value in prompting is moving toward
because agents are starting to work for
longer and longer periods and look in a
lot of ways like junior employees. the
specification done right turns out to be
just what clear thinking has always
looked like really made explicit because
machines don't let us be lazy about it
and I'm really excited for the way that
kind of communication clarity can clean
up our organizations and our humanto
human communication as well. Good luck
with prompting the humans and agents in
your life. Cheers. And yes, there's lots
more on this on the Substack. I think
this is one that needs a really complete
guide. So, I wrote up a lot of extra
stuff for this so that you can dive into
انقر على أي نص أو طابع زمني للانتقال إلى تلك اللحظة في الفيديو
مشاركة:
معظم النصوص تصبح جاهزة في أقل من 5 ثوانٍ
نسخ بنقرة واحدة125+ لغةالبحث في المحتوىالانتقال إلى الطوابع الزمنية
الصق رابط YouTube
أدخل رابط أي فيديو YouTube للحصول على نصه الكامل
نموذج استخراج النص
معظم النصوص تصبح جاهزة في أقل من 5 ثوانٍ
احصل على إضافة Chrome
احصل على النصوص فوراً دون مغادرة YouTube. ثبّت إضافة Chrome للوصول بنقرة واحدة إلى نص أي فيديو مباشرةً من صفحة المشاهدة.