This content introduces a novel approach to building complex software projects using AI agents, overcoming the limitations of context windows and manual orchestration by employing a two-phase agent system with automated testing and progress tracking.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
Hey there. This is not the Claude
website. This is actually a onetoone
clone that I built with over 200 unique
features. And I didn't write a single
line of code myself. An AI agent built
this entire thing while I was sleeping.
So, here's the problem. When you're
building a project this big, we're
talking full conversations,
projects, artifacts, file uploads, all
of it, you hit the wall pretty quickly.
the context window fills up and the
agent loses track of what it was doing.
And if you've tried to build anything
this substantial with coding agents
before, you know exactly what I'm
talking about. And compacting the
conversation is just not good enough.
The workaround that a lot of people use
is to manually orchestrate everything.
You would create an implementation plan
using your agent, maybe store that plan
somewhere in your project folder. You
could even use something like specit and
bmat to do this. And you then get the
agent to implement these features one by
one. You then clear the conversation
after each session and ask the agent to
implement the next feature. Rinse and
repeat. This works, but it's exhausting,
especially for larger projects. You're
effectively babysitting the agent the
entire time. What I'm about to show you
is completely different. You give your
requirements once and an initialization
agent will break everything down into a
detailed feature list. And then coding
agents take over implementing one
feature at a time, testing, committing
the changes, clearing the context
window, and picking up the next feature
automatically. This even does regression
testing before moving on to the next
feature. This ran for hours while I did
absolutely nothing. And by the end of
the process, we had a fully functional
clone of the Claude website. In this
video, I'll show you exactly how to set
this up yourself. I've really simplified
the process so you don't have to be a
developer to follow along. And as an
added bonus, I'll show you how to
integrate with NATN to get realtime
updates as your agent is making
progress. In this instance, the agent
sent me notifications to Telegram every
time it completed a new feature. This is
all based on an article written by
Anthropic about an effective harness for
long-running agents. This is a brilliant
article and I actually recommend you
read it. It's all about getting agents
to perform tasks that would take a lot
of time and context. As AI agents become
more capable, developers are relying on
these agents to implement way more
complex tasks. And these tasks can take
hours if not days to implement. So the
challenge when you're using something
like specit and bmad or even just the
planning mode in your IDE is that agents
will actually have to work in sessions
because the context window will fall up
as it's working through the solution and
at some point the quality is going to
decrease and you might actually have to
compact the session which will summarize
the conversation dropping off a lot of
important context. emerging software
engineers working in shifts where each
new engineer arrives with no memory of
what happened in the previous shift.
That is exactly the problem here. Even
if you clear the context and ask the
agent to implement the next feature, it
has no idea what's been implemented
already. So what this project proposes
is that we use a two-fold solution where
we can use something like the claw agent
SDK to plan and implement the solution
in two phases. First, we'll have an
initialization agent which will
basically take in your prompt and create
a feature list from that and it will
also set up the basic project structure.
Once that's done, the framework will use
coding agents to implement the features
one task at a time. So, these agents
will make incremental progress in every
session. Now, they don't mention it
here, but something I really like about
their solution is when the coding agent
starts a session, it will pick two
features that have already been
implemented at random and do regression
testing on them and then fix any issues
before moving on to the next feature.
So, you can definitely read through this
article, but what I do want to focus on
is their quick start where they give you
access to an example project that
implements all of this. Now, the setup
process is not too complicated, so you
can definitely try this yourself, but
I'm actually going to show you an even
easier way to get going. In the
description, you'll find a link to this
repository. I simply took their project
and modified it slightly, so it's a bit
easier to work with. So really all you
have to do is click on code and you can
either download this as a zip file or if
you've got get installed simply copy
this link then extract the contents of
that zip file and then open the folder
in a code editor. I'm using cursor but
you can use VS code or whatever editor
you want. Now the project is really
straightforward. There's a bunch of
Python files like agent autonomous agent
demo and the client file. This basically
uses the agent SDK to set up this entire
project. Now, one file you might want to
go through is the readme file. This is
where I give you detailed instructions
on how to set everything up. So, there
are a few dependencies that we have to
install and it also shows you how to set
up any environment variables and finally
how to start this project. But we'll go
through all of that in detail. Now,
since this project uses Python, I do
recommend setting up a virtual
environment. If you're new to Python,
this is really easy to set up. Let's
create a new terminal window. And in the
terminal, let's run Python. If you're
using Mac and Linux, I think it's Python
3, but for Windows, it's just Python.
Then dash M then venv
space Venv.
So, it looks something like this. This
will create a new virtual environment
within this folder. Now, we have to
activate this virtual environment. On
Linux and Mac, it's this command. Or if
you're using Windows like I am, the
command looks something like this. So,
then press enter. And if everything was
done correctly, you should see the
virtual environment name over here. So,
why do we need a virtual environment?
Well, we're going to install a whole
bunch of Python dependencies. And by
using a virtual environment, those
dependencies will only be installed in
this project. So it's only scope to this
project. If you don't activate the
virtual environment, everything will
still work. But all of these
dependencies will be installed globally
on your machine, which could affect
other projects or scripts on your
machine. So really, this is not a lot of
effort. Just activate your virtual
environment. So let's install our Python
dependencies by running pip install and
requirements.xt. txt. Now again, all of
this is in that readme file. Cool. We've
now installed the project dependencies.
Now, this framework uses the anthropic
models for the initialization agent and
the coding agent. This also means we
have to provide an anthropic API key.
And if you're using the quick start from
anthropic, they only allow you to use
the API key, which can actually be
really, really expensive. But I'm going
to show you a way cheaper solution.
First, let's rename this. env.example
file. So let's rename it to env. Now in
this file you have a choice of two
variables. We can either provide the
anthropic API key which is the default
or we can use our claude code o orth
token. So if you're already using claw
code and you've got a claw subscription
you can simply piggy back on your
subscription. And trust me this agent
uses a lot of tokens and it runs for
hours. So, in my opinion, using the
Anthropic API key is simply not an
option. So, if you've got the basic $20
claw subscription, you can run this
process for hours and for days and for
weeks without ever going over that
subscription cost. So, I'm actually
going to comment out this anthropic API
key and I'm going to use my Claude code
subscription instead. Now, I had no idea
that you could use the Claude code
oorthth token in the agent SDK. So, I do
want to give a shout out to a friend of
the channel, WebDev, Cody. He worked
with me on Discord to get all of this
working and he's got some brilliant
content on aentic coding. Cody also has
a fantastic course on learning how to
use aic coding to build full stack
applications. So, definitely go to
aentic jumpstart.com and tell him Leon
sent you. I'm not getting paid for this
at all. He's a good friend of the
channel and I highly recommend to check
his stuff out. So just run the command
claude setup token. You will be asked to
authorize this token. So just click on
authorize. You can now close the browser
window. Then in the terminal you can
simply copy the token and add it to the
env file. Now before we move off the
file you will also notice this optional
variable for process n web hook. So if
you want you can uncomment this variable
and provide a link to your nadn
instance. So as the agent is making
progress, it will send some valuable
status updates to this endpoint and then
you can do whatever you want with it.
You could email the results to yourself.
You could send updates to Telegram,
whatever. I'll simply leave this
commented out for now. Now we can
finally test this application. Now this
prompts folder is really important. This
contains three files. The appspec which
is critical. This appspec file actually
drives the entire solution and this is
something you have to provide. So this
is where you can explain what the
project is about. So you've got this
overview section, the text stack for the
front end, the back end, communication
layer. We can also specify prerequisites
and of course all the core features. And
this is a massive list of features. Now
don't worry, you don't have to type all
of this stuff out by hand. You can of
course just simply give this file to a
agent and say hey here's an example
appspec file. You can replace all of
this with my apps requirements. And of
course on my channel we have a look at
very cool ways to simplify this even
further. I'll show you in a second. Now
we also have this coding prompt file and
this will be used by the coding agent.
The same with the initializer prompt.
Now you don't really have to modify
these files. I personally made quite a
few changes to these files in this
project because I actually used this
extensively in the last week and I felt
that the anthropic demo actually still
had a few gaps in it. As an example, I
noticed that the coding agent would
create the app with a whole bunch of
pages and these pages would show
results, but those results were all
hardcoded mock data. And when the agent
did testing, it looked at the page and
it simply said, "Oh, it looks like
everything is working. The page is
showing up and I can see a bunch of
values. But at no point did it consider
that this might be mock data and that
mock data needs to be replaced with real
time data. So I added a lot of steps in
these prompts to force the agents to
ensure that the data that is looking at
is actually real. Now the only thing you
might want to change yourself in this
initialization prompt is this section
where it says you need to create a
feature list with 200 detailed test
cases. Now, this really depends on your
application. If you're building a simple
to-do list app that only you will use,
then you definitely don't need 200
features, right? Or if you're building
something massive like an enterprise
scale application, you might want to
bump this up to 500 features. Now,
again, I'm giving you a really simple
way to automate all of this. So, instead
of trying to type out all of this
manually, I added a custom prompt to
this claude folder. This create spec
file. Now, this is a really detailed
prompt, but this is going to help the
agent populate all of the stuff for you.
So, let's open up our terminal. I'm
actually just going to open up another
session and I'm going to start cla code.
So, all we have to do is run the custom
command front slashcre spec. Right? So,
the agent's going to ask us a few
questions like what do you want to call
this project in your own words? What are
you building? And who will use it? Just
you or others too? So this will tell the
agent whether or not user authentication
is required. Help me build an
application that I can use to come up
with unique YouTube titles. So I will
provide the topic and idea of the video.
And this app will then call open router
to generate unique YouTube ideas. And
what I also want is for a second agent
to review the titles to give feedback to
the first agent. And then that agent
needs to rewrite the titles until we get
really good high clickthrough rate title
ideas. Only I will use this application
and no one else. We can just call this
title smmith. I don't know something
like that. So let's simply run this. And
I'm currently in editing mode. It really
doesn't matter. If you want you can just
go into planning mode to make sure the
agent won't accidentally make any
changes. So this custom prompt will
force slot code to ask you clarifying
questions and I really love this. So you
can choose between quick mode and
detailed mode. In quick mode, we can
describe the app at a high level without
really providing any details on the
technical architecture. This could be
ideal for vibe coders or for someone
that really doesn't understand this tech
stack. Or if you really want to dive
into the weeds of how everything should
work, you can go into detailed mode.
I'll just go with quick mode. So how
complex is your application? So simple,
medium or complex. By the way, this will
determine how many features we will add
to this initialization prompt. So this
value over here. But as you can see, I'm
really trying to abstract all of that
away. So let's just say simple. Any
technology preferences or should I
choose sensible defaults? I'll just go
with defaults. Right. The agent is
asking us a few more questions like how
do we envision the output to work and
the generation process. I'm actually
just going to say you choose. Of course,
in your application, you probably want
to be a bit more involved in this, but
for tutorial sake, let's just get the
agent to decide. And cool. So, this app
spec file was updated. The project name
is now titlesmith with a proper
overview. And our agent now populated
the text stack. So it covers the front
end, back end, the prerequisites,
security and access control, and of
course all of these key features. And
looking at the initializer prompt, our
agent decided to create a 150 unique
test cases. So now that we have our
appspec, we can finally go ahead and
implement this solution. And for this,
let's go back to that Python
environment. Now to start this process,
we have to run the following command. In
fact, let's go to the readme file under
quick start. We can simply copy this
command and let's paste it into the
terminal. Now all we have to change is
the name of the project folder. So I'll
just call this title Smith. And that's
really it. Let's run this. The
initializer agent is now running. And
this is going to create a subfolder. So
if we go to the generations folder, we
can now see a subfolder called
titlesmith. And the initializer agent is
now doing a lot of work. It's going to
create a feature list file. And by the
way, this can take a few minutes to
complete. These feature list files are
massive. It will then also set up the
basic project structure, right? Our
initializer just created this feature
list file. So, let's have a quick look
at it. This file is massive. And for a
small app like this, this file is
already 1,922
lines long. Each and every feature
contains a description on what it is, as
well as all of the steps needed to
implement this feature. And each feature
also contains a property called passes
which is false by default. So as the
agent works through this list, it will
implement a change, test it, and then
set passes to true. It will then move on
to the next feature. What's really cool
is that these coding agents have
instructions to retrieve two features
that have already been implemented by
random and then do regression testing on
those features and fix any bugs. So this
means that if any feature actually broke
one of the existing features, the agent
will automatically pick up this issues
and address it. Besides for the feature
list, this initializer agent will also
set up all of the project dependencies.
So it will create the project structure
and install any dependencies. All right,
so the initialization agent has now set
up the project and the feature list file
and now it's updating this cla progress
text file. This file is really useful
for keeping track of the current
progress. Now, this is really where the
fun begins. The agent SDK is now going
to use the coding agent to implement all
of these features. And honestly, you can
now step back and let the agent do its
thing. This coding agent will now have a
look at the feature list and retrieve
any features that have not yet been
implemented. So, any feature where
passes equals false. It will then look
at the highest priority feature and
implement that first. It will also do
regression testing on any features that
have already been implemented. Now,
there are a few things that I do want to
mention about the coding agent. First,
if we go to this autonomous agent demo
file and we scroll down, we can see that
we're currently using opus to implement
this project. By default, the anthropic
demo actually uses sonnet. So, if you
prefer to use sonnet, you can simply
comment out this line and save this
file. But honestly I just prefer Opus.
Then the second thing is if we go to
this client file we can see all the MCP
servers and tools that are available to
this agent. So if we go down to this
claude SDK client section here we can
see all the MCP servers. The anthropic
demo actually uses Puppeteer for end to
end testing but I did a sideby-side
comparison and Playright is way faster.
I'm not sure why they decided on
Puppeteer. Maybe you can tell me in the
comments. But honestly, Playright was
just so much faster. And you might be
wondering, well, what is Puppeteer and
Playright used for? This coding agent
really likes to do end to-end testing.
It does this by opening the browser
window. Then it takes a screenshot of
the browser window and it uses the
agent's vision to analyze the image and
it will then determine if there's any UI
issues, etc. Now, I find that process to
be really slow. So I'm actually running
playright in headless mode. The agent
will still be able to see all the
elements by actually just looking at the
HTML code. But if for some reason you
want the agent to use the browser, you
can simply comment out this first line
and add back the second line. So this
will run the playright MCP server where
it will actually use the browser window.
And I'm just providing a viewport size.
So the screenshots are not too big. Now,
this process can run for hours, days, or
even weeks. It really depends on how
large and complex your project is. Now,
I personally wanted some way of
receiving updates every time the agent
makes progress. I don't want to go and
babysit my monitor and see what's going
on. So, this is totally optional, but if
you want to receive notifications, I've
actually integrated NN into this
workflow. So in the env file there's
this progress nitn web hbook URL
variable. I'm actually going to comment
this out and I'm going to stop this
process just for now so that I can
actually show you how to implement this.
By the way, you can stop and resume this
workflow at any time. You just press
Ctrl C to stop the process. And as you
can see here, to resume, simply run the
same command again. So we'll restart it
in a second. I'm just going to save this
env file. And now all we have to do is
provide this NN web hookbook. Again,
this is totally optional. You're more
than welcome to let this process run in
the background, but I personally want to
receive notifications. So, of course,
the first thing you need to do is open
up N8N and create a new workflow. If you
don't yet have an NA instance, then what
you can simply do is use the link in the
description to go to this page.
Hostinger is without a doubt the
cheapest way to host this N8N instances.
So what you can do is choose a plan like
the KMV1 plan is only $5 per month. I'll
go with the KMV2 plan. Select your
application as N8N and then under the
discount code you can enter the code
Leon and this will give you an
additional 10% off. You don't have to go
with 24 months either of course. You can
just go monthtomonth or maybe a 12-month
period. Then simply continue with the
checkout process. Then after setting
your root password, hostinger will build
your NAT instance and you'll have access
to this dashboard. All you really need
is to click on manage app and you will
now have access to your very own N8
instance. How awesome is that? Cool.
Let's create our workflow. I'll just
give it a name like autocoder
notifications. Then let's add our
trigger node. And for this we need the
web hook trigger. Let's change the
method from get to post. Let's give it a
path name like autocoder.
And that's actually it. What you can do
then is grab your production URL. Let's
just copy this and let's add that to
this variable. And the last thing we
have to do in N8N is to simply save this
workflow and let's activate it as well.
So let's restart this process. Now
thankfully it won't run the
initialization agent again as it's
already run. The coding agent will
simply pick up from where it left off.
And as this agent is working through
these changes, I can already see that
N8N was triggered. So if I go to
executions, I can see one execution
executed already. This is everything our
autonomous agent just sent to N8N. So it
includes this body property which
includes the name of the event, how many
tests are passing, how many there are in
total, the percentage completed, as well
as a list of completed tasks. And now of
course then you can use that information
to send emails or WhatsApp messages or
telegram messages to yourself. The sky
really is the limit. So I decided to
send telegram messages and I just sent
like the project name, the tests
completed and whatever else. And that
resulted in something that looks like
this. So it's got the project name, the
list of tests that were completed, the
total tests, etc. And this way I could
get notifications to my phone every time
something was implemented. If you are
curious to see how I implemented that
Telegram integration, then you can
download it from my community which I'll
link to in the description of this
video. I hope you found this video
useful. If you did, hit the like button
and subscribe to my channel for more
Claude Code and Agentic Coding content.
Thank you for watching. I'll see you in
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.