0:02 Ralph Wigum is absolutely blowing up. We
0:04 made a video about it last year and
0:06 since then it's all anyone is talking
0:08 about on Twitter. Matt PCO has made
0:10 loads of videos on it. Ryan Carson has
0:12 written a very popular article on it and
0:14 Rasmike has built on it with his Ralphie
0:16 bash script. But is everyone doing it
0:19 wrong? The creator has already said that
0:21 some implementations are incorrect. So
0:23 what's the correct way to do it? And why
0:25 is Ralph currently the best way to build
0:28 software with AI? Hit subscribe and
0:29 let's get into it.
0:32 The Ralph loop was created by Jeff
0:34 Huntley and written about way back in
0:36 June last year. It is essentially a bash
0:38 loop that gives an AI agent the exact
0:41 same prompt over and over again. But
0:43 it's genius on so many levels because it
0:46 lets the AI agent work in its smartest
0:48 mode, which is the mode where it has as
0:50 little context as possible. Take a look
0:53 at this. So let's imagine this is the
0:55 total context window for an agent. From
0:58 zero to about 30% is what we'll call the
1:00 smart zone, which is where the agent
1:03 performs the best. From about 30 to 60%,
1:05 it still performs really well. And from
1:08 60% onwards, so 60, 70, 80, 90, that's
1:10 when it starts to degrade. We'll call it
1:12 the dumb zone. Now, these numbers aren't
1:13 set in stone and could be different per
1:15 model. So the smart zone for a certain
1:18 model could be 40, 50%, but usually over
1:20 80% context window, that's when the
1:22 dumbness starts to begin. So for Claude
1:25 Summit or Opus, the typical tokens for a
1:28 context window is 200,000. So we can say
1:30 the first 60 is the smart zone. The next
1:32 60 is still okay, but not as good as the
1:35 first 60k tokens. And then the last 80k,
1:37 it doesn't seem to perform as well. Now
1:39 this is my personal experience with this
1:40 model. You might have had other
1:43 experiences. And the reason for this is
1:44 because the model itself is what we call
1:47 auto reggressive. Meaning it has to look
1:49 at the previous tokens to predict the
1:50 next one. And if you have loads and
1:52 loads of tokens, it has to go through a
1:54 lot of them to find out the important
1:57 bits that are relevant to the next task
1:58 at hand. Now, let's focus on the first
2:01 30%. Even before you write your first
2:02 prompt, there are some things that get
2:03 added to the context window
2:05 automatically. First is the system
2:07 prompt and then the system tools. These
2:10 on a typical claude model take 8.3% and
2:13 1.4% of the context. So almost 10% of
2:15 this 30. And then if you have skills
2:17 that can get added and also if you have
2:19 custom MCP tools. Finally, if you have
2:21 an agent MD file, that gets added, too.
2:23 And the larger, of course, for any of
2:25 these things. So, the larger the MD
2:27 file, the more tokens it will take up.
2:28 And this is all even before you've added
2:31 your own prompt. So, in general, it's
2:33 best to keep this section as small as
2:35 possible. So, have less tools, have less
2:37 skills, and have less in your agent MD
2:40 file so that the model is working at its
2:42 most optimum context. And to get an idea
2:44 of exactly how much 60k is, if we were
2:46 to get the whole script of Star Wars a
2:49 new hope, that is about 54,000 tokens in
2:51 GPT5. So roughly this amount. Now you
2:54 may be wondering what about compaction?
2:55 Can that help with this whole process?
2:57 And we'll talk about that a bit later.
2:59 But now let's move on to exactly how
3:01 Ralph can help with this. So the benefit
3:04 of Ralph is that you focus on one goal
3:06 per context window. So the whole 200k
3:08 context window, we can dedicate that to
3:10 one goal or one task. And the way we do
3:12 that is we write a prompt that will
3:14 first say inspect the plan MD file. This
3:16 contains the tasks to be done. So
3:18 something like create the front end,
3:20 create the back end, do the database and
3:21 so on. That is a very high level
3:23 example. Of course, you'd go way more
3:24 detailed if you were doing Ralph and
3:26 more granular, but we'll stick with that
3:28 example for now. So this prompt will
3:30 tell the agents to pick the most
3:32 important task, then make those changes.
3:35 After making those changes, run and even
3:37 push and commit those changes as well as
3:38 doing a test. And once you're done with
3:41 those, once the tests have passed, then
3:43 tick the task as done in the plan ND
3:45 file and do that again. So the agent
3:46 will keep looking for the most important
3:49 task to do until it's completed all the
3:51 tasks. Now, actually, let me take that
3:53 back because you could keep having the
3:56 Ralph loop go over and over again even
3:58 if it has completed all the tasks. And
4:00 the benefit of that is that it may even
4:02 find things to fix or find features to
4:05 add that don't exist in the plan and
4:07 defile. But if it is going off the
4:09 rails, the benefit of having Ralph is
4:10 that you can stop the whole process
4:12 whenever you want, adjust the prompt MD
4:14 file and then run the whole process
4:16 again. And Ralph makes this so simple
4:18 because this whole process is executed
4:21 in one single bash while loop. So here
4:22 it just catchs the prompt MD file. So
4:24 prints it to the agent and then runs
4:26 claude in yolo mode. Of course, the flag
4:28 isn't yolo. It's dangerously skip
4:30 permissions, but for the sake of space,
4:32 I've kept it short. And what makes Ralph
4:34 special is that it's outside of the
4:36 model's control. So the model can't
4:38 control when to stop Ralph. It will just
4:41 keep going. And that way you can ensure
4:43 that when a new task runs or when a new
4:45 prompt is triggered, the context, so
4:47 here is pretty much where it is when you
4:49 first open the agent. So this is fresh.
4:51 It doesn't have any compaction. It
4:53 doesn't have anything added. So each new
4:55 task gets the most amount of context and
4:58 uses the model in its most smart or most
5:00 optimal context window stage. Basically,
5:02 compaction is where the agent will look
5:04 at all the tokens that have been written
5:06 in the context window and pick out the
5:09 most important bits for the next prompt.
5:11 So, it will pick what it thinks is most
5:14 important, but it doesn't know what is
5:15 actually most important. Therefore,
5:17 compaction might lose some critical
5:19 information and make your project not
5:22 work as expected. Anyway, now that we've
5:24 seen the canonical Ralph loop
5:26 implementation from the creator, this
5:29 helps us see why other implementations
5:31 are different. Let's take a look at the
5:33 anthropic implementation, which uses a
5:36 slash command to run Ralph inside of
5:39 Claude's code, has max iterations, and a
5:41 completion promise. So, the problem with
5:43 this specific Ralph Wigan plug-in is the
5:45 fact that it compacts the information
5:47 when it's moving on to the next task.
5:49 So, if it finishes one task and reruns
5:51 the prompt instead of completely
5:53 resetting the context window, it
5:55 compacts what was previously done,
5:57 therefore could lose some vital
5:59 information. There's also the slight
6:01 issue of having max iterations and a
6:03 completion promise because sometimes
6:06 it's nice to just let Ralph keep going.
6:08 It can find very interesting things to
6:09 fix that you wouldn't have thought of
6:12 before. And if you watch it, so be a
6:14 human on the loop, you may see patterns,
6:16 good or bad, from a specific model that
6:18 you can tweak and enhance in your
6:20 original prompt. If we take a look at
6:22 Ryan Carson's approach to the Ralph
6:23 loop, we can see here that it's not
6:26 quite canonical simply because on each
6:28 loop, it has the possibility of
6:30 adjusting or adding to the agents.mmd
6:32 file. Now, depending on the system
6:34 prompt or any user prompts you've added
6:36 to the model, in my experience, by
6:38 default, models can be very wordy. And
6:40 so, if on each iteration you're adding
6:43 to the agents file, which gets added to
6:45 the context at the beginning of each
6:46 user prompt, then you're just adding
6:48 more tokens into the context window,
6:50 pushing the model into a place where it
6:52 could potentially give you dumb results.
6:54 But the fact that people are making
6:56 their own scripts from the basic Ralph
6:58 loop bash script is a testament to how
7:01 simple and easy it is to understand. And
7:03 although there is a canonical way of
7:05 doing Ralph, I think it's okay for
7:07 developers, teams, and companies to
7:10 tweak it to their specific use case. For
7:12 example, I love the fact that in Raz
7:14 Mike's Ralphy script, there's a way to
7:16 run parallel Ralphs and also the fact
7:18 that you can use the agent browser tool
7:20 from Vel to do browser testing. I also
7:22 love the fact that in Matt Perox's
7:24 version of Ralph, he adds tasks or
7:26 things to do as GitHub issues and the
7:28 Ralph loop will pick the most important
7:30 one, work on it, and mark it as done
7:31 when it's complete before working on the
7:33 next one, which I think is really
7:35 clever. I think the power and simplicity
7:37 of Ralph means that it's going to stick
7:39 around for a very long time, and you
7:41 also may see a lot of iterations and
7:43 improvements from it. I really like the
7:46 way Jeffree is taking this with his Loom
7:48 and Weaver project where he wants to
7:49 create a way to make software
7:52 autonomously and correctly. But with all
7:54 these Ralphs autonomously creating new
7:56 software, you need a way to search for
7:58 errors and make sure they get fixed.
7:59 This is where Better Stack comes in
8:02 because not only can it ingest logs and
8:03 filter out errors from them, but it can
8:05 also handle error tracking on the front
8:07 end. So with this MCP server, you can
8:10 ask an agent to specifically pick out
8:12 errors from the front end or back end
8:13 instead of reading through the whole
8:16 log, which in turn reduces the context
8:18 window. So go and check out better stack
8:19 and let me know what you think in the comments.