Frontier AI agents leverage several key components and protocols to achieve advanced capabilities in task planning and code generation with minimal human intervention. Understanding these elements is crucial for comprehending how agentic AI functions.
Mind Map
Nhấn để mở rộng
Nhấn để khám phá sơ đồ tư duy tương tác đầy đủ
Frontier AI agents, they're pretty capable.
They're really good at planning out tasks and writing code with minimal human involvement
but there are a handful of specific pieces under the hood that enable this.
So let's cover five of those pieces, the five terms you need to know about agentic
AI and let's start with stuff that's inside the agent that kind of shapes how it behaves.
Inside an agent of course there is a model, a large language model.
That's what's doing the actual text generation and the reasoning and by
itself well it's just a conversational partner.
What turns it into an agent is the instruction layer that's wrapped around the model.
So that brings us to term number one, term number one that you need to know, that is agents.md.
So what's that?
Well, .md, that's markdown, so it's just a text file.
It sits at the root of a project, and whenever the agent starts work in that project,
it reads whatever is in that agent's .mdfile.
Now the file tells the agent things like which commands to run for tests
or which coding conventions this code base uses.
So we can really think of this as being kind of like a...
Readme file but it's a readme files specifically written for agents.
It tells the agents things like specific setup commands to use
and any code style rules or maybe how a PR title should be formatted.
So the agent executes the commands it finds in agents.md when they're contextually relevant.
So if the file says run PMPM test before committing well then the
agent will run PMPM test before it does a commit.
And agents.md files can also be nested, meaning there can be multiple of them.
So maybe we have one at the root and then multiple
other ones for sub-projects with its own set of rules.
And files that are closer to the working directory
override the earlier ones because they appear later.
Now agents.md was introduced by OpenAI and later contributed to the agentic AI
foundation that runs under the Linux foundation.
Now a quick wrinkle worth mentioning here some agents use a different
file name from agents.md so Claude for example does this.
Claude's one that is actually called Claude.md because of course it is so
it's different name but it's more or less the same idea.
So agents.md is read by an agent every time it starts work in a given project.
But what about knowledge that the agent only needs sometimes and isn't necessarily project specific.
So let's say the agent needs to know how to build a PowerPoint deck.
Well, loading all of that context every single time the agent starts,
that would just really clog up the context window for no real reason
if the task at hand has nothing to do with PowerPoint slides.
So that brings us to term number two and term number two is
agent skill so what's that well an agent skill is
a folder and inside that folder is a file that file is called skill.md.
So .md again that's more markdown now also in that
folder is whatever scripts or resources the task needs and then inside skill.
Md is some metadata including a description.
And that tells the agent something like, invoke me when the user wants to X.
So X could be when the use wants to make a PowerPoint.
And if the user's request matches that description, the agent pulls the skill in.
If it doesn't match, well, the skill is just gonna kind of sit
there out of the way, not taking up any context.
Agent skills are another open standard and they're supported by multiple agent platforms.
Agents.md, that's how a specific project works,
and an agent skill tells the agent how to do a specific kind of task.
All right, so that's two of our five terms down.
The agent now knows what to do, but doing things also means reaching outside the box,
as in outside the AI agent itself.
So that's where we're going to go next.
So agents need to reach all kinds of external things like APIs or databases
or developer tools or SaaS platforms you name it.
And the challenge here is that every one of those targets might have its own interface.
So without some kind of standard every AI agent would need a custom
connector for every external thing it might touch which would be a mess.
So that brings us to term number three, MCP - Model Context Protocol.
Now MCP is an open protocol for connecting AI applications to tools and data sources
and workflows and it comes with something called an MCP server.
Now an MCP server wraps up a tool or a data source in a standard interface
and any agent that can speak MCP can now talk to that tool.
So let's say an agent needs to pull data from it needs to go to something in Notion.
So we've got Notion here, or maybe it needs to go a Stripe payment link, whatever the backend is.
Well, the agent speaks MCP to the server and it's
the server now that handles the underlying API for in this case,
Notion. Now, MCP started at Anthropic and is now governed under the AAIF,
again at the Linux foundation.
And it has broad industry support.
So that covers agents talking to tools and data.
What about agents talking other agents?
Well, time for term number four.
That is A2A.
Otherwise known as agent to agent.
So A2A is an open protocol for agent to agent communication.
So let's kind of think of a scenario for using this.
Let's say we've got a procurement agent here and that handles vendor contracts.
And then maybe we've also got a finance agent over here and that approves spend.
And yeah, I know financial processing stuff, trying to.
Contain your excitement but the the procurement agent needs to negotiate a contract
and then it needs to hand off to the finance for approval and
without A2A these two agents would need some form of custom integration
or they wouldn't really coordinate very well but with A2A each agent publishes something called an
agent cart. And that's just basically a description of what the agent does and how to talk to it.
And other agents can read that card and then figure out how to delegate work.
The procurement agent in this case is going to find the agent card and
read it for the finance agent and then hand off the contract.
So that's A2A and this A2A standard comes from Google.
It's now also an open standard under, you guessed it, the Linux foundation.
So MCP is how agents talk to tools and data and A2A is how agent's talk to each other.
All right, so how we're doing here,
now the agent knows what to do and it knows how to reach outside of its borders.
What else?
Well, sometimes one agent just isn't enough.
Maybe the task is too big for one context window,
so say the agent's reviewing a code base with thousands of files loading every file,
that would blow out the context on its own.
Or maybe the work is embarrassingly parallel,
like you've got to run a check on 20 different functions and each check is independent,
and you could do those one at a time but that's slow,
doing them all at once would be 20 times faster.
So, term number five that you need to know.
It's subagents, which means using and spawning multiple agents.
So a subagent is a child agent that the main agent spawns to do a specific piece of work
and each sub agent runs in its own fresh context window,
it does its job and it returns a result when it is done .
So this main agent here, it could spawn a sub agent and give it some work to do.
Let's say go read 500 files,
and then just kind of hand back to the main agent a summary of those files.
So that would keep the main agents context window pretty clean.
And we could have lots of agents in parallel, maybe we've got like 20
agents here running in parallel handling 20 independent checks at the same time.
Now, sub agents are a little bit different from the other
four terms because sub agents are a common pattern in modern
agent systems but they don't really have a formal standard document behind them.
But the concept shows up almost identically everywhere.
I mean the very basic idea is you have this big parent agent here.
That parent agent spawns one or more child agents.
The child gets the same context.
The child does whatever work it was told to do
and then it returns a result and the parent carries on.
With its context intact.
So there we've got five terms.
We've got agents.md and agent skills, which live inside the agent and they shape how it behaves.
We've go MCP and we've go A2A.
That's how the agent reaches outwards to tools and to other agents.
And we've gone sub agents.
That's the agent handles the work that doesn't fit into one context.
That's what a front-end AI agent actually looks like
under the hood today.
Nhấn vào bất kỳ đoạn văn bản hoặc mốc thời gian nào để nhảy đến phần đó trong video
Chia sẻ:
Hầu hết transcript sẵn sàng trong dưới 5 giây
Sao Chép 1 Chạm125+ Ngôn ngữTìm kiếm nội dungNhảy đến mốc thời gian
Dán URL YouTube
Nhập link bất kỳ video YouTube để lấy toàn bộ transcript
Form Trích Xuất Transcript
Hầu hết transcript sẵn sàng trong dưới 5 giây
Cài Tiện Ích Chrome Của Chúng Tôi
Lấy transcript ngay mà không cần rời khỏi YouTube. Cài tiện ích Chrome để truy cập transcript của bất kỳ video nào ngay trên trang xem, chỉ với một cú nhấp.