Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
I Built the Ultimate Browser Agent with No Code (n8n + Airtop) | Nate Herk | AI Automation | YouTubeToText
YouTube Transcript: I Built the Ultimate Browser Agent with No Code (n8n + Airtop)
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Video Summary
Summary
Core Theme
This content demonstrates an AI agent capable of performing human-like actions within a web browser, using natural language commands to navigate, search, and extract information, powered by the Airtop platform.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
This AI agent can use a computer just
like a human would. As you can see, if
we take a look at its browser tools, it
can start up a browser. It can click,
type, query, load windows, and also end
those sessions. So, the whole point of
this agent is to take web actions with
completely natural language. Let's hop
into a quick demo so I can show you guys
what I mean by that, and then we'll
break down how this agent works. So,
what I'm going to do here is ask this
agent to find posts on X about Google
V3. So, I wanted to move over our agent
because what it's going to do real quick
is it's going to use its start browser
tool to spin up a browser in airtop. And
when it does that, it's going to give us
a link in our Slack where we can
actually click onto that and we can
watch the agent move around in a
browser. So, it's going to be really
cool. So, now that that browser has
started, we got the link and it's going
to load up that session and then we're
going to be able to watch the agent
actually do things. So, right here you
can see that it's using its type tool.
So, it's probably going to move its
mouse all the way up here, click on the
search bar, and then type in Google V3,
and then it will hit enter. So, we can
search for Google
V3. As you can see, there goes the
mouse. We're going to click onto it.
It's typing Google V3, and bam, now we
are searching X. And if you guys will
notice, I'm actually logged in down here
in the bottom left. You can tell I'm
logged into one of my Twitter accounts.
So, that means that even if you need to
be on a site where you have
authentication, you can still automate
that with AirTop. And already the
browser is being ended right now because
we already got our results back. Okay,
so let's see what our agent comes up
with after it was able to search X. So
here's what the agent came up with based
on search results from X. Here's what I
found about Google V3. Found latest
features. V3 now includes native audio
generation capabilities. We found
availability and reception. It's
released in 73 countries through the
Gemini app. And finally, we found
technical notes. It builds upon previous
V2 capabilities. All right, so hopefully
that gives you guys a taste of what a
browser agent can do, we're going to
break down how this build works. I know
it may seem a little complicated, but
trust me, it's really easy. All we have
to do is use natural language. So, let's
break it down. But before we do that,
real quick, if you want to download this
template for free so you can follow
along with the rest of the video, all
you have to do is join my free school
community. The link for that is down in
the description. You will have to come
in here, search for the title of this
video, or you can click on YouTube
resources and find the post associated
with this video. Once you click on that
post, you'll see a JSON file right here
to download which once you download it,
you can import that into NIN and you'll
have this exact template right here. And
then all you have to do is plug in a few
credentials and you'll be good to go.
All right, so let me explain how this
works at a high level. We have an AI
agent that has the ability to start up a
browser. Once it starts up that browser,
it can then click in it, type in it,
look at it, load different windows, and
then it has to end the session at the
end. So, the way that it's actually able
to spin up these browsers is through
something called Airtop, which are these
nodes right here. And Airtop lets you
automate the web with just your words.
You can go ahead and sign up for free
using the link in the description. And
then you can also get 50% off for 3
months using code Nate half off, which
will also be in the description. And the
cool thing about AirTop is if you click
on automations and you go down here to
Nitn, there are so many Airtop templates
already out there for doing things like
commenting on X posts, building a list
of profiles, extracting information from
LinkedIn profiles. There's so many
different use cases for browser
automation. Anyways, let's start to
break down this workflow a little bit.
First thing I'm going to do is just
click into the system prompt and show
you guys how simple this is. So, the
overview is that you are a smart
advanced web agent and you have tools
that let you control a remote web
browser. Your primary goal is to fulfill
the human's request and you're able to
do that by using the different tools
that you have. So then the next thing I
did was I gave it its tools. So it has
the ability to start a browser which I
said always begin with this tool. It
returns a session ID and a window ID
which are required for the other tools
because other tools like click or type
have to know okay which actual browser
am I clicking or typing in. You also
have the tool to load a URL which loads
a website in the browser window. You
have a tool called query which basically
lets you scan the browser and you can
extract information like where's the
search bar or what's the menu options or
you know any information that you need
to retrieve like our xosts. You have a
tool called click which you use to click
on certain elements like buttons. You
have the tool called type which lets you
type into visible input fields like
search bars and it also automatically
hits enter after you're done typing. You
have a tool to end the session which
must be used every single time before
you respond to the human because we want
to turn off that browser session in
airtop. And then we gave it access to
the think tool which lets it think and
reflect on what it's done and what it
still needs to do which I found be
really helpful for browser automation.
And then I gave it a few tips that are
important like always think step by
step. Use the query whenever you're
unsure of what's on the screen and
you'll never need to log in. All right.
So let's just start off by looking at
that start browser tool. So you guys
will notice that this is not an airtop
node. This is an end workflow tool that
we configured ourselves. So I'm just
going to scroll over over here and this
is the actual start browser tool. So let
me just drag this over here so we're
looking at everything in one clean
screen. So when the browser agent calls
this tool to start a browser, it
actually goes through this flow to start
that browser. So what I'm going to do
real quick is pull up the execution from
that live demo we just did and we'll see
how it started up the browser. So here's
that data right here. It passed over
twitter.com so we could go to X and
scrape it. And then also passed over my
profile name which I set up in airtop.
I'll show you guys that later. But
that's how you can access something like
X or Instagram or whatever you want to
go through that requires a login because
you can set up that login once in airtop
and then your agent and end can access
that and do whatever you want. Anyways,
from there it's going to do two main
things which is set up a session and set
up a window. And once we've done that,
we get back both a session ID and a
window ID. And right now I'm in my
AirTopp dashboard. If I click on browser
sessions, you can see all the previous
sessions that I've had with different
remote browsers. What we also get is a
live URL. So this live view URL lets us
click on it and we can go watch the
browser agent actually interact with
stuff. So that's what you guys saw me
click on earlier in Slack. So all I'm
doing here is I'm sending myself that
URL. And then the last thing we have to
do is we have to give back the session
and window ID to the main agent. So we
do that with the set node where we
basically say okay here's the session ID
and here's the window ID. All right so
while editing this video I just realized
there was something that I wanted to
touch on in a little bit more detail
which is basically like what is going on
in the transfer of data between this
tool to start the browser and then down
here this actual workflow that starts
the browser and sends us back that
session and window ID. So what this tool
does is it lets us call on a different
NN workflow as a tool. And typically we
would have something like this in a
separate workflow that we would call on
and we would do that by choosing from
list and then we could just choose a
workflow that we had done from list like
let's say I had a tool right here to
grab profiles. I could hook it up and
then now data is being sent to that
tool. So if you want to call on a tool
that lives in the same workflow like we
see here all you have to do is you have
to do it by ID. So in your URL up here I
think it's cut off for you guys. It'll
be workflow slash and then a bunch of
like capital letters, lowercase letters
and numbers. You will basically just
copy that and then you'll come back into
this tool, change this to by ID, and
then you'll basically just paste that
right in there. And then you have the
workflow inputs and profile name to
actually map. So, just wanted to clear
that up. Hopefully, it's not too
confusing. You have the option to keep
it in this workflow or you can move it
into a separate workflow and call it
from a list. But there will also be a
setup guide kind of right over here that
explains everything you need to do to
connect yourself to this agent when you
download the template. So let's get back
to the
video. Okay, cool. So that's the start
browser tool. I'm going to go back into
the execution which was our actual
browser actions and we're back. So like
I said, browser agent calls this tool.
It starts the browser. We get session ID
and window ID. And now the agent's able
to figure out what do I have to do next.
So in this example, what it did next was
it typed. So it called its type tool.
And what happens here is really, really
simple. First of all, it sends over
session ID and window ID, which are
right up here. And this just says, okay,
browser agent, this is the session and
window that you're going to actually
type in. And then we have to tell it
basically, what are you going to type
and where do you type? So, as you can
see right here, the element description
is the search box that says search X and
the text is Google V3. And that's
exactly what we saw it do in that little
live window. And then this type tool
responds to the main agent and says type
operation completed. And then the agent
realized, okay, now that we've made that
search, I have to query the page to see
what those results look like. So I
called on this query tool. We do three
things. Once again, we're sending over
session ID, window ID, and then we're
sending over a prompt, which is what do
you actually extract from this page? And
the prompt was what are the search
results showing about Google V3? Please
summarize the most relevant points and
information. And then on the right hand
side, this tool responds back to the
main agent and says the search results
for Google V3. It highlights several key
points and and then it lists off all
those points. And then finally, before
it actually responds to the human with
those answers, it just has to end the
session. And that's really simple
because all we have to send over is a
session ID. All right, so that was one
quick example. Let's just go through
another example and we can see if it
uses any different tools and how that
all works. All right, so this time we're
saying find me good deals on laptops
from Best Buy. So, what I'm going to do
once again is just kind of move this guy
over so we can pull up our Slack so we
can watch it happen live again. It's
starting up that browser, which is
actually calling that workflow below us
that we just looked at. And what it's
going to do is send us over that live
link so that we can watch the agent take
action. Okay, so our browser session has
been started. We got a link and now
we're going to load that up. We can see
the agent right now is typing. So, we'll
see if it's typing for laptops or
whatever it does. It basically has full
autonomy to use whatever tool it needs
to do to get the job done.
Okay, so we can see the mouse just
clicked on the search bar. It typed in
laptop deals and then it went ahead and
hit enter. Okay, so the live view is a
little bit behind, but it loaded up a
new page and now we can see the agents
using its click tool. So we'll have to
take a look and see what it actually did
and it's actually using the load URL
tool. So we've got a couple new tools to
look at. So it looks like it pulled up a
page of 858 laptops and now it's quering
it once again. Okay, there we go. So
it's ending the session. The browser has
been disconnected. So, let's close out
and let's come over here and see what
the agent came up with. All right, there
we go. So, I found several good laptop
deals currently available at Best Buy.
Here are the most notable ones. So,
we've got a Microsoft Surface laptop,
latest model, which is 200 bucks off.
It's highly rated. It has 16 gigs of
memory, 512 gigs of storage, blah blah
blah. We have a Samsung Galaxy that is
142 bucks off. And we also have a Lenovo
Yoga, which is starting open box, 270
bucks. So, cool. Now, what we're going
to do is I want to look through the
actual agent logs so that we can see
what it did and why. So, I'm just going
to click on the agent. I'm going to
click on logs and let's take a look. So,
the first thing that it did was
obviously it started up the browser.
Then it called its type tool and it says
to type for laptop deals. So, that's
exactly what we saw happen. Then it used
its query tool and the prompt was what
are some of the best laptop deals
currently shown? Please list the prices.
And then the query tool responded and
said unable to satisfy object. the
webpage content provided does not
contain any specific information about
laptop deals, prices or descriptions. So
then what it did is it decided to click
and it decided to click on the link that
says laptops or laptop deals. So super
cool. So it basically clicked on that
link and it loaded up this new URL as
you can see this one right here about
laptops and then we go to the query tool
once again to ask that same prompt which
what are notable deals and then once we
actually get that answer back right here
from that query tool it knows to go
ahead and end the session. So, let's
take a quick look at what happens in the
click and the load URL tool since we
haven't looked at those yet. Real quick,
off the bat, there is a native
integration click tool right here for
airtop. But what happened when I was
using that is I kept getting these like
timeouts. So, I was able to use the HTTP
request to have more control over it.
And now I don't run into that issue with
timeouts. So, in the click one, we're
sending over session ID, window ID, and
then we're just saying what to click on.
So, right here, it says link that says
laptops or laptop deals. Oh, okay. And
one thing I actually missed is that it
didn't work. So that's why it had to go
load a URL because right here we can see
that it actually failed to click and
probably because there wasn't actually a
button that said one of these things. So
then it said, "Okay, that didn't work.
I'm going to just go load my own URL."
bestby.com/site/laptop/computers blah
blah blah. And then because that worked,
the agent was able to query through it.
And then of course before it responded
to us, it went ahead and ended that
session. Okay. So what's special about
browser automation is that it basically
spins up a remote browser. So, it looks
a lot more like a human's actually
controlling stuff because it moves the
mouse and it does all this stuff. And
so, what I showed you guys with the demo
with X is that I was already logged into
my account. So, what you can do is when
you come into Airtop, you'll click on
profiles and then in an actual profile,
which I just created, when you connect,
it basically will let you enter in a
website and you can log in with your
credentials and it saves them to that
profile. So, if you came in here and
saved your credentials for Amazon X,
Instagram, LinkedIn, whatever you wanted
to, you could then connect and do things
in a browser through an airtop agent. I
would just strongly encourage you though
to look into the policies for the
different websites and providers because
some of them are really strict on even
browser automation bots. So, just be
careful with that. But once you
basically create all of those different
credentials, this is the name of your
profile and that's what I'm passing over
when I start up the browser. So in here
when you see I'm starting up the browser
and I'm passing over a URL which is what
URL to load in and also a profile name
which in this case was Nate Herk because
that's the one that I made right over
here. But you don't necessarily need a
profile name. You only need to provide
this profile name if you want to access
a site that has credentials or has to
log in. So just as one final example
just to show you guys, I'm going to
completely remove the profile and it's
there's going to be nothing being passed
over and I'll show you that we're still
able to browser automate. So, like this
previous example right here with Best
Buy, we didn't have to log in to do so.
But just to show you guys that I'm not
lying right here, we're just going to
say find a Yeti water bottle on Google.
It's going to go ahead and start a
browser with no profile, no credentials
saved. The browser just got started. So,
let's load up that window. We can see
that it's going to type. So, it's going
to type in this Google search bar, Yeti water
water
bottle. There we go. Yeti water bottle.
Hit search. And now, it's going to use
its query tool to scan these results.
There we go. It's ending the session
because we found everything we needed
to. And then we will just basically see
that it worked. And keep in mind, I'm
not passing over any sort of profile
name. So, you only need to do that if
you're accessing a site where you
actually need to log in. Okay. So,
here's our results. They range from
sizes 18 oz to 64 ouncez. Prices ranging
from $20 to 65 bucks. Here are some
places you can get them. Blah blah blah.
All right. So, some final things to
think about real quick. The reason I'm
using a click https rather than the
native airtop node for the click tool, I
was running into some issues where it
was timing out. So, the click tool does
work, but for certain requests, it was
just timing out on me. So, not sure if
you guys will have a similar experience
or not, but that's why I switched it
out. And I'm sure you guys are probably
wondering about like different use cases
for this because I know it seems just
kind of fun and flashy, but there
definitely are good use cases for this.
Like I said, I'd encourage you to go to
Airtop and check out the NAN templates.
There's a ton of cool stuff like ICP
scoring, LinkedIn person ICP scoring,
monitoring competitor price changes,
monitoring job changes, all that kind of
stuff. And remember, you can use code
Nate half off for 50% off, 3 months.
Okay. And finally, when you get in here
and you need to set this up for
yourself, what do you actually do? First
of all, you'll have to connect a chat
model. So, I'm using Open Router and I'm
using Claude 3.5 Sonnet. I tested out
GPT41. I tested out some other ones. I
definitely found that 3.5 sonnet or the
higher power claw models worked the
best. So that's why I went with that
here. I also gave it access to a think
tool which is pretty cool because
sometimes it would get stuck and it
would come here and think about it.
Obviously it's better when it doesn't
have to use it but on some more complex
things it may have to use a think tool.
So it's good to have that there. And
then of course you'll have to come in
there and plug in your own airtop API
key which is super simple because of a
native node. You'd come in here create
new credential and just plug in an API
key. You could get that from airtop. If
you go to your dashboard and you go to
API keys right here and all you'd have
to do is click on create new key and
copy and paste it in. And finally, I'm
sure you guys are wondering a little bit
about the plans. So, you can start for
free and get 5,000 credits which will
last you quite a while. If I go to my
billing, you can see that I've done
about 2 hours of automation with this
just testing out and I just finally
passed over the 5,000 free limit. So, um
if you go in here and you can get 30,000
credits for $15 if you use my code, you
can get 100,000 credits for only 45
bucks if you use my code. And the other
thing here is you have different amounts
of simultaneous browsers. So you know
how I said you start up a session and
then the agent uses its tools and then
it ends the session. How that works is
back here you have browser sessions and
this keeps track of all of the different
sessions you have active. So if you're
on the free plan you can only do one
active at a time. If you're on the
starter plan you can have three running
at a time but that's it. And that's why
you have to make sure at the end that
you actually terminate them using this
tool right here, end session. Otherwise
they'll just keep being active forever.
Well, I'm sure they'll time out
eventually, but they'll stay active for
a while. And so, really, you should get
in here and you should customize the
prompt a little bit and play with the
different tools to fit your use case.
But this is a really good skeleton and a
really good place to get you started
rather than starting from scratch. And
you can download this entire template
for free by joining my free school
community. Link for that is down in the
description. When you download that,
there'll be a setup guide somewhere
right over here that will basically tell
you what you need to plug in and all of
the different API keys that you need to
connect. You'll also need to, of course,
connect something like Slack if you want
to grab those live URLs. But anyways,
that's going to do it for this video. If
you guys enjoyed this style video, then
definitely check out my paid community.
The link for that is down in the
description as well. It's a great
community filled with like-minded people
building with NN every day, sharing
their challenges, sharing their wins,
and it's a really cool place to be.
We've got two full courses, one called
Agent Zero on the foundations of AI
automation, and then a course called 10
hours to 10 seconds where you learn how
to identify, design, and build
time-saving automations. So, I'd love to
see you guys in those calls or in the
community. But if you enjoyed the video
and you learned something new, please
give it a like. definitely helps me out
a ton and I'll see you guys in the next
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.