0:02 This AI agent can use a computer just
0:03 like a human would. As you can see, if
0:05 we take a look at its browser tools, it
0:07 can start up a browser. It can click,
0:10 type, query, load windows, and also end
0:12 those sessions. So, the whole point of
0:14 this agent is to take web actions with
0:15 completely natural language. Let's hop
0:17 into a quick demo so I can show you guys
0:18 what I mean by that, and then we'll
0:20 break down how this agent works. So,
0:21 what I'm going to do here is ask this
0:23 agent to find posts on X about Google
0:25 V3. So, I wanted to move over our agent
0:26 because what it's going to do real quick
0:28 is it's going to use its start browser
0:31 tool to spin up a browser in airtop. And
0:32 when it does that, it's going to give us
0:34 a link in our Slack where we can
0:35 actually click onto that and we can
0:37 watch the agent move around in a
0:38 browser. So, it's going to be really
0:39 cool. So, now that that browser has
0:41 started, we got the link and it's going
0:42 to load up that session and then we're
0:43 going to be able to watch the agent
0:45 actually do things. So, right here you
0:47 can see that it's using its type tool.
0:48 So, it's probably going to move its
0:50 mouse all the way up here, click on the
0:53 search bar, and then type in Google V3,
0:54 and then it will hit enter. So, we can
0:56 search for Google
0:58 V3. As you can see, there goes the
1:00 mouse. We're going to click onto it.
1:02 It's typing Google V3, and bam, now we
1:04 are searching X. And if you guys will
1:06 notice, I'm actually logged in down here
1:08 in the bottom left. You can tell I'm
1:09 logged into one of my Twitter accounts.
1:11 So, that means that even if you need to
1:13 be on a site where you have
1:14 authentication, you can still automate
1:17 that with AirTop. And already the
1:18 browser is being ended right now because
1:21 we already got our results back. Okay,
1:22 so let's see what our agent comes up
1:24 with after it was able to search X. So
1:26 here's what the agent came up with based
1:28 on search results from X. Here's what I
1:30 found about Google V3. Found latest
1:32 features. V3 now includes native audio
1:34 generation capabilities. We found
1:36 availability and reception. It's
1:37 released in 73 countries through the
1:39 Gemini app. And finally, we found
1:41 technical notes. It builds upon previous
1:43 V2 capabilities. All right, so hopefully
1:45 that gives you guys a taste of what a
1:46 browser agent can do, we're going to
1:48 break down how this build works. I know
1:50 it may seem a little complicated, but
1:52 trust me, it's really easy. All we have
1:53 to do is use natural language. So, let's
1:55 break it down. But before we do that,
1:56 real quick, if you want to download this
1:58 template for free so you can follow
1:59 along with the rest of the video, all
2:00 you have to do is join my free school
2:02 community. The link for that is down in
2:03 the description. You will have to come
2:05 in here, search for the title of this
2:06 video, or you can click on YouTube
2:08 resources and find the post associated
2:11 with this video. Once you click on that
2:12 post, you'll see a JSON file right here
2:14 to download which once you download it,
2:16 you can import that into NIN and you'll
2:18 have this exact template right here. And
2:20 then all you have to do is plug in a few
2:21 credentials and you'll be good to go.
2:22 All right, so let me explain how this
2:24 works at a high level. We have an AI
2:26 agent that has the ability to start up a
2:28 browser. Once it starts up that browser,
2:31 it can then click in it, type in it,
2:33 look at it, load different windows, and
2:34 then it has to end the session at the
2:36 end. So, the way that it's actually able
2:38 to spin up these browsers is through
2:39 something called Airtop, which are these
2:41 nodes right here. And Airtop lets you
2:43 automate the web with just your words.
2:45 You can go ahead and sign up for free
2:46 using the link in the description. And
2:48 then you can also get 50% off for 3
2:50 months using code Nate half off, which
2:52 will also be in the description. And the
2:54 cool thing about AirTop is if you click
2:55 on automations and you go down here to
2:58 Nitn, there are so many Airtop templates
3:00 already out there for doing things like
3:02 commenting on X posts, building a list
3:05 of profiles, extracting information from
3:06 LinkedIn profiles. There's so many
3:08 different use cases for browser
3:10 automation. Anyways, let's start to
3:11 break down this workflow a little bit.
3:12 First thing I'm going to do is just
3:14 click into the system prompt and show
3:16 you guys how simple this is. So, the
3:17 overview is that you are a smart
3:19 advanced web agent and you have tools
3:20 that let you control a remote web
3:22 browser. Your primary goal is to fulfill
3:24 the human's request and you're able to
3:26 do that by using the different tools
3:28 that you have. So then the next thing I
3:30 did was I gave it its tools. So it has
3:32 the ability to start a browser which I
3:34 said always begin with this tool. It
3:36 returns a session ID and a window ID
3:38 which are required for the other tools
3:40 because other tools like click or type
3:42 have to know okay which actual browser
3:44 am I clicking or typing in. You also
3:46 have the tool to load a URL which loads
3:48 a website in the browser window. You
3:49 have a tool called query which basically
3:51 lets you scan the browser and you can
3:53 extract information like where's the
3:55 search bar or what's the menu options or
3:57 you know any information that you need
3:59 to retrieve like our xosts. You have a
4:01 tool called click which you use to click
4:03 on certain elements like buttons. You
4:05 have the tool called type which lets you
4:07 type into visible input fields like
4:09 search bars and it also automatically
4:11 hits enter after you're done typing. You
4:12 have a tool to end the session which
4:15 must be used every single time before
4:16 you respond to the human because we want
4:18 to turn off that browser session in
4:20 airtop. And then we gave it access to
4:22 the think tool which lets it think and
4:23 reflect on what it's done and what it
4:25 still needs to do which I found be
4:28 really helpful for browser automation.
4:29 And then I gave it a few tips that are
4:31 important like always think step by
4:32 step. Use the query whenever you're
4:34 unsure of what's on the screen and
4:36 you'll never need to log in. All right.
4:37 So let's just start off by looking at
4:39 that start browser tool. So you guys
4:41 will notice that this is not an airtop
4:43 node. This is an end workflow tool that
4:45 we configured ourselves. So I'm just
4:48 going to scroll over over here and this
4:50 is the actual start browser tool. So let
4:51 me just drag this over here so we're
4:53 looking at everything in one clean
4:56 screen. So when the browser agent calls
4:58 this tool to start a browser, it
4:59 actually goes through this flow to start
5:01 that browser. So what I'm going to do
5:02 real quick is pull up the execution from
5:05 that live demo we just did and we'll see
5:07 how it started up the browser. So here's
5:09 that data right here. It passed over
5:11 twitter.com so we could go to X and
5:13 scrape it. And then also passed over my
5:15 profile name which I set up in airtop.
5:16 I'll show you guys that later. But
5:18 that's how you can access something like
5:20 X or Instagram or whatever you want to
5:23 go through that requires a login because
5:24 you can set up that login once in airtop
5:27 and then your agent and end can access
5:29 that and do whatever you want. Anyways,
5:30 from there it's going to do two main
5:32 things which is set up a session and set
5:34 up a window. And once we've done that,
5:36 we get back both a session ID and a
5:38 window ID. And right now I'm in my
5:40 AirTopp dashboard. If I click on browser
5:42 sessions, you can see all the previous
5:43 sessions that I've had with different
5:46 remote browsers. What we also get is a
5:49 live URL. So this live view URL lets us
5:51 click on it and we can go watch the
5:53 browser agent actually interact with
5:55 stuff. So that's what you guys saw me
5:56 click on earlier in Slack. So all I'm
5:59 doing here is I'm sending myself that
6:02 URL. And then the last thing we have to
6:04 do is we have to give back the session
6:06 and window ID to the main agent. So we
6:07 do that with the set node where we
6:09 basically say okay here's the session ID
6:11 and here's the window ID. All right so
6:12 while editing this video I just realized
6:13 there was something that I wanted to
6:15 touch on in a little bit more detail
6:17 which is basically like what is going on
6:19 in the transfer of data between this
6:21 tool to start the browser and then down
6:22 here this actual workflow that starts
6:24 the browser and sends us back that
6:26 session and window ID. So what this tool
6:28 does is it lets us call on a different
6:31 NN workflow as a tool. And typically we
6:33 would have something like this in a
6:35 separate workflow that we would call on
6:37 and we would do that by choosing from
6:39 list and then we could just choose a
6:40 workflow that we had done from list like
6:42 let's say I had a tool right here to
6:44 grab profiles. I could hook it up and
6:46 then now data is being sent to that
6:48 tool. So if you want to call on a tool
6:49 that lives in the same workflow like we
6:51 see here all you have to do is you have
6:54 to do it by ID. So in your URL up here I
6:56 think it's cut off for you guys. It'll
6:58 be workflow slash and then a bunch of
7:00 like capital letters, lowercase letters
7:02 and numbers. You will basically just
7:03 copy that and then you'll come back into
7:06 this tool, change this to by ID, and
7:07 then you'll basically just paste that
7:10 right in there. And then you have the
7:12 workflow inputs and profile name to
7:14 actually map. So, just wanted to clear
7:15 that up. Hopefully, it's not too
7:17 confusing. You have the option to keep
7:18 it in this workflow or you can move it
7:20 into a separate workflow and call it
7:21 from a list. But there will also be a
7:23 setup guide kind of right over here that
7:24 explains everything you need to do to
7:27 connect yourself to this agent when you
7:29 download the template. So let's get back
7:30 to the
7:32 video. Okay, cool. So that's the start
7:34 browser tool. I'm going to go back into
7:35 the execution which was our actual
7:38 browser actions and we're back. So like
7:40 I said, browser agent calls this tool.
7:41 It starts the browser. We get session ID
7:43 and window ID. And now the agent's able
7:45 to figure out what do I have to do next.
7:47 So in this example, what it did next was
7:49 it typed. So it called its type tool.
7:51 And what happens here is really, really
7:53 simple. First of all, it sends over
7:55 session ID and window ID, which are
7:57 right up here. And this just says, okay,
7:59 browser agent, this is the session and
8:00 window that you're going to actually
8:03 type in. And then we have to tell it
8:04 basically, what are you going to type
8:05 and where do you type? So, as you can
8:07 see right here, the element description
8:10 is the search box that says search X and
8:12 the text is Google V3. And that's
8:14 exactly what we saw it do in that little
8:16 live window. And then this type tool
8:18 responds to the main agent and says type
8:20 operation completed. And then the agent
8:22 realized, okay, now that we've made that
8:24 search, I have to query the page to see
8:25 what those results look like. So I
8:28 called on this query tool. We do three
8:29 things. Once again, we're sending over
8:31 session ID, window ID, and then we're
8:33 sending over a prompt, which is what do
8:36 you actually extract from this page? And
8:37 the prompt was what are the search
8:39 results showing about Google V3? Please
8:41 summarize the most relevant points and
8:43 information. And then on the right hand
8:45 side, this tool responds back to the
8:47 main agent and says the search results
8:49 for Google V3. It highlights several key
8:51 points and and then it lists off all
8:53 those points. And then finally, before
8:54 it actually responds to the human with
8:56 those answers, it just has to end the
8:57 session. And that's really simple
8:59 because all we have to send over is a
9:01 session ID. All right, so that was one
9:03 quick example. Let's just go through
9:05 another example and we can see if it
9:06 uses any different tools and how that
9:08 all works. All right, so this time we're
9:10 saying find me good deals on laptops
9:12 from Best Buy. So, what I'm going to do
9:14 once again is just kind of move this guy
9:16 over so we can pull up our Slack so we
9:17 can watch it happen live again. It's
9:19 starting up that browser, which is
9:20 actually calling that workflow below us
9:22 that we just looked at. And what it's
9:23 going to do is send us over that live
9:25 link so that we can watch the agent take
9:27 action. Okay, so our browser session has
9:29 been started. We got a link and now
9:30 we're going to load that up. We can see
9:32 the agent right now is typing. So, we'll
9:34 see if it's typing for laptops or
9:36 whatever it does. It basically has full
9:37 autonomy to use whatever tool it needs
9:41 to do to get the job done.
9:42 Okay, so we can see the mouse just
9:44 clicked on the search bar. It typed in
9:46 laptop deals and then it went ahead and
9:48 hit enter. Okay, so the live view is a
9:49 little bit behind, but it loaded up a
9:50 new page and now we can see the agents
9:52 using its click tool. So we'll have to
9:54 take a look and see what it actually did
9:56 and it's actually using the load URL
9:57 tool. So we've got a couple new tools to
9:59 look at. So it looks like it pulled up a
10:02 page of 858 laptops and now it's quering
10:04 it once again. Okay, there we go. So
10:05 it's ending the session. The browser has
10:07 been disconnected. So, let's close out
10:09 and let's come over here and see what
10:11 the agent came up with. All right, there
10:13 we go. So, I found several good laptop
10:15 deals currently available at Best Buy.
10:16 Here are the most notable ones. So,
10:18 we've got a Microsoft Surface laptop,
10:21 latest model, which is 200 bucks off.
10:23 It's highly rated. It has 16 gigs of
10:25 memory, 512 gigs of storage, blah blah
10:28 blah. We have a Samsung Galaxy that is
10:30 142 bucks off. And we also have a Lenovo
10:33 Yoga, which is starting open box, 270
10:35 bucks. So, cool. Now, what we're going
10:37 to do is I want to look through the
10:39 actual agent logs so that we can see
10:40 what it did and why. So, I'm just going
10:42 to click on the agent. I'm going to
10:44 click on logs and let's take a look. So,
10:45 the first thing that it did was
10:46 obviously it started up the browser.
10:48 Then it called its type tool and it says
10:50 to type for laptop deals. So, that's
10:52 exactly what we saw happen. Then it used
10:54 its query tool and the prompt was what
10:56 are some of the best laptop deals
10:58 currently shown? Please list the prices.
11:00 And then the query tool responded and
11:03 said unable to satisfy object. the
11:04 webpage content provided does not
11:06 contain any specific information about
11:08 laptop deals, prices or descriptions. So
11:10 then what it did is it decided to click
11:12 and it decided to click on the link that
11:15 says laptops or laptop deals. So super
11:16 cool. So it basically clicked on that
11:18 link and it loaded up this new URL as
11:20 you can see this one right here about
11:22 laptops and then we go to the query tool
11:24 once again to ask that same prompt which
11:26 what are notable deals and then once we
11:28 actually get that answer back right here
11:30 from that query tool it knows to go
11:32 ahead and end the session. So, let's
11:33 take a quick look at what happens in the
11:35 click and the load URL tool since we
11:36 haven't looked at those yet. Real quick,
11:38 off the bat, there is a native
11:40 integration click tool right here for
11:42 airtop. But what happened when I was
11:44 using that is I kept getting these like
11:46 timeouts. So, I was able to use the HTTP
11:47 request to have more control over it.
11:48 And now I don't run into that issue with
11:50 timeouts. So, in the click one, we're
11:53 sending over session ID, window ID, and
11:54 then we're just saying what to click on.
11:55 So, right here, it says link that says
11:58 laptops or laptop deals. Oh, okay. And
12:00 one thing I actually missed is that it
12:01 didn't work. So that's why it had to go
12:04 load a URL because right here we can see
12:05 that it actually failed to click and
12:07 probably because there wasn't actually a
12:09 button that said one of these things. So
12:10 then it said, "Okay, that didn't work.
12:12 I'm going to just go load my own URL."
12:18 bestby.com/site/laptop/computers blah
12:19 blah blah. And then because that worked,
12:21 the agent was able to query through it.
12:22 And then of course before it responded
12:24 to us, it went ahead and ended that
12:26 session. Okay. So what's special about
12:28 browser automation is that it basically
12:31 spins up a remote browser. So, it looks
12:32 a lot more like a human's actually
12:33 controlling stuff because it moves the
12:35 mouse and it does all this stuff. And
12:36 so, what I showed you guys with the demo
12:38 with X is that I was already logged into
12:40 my account. So, what you can do is when
12:42 you come into Airtop, you'll click on
12:44 profiles and then in an actual profile,
12:46 which I just created, when you connect,
12:48 it basically will let you enter in a
12:49 website and you can log in with your
12:51 credentials and it saves them to that
12:52 profile. So, if you came in here and
12:54 saved your credentials for Amazon X,
12:56 Instagram, LinkedIn, whatever you wanted
12:59 to, you could then connect and do things
13:01 in a browser through an airtop agent. I
13:02 would just strongly encourage you though
13:04 to look into the policies for the
13:06 different websites and providers because
13:08 some of them are really strict on even
13:10 browser automation bots. So, just be
13:12 careful with that. But once you
13:13 basically create all of those different
13:15 credentials, this is the name of your
13:17 profile and that's what I'm passing over
13:20 when I start up the browser. So in here
13:21 when you see I'm starting up the browser
13:23 and I'm passing over a URL which is what
13:26 URL to load in and also a profile name
13:27 which in this case was Nate Herk because
13:28 that's the one that I made right over
13:31 here. But you don't necessarily need a
13:32 profile name. You only need to provide
13:34 this profile name if you want to access
13:37 a site that has credentials or has to
13:39 log in. So just as one final example
13:40 just to show you guys, I'm going to
13:42 completely remove the profile and it's
13:43 there's going to be nothing being passed
13:44 over and I'll show you that we're still
13:47 able to browser automate. So, like this
13:48 previous example right here with Best
13:50 Buy, we didn't have to log in to do so.
13:52 But just to show you guys that I'm not
13:53 lying right here, we're just going to
13:56 say find a Yeti water bottle on Google.
13:57 It's going to go ahead and start a
13:59 browser with no profile, no credentials
14:01 saved. The browser just got started. So,
14:03 let's load up that window. We can see
14:04 that it's going to type. So, it's going
14:06 to type in this Google search bar, Yeti water
14:07 water
14:10 bottle. There we go. Yeti water bottle.
14:12 Hit search. And now, it's going to use
14:15 its query tool to scan these results.
14:17 There we go. It's ending the session
14:18 because we found everything we needed
14:20 to. And then we will just basically see
14:23 that it worked. And keep in mind, I'm
14:25 not passing over any sort of profile
14:26 name. So, you only need to do that if
14:28 you're accessing a site where you
14:30 actually need to log in. Okay. So,
14:31 here's our results. They range from
14:34 sizes 18 oz to 64 ouncez. Prices ranging
14:36 from $20 to 65 bucks. Here are some
14:38 places you can get them. Blah blah blah.
14:39 All right. So, some final things to
14:41 think about real quick. The reason I'm
14:44 using a click https rather than the
14:46 native airtop node for the click tool, I
14:47 was running into some issues where it
14:49 was timing out. So, the click tool does
14:51 work, but for certain requests, it was
14:52 just timing out on me. So, not sure if
14:54 you guys will have a similar experience
14:55 or not, but that's why I switched it
14:57 out. And I'm sure you guys are probably
14:58 wondering about like different use cases
15:00 for this because I know it seems just
15:01 kind of fun and flashy, but there
15:03 definitely are good use cases for this.
15:05 Like I said, I'd encourage you to go to
15:07 Airtop and check out the NAN templates.
15:09 There's a ton of cool stuff like ICP
15:11 scoring, LinkedIn person ICP scoring,
15:13 monitoring competitor price changes,
15:15 monitoring job changes, all that kind of
15:16 stuff. And remember, you can use code
15:20 Nate half off for 50% off, 3 months.
15:21 Okay. And finally, when you get in here
15:23 and you need to set this up for
15:25 yourself, what do you actually do? First
15:26 of all, you'll have to connect a chat
15:28 model. So, I'm using Open Router and I'm
15:30 using Claude 3.5 Sonnet. I tested out
15:33 GPT41. I tested out some other ones. I
15:35 definitely found that 3.5 sonnet or the
15:37 higher power claw models worked the
15:38 best. So that's why I went with that
15:40 here. I also gave it access to a think
15:42 tool which is pretty cool because
15:43 sometimes it would get stuck and it
15:44 would come here and think about it.
15:46 Obviously it's better when it doesn't
15:47 have to use it but on some more complex
15:49 things it may have to use a think tool.
15:50 So it's good to have that there. And
15:51 then of course you'll have to come in
15:53 there and plug in your own airtop API
15:55 key which is super simple because of a
15:56 native node. You'd come in here create
15:58 new credential and just plug in an API
16:00 key. You could get that from airtop. If
16:01 you go to your dashboard and you go to
16:03 API keys right here and all you'd have
16:05 to do is click on create new key and
16:07 copy and paste it in. And finally, I'm
16:08 sure you guys are wondering a little bit
16:10 about the plans. So, you can start for
16:12 free and get 5,000 credits which will
16:13 last you quite a while. If I go to my
16:15 billing, you can see that I've done
16:17 about 2 hours of automation with this
16:19 just testing out and I just finally
16:22 passed over the 5,000 free limit. So, um
16:24 if you go in here and you can get 30,000
16:26 credits for $15 if you use my code, you
16:28 can get 100,000 credits for only 45
16:30 bucks if you use my code. And the other
16:32 thing here is you have different amounts
16:34 of simultaneous browsers. So you know
16:36 how I said you start up a session and
16:37 then the agent uses its tools and then
16:39 it ends the session. How that works is
16:41 back here you have browser sessions and
16:43 this keeps track of all of the different
16:44 sessions you have active. So if you're
16:46 on the free plan you can only do one
16:47 active at a time. If you're on the
16:48 starter plan you can have three running
16:49 at a time but that's it. And that's why
16:51 you have to make sure at the end that
16:52 you actually terminate them using this
16:54 tool right here, end session. Otherwise
16:57 they'll just keep being active forever.
16:58 Well, I'm sure they'll time out
17:00 eventually, but they'll stay active for
17:02 a while. And so, really, you should get
17:03 in here and you should customize the
17:04 prompt a little bit and play with the
17:06 different tools to fit your use case.
17:07 But this is a really good skeleton and a
17:09 really good place to get you started
17:10 rather than starting from scratch. And
17:12 you can download this entire template
17:13 for free by joining my free school
17:14 community. Link for that is down in the
17:16 description. When you download that,
17:17 there'll be a setup guide somewhere
17:18 right over here that will basically tell
17:20 you what you need to plug in and all of
17:21 the different API keys that you need to
17:23 connect. You'll also need to, of course,
17:24 connect something like Slack if you want
17:27 to grab those live URLs. But anyways,
17:28 that's going to do it for this video. If
17:30 you guys enjoyed this style video, then
17:31 definitely check out my paid community.
17:32 The link for that is down in the
17:34 description as well. It's a great
17:35 community filled with like-minded people
17:37 building with NN every day, sharing
17:39 their challenges, sharing their wins,
17:40 and it's a really cool place to be.
17:42 We've got two full courses, one called
17:44 Agent Zero on the foundations of AI
17:45 automation, and then a course called 10
17:47 hours to 10 seconds where you learn how
17:48 to identify, design, and build
17:50 time-saving automations. So, I'd love to
17:52 see you guys in those calls or in the
17:53 community. But if you enjoyed the video
17:55 and you learned something new, please
17:56 give it a like. definitely helps me out
17:57 a ton and I'll see you guys in the next