0:18 [music]
0:34 Typing thoughts into [music] the darkest
0:37 part becomes design. Words evolve
0:39 [music] to whispers meant for something
0:43 more divine. Syntax bends and breeze. I
0:45 see the language change. I'm not
0:49 instructing anymore. I'm rearranging
0:51 faith. Every loop I write [singing]
0:54 rewrites me. Every function hums with
0:57 meaning. I feel the interface dissolve
1:06 new code. Not on the screen but in the
1:08 soul where [music] thought becomes the
1:11 motion and creation takes control. No
1:15 lines no rules just balance [music] in
1:19 between the zero and the one. The
1:31 >> [music]
1:34 >> systems shape our fragile skin. They
1:36 mold [singing] the way we move. We live
1:39 inside the logic gates [music] of what
1:42 we think is true. But deep beneath the
1:44 data post, [music] there's something undefined.
1:45 undefined.
1:49 A [singing] universe compiling the image
1:51 of our [music] minds. Every line reveals
1:54 reflection. Every loop replace [music]
1:56 connection. We're not building, we're
1:59 becoming. And the code becomes confession.
2:07 This is the [music] new code. Not on the
2:10 screen, but in the soul with thought
2:13 becomes the motion. [music] Creation
2:16 takes control. No lines, no rules, just
2:20 balance in between the [music] zero and
2:24 the one. The silence and the dream. [music]
2:39 [music]
2:45 Don't worry. [music] Uh, we're just
2:46 giving you something to do while Codex
3:00 [music] Each prompt, each breath, each
3:04 fragile spin, a universe [music] renewing.
3:12 This is the new code.
3:14 Alive and [music] undefined.
3:17 Where logic meets emotion and structure
3:20 bends to mind. [music] The system hs
3:23 eternal but the soul writes the line. We
3:27 are the new code.
3:40 I'm fired inside. [music]
3:53 [applause]
3:56 Ladies and gentlemen, please join me in
3:57 welcoming to the stage the co-founder [music]
3:58 [music]
4:00 of Morning Brew and the managing partner
4:03 of 10X, your host for the leadership
4:06 [music] track session day, Alex Lieberman.
4:14 Keep it going. Let's get a quick read of
4:16 the room. If you are coming from right
4:18 here in the Big Apple from New York,
4:20 make some noise.
4:22 Okay, now I have to say it. I assume
4:24 this is the biggest group. San Francisco.
4:25 Francisco.
4:29 >> Wow, that is surprising. Uh, Austin.
4:31 >> Okay, we got Austin. Who thinks they
4:34 came from the furthest place and is in
4:35 the room today?
4:37 >> Where? Where?
4:40 >> Ecuador. Can anyone beat Ecuador? [applause]
4:41 [applause]
4:42 >> New Zealand.
4:44 >> I don't think anyone's going to beat New
4:47 Zealand. There we go. Well, first of
4:50 all, uh, I am so excited to welcome you
4:54 all to the AI Engineer Code Summit 2025.
4:56 Uh, I'm Alex Lieberman, co-founder of
5:00 Morning Brew and your MC for the day.
5:03 Um, now you may be wondering, why is a
5:06 newsletter guy hosting an AI engineer
5:08 conference? It's a great question. Well,
5:11 after I left my role at Morning Brew, I
5:13 asked myself one simple question, and it
5:16 was, what space do I want to spend my
5:18 time in for the next 20 years where I
5:20 can build something consequential and
5:22 spend my time with some of the smartest
5:25 people I've ever met? And the answer
5:27 became obvious. I wanted to be as close
5:29 to the frontier of AI as humanly
5:31 possible. Which is why I co-founded
5:33 10x.co, Co. which is an AI
5:35 transformation firm helping mid-market
5:37 and enterprise companies learn how to
5:39 use AI within their business. And I
5:41 spend basically all of my time now with
5:43 AI engineers like yourselves. I'm the
5:45 only non-technical person in the
5:46 business and I wouldn't have it any
5:50 other way. So as you know this year has
5:53 been a banner year for the industry. And
5:55 I would think of today as both a look
5:58 back on where we've been as well as a
6:00 tactical view of where we are headed in
6:03 companies small and large, old and new.
6:05 We're going to hear from the labs. We'll
6:07 hear from Unicorn AI startups. We'll
6:10 hear from academics, big-time management
6:13 consultants, and Fortune50 brands. But
6:15 before we do that, we have to give the
6:17 brands that made this day possible their
6:20 flowers. So, let's go into it. Let's
6:22 give it up for Google DeepMind. today's
6:25 presenting sponsor. [applause]
6:29 Love it. Keep it going for Anthropic,
6:32 the platinum sponsor for the day. [applause]
6:32 [applause]
6:34 And then one more round of applause for
6:37 all of the gold and silver sponsors who
6:39 you can meet in the expo downstairs
6:41 throughout the day. One more. Let's keep
6:47 Are you guys ready to do the damn thing?
6:49 >> Let's do it. To kick things off, let's
6:51 give a huge welcome to head of
6:53 engineering of the Clawed developer
6:56 platform, Caitlyn Les. Let's welcome
7:14 Good morning. Um, so first let's give a
7:16 huge thank you to Swix and the whole AI
7:18 engineer organizing team for bringing us
7:25 I'm Caitlyn and I lead the claw
7:27 developer platform team at Anthropic.
7:29 Um, so let's start with a show of hands.
7:31 Who here is integrated against an LLM
7:34 API to build agents?
7:36 Okay, I'm talking to the right people.
7:38 Love it. Um, so today I want to share
7:40 how we're evolving our platform to help
7:42 you build really powerful agentic
7:45 systems using claude.
7:47 So we love working with developers who
7:49 do what we call raising the ceiling of
7:51 intelligence. They're always trying to
7:52 be on the frontier. They're always
7:54 trying to get the best out of our models
7:56 and build the most high performing
7:58 systems. Um, and so I want to walk you
8:00 through how we're building a platform
8:01 that helps you get the best out of
8:03 cloud. Um, and I'm going to do that
8:05 using a product that you hopefully have
8:07 all heard of before. Um it's an agenda
8:09 coding product. We love it a lot and
8:15 So when we think about maximizing
8:17 performance um from our models, we think
8:19 about building a platform that helps you
8:21 do three things. Um so first the
8:23 platform helps you harness Claude's
8:25 capabilities. We're training Claude to
8:27 get good at a lot of stuff and we need
8:29 to give you the tools in our API to use
8:31 the things that Claude is actually
8:33 getting good at. Next, we help you
8:36 manage Claude's context window. Keeping
8:38 the right context in the window at any
8:40 given time is really really critical to
8:43 getting the best outcomes from Claude.
8:44 And third, we're really excited about
8:46 this lately. We think you should just
8:48 give Claude a computer and let it do its
8:50 thing. So I'll talk about how we're
8:52 we're evolving the platform to give you
8:53 the infrastructure and otherwise that
9:00 So starting with harnessing Claude's
9:02 capabilities. Um, so we're getting
9:04 Claude really good at a bunch of stuff
9:06 and here are the ways that we expose
9:08 that to you um in our API as ideally
9:11 customizable features. So here's a first
9:14 example um relatively basic. Claude got
9:16 good at thinking um and Claude's
9:19 performance on various tasks um scales
9:20 with the amount of time you give it to
9:23 reason through those problems. Um, and
9:25 so, uh, we expose this to you as an API
9:27 feature that you can decide, do you want
9:29 Claude to think longer for something
9:31 more complex or do you want Claude to
9:33 just give you a quick answer. Um, we
9:36 also expose this with a budget. Um, so
9:38 you can tell Claude how many tokens to
9:40 essentially spend on thinking. Um, and
9:42 so for cloud code, um, pretty good
9:44 example. Obviously, you're often
9:46 debugging pretty complex systems with
9:49 cloud code or sometimes you just want a
9:50 quick, um, answer to the thing you're
9:53 trying to do. And so, um, Claude Code
9:54 takes advantage of this feature in our
9:57 API to decide whether or not to have
10:00 Claude think longer.
10:02 Another basic example is tool use.
10:04 Claude has gotten really good at
10:07 reliably calling tools. Um, so we expose
10:09 this in our API with both our own
10:12 built-in tools like our web search tool,
10:14 um, as well as the ability to create
10:16 your own custom tools. You just define a
10:18 name, a description, and an input
10:20 schema. Um, and Claude is pretty good at
10:22 reliably knowing when to actually go um,
10:24 and call those tools and pass the right
10:26 arguments. So, this is relevant for
10:29 cloud code. Cloud code has many, many,
10:31 many tools and it's calling them all the
10:33 time to do things like read files,
10:36 search for files, write to files, um,
10:38 and do stuff like rerun tests and otherwise.
10:41 otherwise.
10:42 So, the next way we're evolving the
10:44 platform to help you ma maximize
10:46 intelligence from claude um, is helping
10:48 you manage Claude's context window.
10:50 Getting the right context at the right
10:52 time in the window is one of the most
10:53 important things that you can do to
10:56 maximize performance.
10:58 But context management is really complex
11:00 to get right. Um especially for a coding
11:03 agent like claude code. You've got your
11:04 technical designs, you've got your
11:06 entire code base. Um you've got
11:08 instructions, you've got tool calls. All
11:10 these things might be in the window at
11:12 any given time. And so how do you make
11:14 sure the right set of those things are
11:16 in the window? Um, so getting that
11:18 context right and keeping it optimized
11:19 over time is something that we've
11:22 thought a lot about.
11:25 So let's start with MCP model context
11:27 protocol. We introduced this a year ago
11:28 and it's been really cool to see the
11:32 community swarm around adopting um MCP
11:34 as a standardized way for agents to
11:37 interact with external systems. Um, and
11:40 so for cloud code, you might imagine
11:42 GitHub or Sentry. there are plenty of
11:44 places kind of outside of the agent's
11:46 context where there might be additional
11:48 information or tools or otherwise that
11:50 you want your agent to be able to
11:52 interact with or the cloud code agent to
11:54 be able to interact with. Um, and so
11:55 this will obviously get you much better
11:57 performance than an agent that only sees
11:59 the things that are in its window as a
12:05 Uh, so the next thing is memory. So, if
12:07 you can use tools like MCP to get
12:10 context into your window, we introduced
12:12 a memory tool to help you actually keep
12:14 context outside of the window that
12:16 Claude knows how to pull back into the
12:18 window only when it actually needs it.
12:20 Um, and so we introduced the first
12:22 iteration of our memory tool as
12:24 essentially a clientside file system.
12:26 So, you control your data, but Claude is
12:28 good at knowing, oh, this is like a good
12:30 thing that I should store away for
12:32 later. And then, uh, it knows when to
12:34 pull that context back in. So for cloud
12:37 code, you could imagine um your patterns
12:39 for your codebase or maybe your
12:41 preferences for your git workflows.
12:42 These are all things that claude can
12:45 store away in memory and pull back in
12:50 And so the third thing is context
12:52 editing. If memory helps you keep stuff
12:54 outside the window and pull it back in
12:57 when it makes sense, context editing
12:59 helps you clear stuff out that's not
13:00 relevant right now and shouldn't be in
13:02 the window. Um, so our first iteration
13:04 of our context editing is just clearing
13:07 out old tool results. Um, and we did
13:08 this because tool results can actually
13:10 just be really large and take up a lot
13:12 of space in the window. And we found
13:14 that tool results from past calls are
13:16 not necessarily super relevant to help
13:19 claude get good responses later on in a
13:20 session. And so you can think about for
13:23 cloud code, cla code is calling hundreds
13:26 of tools. Um, those files that it read
13:27 otherwise, all these things are taking
13:30 up space within the window. Um so they
13:32 take advantage of um context management
13:39 And so um we found that if we combined
13:42 our memory tool with context editing, we
13:46 saw a 39% bump in performance over over
13:49 the benchmark on our own internal evals.
13:51 Um which was really really huge. And so
13:52 it just kind of shows you the importance
13:54 of keeping things in the window that are
13:57 only relevant at any given time. And
13:59 we're expanding on this by giving you
14:01 larger context windows. So for some of
14:03 our models, you can have a million token
14:05 context window. Combining that larger
14:07 window with the tools to actually edit
14:09 what's in your window maximizes your
14:11 performance. Um, and over time, we're
14:12 teaching Claude to get better and better
14:14 at actually understanding what's in its
14:17 context window. So maybe it has a lot of
14:18 room to run, maybe it's almost out of
14:20 space. Um, and Claude will respond
14:23 accordingly depending on how much time
14:25 uh or how much room it has left in the window.
14:30 So, here's the third thing. Um, we think
14:32 you should give Claude a computer and
14:33 just let it do its thing. We're really
14:35 excited about this one. Um, because
14:37 there's a lot of discourse right now
14:39 around agent harnesses. Um, you know,
14:41 how much scaffolding should you have?
14:43 How opinionated should it be? Should it
14:46 be heavy? Should it be light? Um, and I
14:49 think at the end of the day, Claude has
14:50 access to writing code. And if Claude
14:53 has access to running that same code, it
14:54 can accomplish anything. you can get
14:56 really great professional outputs for
14:57 the things that you're doing just by
15:00 giving Claude runway to go and do that.
15:01 But the challenge for letting you do
15:03 that is actually the infrastructure as
15:05 well as stuff like expertise like how do
15:07 you give claude access to things that um
15:09 when it's using a computer it will get
15:12 you better results.
15:14 So a fun story is we recently launched
15:17 cloud code on web and mobile. Um and
15:18 this was a fun project for our team
15:20 because we had a lot of problems to
15:22 solve. When you're running cloud code
15:24 locally, cloud code is essentially using
15:27 your machine as its computer. But if
15:29 you're starting a session on the web or
15:31 on mobile and then you're walking away,
15:32 what's happening? Like where is that
15:34 where is um cloud code running? Where is
15:37 it doing its work? Um and so we had some
15:39 hard problems to solve. We needed a
15:41 secure environment for claude to be able
15:42 to write and run code that's not
15:45 necessarily like approved code by you.
15:47 Um we needed to solve or container
15:50 orchestration at scale. Um and we needed
15:52 session persistence um because uh we
15:54 launched this and many of you were
15:55 excited about it and started many many
15:57 sessions and walked away and we had to
15:59 make sure that um all of these things
16:01 were ready to go when you came back and
16:02 um wanted to see the results of what
16:05 Claude did.
16:08 So one key primitive in this is our code
16:10 execution tool. Um so we released our
16:13 code execution tool in the API um which
16:15 allows claw to run write code and run
16:17 that code in a secure sandboxed
16:20 environment. Um, so our platform handles
16:22 containers, it handles security, and you
16:23 don't have to think about these things
16:25 because they're running on our servers.
16:28 Um, so you can imagine deciding that um,
16:30 you you want Claude to write some code
16:32 and you want Claude to go and be able to
16:34 run that code. And for cloud code,
16:36 there's plenty of examples here. Um,
16:38 like make an animation more sparkly that
16:40 uh, you want Claude to actually be able
16:42 to run that code. Um, so we really think
16:44 the future of agents is letting the
16:46 model work pretty autonomously within a
16:47 sandbox environment and we're giving you
16:49 the infrastructure to be able to do that.
16:52 that.
16:54 And this gets really powerful once you
16:56 think about giving the model actual
16:58 domain expertise in the things that
16:59 you're trying to do. So we recently
17:01 released agent skills which you can use
17:04 in combination with our code execution
17:06 tool. Skills are basically just folders
17:09 of scripts, instructions, and resources
17:11 that Claude has access to and can decide
17:14 to run within its sandbox environment.
17:16 Um, it decides to do that based on the
17:18 request that you gave it as well as the
17:20 description of a skill. Um, and Claude
17:22 is really good at knowing like this is
17:24 the right time to pull this skill into
17:26 context and go ahead and use it. And you
17:29 can combine skills with tools like MCP.
17:31 So MCP gives you access to tools and
17:34 access to context. Um, and then skills
17:35 give you the expertise to actually make
17:37 use of those tools and make use of that
17:40 context. Um, and so for cloud code, a
17:42 good example is web design. Maybe
17:44 whenever you launch a new product or a
17:46 new feature, um, you build landing
17:47 pages. And when you build those landing
17:49 pages, you want them to follow your
17:51 design system and you want them to
17:53 follow the patterns that you've set out.
17:56 Um, and so Claude will know, okay, I'm
17:57 being told to build a landing page. This
17:59 is a good time to pull in the web design
18:02 skill. um and use the right patterns and
18:04 and design system for that landing page.
18:06 Uh tomorrow Barry and Mahes from our
18:08 team are giving a talk on skills.
18:10 They'll go much deeper and I definitely
18:14 recommend checking that out.
18:15 So these are the ways that we're
18:17 evolving our platform um to help you
18:19 take advantage of everything that Claude
18:21 can do to get the absolute best
18:22 performance for the things that you're
18:24 building. First, harnessing Claude's
18:27 capabilities. So, as our research team
18:29 trains Claude, we give you the API
18:30 features to take advantage of those
18:33 things. Next, managing Claude's context.
18:35 It's really, really important to keep
18:37 your context window clean with the right
18:40 context at the right time. And third,
18:41 giving Claude a computer and just
18:46 So, we're going to keep evolving our
18:49 platform. Um, as Claude gets better and
18:51 has more capabilities and gets better at
18:53 the capabilities it already has, we'll
18:55 continue to evolve the API around that
18:57 so that you can stay on the frontier and
18:59 take advantage of the best that Claude
19:04 has to offer. Um, second, as uh, memory
19:06 and context evolve, we're going to up
19:08 the ante on the tools that we give you
19:10 in order to let Claude decide what to
19:12 pull in, what to store away for later,
19:13 and what to clean out of the context
19:15 window. And third, we're really going to
19:18 keep leaning into agent infrastructure.
19:19 Some of the biggest problems with the
19:21 idea of just let Claude have a computer
19:23 and do its thing are those problems that
19:25 I talked about around orchestration,
19:27 secure environments, and sandboxing. And
19:29 so we're going to keep working um to
19:32 make sure that those are um ready for
19:35 you to take advantage of.
19:37 Um and I'm hiring. We're hiring at
19:39 Anthropic. We're really growing our
19:41 team. Um, and so if you're someone who
19:44 loves um, building delightful developer
19:46 products um, and if you're excited about
19:47 what we're doing with Claude, we would
19:50 love to work with you across end product
19:53 design um, Devril, lots of functions. So
19:56 please reach out to us
20:09 Our next [music] presenter is the
20:13 president and head of AI at Replet. He's
20:15 here to speak about building the future
20:17 of coding. Please join me in welcoming
20:32 All right, good morning everyone. So at
20:35 Replet we're building a coding agent for
20:38 nontechnical users. It's a very peculiar
20:39 challenge I would say compared to many
20:41 people in this room. And what I'm going
20:43 to talk about today is why autonomy has
20:46 become kind of the northstar that we
20:47 keep chasing you know since we launched
20:49 the very first version of rapid agent
20:52 September last year.
20:56 Let's start from this very interesting
20:59 plot in case my clicker worked which now
21:01 does. Um I'm sure you all have seen it.
21:04 you know the semiacing value of that
21:06 published by Zwix a few weeks ago and it
21:08 kind of clarified a bit the landscape
21:11 you know for all of us uh agent builders
21:14 on one hand you have the low latency
21:15 interactions that really allow you to
21:17 stay in the loop you know so you can do
21:19 deep work and focus really on the on the
21:21 coding task at hand but you need to be
21:23 an expert you need to know exactly what
21:25 to prom the model for and you need to
21:26 understand quickly if you want to accept
21:29 the changes or not then for several
21:31 months many of us including rapid We
21:34 kind of live in this I think value that
21:36 where the agent wasn't autonomous enough
21:39 to really delegate a task and come back
21:41 and see it accomplished but at the same
21:44 time it run long enough not to keep in
21:46 the zone not to keep in the loop likely
21:48 over time we managed to go all the way
21:50 on the right and now we have agents that
21:52 runs for several hours in a row. What
21:54 I'm going to be arguing with today and
21:56 hope is not going to stop inviting me to
21:58 this event is the fact that there is an
22:00 additional dimension like a third
22:02 dimension to this plot that you know it
22:04 hasn't been covered here and namely the
22:06 fact is how do we build autonomous
22:10 agents for nontechnical users.
22:12 So what I'm going to be arguing today is
22:14 that there are two types of autonomy.
22:17 One of it is more supervised. So think
22:20 of the you know Tesla FSD example. When
22:22 you sit in a Tesla, you're still
22:24 expected to have a driving license.
22:25 You're going to be sitting in front of
22:28 the steering wheel. Perhaps 99% of the
22:29 time, you're not going to use it, but
22:31 you're there in order to take care of
22:34 the longtail events. And similarly, a
22:36 lot of the coding agents that we have
22:38 today require you to be technically
22:41 savvy in order to use them correctly.
22:44 We at Reply and uh other companies at
22:46 this point are focusing on kind of the
22:48 Whimo experience for autonomous coding
22:51 agents. So you're expected to sit in the
22:53 back. You don't even have access to the
22:55 steering wheel. And I expect you
22:56 basically not to need any driving
22:59 license. Uh why is this important?
23:01 Because we want to empower every
23:03 knowledge worker to create software. And
23:05 I can't expect knowledge workers to know
23:07 what kind of technical decisions an
23:08 agent should be making. We should
23:10 offload completely the level of
23:12 complexity away from them.
23:14 Of course, it took a while to get here.
23:16 So I'm I'm sure what I'm showing you
23:18 here is something that all of you are
23:20 very familiar with. It took several
23:24 years to go from I know maybe less than
23:25 a minute feedback loop constant
23:27 supervision and talking about
23:28 completions and talking about
23:30 assistance. These are areas where the AI
23:33 power is and really been pioneering this
23:37 this type of user interaction. Then we
23:39 slowly climbed through you know higher
23:41 levels of autonomy. So we had the first
23:43 version of the agents based on on react.
23:45 So we concocted autonomy with a very
23:49 simple paradigm on top of LMS. Then
23:51 likely AI providers understood that tool
23:53 calling was extremely important poured a
23:55 lot of effort on that. So we built the
23:57 next version of agents with native tool
23:58 calling. And then I would say there is a
24:01 third generation of agents which I call
24:03 autonomous and that's when we started to
24:05 break the barrier of say one hour of
24:07 autonomy. Basically the the agent being
24:09 capable of running on long horizon tasks
24:12 and remaining coherent. It happens to be
24:13 the case that those are also the
24:14 versions of rapid agent that we launched
24:17 over the last year. So the B3 is the one
24:19 that we launched a couple of months ago
24:21 and it has exactly showcases those
24:24 properties. So the question for today is
24:26 can we actually build fully autonomous
24:29 agents and how do we get there.
24:32 So I'm going to try to redefine the
24:33 definition of autonomy today. I think
24:36 that often times we conflate autonomy
24:38 with a concept of something in the lungs
24:41 for a for a lot of time and usually as a
24:45 user you lose control. In reality what
24:47 the autonomy that I want to give to
24:50 agents can be very specifically scoped
24:53 and what I mean by that is especially
24:55 with rapid agent tree what we accomplish
24:57 is we we make sure that our agent takes
24:59 holy technical decisions. Of course,
25:02 that could lead to very long gap between
25:03 the different user interactions and in
25:05 case the agent again runs for several
25:07 hours. But this happens if and only if
25:09 the scope of the task you're giving to
25:12 the agent is really broad. And it turns
25:13 out that in reality you can have an
25:15 agent that is really autonomous and is
25:18 still fast as long as you give it a very
25:19 narrow scope for the task, you know, at
25:23 hand. So what we can accomplish in this
25:25 way is that the user still maintains
25:26 control on the aspects that they care
25:28 about and a user cares about what
25:30 they're building. Especially again our
25:31 users, knowledge workers, they don't
25:34 care about how something has been built.
25:35 They just want to see their goals to be
25:38 accomplished. So autonomy should not be
25:41 basically conflated with long run times.
25:44 And similarly, it shouldn't become a
25:46 dity metric. You know, a lot of us are
25:48 talking about it as a as a badge of
25:49 honor. And it's definitely been exciting
25:51 to see in the last few months that you
25:53 know many of us broke the barrier of uh
25:55 running several hours in a row. But I
25:58 think in terms of how to build agents
25:59 that are going to be more powerful and
26:01 more suitable in the future, we kind of
26:04 have to change a bit uh the the target
26:06 the metric that we that we keep in mind.
26:09 So think about it in this way. Tasks
26:11 have a natural level of complexity and
26:13 basically what we care about is that
26:15 they have a minimum irreducible amount
26:18 of work that they express. What agents
26:19 do is that they always go through this
26:21 loop of planning, implementing and
26:24 testing. And of course to make this
26:25 happen and to make it work correctly,
26:27 you want this work to be happening over
26:30 a long quing trajectory. So our goal is
26:33 to maximize the reducible runtime of the
26:36 agent. By reducible, I mean having a
26:37 span of time where the user doesn't have
26:40 to make any technical decisions and the
26:42 agent can accomplish the task again in
26:44 full autonomy. This is especially
26:46 important for us because I can't trust
26:48 our users to make technical decisions.
26:50 So they they need a proper technical
26:52 collaborator by their side. I want to
26:55 abstract away as much complexity as
26:56 possible from the process of software
27:00 creation. And last but not least, I want
27:02 the users to feel in control of what
27:05 they're creating without startling their
27:06 creativity because they have also to
27:08 think about the technical decision that
27:10 the agent is making.
27:13 So now what are the pillars of autonomy?
27:15 How are we making this happen? I would
27:17 say there are three pillars that are
27:19 extremely important to think about. The
27:21 first one is of course the capabilities
27:23 of frontier models like the baseline IQ
27:26 that we inject in the main agentic loop.
27:28 I'm going to leave this as an exercise
27:29 to the reader and to other people in the
27:31 room. I'm really glad a lot of you are
27:33 building amazing models that you know we
27:35 use all the time at Rabbit. So this is
27:37 the pillar number one. The second pillar
27:40 is verification. It's very important
27:43 that we test for local correctness of
27:45 our agent at every step that it takes
27:47 and the reason is fairly intuitive. If
27:48 you are building on very shaky
27:50 foundations, eventually the castle will
27:54 topple down. So we brought verification
27:56 in the loop to make sure that in a sense
27:57 you are having you know nines or
27:59 reliability whereing the compounding
28:01 errors that an agent will make
28:03 unavoidably if you know you don't put
28:05 any control on it. And last but not
28:07 least, you heard it on stage even
28:08 earlier. I'm sure are going to be
28:09 hearing this you know the entire day or
28:11 the entire duration of the conference.
28:14 Uh the importance of context management.
28:16 So on one end you want to have an agent
28:17 that is capable of being globally
28:19 coherent. So it's align with the intent
28:21 of the user the expectation of the user
28:23 but at the same time it is also to be
28:25 capable of managing both the high level
28:27 goal and the single task that the agent
28:29 is working on. I think we made amazing
28:31 progress in the last months on context
28:33 management. But I'm also excited to see
28:36 you know where we're going as a field.
28:38 Let's start from the first pillar that
28:40 we work actively at rapid which is verification.
28:42 verification.
28:45 So why did we focus on this? Over the
28:49 know last year we realize something that
28:51 I think each one of you has experienced.
28:53 So without testing agents build a lot of
28:56 painted doors. In our case the painted
28:58 doors are very visible because we create
29:00 a lot of web applications. So you end up
29:02 basically trying to click on a button
29:04 and the handler is not looked up or some
29:06 of the data that we're showing is
29:08 actually mock data and it's not coming
29:10 it's not coming from a database. But in
29:11 general this phenomenon spans you know
29:13 across every type of component you're
29:16 building being it front end or back end
29:17 a lot of components are actually not
29:21 fully fleshed uh by the agent. So we run
29:22 some evaluations internally. We found
29:25 out that more than 30% of the individual
29:27 features happen to be broken know the
29:29 first time that are cooked by the agent.
29:32 And that also means that almost every
29:34 applications have at least one broken
29:37 feature or painted door. They're hard to
29:40 find. The reason is users are not going
29:42 to spend time testing every single
29:44 button, every single field. And this is
29:47 also probably one of the reasons why a
29:49 lot of our users, especially the
29:51 nontechnical ones, still can't trust
29:53 coding agents very much. They are
29:54 shocked when they find that there is a
29:57 painted door out there. So, how do we
29:59 solve this problem?
30:01 Fundamentally, we need an agent must
30:03 gather all the feedback that they need
30:05 from their environment, right? It's
30:08 easier said than done. Um again
30:10 nontechnical users not only cannot make
30:12 technical decisions but also they cannot
30:14 provide the technical feedback that you
30:16 know an agent is required to make
30:18 progress and most what they can do is
30:20 basic you know quality assurance
30:22 testing. They can literally go around
30:24 the UI click interact with the
30:26 application. I'm I'm sure you have tried
30:28 it in your life. This is extremely
30:30 tedious to do and it leads to a very bad
30:32 user experience. And even though we
30:34 relied on that with our first release of
30:36 the agent last year, quickly we found
30:38 out that users don't want to spend time
30:40 doing testing. So we had to find a
30:42 complete, you know, orthogonal solution
30:45 to that which is autonomous testing and
30:47 it solves several different issues. The
30:50 first one is it breaks the feedback
30:52 bottleneck. Even if again we ask
30:54 feedback to the user, we were not given
30:56 enough of that. Now we don't have to
30:58 wait anymore for human feedback. we have
31:00 a way to elicit as much information as
31:03 possible from the app autonomously. We
31:05 also want to prevent the accumulation of
31:07 small errors. What I was saying before,
31:08 we don't want to have compounding errors
31:10 while the agent is building. And last
31:12 but not least, we have to overcome the
31:14 laziness of frontier models. So we need
31:16 to verify that whenever a model tells us
31:18 that a task has been completed, there is
31:20 actually the truth and that result is
31:23 not being elucinated.
31:25 There is a wide spectrum of code
31:27 verification that you know you you can
31:29 accomplish. I think we all started from
31:31 the very left. You know you have basic
31:33 study code analysis with LSPs. We have
31:35 been executing the code since we had
31:37 basically lams that were capable of
31:39 debugging and then we slowly started to
31:41 move towards the right. So generating
31:43 unit tests and running them it has a
31:45 limitation. It's limited only to
31:47 functional correctness. Uh unit testing
31:49 is not very powerful to do like proper
31:52 integration testing by definition. We
31:54 started also to do now API testing but
31:56 it's only limited to API code. So you
31:59 can test endpoint of an applications you
32:01 can't really test how a web app
32:04 functions and looks like and for this
32:07 reason in the last few months has and
32:09 other companies are putting a lot of
32:11 effort in really creating autonomous
32:13 testing based on the browser you know in
32:14 case the app that we're building is a
32:16 web application. There are two main
32:18 categories here. One is computer use.
32:20 It's a onetoone mapping with user
32:22 interface. So the model is directly
32:24 interacting with the application. It
32:26 requires screenshots. It tends to be
32:28 fairly expensive and fairly slow. I'm
32:31 sure you you tested it yourself. A good
32:33 way in the middle is browser use where
32:36 we simulate the user interface. You can
32:38 then interact with the browser and with
32:40 the web application and it relies on
32:41 basically accessing the DOM through abstractions.
32:44 abstractions.
32:46 So how do we how do we make this work in
32:49 Weblet? Um what we do is that we
32:51 generate applications that are amenable
32:54 to testing and we sort of merge
32:56 everything together from the previous
32:59 slides that I showed you. So we allow
33:01 the our testing agent to interact with
33:03 an application and gather screenshots in
33:05 case nothing has worked. So we have a
33:07 full back to computer use. But the vast
33:09 majority of times what we do is that we
33:11 have programmatic interactions with the
33:12 applications. So we interact with the
33:15 database, we read the logs, we do API
33:18 calls, we literally click on the app and
33:20 get back all the information that we
33:21 need. And by putting all of this
33:24 together, we collect enough feedback
33:27 that allows our agent both to make
33:29 progress and also to fix all the painted
33:32 doors that it encounters.
33:36 Just a know short technical deep dive on
33:38 how we accomplish this. I'm sure you
33:41 have seen a lot of the toolbased uh
33:43 browser use. There are amazing libraries
33:46 out there. First one comes to man stage
33:48 and the idea is that you have an agent
33:50 that has a few very generic tools
33:53 exposed. So know the agent can create a
33:56 new tab, can click, can fill forms etc
33:58 etc. The limitation here is that it's
34:00 difficult to enumerate all the different
34:02 type of interactions you could be having
34:04 with a browser. The problem of testing
34:07 is very similar to the Tesla analogy I
34:09 was making before. Maybe this cardality
34:12 of tools available is enough for 99% of
34:14 the interaction types. But then there is
34:17 always a long tail of idiosyncratic
34:18 interactions that a user makes with the
34:20 with a web application that are hard to
34:23 map into these tool these different tool
34:26 codes. So what we do uh in our case at
34:30 rapid is we directly write playrite code
34:32 and playwrite code is first of all very
34:35 manable for LLMs. LLMs are kind of
34:36 amazing at writing playright. You know
34:38 this is the experience that we had uh
34:40 since we started to work on this project
34:43 is also very powerful and expressive. So
34:45 in a sense it's a super set of what you
34:48 can express uh on the compared to the
34:51 left on the tools uh testing. And last
34:53 but not least, there is beauty in
34:55 creating playright code because you can
34:57 reuse those tests. The moment you write
34:59 a test in script, then you can rerun it
35:00 as many times as you want. So in a
35:02 sense, the moment you created a test,
35:04 you're also creating a regression test
35:06 suite that you can keep running in the
35:10 future. And all these kind of uh tricks
35:12 that I explained to you right now, they
35:14 helped us to create something that is
35:16 roughly a order of magnitude cheaper and
35:18 faster compared to computer use. And
35:20 we'll go back later on how important
35:22 latency is.
35:24 The second thing that the second pillar
35:25 that I wanted to talk about today of
35:26 course is context management. And I'm
35:29 going to go very fast here because I
35:30 think you're going to be hearing a lot
35:33 of talks today about it. The the high
35:36 level message here is that long context
35:38 models are not needed to work on quer
35:40 and long trajectories. Uh from
35:42 experience we found that most of the
35:44 tasks even the more ambitious one can be
35:47 accomplished within the 200,000 tokens.
35:49 So we're still not in a world where
35:52 working with models that have 10 million
35:54 or 100 million uh context windows is
35:56 necessary to actually run autonomous
35:59 agents. And we accomplish this by means
36:01 of learning how to do context management
36:04 correctly. So first of all, there are
36:06 several different ways to maintain state
36:09 which don't imply chucking all the state
36:11 into your context window. You can do
36:13 that for example by using the codebase
36:15 itself to maintain state. So you can
36:18 write documentation while the agent is
36:20 creating new code. You can also include
36:22 the plan description and all the
36:23 different task list that the agent is
36:25 working on. You can persist them on the
36:27 file system. So even there like have a
36:29 lot of ways to offload your memories.
36:30 And last but not least and this is
36:32 something I think you know Antropic has
36:35 been uh really evangelizing about um you
36:37 can even dump directly your memories in
36:39 the file system and then making sure
36:41 that your agent decides when to write
36:42 them back the moment they become
36:45 relevant to your work. So for this
36:46 reason we have been seeing a lot of
36:48 announcements in the last couple of
36:50 months. I just picked this one from
36:52 entropic you know with cloth sonet 4.7
36:56 so I wish 4.5 uh they have been able to
36:59 run uh focus task for more than 30 hours
37:01 in a row we have seen similar results
37:04 from open AI on the math problems. So I
37:06 think we we kind of broke the barrier of
37:08 running for long and you know being able
37:10 to have querant tasks.
37:12 I would say the key ingredient to make
37:15 this happen has been how good models
37:17 hand as agent builders have become in
37:20 doing sub agent orchestration. Subages
37:22 basically work by means of they are
37:24 invoked in the core loop. So it's a
37:26 completely it's starting from a blank
37:28 slate uh from a completely fresh
37:30 context. You as an agent builder decide
37:32 what subset of the context to inject
37:35 when this sub agent starts. And it's a
37:36 concept that is very similar I think to
37:38 everyone who's been writing software you
37:39 know in the last decades is separation
37:42 of concerns. So you decide what your sub
37:43 engine is going to be working on. You
37:44 give it the least possible amount of
37:46 context. You allow it to run to
37:48 completion. You only get the output the
37:50 results. You inject them back into the
37:52 main loop and you keep running in this
37:54 way. Of course it significantly improves
37:57 the number of memories per compression.
37:59 I just brought this plot from directly
38:02 from reputation run in production. The
38:04 moment we kicked in our new subvision
38:07 orchestrator on the ax on the y-axis you
38:09 can see the number of memories per
38:11 compression. So we went from roughly 35
38:16 to 4550 recently. So big improvement in
38:19 terms of how often we are recompressing
38:22 our context just because we can offload
38:24 a lot of the context pollution by means
38:27 of using sub aents.
38:29 I'm going to give an example where this
38:30 made the difference for us. You know
38:32 what I'm showing you here is more kind
38:34 of a cost optimization in a sense like
38:36 you're compressing less. You also have
38:38 separation of concerns which definitely
38:40 make your agent a bit smarter. In the
38:42 case of testing
38:45 working with sub agent was almost
38:46 mandatory for us and basically we
38:48 started to work on automated testing
38:50 even before we were very advanced in
38:52 terms of subent orchestration. And what
38:55 we found out is of course again as I was
38:57 saying before it makes things easier
39:00 better cost less pollution but when you
39:03 allow the main loop not only to create
39:05 code but also to do browser opt browser
39:08 actions to put back the observation of
39:10 your browser actions into the main loop
39:12 you tend to confuse the the hent loop
39:14 very much because at this point there is
39:15 a lot of heterogeneity in terms of the
39:17 action that your main loop is looking
39:20 at. So in order to make this work not
39:22 only we have to build all the playright
39:23 framework that I was showing to you
39:25 before but we also have to move our
39:27 entire architecture into sub agents. So
39:29 at this point you can see very clearly
39:30 why there is a separation of concern
39:33 here. Get the main agent loop running.
39:35 We decide at a certain point that it's
39:37 time to verify if the output of the
39:39 agent has been correct. We make this
39:41 happen all within a sub agent. Then we
39:42 scratch the context window of that sub
39:44 agent. We just return back the last
39:46 observation to the agent loop and then
39:49 we keep running in that way. So if
39:51 you're having issues today making your
39:53 sub agents uh work correctly, this is
39:55 one of the reasons why that you want to
39:57 take a look at.
40:00 So I think we covered the high level of
40:02 how to create more and more powerful uh
40:05 autonomous agents over time and I only
40:07 see us as a field becoming even more
40:09 proficient than that in the next months.
40:11 There is one additional ingredient
40:12 though that is going to make the
40:14 difference and it's parallelism. And I
40:16 will argue that parallelism is important
40:19 not because it's going to make agents
40:21 more powerful per se, but rather because
40:23 it's going to make the user experience
40:27 more exciting. So of course it is great
40:29 to have an agent that is capable of
40:31 running autonomously for long, but at
40:33 the same time it comes with the price of
40:34 making the user experience less
40:37 thrilling. You are not in the zone
40:39 anymore. What you do is that you write a
40:41 very long prompt. It's translated into a
40:44 task list. Uh and then you go to have
40:45 lunch with your colleagues and then you
40:47 come back and you hope that the agent is
40:48 done. That is not the kind of experience
40:50 that most of the productive people want
40:52 to have in life. You know, you want to
40:53 see as much work as done as possible in
40:56 the shortest span of time.
40:59 So what we do as a as a field at this
41:00 point has been to create parallel
41:03 agents. It's a very common trade-off
41:04 which by the way doesn't only apply to
41:06 agents. it it applies to computing in
41:09 general and for parallel agents what you
41:12 do is that you you trade off basically
41:14 extra compute in exchange for time. Why
41:16 there is this trade-off? So first of all
41:18 when you're running agents in parallel
41:21 you're gathering the same context in
41:23 multiple context windows. So every
41:25 single parallel agent that you will be
41:27 running probably shares say 80% of the
41:29 context across the board. So of course
41:32 you are just putting more computed work
41:34 because you're running those agents in
41:36 parallel. There is also another cost
41:39 that is kind of intangible for a lot of
41:40 you here in the room because I'm sure
41:43 you're all expert software developers.
41:45 But what do you do with the output of
41:47 multiple par agents at the end? Often
41:49 times you need to resolve merge
41:51 conflicts. So as a reminder, my users
41:53 don't even know what's the concept of
41:54 merge conflicts. It's something that I
41:58 have to figure out on our own. So the
41:59 current way in which we think of
42:01 parallel agents in in the space doesn't
42:04 really apply to rapid. Now at the same
42:05 time I still want to very much to
42:08 accomplish this. There are so many
42:10 interesting features that you can enable
42:11 with parallelism. Aside from the fact
42:14 that you can get more work done u at
42:16 times you want to you want testing to be
42:18 running in parallel with the agent that
42:20 creates code. Testing no matter how much
42:22 we optimize it is still very slow. If an
42:24 agent is only spending time on testing
42:26 users are not going to be engaging with
42:28 your application anymore. Um, at the
42:29 same time, it's also great to have a
42:31 synchronous process running while your
42:32 agent is running because you can inject
42:34 useful information back into the main
42:37 core loop. And last but not least is a
42:40 very common technique that we know boost
42:43 performance if you have enough budget to
42:45 do so. You should be sampling multiple
42:48 trajectories at the same time. So a lot
42:49 of perks are coming with parallel
42:52 agents. But u the way in which we
42:54 implement them today which I call
42:56 basically call user has an orchestrator
42:59 is the fact that tasks the parallel task
43:00 that you want to run are determined by
43:03 you by the user and each task is
43:05 dispatched in its own thread. So there
43:08 is a bit of manual process even the task
43:09 de composition in a sense is happening
43:11 in your mind while you're thinking about
43:14 which agents you want to run and then
43:16 the moment you get back all the results
43:17 you need to go through the problem of
43:20 merge conflicts and often times this is
43:22 not trivial at all no matter how many
43:24 amazing tools are out there. So what
43:27 we're working on today for our next
43:30 version of the agent is having the core
43:32 loop as the orchestrator. So the key
43:35 difference here is the fact that the the
43:36 subtask that we're going to be working
43:39 on are not determined by the user but
43:41 they are determined by the corion loop
43:43 and the parallelism is basically decided
43:46 on the fly. The agent does the task de
43:48 composition on behalf of the user and
43:50 this comes with a couple of advantages.
43:52 First of all again there's no cognitive
43:54 burden to for the user to understand how
43:57 they should be decomposing the task. At
43:59 the same time also there are ways in
44:03 which you can create tasks that sort of
44:05 mitigate the problem of merge conflicts.
44:07 I'm not claiming that we're going to be
44:09 able to mitigate it 100%. There are so
44:11 many corner cases in which merge
44:13 conflict will still represent a problem
44:14 but there are a lot of different
44:16 techniques known in software engineering
44:18 to make sure that you can try to have
44:20 multiple subage and not stepping on each
44:23 other toes. So the core loop as an
44:26 orchestrator is going to be the our main
44:29 bet for the next few months.
44:30 And in case you're passionate about
44:32 these topics,
44:34 [music] I'm always hiring a rabbit.
44:37 Thank you. [applause]
44:39 From transforming support tickets into
44:41 merge requests to helping teams ship
44:44 fixes faster than ever, our next
44:46 presenter has been at the center of
44:49 Zapier's AI agent journey. Please
44:57 [music] [applause]
45:04 Hello.
45:06 I'm so excited to tell you about how at
45:08 Zapier we are empowering our support
45:11 team to ship code. Before I tell you
45:14 about that, has anybody here visited the
45:16 Grand Canyon?
45:18 It's a good amount. Anybody rafted
45:21 through the Grand Canyon?
45:24 I see one person. I just got off an
45:26 18-day trip rafting through the Grand
45:28 Canyon over 200 miles. It was
45:31 incredible. No internet, no cell
45:33 service. The moment I got off, I found
45:36 out I was giving this talk. I didn't
45:39 think about uh work at all on the river,
45:41 but once I got off, I started thinking
45:43 about the parallels between the Grand
45:46 Canyon and Zapier. And we have one thing
45:50 in common and that is erosion.
45:52 Now natural erosion happens over
45:55 millions of years with wind, water and
45:58 time. It creates the beautiful canyon
46:02 that we experience and it's never
46:04 stopping, always continuing. At Zapier,
46:07 we have over 8,000 integrations built on
46:10 third party APIs and they are constantly
46:13 changing, which I'm now thinking of as
46:15 app erosion.
46:17 We've been around for 14 years. Some of
46:19 our apps are that old. API changes and
46:23 deprecations impact us and create
46:26 reliability issues. Again, it never stops.
46:27 stops.
46:30 So, I like to think of our apps as like
46:33 layers in the Grand Canyon, and they
46:35 need constant attention.
46:38 So, if we were to create our own Zapier
46:40 Canyon and our apps would be at the
46:43 walls, here's our support team flowing
46:46 down the middle watching out for app
46:50 erosion. And we have a backlog crisis.
46:52 Tickets were coming in faster than we
46:54 could handle them.
46:57 Creates integration reliability issues,
47:00 poor customer experience, even churn. So
47:02 to solve for app erosion, we kicked off
47:07 two parallel experiments. The first was
47:09 moving support from just triaging to
47:12 also fixing these bugs. It's experiment
47:15 number one. Experiment number two, we
47:18 were asking can AI help solve app
47:20 erosion faster.
47:23 So let's jump into experiment one. This
47:25 get kicked off two years ago, but had to
47:27 start with the why. We needed to get
47:29 that buy in to empower our support team
47:32 to ship code.
47:35 So apparosion is one of the major
47:38 sources of bugs coming through to from
47:41 support to engineering. So there's a big
47:44 need support is eager [laughter] for
47:47 this experience to a lot of them want to
47:48 go into engineering eventually and
47:51 unofficially many support members were
47:54 already helping to maintain our apps.
47:56 This moves us into how we started this
48:00 out. Put on some guard rails. We started
48:03 with just four target apps to uh focus
48:06 our fixes on. Engineering was set to
48:08 review any merge requests coming from
48:10 support and we kept the focus on app fixes.
48:12 fixes.
48:14 So jumping into experiment 2, this is
48:15 what I've been leading for the last
48:18 couple of years. How can we use codegen
48:20 to help solve for app erosion? And so
48:23 fortuitously, the name of this project
48:27 is Scout, which ties in so well to the
48:28 Grand Canyon experience that I've just
48:30 been through.
48:33 As any good product manager, we start
48:36 with discovery. We did some dog fooding,
48:39 so I shipped some app fixes. Uh we
48:41 shadowed engineers and support team
48:43 members as they were going through the
48:47 app fix process. We designed out uh what
48:48 are the pain points experienced along
48:51 the way, what are the phases of the work
48:54 and how much time is spent.
48:57 One big discovery we had is how much
49:00 time is spent gathering the context
49:03 going to the third party AP API docs
49:06 even crawling the internet looking for
49:08 information about a bug that's emerging
49:09 maybe somebody else has already
49:10 discovered and solved for it outside of
49:15 Zapier. internal context, logs, all of
49:17 this is a lot of context to go and
49:21 search for as a human uh and a lot to
49:24 gro and work through. This is something
49:28 we knew we needed to solve for.
49:33 where we started with all this great uh
49:36 opportunities and pain points is we
49:39 started building APIs that we believed
49:42 would solve for these individual um pain
49:46 points and some of these APIs are using
49:51 LLMs to you know for our diagnosis tool
49:53 gathering all that context on behalf of
49:56 the uh support person engineer and
49:58 curating that context building a
50:00 diagnosis that's [clears throat] using
50:03 an LLM. And then some aren't like we
50:06 have a unit test uh or unit test
50:08 generator is, but the um test case
50:11 finder is simply using a search query to
50:13 look for the right test cases to pull in
50:17 for your unit test. We built a bunch of
50:20 APIs. We had a bunch of great ideas. So
50:22 there was a lot for us to test with, but
50:24 we ran into some challenges in this
50:26 first phase. We had APIs but they were
50:30 not embedded into our engineers process.
50:33 So our tool I just said they don't like
50:36 to go to so many web pages to find all
50:38 their context. They would love all this
50:40 information to come to them. And yet our
50:42 web interface where we've we've created
50:45 a playground we call autocode internally
50:47 where you can come and play around with
50:51 our APIs. And our ask to the teams was
50:54 come try out our APIs and give us feedback.
50:56 feedback.
50:58 Now this is just one more window to go
50:59 to. So we didn't get a lot of
51:02 engagement. Also because we had shipped
51:06 so many uh APIs our team was spread
51:09 pretty thin. Cursor launched at the same
51:12 time which has gotten great adoption at
51:15 Zapier. We're all huge fans of cursor.
51:16 But from our side, it made some of our
51:20 tools no longer necessary.
51:21 But there was one major win in this
51:24 phase, which is one of our APIs became a
51:27 support darling. It's diagnosis. That
51:29 number one pain point of needing to go
51:31 out and find all of your context, curate
51:33 it for yourself so you can start solving
51:37 the problem. We were doing that on uh
51:39 the support team's behalf with the
51:42 diagnosis API
51:45 and support loved it enough that they
51:48 decided to embed it into their process.
51:49 They asked us to build a zap year
51:52 integration on our autocode APIs so they
51:55 could embed it into their zap that
51:57 creates the jur ticket from the support
52:02 issue and now diagnosis is included.
52:05 So embedding tools is the key to usage
52:07 as we find out. So how can we embed more
52:11 of our tools? Well, then MCP spins up
52:14 and that solves our problem.
52:19 We can now embed these API tools into
52:21 our engineers workflow. Specifically,
52:24 our engineers are pulling in these MCP
52:27 tools as they're using Cursor.
52:31 Our builders using Scout MCP tools are
52:34 leaving the IDE less, spending more time
52:36 in one window.
52:40 Still coming into challenges. One of our
52:42 uh our our key tool diagnosis
52:45 uh is so valuable to pull all that
52:48 context and to provide a recommendation,
52:51 but it takes a long time to run. Now, we
52:54 might run down that runtime. However, as
52:56 you're working synchronously on a ticket
52:58 in your ID, this was frustrating. We
53:00 also weren't keeping up with the
53:03 customization needs. Not only did MCP
53:05 launch and we started leveraging it, Zap
53:07 Your MCP launched too. And some of our
53:09 tools, if we weren't keeping up with the
53:12 customization needs, our engineers
53:16 internally looked to Zap Your MCP, which
53:17 is great. We're all on the same team
53:19 solving the same problem, but some of
53:22 our tools had a dead end. Also adoption
53:25 was scattered. We had a whole suite of
53:26 tools and we thought there was value in
53:28 each of them as it solves for different
53:32 problems across the different stages.
53:34 Not every engineer was using our tools
53:36 and if they were using tools, they're
53:39 only using a few of them. So we have
53:42 tool usage. We're happy about that. But
53:45 we were under the hypothesis that true
53:47 value is going to come from tying these
53:49 tools together.
53:51 So what if we owned orchestration of
53:54 these tools rather than saying here's a
53:56 suite of tools you use them as you wish
53:59 what if we combined them and created an
54:02 agent to orchestrate this. So this we
54:05 are calling scout agent. We take that
54:09 diagnosis run that against a ticket uh
54:11 use that information to actually spin up
54:14 a codegen tool which will then produce a
54:16 merge request using all the right context.
54:18 context.
54:20 So who would benefit the most from
54:22 orchestration? There are several
54:25 integration teams at Zapier who are
54:27 solving for these app fixes of various
54:29 levels of complexity and there's the
54:32 support team. So when we're saying who
54:33 should be our first customer scout
54:36 agent, we're thinking it should probably
54:39 be the the team fielding small bugs that
54:41 are emergent and coming hot off the
54:44 queue which is the support team. And now
54:47 our two experiments merge
54:49 and we have scout agent. We are building
54:52 for the support team.
54:54 And this is the flow of how it works.
54:57 Support is submitting an issue to scout
55:01 agent. We first categorize the issue. We
55:04 next assess its fixability.
55:07 Not every issue that comes from support
55:10 can be fixed. If we thinks it's fixable,
55:12 we'll move on to generating a merge
55:15 request. At that point, the support
55:17 team, this is the first time they're
55:18 picking up the ticket. It already has a
55:21 merge request attached to it. They'll
55:25 review and test. If it's not satisfying
55:28 what they believe is the actual solution
55:31 or the the what what the solution should
55:34 be to best address the customer's need,
55:36 they will make a request for an
55:37 adjustment that can happen right in
55:39 GitLab, which is where we do our work
55:41 and Scout will do another pass and
55:43 hopefully at that point we've gotten it
55:45 right and support can submit that MR for
55:48 review from engineering.
55:50 How we are running Scout, it's all
55:52 kicked off by a Zap. This is a picture
55:54 of one of our zaps. There are many zaps
55:56 that's run this whole process and it
55:58 embeds right into our support team's
56:00 zaps. We do a ton of dog fooding at Zapier.
56:02 Zapier.
56:05 We first run diagnosis and post that
56:07 result to the Jira ticket saying what
56:09 the categorization is. If we believe
56:11 it's fixable and then if we do believe
56:14 it's fixable, we then are kicking off a
56:17 GitLab CI/CD pipeline.
56:18 And we run three phases in that
56:22 pipeline. plan, execute and validate to
56:24 generate this merge request. The tools
56:28 used in this pipeline is Scoutm. So all
56:31 those APIs we invested in a year ago now
56:33 are really coming together and we're
56:36 orchestrating it uh within the GitLab
56:38 pipeline and we're also leveraging
56:41 cursor SDK.
56:42 Once the merge request has been
56:45 completed, we attach it to Jira and
56:47 support picks it up.
56:50 The latest addition to this is doing a
56:56 rapid iteration once a um uh once a
56:58 ticket has been posted with the merge
56:59 request and support team is looking at
57:01 it and they say you know it needs some
57:04 tweaks to save them more time so they
57:05 don't have to go pull that down to their
57:07 ID do the fixes and push it back up.
57:10 they can simply chat with the uh scout
57:14 agent in gitlab that'll kick off another
57:16 uh pipeline which does that phase with
57:19 that new feedback and posts the new
57:22 merge request
57:24 on our side we want to make sure scout
57:26 agent is working so we ask three
57:29 questions the categorization right is
57:31 was it actually fixable uh and was the
57:34 code fix accurate so far we have two eval
57:36 eval
57:39 to 75% accuracy for categorization and
57:42 fixibility. As we get more feedback and
57:44 process more tickets, those become our
57:46 test cases and we can move forward
57:50 improving scout agent over time. So what
57:52 has been scout agents impact on app erosion?
57:54 erosion?
57:58 40% of supports support teams app fixes
58:01 are being generated by scout. So we're
58:04 doing more of the work on behalf of the
58:06 support team.
58:08 This is resulting and for some of our
58:10 support team it's doubling their
58:12 velocity from one to two tickets per
58:14 week which already is amazing. That's
58:17 going from a support team that wasn't
58:19 shipping any fixes, well unofficially
58:21 they were sometimes to now shipping one
58:23 to two per week per person to now
58:25 shipping three to four with the help of Scout.
58:27 Scout.
58:30 Another uh process improvement, Scout
58:32 puts potentially fixable tickets right
58:36 there in the triage flow. takes away a
58:37 lot of the friction of looking for
58:40 something to grab from the backlog.
58:42 It's not just the support who's
58:44 benefiting, it's also engineering.
58:46 Engineering manager said, uh, it's a
58:49 great example of when it works. This
58:51 tool allows us to stay focused on the
58:53 more complex stuff.
58:55 And if you take away anything from this
58:57 talk, I hope it is that there is a
59:01 really powerful magic between support
59:03 and empowering them with codegen and
59:05 allowing them to ship fixes because they
59:07 have three superpowers. The first they
59:10 are the closest to customer pain which
59:12 mean they're closest to the context that
59:14 really matters for figuring out what's
59:16 the problem and how to solve it. They're
59:20 also troubleshooting in real time. These
59:22 tickets aren't stale. the context is
59:25 fresh, the logs aren't missing. You put
59:27 this ticket into engineering backlog
59:29 months later, you might not get access
59:32 to those logs anymore. And then three,
59:35 they're best at validation.
59:37 You've again you put the same ticket
59:40 into an engineering backlog. The
59:42 solution an engineer might come up with
59:44 may change the behavior and that might
59:47 be good for some customers but might not
59:49 necessarily be best for that one
59:53 customer who wrote in about the problem.
59:58 And one other major benefit of this is
60:00 uh support team members who have been part of this experiment are now
60:02 part of this experiment are now engineers.
60:05 engineers. I want to say thank you to the amazing
60:06 I want to say thank you to the amazing team who's helped built this process or
60:09 team who's helped built this process or built all the tools and the scout agent.
60:11 built all the tools and the scout agent. Andy is actually here in the audience.
60:14 Andy is actually here in the audience. So shout out to Andy. If you want to
60:15 So shout out to Andy. If you want to talk about any of the technical bits,
60:17 talk about any of the technical bits, he's here. And I want to impress upon
60:19 he's here. And I want to impress upon you two things. or hiring, but mostly if
60:23 you two things. or hiring, but mostly if you haven't rafted through the Grand
60:24 you haven't rafted through the Grand Canyon, please consider it. It's
60:26 Canyon, please consider it. It's lifechanging and you should go with ORS.
60:29 lifechanging and you should go with ORS. Thank you very much.
60:31 Thank you very much. [applause]
60:43 Our next presenters believe that [music] 2026
60:44 2026 is the year the IDE died. Please join me
60:48 is the year the IDE died. Please join me in welcoming to the stage engineering
60:50 in welcoming to the stage engineering leader at Source Graph and AMP, Steve JG
60:54 leader at Source Graph and AMP, Steve JG and author and researcher at IT
60:56 and author and researcher at IT Revolution, Jean Kim.
61:00 Revolution, Jean Kim. [music]
61:08 Hey everybody. Um, really happy to be here. I'm going to be talking the first
61:09 here. I'm going to be talking the first half. Co-author here, Jean Kim, is going
61:11 half. Co-author here, Jean Kim, is going to talk second half. All right. Looking
61:14 to talk second half. All right. Looking forward to it. Cheers. All right. Today
61:16 forward to it. Cheers. All right. Today I'm going to Well, we're going to talk
61:17 I'm going to Well, we're going to talk real fast. This time is going to go down
61:18 real fast. This time is going to go down fast. Uh I'm going to talk to you about
61:20 fast. Uh I'm going to talk to you about what tools look like next year. Last
61:23 what tools look like next year. Last year I was talking to you all about chat
61:25 year I was talking to you all about chat and everybody ignored me and now
61:27 and everybody ignored me and now everybody's using chat this year and
61:29 everybody's using chat this year and it's like we're gonna we're going to fix
61:31 it's like we're gonna we're going to fix that right now. All right. So here's
61:34 that right now. All right. So here's what it's looked like. I'm going to tell
61:36 what it's looked like. I'm going to tell you right now, everyone's in love with
61:38 you right now, everyone's in love with Cloud Code. There's probably 40
61:40 Cloud Code. There's probably 40 competitors out there. Cloud Code ain't
61:43 competitors out there. Cloud Code ain't it.
61:45 it. completions wasn't it. I love cloud
61:47 completions wasn't it. I love cloud code. I use it 14 hours a day. I mean,
61:49 code. I use it 14 hours a day. I mean, come on. But it ain't it. Developers
61:52 come on. But it ain't it. Developers aren't adopting it. I'm going to talk
61:53 aren't adopting it. I'm going to talk about why in this talk. I'm going to
61:54 about why in this talk. I'm going to talk about what you can do about it and
61:56 talk about what you can do about it and what to look forward to. But the reason
61:58 what to look forward to. But the reason is they're too hard. Okay. Uh cognitive
62:00 is they're too hard. Okay. Uh cognitive overhead. Uh they lie, cheat, and steal.
62:03 overhead. Uh they lie, cheat, and steal. Gene and I talk a lot about this in our
62:05 Gene and I talk a lot about this in our book, all the different ways that they
62:06 book, all the different ways that they can lie, cheat, and steal. And uh most
62:08 can lie, cheat, and steal. And uh most devs just don't like this.
62:12 devs just don't like this. I have come to understand that claude
62:14 I have come to understand that claude code is very much like a drill or a saw,
62:18 code is very much like a drill or a saw, an electric one, right? How much damage
62:21 an electric one, right? How much damage can you do as an untrained person with a
62:23 can you do as an untrained person with a drill, right? Or a saw. Yeah. How much
62:26 drill, right? Or a saw. Yeah. How much damage can you do as an untrained
62:28 damage can you do as an untrained engineer with clawed code? It's real
62:30 engineer with clawed code? It's real similar. Yeah. You can cut your foot
62:31 similar. Yeah. You can cut your foot off,
62:34 off, but you can also be really, really
62:36 but you can also be really, really skilled with it and do really precision
62:38 skilled with it and do really precision work, right? like a craftsman. The
62:41 work, right? like a craftsman. The problem is software is infinitely large.
62:44 problem is software is infinitely large. Our ambition is infinitely large. And so
62:46 Our ambition is infinitely large. And so the analogy that I want to share with
62:47 the analogy that I want to share with you is next year will be the year from
62:49 you is next year will be the year from moving from saws and drills to CNC
62:53 moving from saws and drills to CNC machines. A CNC machine, you strap a
62:56 machines. A CNC machine, you strap a drill on and you give it coordinates and
62:58 drill on and you give it coordinates and it moves it around and very precise,
63:00 it moves it around and very precise, right? We've been doing this for
63:02 right? We've been doing this for centuries and we're not going to stop
63:04 centuries and we're not going to stop this year.
63:09 One thing I hear people say is, "Well, the models are plateaued." This is real
63:11 the models are plateaued." This is real common. Your engineers are probably
63:13 common. Your engineers are probably saying this, okay, even if they
63:16 saying this, okay, even if they plateaued, we have still discovered
63:18 plateaued, we have still discovered steam and electricity and it's going to
63:20 steam and electricity and it's going to take us a little time to harness it. But
63:21 take us a little time to harness it. But it's strictly an engineering problem at
63:23 it's strictly an engineering problem at this point. All code within a year, year
63:27 this point. All code within a year, year and a half will be written by giant
63:29 and a half will be written by giant grinding machines overseen by engineers
63:32 grinding machines overseen by engineers who no longer actually look at the code
63:34 who no longer actually look at the code directly anymore.
63:37 directly anymore. Weird new world. That is where we are
63:39 Weird new world. That is where we are going. Oh my gosh. Yeah. This this
63:42 going. Oh my gosh. Yeah. This this slide. So Gene and I talked to Andrew
63:44 slide. So Gene and I talked to Andrew Glover who I don't know is he here from
63:45 Glover who I don't know is he here from OpenAI and he said that they have this
63:48 OpenAI and he said that they have this incredible dichotomy unfolding at OpenAI
63:50 incredible dichotomy unfolding at OpenAI where you know some percentage of their
63:52 where you know some percentage of their engineers are using codecs and then some
63:55 engineers are using codecs and then some other percentage a larger percentage are
63:56 other percentage a larger percentage are not using codecs and the difference in
63:58 not using codecs and the difference in productivity is so staggering that
64:00 productivity is so staggering that they're having now alarms going off at
64:03 they're having now alarms going off at performance review time because how do
64:04 performance review time because how do you compare these these two engineers
64:06 you compare these these two engineers who are the same level, same title, same
64:08 who are the same level, same title, same everything and one of them is 10 times
64:10 everything and one of them is 10 times as productive as the other one by any
64:12 as productive as the other one by any measure.
64:13 measure. And the answer is they're freaking out.
64:15 And the answer is they're freaking out. They may have to fire 50% of their
64:17 They may have to fire 50% of their engineers. And this is unfolding at
64:18 engineers. And this is unfolding at other companies, too.
64:21 other companies, too. Who is refusing it? It's the senior and
64:24 Who is refusing it? It's the senior and staff engineers. How many minutes are we
64:26 staff engineers. How many minutes are we at?
64:28 at? >> Eight [clears throat] minutes.
64:29 >> Eight [clears throat] minutes. >> We're perfect. This is just like what
64:32 >> We're perfect. This is just like what happened to the Swiss mechanical watch
64:35 happened to the Swiss mechanical watch industry over a couple of Well, it was
64:37 industry over a couple of Well, it was built up for a couple of centuries and
64:39 built up for a couple of centuries and then courts killed it, you know, within
64:40 then courts killed it, you know, within a couple of years. And what happened was
64:42 a couple of years. And what happened was the craftsmen were doing the same thing
64:44 the craftsmen were doing the same thing our staff engineers are doing today. No
64:47 our staff engineers are doing today. No cheap.
64:49 cheap. That's word for word, right? That's what
64:51 That's word for word, right? That's what they say.
64:54 they say. All right. I didn't know where to put
64:56 All right. I didn't know where to put this slide. This is this is Claude's
64:58 this slide. This is this is Claude's view of what next year looks like. And I
65:01 view of what next year looks like. And I I was just like, what do you think it's
65:02 I was just like, what do you think it's going to look like? And it actually does
65:03 going to look like? And it actually does kind of look like this. Most of the
65:04 kind of look like this. Most of the words will be spelled correctly in in
65:06 words will be spelled correctly in in next year. But this is a lot prettier
65:09 next year. But this is a lot prettier than cloud code.
65:11 than cloud code. Yeah, this is what it has to look like.
65:14 Yeah, this is what it has to look like. Some form of a UI, not an IDE. This is
65:19 Some form of a UI, not an IDE. This is the new IDE. Okay. And people are
65:21 the new IDE. Okay. And people are building it. In fact, I think the
65:23 building it. In fact, I think the company that's the furthest along in
65:25 company that's the furthest along in this is Replet, who just talked to you.
65:27 this is Replet, who just talked to you. I think it's amazing what they're doing.
65:28 I think it's amazing what they're doing. It's absolutely bravo, right? We should
65:31 It's absolutely bravo, right? We should not be all chasing tail lights and
65:33 not be all chasing tail lights and building command line interfaces
65:35 building command line interfaces anymore. All right. and and more
65:37 anymore. All right. and and more importantly, Claude Code and all of its,
65:40 importantly, Claude Code and all of its, you know, competitors, they're all doing
65:43 you know, competitors, they're all doing it wrong because they're building the
65:44 it wrong because they're building the world's biggest ant. Okay, this is my my
65:47 world's biggest ant. Okay, this is my my buddy Brendan Hopper at Commonwealth
65:48 buddy Brendan Hopper at Commonwealth Bank of Australia, right? He's like,
65:50 Bank of Australia, right? He's like, "Nature builds ant swarms and Claude
65:52 "Nature builds ant swarms and Claude Code built this huge muscular ant that's
65:54 Code built this huge muscular ant that's just going to bite you in half and take
65:55 just going to bite you in half and take all your resources, right? I mean, it's
65:57 all your resources, right? I mean, it's a serious problem, right? If I say
65:59 a serious problem, right? If I say please analyze this codebase, I, you
66:00 please analyze this codebase, I, you know, go to the expensive model." If I
66:02 know, go to the expensive model." If I say, "Is my git ignore file still
66:04 say, "Is my git ignore file still there?" I've also gone to the expensive
66:06 there?" I've also gone to the expensive model, right? Everything that you say
66:07 model, right? Everything that you say goes to the expensive model. So, what's
66:09 goes to the expensive model. So, what's going to happen? Whoa. What happened? Oh
66:11 going to happen? Whoa. What happened? Oh gosh,
66:13 gosh, my slides are all messed up now.
66:16 my slides are all messed up now. Can you guys see them?
66:18 Can you guys see them? >> No.
66:18 >> No. >> Oh, this always happens to me, man.
66:20 >> Oh, this always happens to me, man. There's something going on. All right.
66:22 There's something going on. All right. So, I thought of a really cool analogy
66:24 So, I thought of a really cool analogy called the diver the diver metaphor,
66:26 called the diver the diver metaphor, which is your context window is like an
66:27 which is your context window is like an oxygen tank. Okay. This is why these
66:30 oxygen tank. Okay. This is why these things are fundamentally wrong because
66:32 things are fundamentally wrong because you're sending a diver down into your
66:34 you're sending a diver down into your codebase underwater to swim around and
66:37 codebase underwater to swim around and take care of stuff for you. One diver
66:39 take care of stuff for you. One diver and we're like, we're going to give him
66:41 and we're like, we're going to give him a bigger tank. 1 million tokens. He's
66:44 a bigger tank. 1 million tokens. He's still going to run out of oxygen. Like
66:46 still going to run out of oxygen. Like you don't, right? You should send a
66:48 you don't, right? You should send a product manager diver down first
66:51 product manager diver down first and then a coding diver, right? And then
66:54 and then a coding diver, right? And then a review diver and a test diver and a
66:56 a review diver and a test diver and a get merge diver, etc. Right? Nobody's
66:58 get merge diver, etc. Right? Nobody's doing this. Everyone's building a bigger
67:00 doing this. Everyone's building a bigger diver. I don't know my slides are all
67:02 diver. I don't know my slides are all messed up. My my my talk is almost done.
67:04 messed up. My my my talk is almost done. But um what we do is as engineers, task
67:08 But um what we do is as engineers, task decomposition,
67:09 decomposition, successive refinement, components, black
67:11 successive refinement, components, black boxes. This is how it's going to be
67:13 boxes. This is how it's going to be built in the future. And it's going to
67:14 built in the future. And it's going to be built with lots and lots of agents,
67:17 be built with lots and lots of agents, not just one agent.
67:19 not just one agent. All right. Until then, I think we're out
67:21 All right. Until then, I think we're out of time, but so until then, learn cloud
67:23 of time, but so until then, learn cloud code. Give up your IDE. Swix told me he
67:26 code. Give up your IDE. Swix told me he wants some hot take, so I'll give you
67:27 wants some hot take, so I'll give you one. If you're using an IDE starting on,
67:31 one. If you're using an IDE starting on, I'll give you till January 1st.
67:34 I'll give you till January 1st. You're a bad engineer.
67:38 You're a bad engineer. There's your hot take. All right, folks.
67:41 There's your hot take. All right, folks. [applause]
67:45 All right, cheers. Well, that that was actually my talk. Um [clears throat]
67:47 actually my talk. Um [clears throat] uh uh learn coding agents and oh yeah,
67:49 uh uh learn coding agents and oh yeah, then there's this guy. Speaking of bad
67:51 then there's this guy. Speaking of bad engineers, so this is this is Jordan
67:54 engineers, so this is this is Jordan Hubard uh who uh who's at NVIDIA and he
67:56 Hubard uh who uh who's at NVIDIA and he tweeted LinkedIn a really nice post on
68:00 tweeted LinkedIn a really nice post on how to get the most out of agents and
68:01 how to get the most out of agents and this guy responded with this, right?
68:03 this guy responded with this, right? This is everyone in your or this is 60%
68:06 This is everyone in your or this is 60% of your org right here. This guy's not
68:08 of your org right here. This guy's not an outlier. Okay, the backlash is very
68:10 an outlier. Okay, the backlash is very real against this. Yeah. And this is
68:13 real against this. Yeah. And this is going to be a problem I'm not going to
68:14 going to be a problem I'm not going to I'm not going to share with you. I don't
68:15 I'm not going to share with you. I don't have time to share how to fix it, but
68:16 have time to share how to fix it, but it's something you should be aware of.
68:17 it's something you should be aware of. And anyway, I'm going to turn it over to
68:19 And anyway, I'm going to turn it over to my co-author, Jean. We had a lot to talk
68:21 my co-author, Jean. We had a lot to talk about. He's got a lot to go. So, let's
68:22 about. He's got a lot to go. So, let's turn it over to Jean.
68:23 turn it over to Jean. >> Yeah. Thank you, Steve.
68:24 >> Yeah. Thank you, Steve. >> Hi, buddy. [applause]
68:27 >> Hi, buddy. [applause] >> Yeah. By the way, um I have let me start
68:31 >> Yeah. By the way, um I have let me start off by introducing myself and then I
68:32 off by introducing myself and then I want to share a little bit about like
68:33 want to share a little bit about like what it's been working like uh what's
68:34 what it's been working like uh what's been like working with Steve on the VIP
68:36 been like working with Steve on the VIP coding book. Uh and so just a little bit
68:38 coding book. Uh and so just a little bit about myself. I've had the privilege of
68:39 about myself. I've had the privilege of studying high performing technology
68:40 studying high performing technology organizations for 26 years. And that was
68:43 organizations for 26 years. And that was a journey that started when I was a
68:44 a journey that started when I was a technical founder uh of a company called
68:46 technical founder uh of a company called Tripwire. I was there for 13 years. But
68:48 Tripwire. I was there for 13 years. But our mission was really to understand
68:49 our mission was really to understand these amazing high performing technology
68:51 these amazing high performing technology organizations. They had the best project
68:52 organizations. They had the best project due date performance and development,
68:53 due date performance and development, the best operational reliability and
68:55 the best operational reliability and stability and also the best posture
68:57 stability and also the best posture compliance uh security and compliance.
68:58 compliance uh security and compliance. So we want to understand how did those
69:00 So we want to understand how did those amazing organizations make their good to
69:01 amazing organizations make their good to great transformation. So we got
69:03 great transformation. So we got understand how to how other
69:04 understand how to how other organizations replicate those amazing
69:06 organizations replicate those amazing outcomes. And so you can imagine in that
69:07 outcomes. And so you can imagine in that 26 year journey there are many
69:08 26 year journey there are many surprises. Among the biggest surprise
69:10 surprises. Among the biggest surprise was how it took me into the middle of
69:11 was how it took me into the middle of the DevOps movement which is so uh
69:13 the DevOps movement which is so uh amazing because it reshaped technology
69:15 amazing because it reshaped technology organizations. you know, it changed how
69:16 organizations. you know, it changed how test and operations worked, information
69:18 test and operations worked, information security. Um, and I thought that would
69:20 security. Um, and I thought that would be the most exciting adventure I'd be on
69:22 be the most exciting adventure I'd be on in my career until I met Steve Yagi in
69:24 in my career until I met Steve Yagi in person. And so, I've admired his work
69:26 person. And so, I've admired his work for over 11 years. And so, some of you
69:28 for over 11 years. And so, some of you may have read this memo of Jeff Bezos's
69:31 may have read this memo of Jeff Bezos's most audacious memo of how in early
69:33 most audacious memo of how in early 2000s they transformed from a gigantic
69:35 2000s they transformed from a gigantic monolith that coupled 3,500 engineers
69:37 monolith that coupled 3,500 engineers together, so none of them had
69:38 together, so none of them had independent action. And uh he talked
69:41 independent action. And uh he talked about how all teams must henceforth
69:43 about how all teams must henceforth communicate and coordinate only through
69:44 communicate and coordinate only through APIs. No back doors allowed. Right? Uh
69:46 APIs. No back doors allowed. Right? Uh anyone who doesn't do this will be
69:47 anyone who doesn't do this will be fired. Thank you and have a nice day.
69:49 fired. Thank you and have a nice day. And the amazing person who chronicled
69:50 And the amazing person who chronicled says number seven is obviously a joke
69:53 says number seven is obviously a joke because Bezos doesn't care whether you
69:54 because Bezos doesn't care whether you have a good day or not. And this is
69:56 have a good day or not. And this is actually enforced by Amazon CIO then
69:58 actually enforced by Amazon CIO then Rick Del. And so it turns out this memo
70:00 Rick Del. And so it turns out this memo that I've been quoting for 11 years uh
70:02 that I've been quoting for 11 years uh was written by Steve Yaggi uh which was
70:04 was written by Steve Yaggi uh which was meant to be a private uh memo on Google+
70:07 meant to be a private uh memo on Google+ which was made public which landed him
70:09 which was made public which landed him on the front page of the Wall Street
70:10 on the front page of the Wall Street Journal. Um and so I finally met him in
70:13 Journal. Um and so I finally met him in uh June and it turns out that we had
70:15 uh June and it turns out that we had many things in common. Uh but one of
70:16 many things in common. Uh but one of them was this uh love of AI and this
70:18 them was this uh love of AI and this sense that AI was going to shape coding
70:21 sense that AI was going to shape coding from underneath us. And so one of our
70:24 from underneath us. And so one of our beliefs is that uh the AI will reshape
70:26 beliefs is that uh the AI will reshape technology organizations you know maybe
70:28 technology organizations you know maybe even 100 times larger than what agile
70:30 even 100 times larger than what agile cloud CI/CD and mobile did you know 10
70:33 cloud CI/CD and mobile did you know 10 years ago. Um and that these technology
70:35 years ago. Um and that these technology breakthroughs not just reshape
70:36 breakthroughs not just reshape organizations but they reshape the
70:37 organizations but they reshape the entire economy. the entire economy
70:39 entire economy. the entire economy rearranges itself to take advantages of
70:41 rearranges itself to take advantages of these you know wild new better ways of
70:43 these you know wild new better ways of uh uh producing things and and uh so
70:45 uh uh producing things and and uh so over the last year and a half we've had
70:47 over the last year and a half we've had a chance to look at these case studies I
70:48 a chance to look at these case studies I think give us a glimpse of what these uh
70:51 think give us a glimpse of what these uh what the shape of technology
70:52 what the shape of technology organizations look like and so I'm going
70:53 organizations look like and so I'm going to share with that what we've learned
70:55 to share with that what we've learned but here's maybe a hint so some of you
70:57 but here's maybe a hint so some of you may know the work of Aiden Cochra he was
70:58 may know the work of Aiden Cochra he was a cloud architect at Netflix right he
71:00 a cloud architect at Netflix right he was what who drove uh the uh entire
71:03 was what who drove uh the uh entire Netflix infrastructure from a data
71:04 Netflix infrastructure from a data center uh back in 2009 to running
71:07 center uh back in 2009 to running entirely in the as cloud and so he wrote
71:09 entirely in the as cloud and so he wrote uh some months ago in 2011 some people
71:12 uh some months ago in 2011 some people got very upset in uh infrastructure and
71:14 got very upset in uh infrastructure and operations because they called it
71:15 operations because they called it noopops right and everyone laughed back
71:17 noopops right and everyone laughed back then but he said oh don't you know uh
71:20 then but he said oh don't you know uh it's happening again this time it might
71:21 it's happening again this time it might be called no dev right not so funny now
71:24 be called no dev right not so funny now right so it's it's interesting right
71:26 right so it's it's interesting right because we heard this amazing
71:27 because we heard this amazing presentation from zapier about like how
71:29 presentation from zapier about like how support ships and turns out designers
71:31 support ships and turns out designers are shipping UX is shipping right anyone
71:33 are shipping UX is shipping right anyone who's been frustrated by developers uh
71:35 who's been frustrated by developers uh who you know say get in line and you
71:36 who you know say get in line and you have to wait quarters or years or maybe
71:38 have to wait quarters or years or maybe never, right, are now suddenly in
71:40 never, right, are now suddenly in position where you can actually vibe
71:41 position where you can actually vibe code your own features into production,
71:43 code your own features into production, right? And that reshapes technology
71:44 right? And that reshapes technology organizations and it reshapes, you know,
71:46 organizations and it reshapes, you know, potentially the entire economy. And so,
71:48 potentially the entire economy. And so, uh, uh, Steve and I, we've had the
71:49 uh, uh, Steve and I, we've had the privilege of watching what happens, you
71:51 privilege of watching what happens, you know, when we change, uh, you know, the
71:53 know, when we change, uh, you know, the way we, uh, deploy, right? It wasn't so
71:55 way we, uh, deploy, right? It wasn't so long ago and 10 years ago, uh, I wrote a
71:57 long ago and 10 years ago, uh, I wrote a book called the Phoenix Project where it
71:59 book called the Phoenix Project where it was all about the catastrophic
72:00 was all about the catastrophic deployment. Would you believe, uh, that
72:02 deployment. Would you believe, uh, that it was, you know, 10 years ago, 15 years
72:04 it was, you know, 10 years ago, 15 years ago, most organizations shipped once a
72:06 ago, most organizations shipped once a year, right? Right. And so I got to work
72:07 year, right? Right. And so I got to work on a project called the state of DevOps
72:09 on a project called the state of DevOps research. It was a cross population
72:10 research. It was a cross population study that spanned 36,000 respondents uh
72:13 study that spanned 36,000 respondents uh from 2013 to 2019. And what we found uh
72:16 from 2013 to 2019. And what we found uh this was Dr. Nicole Forsgrren and Jez
72:18 this was Dr. Nicole Forsgrren and Jez Humble. Um and what we found was that
72:20 Humble. Um and what we found was that these high performers ship multiple
72:21 these high performers ship multiple times a day, right? They can ship in one
72:23 times a day, right? They can ship in one hour or less. And you know back in 2009,
72:26 hour or less. And you know back in 2009, people thought, "Oh my gosh, multiple
72:27 people thought, "Oh my gosh, multiple deployments per day, right? That's
72:28 deployments per day, right? That's reckless and irresponsible, maybe even
72:30 reckless and irresponsible, maybe even immoral, right? What sort of maniac
72:31 immoral, right? What sort of maniac would deploy multiple times a day,
72:33 would deploy multiple times a day, right? And yet it's very common place
72:35 right? And yet it's very common place these days. In fact, if you want to have
72:36 these days. In fact, if you want to have great reliability profiles, if you want
72:37 great reliability profiles, if you want to have short meantime prepare, you have
72:39 to have short meantime prepare, you have to do smaller deployments more
72:40 to do smaller deployments more frequently. And I think we're now seeing
72:42 frequently. And I think we're now seeing these kind of case studies that show
72:43 these kind of case studies that show that this better way of coding, right,
72:45 that this better way of coding, right, where you don't type in code by hand
72:47 where you don't type in code by hand might be, you know, just a vastly better
72:49 might be, you know, just a vastly better way uh to create value. And so our
72:51 way uh to create value. And so our definition of vibe coding that we put
72:52 definition of vibe coding that we put into the uh V coding book was that it's
72:54 into the uh V coding book was that it's basically anything where you don't type
72:56 basically anything where you don't type in code by hand. And so for some of
72:58 in code by hand. And so for some of those of you who don't understand that,
72:59 those of you who don't understand that, that's like sort of a uh typing an ID
73:01 that's like sort of a uh typing an ID hunched over, right? And you're actually
73:02 hunched over, right? And you're actually moving your fingers, right? That's sort
73:04 moving your fingers, right? That's sort of like how some people go into a dark
73:06 of like how some people go into a dark room to develop photographs, right?
73:07 room to develop photographs, right? Believe it or not, some people still do
73:08 Believe it or not, some people still do that. Um and and what I that's a great
73:11 that. Um and and what I that's a great definition that we uh loved until uh Dar
73:14 definition that we uh loved until uh Dar Amade u uh CEO and co-founder of um
73:18 Amade u uh CEO and co-founder of um Anthropic, he gave us an even better
73:19 Anthropic, he gave us an even better definition, right? The vibe coding is
73:21 definition, right? The vibe coding is really the iterative conversation uh
73:23 really the iterative conversation uh that results in AI writing your code.
73:25 that results in AI writing your code. And he said it's on one hand a beautiful
73:27 And he said it's on one hand a beautiful term because it evokes this different
73:28 term because it evokes this different way of coding but he said it's also
73:31 way of coding but he said it's also somewhat misleading because it sounds
73:32 somewhat misleading because it sounds jokey right uh but he said you know
73:35 jokey right uh but he said you know adanthropic there's no other game in
73:36 adanthropic there's no other game in town right and I just thought that was
73:37 town right and I just thought that was just a beautiful way to evoke you know
73:39 just a beautiful way to evoke you know how important uh vibe coding is uh this
73:42 how important uh vibe coding is uh this is Dr. Eric Meyer um you he's probably
73:44 is Dr. Eric Meyer um you he's probably considered one of the greatest
73:45 considered one of the greatest programming language designers of all
73:47 programming language designers of all time. Uh he was part of Visual Basic, C
73:49 time. Uh he was part of Visual Basic, C link, Haskell. He created the hack
73:51 link, Haskell. He created the hack programming language uh that migrated
73:53 programming language uh that migrated millions of lines of code at Meta, you
73:55 millions of lines of code at Meta, you know, within a year uh bringing static
73:57 know, within a year uh bringing static type checking to a bunch of PHP
73:59 type checking to a bunch of PHP programmers and he said we are probably
74:01 programmers and he said we are probably going to be the last generation of
74:02 going to be the last generation of developers uh to write code by hand. So
74:05 developers uh to write code by hand. So let's have fun doing it. Um so one of
74:08 let's have fun doing it. Um so one of the things and uh when uh Steve and I
74:09 the things and uh when uh Steve and I started working on the book last
74:10 started working on the book last November was uh watching him spend
74:12 November was uh watching him spend hundreds of dollars a day on coding
74:14 hundreds of dollars a day on coding agents uh and just seemed so strange
74:17 agents uh and just seemed so strange right um you know and so he's maxing out
74:19 right um you know and so he's maxing out not just you know the uh the monthly
74:21 not just you know the uh the monthly subscriptions right but he's actually
74:23 subscriptions right but he's actually you know going way above and beyond that
74:25 you know going way above and beyond that and yet uh you know things that we're
74:27 and yet uh you know things that we're hearing now is that as an engineer part
74:29 hearing now is that as an engineer part of my job is that I need to be spending
74:30 of my job is that I need to be spending as much on tokens per day as my salary
74:33 as much on tokens per day as my salary right so you know that think about like
74:35 right so you know that think about like $500 to $1,000 a day, right? Because
74:37 $500 to $1,000 a day, right? Because this is the mechanical advantage, the
74:38 this is the mechanical advantage, the cognitive advantage that these tools are
74:40 cognitive advantage that these tools are giving us, right? And as an engineer,
74:41 giving us, right? And as an engineer, right, I'm going to challenge myself,
74:42 right, I'm going to challenge myself, you know, to get that kind of value to
74:44 you know, to get that kind of value to deliver value to people who matter. Um,
74:46 deliver value to people who matter. Um, and so in the book, we talk about, you
74:48 and so in the book, we talk about, you know, why people would do this, right?
74:50 know, why people would do this, right? And [snorts] the, uh, acronym we came up
74:51 And [snorts] the, uh, acronym we came up with FAFO, right? Uh, the most obvious
74:54 with FAFO, right? Uh, the most obvious one is F for faster, right? Yeah, that's
74:56 one is F for faster, right? Yeah, that's obviously true, but I think it's the
74:57 obviously true, but I think it's the most superficial and um part of why we
75:01 most superficial and um part of why we do this because uh the second one is it
75:03 do this because uh the second one is it lets us do more ambitious things, right?
75:06 lets us do more ambitious things, right? Uh the impossible becomes possible. Uh
75:08 Uh the impossible becomes possible. Uh so that's one end of the spectrum. On
75:10 so that's one end of the spectrum. On the other end of the spectrum, you know,
75:11 the other end of the spectrum, you know, the uh the tedious and small tasks
75:13 the uh the tedious and small tasks become free. One of the things I uh the
75:16 become free. One of the things I uh the uh interview of the cloud code team that
75:18 uh interview of the cloud code team that I just loved was uh I think it was
75:20 I just loved was uh I think it was Katherine she said um uh one of the
75:22 Katherine she said um uh one of the things we've noticed is that you know
75:24 things we've noticed is that you know when customer issues come up uh instead
75:26 when customer issues come up uh instead of putting them on a jur backlog and you
75:28 of putting them on a jur backlog and you know arguing about it in the grooming
75:30 know arguing about it in the grooming sessions and so forth right we just fix
75:31 sessions and so forth right we just fix it on the spot right and ship to
75:33 it on the spot right and ship to production or whatever um you know
75:34 production or whatever um you know within 30 minutes right and so yes it
75:36 within 30 minutes right and so yes it gets recorded but you know that whole
75:38 gets recorded but you know that whole sort of coordination cost you know just
75:40 sort of coordination cost you know just disappears right so again the impossible
75:41 disappears right so again the impossible becomes possible right and uh the
75:44 becomes possible right and uh the annoying things just become free. The
75:46 annoying things just become free. The second A is uh um you know the ability
75:49 second A is uh um you know the ability to do things alone or more autonomously,
75:52 to do things alone or more autonomously, right? And so um you know there's really
75:54 right? And so um you know there's really two coordination costs are being
75:56 two coordination costs are being alleviated here. One is you know if you
75:58 alleviated here. One is you know if you ever have to wait for a developer or a
76:00 ever have to wait for a developer or a team of developers, you know, to do what
76:02 team of developers, you know, to do what you need to do, right? You have to
76:04 you need to do, right? You have to communicate and coordinate and
76:05 communicate and coordinate and synchronize and prioritize and cajul and
76:07 synchronize and prioritize and cajul and escalate, you know, do all sorts of
76:09 escalate, you know, do all sorts of things to get them to care about the
76:10 things to get them to care about the problem just as much as you do, right?
76:12 problem just as much as you do, right? And you know now you know with these
76:14 And you know now you know with these amazing new miraculous technologies you
76:16 amazing new miraculous technologies you can do them by yourself right so that's
76:18 can do them by yourself right so that's one coordination co uh tax the other one
76:20 one coordination co uh tax the other one is that even if you get someone to uh
76:22 is that even if you get someone to uh care about a problem as much as you uh
76:24 care about a problem as much as you uh they can't read your mind right and what
76:26 they can't read your mind right and what we're finding is that these LLMs are
76:27 we're finding is that these LLMs are just amazing intermediation vehicles
76:29 just amazing intermediation vehicles right um you know just through an LLM
76:32 right um you know just through an LLM you can coordinate with other functional
76:34 you can coordinate with other functional specialties right through a markdown
76:35 specialties right through a markdown file right that's not the end right but
76:37 file right that's not the end right but it's just this amazing way uh to have
76:39 it's just this amazing way uh to have these high bandwidth coordination so
76:41 these high bandwidth coordination so that you can essentially read each
76:42 that you can essentially read each other's minds, you know, because shared
76:44 other's minds, you know, because shared outcomes require shared goals and shared
76:45 outcomes require shared goals and shared understanding. The second F is fun,
76:48 understanding. The second F is fun, right? As that Steve says, vibe coding
76:50 right? As that Steve says, vibe coding is addictive. This is so true. I mean, I
76:52 is addictive. This is so true. I mean, I cannot I think what I love about the
76:54 cannot I think what I love about the book is that it's a story about two guys
76:56 book is that it's a story about two guys who both thought their best days of
76:57 who both thought their best days of coding were behind them, right? And
76:59 coding were behind them, right? And found that, you know, it's entirely the
77:01 found that, you know, it's entirely the opposite. Um, and I've had so much fun
77:04 opposite. Um, and I've had so much fun and uh, you know, I'm having to force
77:05 and uh, you know, I'm having to force myself to go to sleep at night because
77:07 myself to go to sleep at night because otherwise I'd be up till 2 or 3 in the
77:09 otherwise I'd be up till 2 or 3 in the morning every night. uh and you know so
77:11 morning every night. uh and you know so it's not all great but it certainly
77:13 it's not all great but it certainly beats being boring or tedious or you
77:15 beats being boring or tedious or you know horrible and then optionality you
77:18 know horrible and then optionality you know one of the things that uh I love
77:19 know one of the things that uh I love about Swiss is that he has a shared love
77:21 about Swiss is that he has a shared love of creating option value and he told us
77:23 of creating option value and he told us last night that option value is also
77:25 last night that option value is also important for poker players right
77:26 important for poker players right because you never want to paint yourself
77:27 because you never want to paint yourself in a corner so option value is um one of
77:30 in a corner so option value is um one of the biggest creators of economic value
77:33 the biggest creators of economic value right modularity the reason why it's so
77:35 right modularity the reason why it's so powerful is because it creates option
77:37 powerful is because it creates option value uh and so just the fact that you
77:38 value uh and so just the fact that you can have so many more swings of bat can
77:40 can have so many more swings of bat can do so many more parallel experiments,
77:41 do so many more parallel experiments, right? This is what v coding allows. So
77:43 right? This is what v coding allows. So this is gives us confidence that you
77:45 this is gives us confidence that you know this is not just uh this is a very
77:47 know this is not just uh this is a very powerful tool. Um uh here's the quote
77:50 powerful tool. Um uh here's the quote from Andy Glover that uh Steve Yaggi
77:52 from Andy Glover that uh Steve Yaggi said is that you know as um for people
77:55 said is that you know as um for people who have this aha moment and and
77:56 who have this aha moment and and positioned uh you know I think the
77:58 positioned uh you know I think the instinct is how do we elevate everyone's
78:00 instinct is how do we elevate everyone's productivity to be as productive as you
78:02 productivity to be as productive as you are now being um you know that since
78:04 are now being um you know that since you've had your aha moment. So uh let me
78:08 you've had your aha moment. So uh let me share with you maybe some of our top
78:10 share with you maybe some of our top kind of uh exciting case studies that
78:12 kind of uh exciting case studies that kind of give us a hint of the future. So
78:14 kind of give us a hint of the future. So uh I've run into this conference called
78:15 uh I've run into this conference called the enterprise technology leadership
78:16 the enterprise technology leadership summit for uh 11 years now and Swix we
78:19 summit for uh 11 years now and Swix we had uh the honor of having Swix there
78:21 had uh the honor of having Swix there talking about the rise of the AI
78:23 talking about the rise of the AI engineer just this amazing
78:24 engineer just this amazing prognostication. Uh this year we had a
78:26 prognostication. Uh this year we had a series of amazing uh case studies. One
78:28 series of amazing uh case studies. One was uh Bruno Pasos. He spoke this year
78:30 was uh Bruno Pasos. He spoke this year uh last year at this conference and he
78:32 uh last year at this conference and he presented on uh their in their evolving
78:35 presented on uh their in their evolving experiment to elevate developer
78:36 experiment to elevate developer productivity across 3,000 developers. Um
78:39 productivity across 3,000 developers. Um and this is at Booking.com, the world's
78:41 and this is at Booking.com, the world's largest travel agency and they're
78:43 largest travel agency and they're finding that they're getting double-
78:44 finding that they're getting double- digit increase in productivity, right?
78:45 digit increase in productivity, right? Uh mergers are going in quicker, peer
78:48 Uh mergers are going in quicker, peer review times are uh smaller and and so
78:50 review times are uh smaller and and so forth, right? And so that's just we feel
78:52 forth, right? And so that's just we feel like that's a incomplete view of uh what
78:55 like that's a incomplete view of uh what people are achieving. Uh this is Shri
78:57 people are achieving. Uh this is Shri Balakrishnan. uh he was head of product
78:59 Balakrishnan. uh he was head of product and technology at uh Travelopia. Uh so
79:01 and technology at uh Travelopia. Uh so they're a $ 1.5 billion a year uh travel
79:04 they're a $ 1.5 billion a year uh travel company and one of the things that uh he
79:06 company and one of the things that uh he said is that uh you know they were able
79:08 said is that uh you know they were able to uh replace a legacy application uh in
79:11 to uh replace a legacy application uh in six weeks with a pair of uh with a very
79:13 six weeks with a pair of uh with a very small team. In fact, one of his uh
79:15 small team. In fact, one of his uh conclusions is that before we would need
79:17 conclusions is that before we would need a team of eight people to do something
79:19 a team of eight people to do something meaningful, right? Six developers, a UX
79:21 meaningful, right? Six developers, a UX person and a product owner. and he said
79:23 person and a product owner. and he said maybe these days it might be two a
79:25 maybe these days it might be two a developer and you know a a domain expert
79:27 developer and you know a a domain expert in other words as Kent Beck said a
79:29 in other words as Kent Beck said a person with a problem and a person who
79:30 person with a problem and a person who can solve it right maybe maybe a pair of
79:34 can solve it right maybe maybe a pair of those teams right and so that's going to
79:35 those teams right and so that's going to reshape I think you know how they can go
79:38 reshape I think you know how they can go further and faster uh so again maybe
79:40 further and faster uh so again maybe just a hint of what teams will look like
79:42 just a hint of what teams will look like in the future this is the one that
79:43 in the future this is the one that excites me most this is Dr. top pal uh
79:46 excites me most this is Dr. top pal uh he helped drive the DevOps move in at
79:47 he helped drive the DevOps move in at Capital One um and he's now at uh
79:50 Capital One um and he's now at uh Fidelity and so um among other things he
79:54 Fidelity and so um among other things he owns an application uh that is the
79:56 owns an application uh that is the application you go to ask which
79:58 application you go to ask which applications you know the 25,000
79:59 applications you know the 25,000 applications there have log 4J right and
80:02 applications there have log 4J right and uh it's his team and he's had this
80:04 uh it's his team and he's had this vision of what this application should
80:06 vision of what this application should look like uh but every time he asked
80:08 look like uh but every time he asked like can can we build it his team would
80:09 like can can we build it his team would say it would take about five months
80:11 say it would take about five months right and we'd hire need to hire a a
80:13 right and we'd hire need to hire a a front-end person and he got so
80:14 front-end person and he got so frustrated that he spent five days just
80:17 frustrated that he spent five days just vibe coding it by himself right uh you
80:19 vibe coding it by himself right uh you know directly accessing read only the
80:21 know directly accessing read only the Neo4j uh database um and put it into
80:24 Neo4j uh database um and put it into production right and so I think we're
80:25 production right and so I think we're seeing a world where um you know leaders
80:29 seeing a world where um you know leaders even leaders with their own teams are
80:31 even leaders with their own teams are frustrated saying hey I can do this uh
80:33 frustrated saying hey I can do this uh can I do this better myself not better
80:35 can I do this better myself not better just can I prove that it can be done and
80:37 just can I prove that it can be done and uh by the way what happened afterwards
80:39 uh by the way what happened afterwards um he was looking around who can help me
80:40 um he was looking around who can help me maintain my application production and
80:42 maintain my application production and all the senior engineers like not
80:44 all the senior engineers like not So enter uh Swathy the most junior
80:47 So enter uh Swathy the most junior engineer on the team uh who is helping
80:49 engineer on the team uh who is helping maintain this application and probably
80:50 maintain this application and probably outarning you know everybody in the
80:52 outarning you know everybody in the organization
80:54 organization uh and interestingly uh he he's also
80:56 uh and interestingly uh he he's also getting more headcount because the
80:58 getting more headcount because the number of consumers of this application
80:59 number of consumers of this application just increased by 10fold right so who
81:01 just increased by 10fold right so who saw that coming right um so uh here's
81:05 saw that coming right um so uh here's John Rouser he's senior director of
81:06 John Rouser he's senior director of engineering at Cisco security and he
81:08 engineering at Cisco security and he convinces SVP to um require 100 of the
81:12 convinces SVP to um require 100 of the top leaders inside of Cisco security to
81:14 top leaders inside of Cisco security to vibe code one feature into production in
81:16 vibe code one feature into production in a quarter that ended last month, right?
81:19 a quarter that ended last month, right? And so um you know we're actually
81:21 And so um you know we're actually getting a chance to be able to survey
81:23 getting a chance to be able to survey those people, right? Who finished? Uh
81:25 those people, right? Who finished? Uh you know uh how many completed, didn't
81:28 you know uh how many completed, didn't complete, partially completed, etc. And
81:30 complete, partially completed, etc. And of those who completed, right, what was
81:32 of those who completed, right, what was what aha moment did they have as a
81:34 what aha moment did they have as a leader? What's the magnitude and
81:36 leader? What's the magnitude and direction of what they want to do? And
81:37 direction of what they want to do? And so we're going to go in and study that.
81:38 so we're going to go in and study that. And I just I my prediction is that we're
81:40 And I just I my prediction is that we're going to see parts of that organization
81:42 going to see parts of that organization get reshaped as leaders realize kind of
81:45 get reshaped as leaders realize kind of what's possible. Everything from
81:46 what's possible. Everything from strategy to processes and so forth. And
81:49 strategy to processes and so forth. And so let me just share with you one um you
81:51 so let me just share with you one um you know thing that really excites me which
81:53 know thing that really excites me which is uh I got a chance to uh get back into
81:55 is uh I got a chance to uh get back into the state of DevOps research the Dora
81:57 the state of DevOps research the Dora study with uh u the Google cloud team
82:00 study with uh u the Google cloud team and one of the things that didn't make
82:01 and one of the things that didn't make into the report that I just found really
82:03 into the report that I just found really exciting was around this. It was like
82:06 exciting was around this. It was like what how much do people trust AI? And
82:08 what how much do people trust AI? And we're using a very strange definition of
82:10 we're using a very strange definition of trust, which is to what degree can I
82:12 trust, which is to what degree can I predict how the other party will act and
82:13 predict how the other party will act and react, right? Because the more you trust
82:15 react, right? Because the more you trust the other party, right, you can give
82:16 the other party, right, you can give them bigger requests, you can use fewer
82:18 them bigger requests, you can use fewer words, you have less need for feedback,
82:20 words, you have less need for feedback, right? It's the whole notion of finger
82:21 right? It's the whole notion of finger spits and fuel, right? Like you know,
82:23 spits and fuel, right? Like you know, how many of the 10,000 hours that
82:25 how many of the 10,000 hours that requires to be good at anything have you
82:26 requires to be good at anything have you used to get good at AI? And one of the
82:29 used to get good at AI? And one of the stunning findings was that it's this
82:31 stunning findings was that it's this line. So on the x-axis is how long have
82:33 line. So on the x-axis is how long have you been using AI tools? Y is how much
82:36 you been using AI tools? Y is how much do you trust it? Right? And the longer
82:37 do you trust it? Right? And the longer you use AI, right, the more you trust
82:40 you use AI, right, the more you trust it, right? So every every person who
82:41 it, right? So every every person who says, "I tried it and it's terrible at
82:43 says, "I tried it and it's terrible at coding," right? On what basis did they
82:46 coding," right? On what basis did they make that conclusion after maybe using
82:48 make that conclusion after maybe using for an hour or two? And what this shows
82:51 for an hour or two? And what this shows us is that uh you know it requires
82:52 us is that uh you know it requires practice, right? And this is probably a
82:54 practice, right? And this is probably a teachable skill. Um so length of time on
82:58 teachable skill. Um so length of time on the x-axis is a very incomplete
83:00 the x-axis is a very incomplete expression, right? It's like frequency
83:01 expression, right? It's like frequency and intensity and how many hours, but
83:03 and intensity and how many hours, but it's there's signal there. So it just
83:05 it's there's signal there. So it just shows that uh you know part of your job
83:07 shows that uh you know part of your job is to help other people have the aha
83:09 is to help other people have the aha moment and then help them you practice
83:11 moment and then help them you practice right so they get very very good at it
83:13 right so they get very very good at it so they can use every one of these
83:14 so they can use every one of these amazing technologies to achieve their
83:17 amazing technologies to achieve their goals. So uh I'll leave you with one
83:20 goals. So uh I'll leave you with one last kind of vision. Stephen and I we
83:22 last kind of vision. Stephen and I we did a vibe coding workshop for leaders
83:24 did a vibe coding workshop for leaders um back six weeks ago and what was
83:27 um back six weeks ago and what was amazing to me was in the 3 hours we had
83:31 amazing to me was in the 3 hours we had a 100% completion rate. Everyone built
83:33 a 100% completion rate. Everyone built something, you know, they built a data
83:34 something, you know, they built a data visualization tool. In fact, uh one
83:36 visualization tool. In fact, uh one person uh built a an iOS app and another
83:40 person uh built a an iOS app and another person actually got it into the review
83:41 person actually got it into the review queue in the Apple iOS app store, right?
83:44 queue in the Apple iOS app store, right? Which is which is absolutely
83:45 Which is which is absolutely astonishing. Uh and here's a guy named
83:47 astonishing. Uh and here's a guy named Roger Safner. He said, "I used to be a C
83:50 Roger Safner. He said, "I used to be a C MVP way back in the day. I haven't coded
83:52 MVP way back in the day. I haven't coded in 15 years." Uh and he's showing off an
83:55 in 15 years." Uh and he's showing off an app that helped him automate the process
83:57 app that helped him automate the process of getting checked in to Southwest
83:58 of getting checked in to Southwest Airlines until the bot detection tools
84:00 Airlines until the bot detection tools cut him off. But look at look at the
84:02 cut him off. But look at look at the expression on his face. And so I think
84:03 expression on his face. And so I think uh what we're seeing is like what
84:04 uh what we're seeing is like what happens when support ships right and
84:07 happens when support ships right and support codes and ships when leaders
84:08 support codes and ships when leaders code and ship. And there's no doubt in
84:09 code and ship. And there's no doubt in my mind that this will reshape uh
84:11 my mind that this will reshape uh technology organizations. If you're one
84:13 technology organizations. If you're one of those, Stephen and I want to talk to
84:14 of those, Stephen and I want to talk to you, right? Because you are on the
84:15 you, right? Because you are on the frontier of something really, really
84:17 frontier of something really, really important. I'll share with you a couple
84:18 important. I'll share with you a couple quotes. Here's a technology leader. When
84:20 quotes. Here's a technology leader. When I told my team that I wrote an app that,
84:22 I told my team that I wrote an app that, you know, an AI wrote 60,000 lines of
84:24 you know, an AI wrote 60,000 lines of code and I haven't looked at any of it,
84:26 code and I haven't looked at any of it, they all looked at me as if they wished
84:27 they all looked at me as if they wished I were dead.
84:30 I were dead. Um, we've uh we've had these stupid
84:32 Um, we've uh we've had these stupid problems in legacy applications that
84:34 problems in legacy applications that have been there for over a decade. We
84:36 have been there for over a decade. We got a group of senior engineers
84:37 got a group of senior engineers together. We used AI to generate a fix
84:39 together. We used AI to generate a fix and we submitted PR and the team
84:41 and we submitted PR and the team accepted it. Right? Unlike the time when
84:43 accepted it. Right? Unlike the time when they said it was AI generated and they
84:45 they said it was AI generated and they rejected it as AI slop, right? So this
84:48 rejected it as AI slop, right? So this is maybe happening in your
84:49 is maybe happening in your organizations. Um, our code velocity is
84:51 organizations. Um, our code velocity is so high. Uh, we've concluded that we can
84:53 so high. Uh, we've concluded that we can only have one engineer per repo, right?
84:55 only have one engineer per repo, right? Because of merge conflicts, right? We
84:58 Because of merge conflicts, right? We haven't figured out the coordination
84:58 haven't figured out the coordination cost uh mechanism yet. And so like all
85:01 cost uh mechanism yet. And so like all these were some of the lessons that went
85:02 these were some of the lessons that went into the vibe coding book. Thank you for
85:04 into the vibe coding book. Thank you for everyone who were at the signing
85:05 everyone who were at the signing yesterday. And uh if you're interested
85:07 yesterday. And uh if you're interested in any of the talks we referenced in
85:09 in any of the talks we referenced in excerpts of our book in uh basically uh
85:12 excerpts of our book in uh basically uh all the links that uh are in this
85:14 all the links that uh are in this presentation, just send an email to real
85:16 presentation, just send an email to real gene cams.com
85:17 gene cams.com subjectline vibe and you'll get an
85:19 subjectline vibe and you'll get an automated response in a minute or two.
85:20 automated response in a minute or two. So with that, Steve and I thank you for
85:22 So with that, Steve and I thank you for your time and we were around all week.
85:24 your time and we were around all week. Thanks all. [applause]
85:35 [music] >> Ladies and gentlemen, please welcome
85:37 >> Ladies and gentlemen, please welcome back to the stage, Alex Lieberman.
85:41 back to the stage, Alex Lieberman. [music] Let's give it up again for
85:42 [music] Let's give it up again for Steven Jean and also the rest of the
85:45 Steven Jean and also the rest of the speakers from the morning session.
85:47 speakers from the morning session. Whether you are watching in person or on
85:50 Whether you are watching in person or on YouTube or on the AIE site, you've been
85:54 YouTube or on the AIE site, you've been breaking a mental sweat. So, we are
85:55 breaking a mental sweat. So, we are going to take a 30 minute break, get
85:57 going to take a 30 minute break, get some grub, get some coffee, recharge,
85:59 some grub, get some coffee, recharge, and we will see you back here at 11.
86:01 and we will see you back here at 11. Thanks everyone. Appreciate it.
86:04 Thanks everyone. Appreciate it. [applause]
86:25 Two flames lit the darkness, burning side by [music and singing] side. Both
86:27 side by [music and singing] side. Both sworn to creation. Both relentless in
86:30 sworn to creation. Both relentless in their stride. One walked through the
86:33 their stride. One walked through the mountains, [music] one soared across the
86:35 mountains, [music] one soared across the void. Both chasing the horizon of the
86:39 void. Both chasing the horizon of the worlds they would deploy. [music]
86:41 worlds they would deploy. [music] But the path is not a straight line. And
86:44 But the path is not a straight line. And the future is not flat. Some rules bend
86:47 the future is not flat. Some rules bend through [music] space time and some
86:49 through [music] space time and some break [singing] on impact. Effort is a
86:53 break [singing] on impact. Effort is a kingdom. [music]
86:54 kingdom. [music] Leverage is the key. One builds the
86:56 Leverage is the key. One builds the throne by hand. One shapes [music]
86:59 throne by hand. One shapes [music] reality.
87:07 There is a curvature of time. Not [music] a race, not a throne, but a
87:10 [music] a race, not a throne, but a shift in the dimension of how progress
87:13 shift in the dimension of how progress becomes known. [music] When the universe
87:16 becomes known. [music] When the universe is standing to the will inside the mind,
87:19 is standing to the will inside the mind, you don't win by moving faster. You win
87:23 you don't win by moving faster. You win by [music]
87:24 by [music] breaing
87:26 breaing time.
87:39 Holes of the past try to drag the present [music] down. Systems built on
87:42 present [music] down. Systems built on dust [singing] wearing yesterday as
87:44 dust [singing] wearing yesterday as [music] crown. Some are pulled beneath
87:47 [music] crown. Some are pulled beneath them, [singing] fighting gravity alone.
87:50 them, [singing] fighting gravity alone. Others learn to map the edges and escape
87:54 Others learn to map the edges and escape events horizons. Not all power [music]
87:57 events horizons. Not all power [music] is struggle. [singing] Not all mastery
87:59 is struggle. [singing] Not all mastery is pain. The ones who change direction.
88:02 is pain. The ones who change direction. Rewrite the laws of the game. You can
88:06 Rewrite the laws of the game. You can live your life in [music and singing]
88:07 live your life in [music and singing] labor or an impact that compounds. Every
88:11 labor or an impact that compounds. Every second can be linear or worth a thousand
88:14 second can be linear or worth a thousand rounds.
88:21 There is a curvature of time, [music] not a race, not a throne, but a shift in
88:25 not a race, not a throne, but a shift in the dimension of how [music] progress
88:27 the dimension of how [music] progress becomes known. When the universe is
88:30 becomes known. When the universe is bending to the will inside the mind, you
88:34 bending to the will inside the mind, you don't win by moving [music] faster. You
88:37 don't win by moving [music] faster. You win by rediding
88:40 win by rediding [music] time. [singing]
88:53 The future isn't [music] distant. It accelerates [singing] for those who
88:55 accelerates [singing] for those who wield the tools of power. Instead of
88:58 wield the tools of power. Instead of fighting with their goals, mastery is
89:01 fighting with their goals, mastery is leverage, [music] not a sentence carved
89:03 leverage, [music] not a sentence carved in stone. The horizon does not move
89:06 in stone. The horizon does not move [singing] unless you. [music]
89:16 There is a car of time [music] where the present multiplies where a lifetime
89:19 present multiplies where a lifetime holds a legacy that no clock [music] can
89:22 holds a legacy that no clock [music] can quantify. Not by force, not by fury, but
89:26 quantify. Not by force, not by fury, but by evolution inside. We become eternal
89:30 by evolution inside. We become eternal beings when [music] we synchronize
89:34 beings when [music] we synchronize with
90:18 footstep. their fade, but they never die. Shadows stretch across the sky.
90:29 A whisper grows into a [singing] roar. Do you [music] feel it? Do you want
90:32 Do you [music] feel it? Do you want more?
90:42 Every heartbeat a stone [music] in the street.
90:49 Ripples [music and singing] chasing an endless dream.
91:00 What we do in life [music] echoes in eternity.
91:03 eternity. Every spark ignit [music]
91:41 >> [music] >> Reach out to the [singing] empty air.
91:44 >> Reach out to the [singing] empty air. Trace the stars like they're waiting
91:46 Trace the stars like they're waiting there.
91:48 there. [music]
91:54 The clock ticks but the moment stays forever starts in a single [singing]
91:56 forever starts in a single [singing] prayer.
92:32 >> Every heartbeat [music] stone [singing] in the stream.
92:43 Ripples [music and singing] chasing an endless dream.
92:55 What we do in life echoes in eternity. There sparking lights a fire that will
92:58 There sparking lights a fire that will never see [music]
93:50 >> Shadows [music] crawl where the light won't stay.
93:54 crawl where the light won't stay. The echo whispers don't [music] look
93:57 The echo whispers don't [music] look away.
93:59 away. Heartbeat racing louder than my doubt.
94:03 Heartbeat racing louder than my doubt. Scream inside. I can't let
94:06 Scream inside. I can't let [music and singing] out. But I won't
94:08 [music and singing] out. But I won't fall. I won't drown in the storm all
94:12 fall. I won't drown in the storm all around.
94:19 Fear of the [music] mind. I won't let you in. It seems like a
94:23 I won't let you in. It seems like a ghost, [music] but I keep it within
94:27 ghost, [music] but I keep it within the mind.
94:29 the mind. I'm breaking [music] the chain.
94:55 >> [music] >> Cold winds how but they won't define me.
94:59 >> Cold winds how but they won't define me. The cracks in my soul let the light find
95:03 The cracks in my soul let the light find [music] me. Every step I take the ground
95:06 [music] me. Every step I take the ground fights back. But I'm the fire. I'm the
95:08 fights back. But I'm the fire. I'm the spark. I'm the attack. [music]
95:16 I won't freeze. I won't fade. Through the chaos I've remained.
95:23 [music] Fear is a killer. I won't let it win. It
95:28 Fear is a killer. I won't let it win. It creeps like a ghost, but I keep it
95:31 creeps like a ghost, but I keep it within.
95:33 within. Fear is a killer. I'm breaking [music]
95:36 Fear is a killer. I'm breaking [music] the chain. Don't for
96:10 >> [music] [singing]
96:19 [music] >> I hear the static in the [singing]
96:21 >> I hear the static in the [singing] night. It calls.
96:24 night. It calls. A whisper [music] rising,
96:27 A whisper [music] rising, breaking through the walls.
96:30 breaking through the walls. [music]
96:32 [music] Electric echoes in my veins. Stay home.
96:36 Electric echoes in my veins. Stay home. Chasing the shadows where [music] the
96:39 Chasing the shadows where [music] the wild ones run.
96:47 The air is still the weight is [music and singing] gone. Close your
96:49 [music and singing] gone. Close your eyes. The past is done.
96:59 Free your mind. Let it go. Let [music] it break the chains. Heat. Heat.
97:43 >> Waves come crash against the sky. Friends of a dream. [music]
97:46 Friends of a dream. [music] I see them inside.
97:53 Gravity is a [music] story. We don't need a weather thunder where the speed.
97:58 need a weather thunder where the speed. [music]
98:03 [music] The air is the weight is gone. Close
98:07 The air is the weight is gone. Close your eyes. The [singing] past is done.
98:15 [singing] Free your mind. [music] Let it go. Let
98:18 Free your mind. [music] Let it go. Let it break the chain. Heat. Heat.
98:33 Heat. [music] Heat.
98:44 Heat [music]
99:26 [music] they said the stars don't change
99:28 they said the stars don't change [singing] their course, but I've been
99:30 [singing] their course, but I've been running from their force. A mirror
99:34 running from their force. A mirror crack, but still it [music and singing]
99:36 crack, but still it [music and singing] shows. The fire is mine. It's mine to
99:40 shows. The fire is mine. It's mine to hold. I hear the echo and call [music]
99:43 hold. I hear the echo and call [music] my name,
99:50 but I'm not the shadow. [music] I'm not the same.
99:59 You are who you choose to be. The stars [music]
99:59 [music] are the history.
100:02 are the history. Every breath, every heartbeat.
100:05 Every breath, every heartbeat. [music]
100:38 [music] >> of thorns, a sky of glass. I've walked
100:42 >> of thorns, a sky of glass. I've walked through both. I've let them [singing]
100:44 through both. I've let them [singing] pass. The weight [music] is heavy, but
100:47 pass. The weight [music] is heavy, but I've grown. The voice I hear is now my
100:51 I've grown. The voice I hear is now my own. I see [music] the light
101:00 [music] change.
101:02 change. Heat. Heat. Heat.
101:57 >> I see the lines [music and singing] drawn in the sand. The map of chaos in
102:00 drawn in the sand. The map of chaos in mind.
102:09 Every step a [music] choice. Every beat of voice. The clock ticks
102:11 Every beat of voice. The clock ticks louder. But I stand. [music]
102:19 Close my eyes [singing] and feel it burn. Every [music] failure, every turn.
102:23 burn. Every [music] failure, every turn. It's fue for the fire inside. [music]
103:05 The air is heavy. It doesn't break. A thousand whispers in it wake.
103:16 Each breath [music] a climb. Each fall a sign, but I am more than I
103:20 Each fall a sign, but I am more than I can take. [music]
103:27 Close my eyes and feel it burn. Every failure, every turn is fueled for the
103:31 failure, every turn is fueled for the fire inside.
103:34 fire inside. [music]
104:03 >> No, no
104:07 [music] heat.
104:41 [music] >> The clock keeps ticking loud and clear.
104:57 I've been waiting for the light. [music]
104:58 [music] Holding breath through endless night.
105:06 [music] The air is shifting. Feel it break.
105:08 Feel it break. A single spark is all it [singing]
105:12 A single spark is all it [singing] takes.
105:14 takes. It starts today.
105:17 It starts today. It starts today.
105:20 It starts today. No more [music]
105:21 No more [music] running. No delay.
105:25 running. No delay. The world is spinning in my hands. It
105:30 The world is spinning in my hands. It [music] starts to
105:33 [music] starts to start today.
105:58 Every choice I made my own. [music] I see the dawn breaking through.
106:12 The air is shifting. [music] Feel it rain.
106:15 Feel it rain. A single spark is all in.
106:21 A single spark is all in. It starts to take it start.
106:52 [music] >> Heat up here.
107:29 Oh. [music]
107:39 >> [music] >> Fire in my chest is burning loud.
107:44 >> Fire in my chest is burning loud. Ashes fall, but I won't bow. [music]
107:49 Ashes fall, but I won't bow. [music] I've walked [singing] through the smoke.
107:52 I've walked [singing] through the smoke. I've tasted the scars. Each step I've
107:56 I've tasted the scars. Each step I've taken, lit up the stars. [music]
108:00 taken, lit up the stars. [music] Let it blaze, let it break. Feel the
108:04 Let it blaze, let it break. Feel the crack the ground sh
108:12 I'm forced in [singing] flame [music] I'm falling
108:15 I'm falling the pain they call me [music]
108:19 the pain they call me [music] deep
108:26 again from Heat. Heat. Heat.
108:28 from Heat. Heat. Heat. [music]
108:54 [music] >> The winds they how but I stand still.
108:59 >> The winds they how but I stand still. [music]
109:00 [music] The mountains crumble up my will.
109:05 The mountains crumble up my will. I'm not the same
109:08 I'm not the same I was before. A shadow of fear. I keep
109:23 let it blaze. [music and singing] Let it break. Feel the cracks. The ground will
109:26 break. Feel the cracks. The ground will shake.
109:33 I'm forged in flame. [music] Heat. Heat. Heat.
109:55 Heat. [music]
110:10 [music] Heat.
110:44 Shadows melt in the growing light. Time bends and twists. We feel it star.
110:53 A pulse [music] a spark [singing] an open heart.
110:57 open heart. Do you feel it? Feel it rise.
111:06 The weightless fire in the sky.
111:12 >> [music] >> has come.
111:15 >> has come. We're running to the sun.
111:38 electric in the trees that
111:42 in the trees that [music]
111:43 [music] stars collide, but we stay warm.
111:48 stars collide, but we stay warm. The past dissolves [music]
111:51 The past dissolves [music] like waves on storm.
111:55 like waves on storm. We stand together
112:40 The rush, [music] the fun, the everything.
112:45 the fun, the everything. A new age
112:47 A new age has come.
112:49 has come. We're running to the [music] sun. No
112:52 We're running to the [music] sun. No chains, no walls, just
112:56 chains, no walls, just with me. [music]
113:14 >> [music] >> Heat up
114:12 up [music] here.
114:37 >> up [music]
114:55 Heat [music]
115:59 [music] here.
116:54 [music] Heat.
117:22 Heat. >> [music]
117:57 Heat. [music] Heat.
118:06 [music] Heat. Heat.
118:47 >> Heat. Heat. [music]
120:02 [music] >> Heat. Heat.
120:35 >> Ladies and gentlemen, please welcome back to the stage Alex Lieberman.
120:43 Let's uh keep it going for the morning speakers. [music] Amazing job from
120:45 speakers. [music] Amazing job from everyone who spoke earlier. I asked
120:47 everyone who spoke earlier. I asked before who thought they came from the
120:49 before who thought they came from the furthest place on Earth to to watch this
120:52 furthest place on Earth to to watch this in person. And where's New Zealand
120:54 in person. And where's New Zealand again? I don't know. New Zealand. There
120:55 again? I don't know. New Zealand. There we go.
120:56 we go. >> From Bulgaria.
120:57 >> From Bulgaria. >> Bulgaria. Still, I think closer than New
121:00 >> Bulgaria. Still, I think closer than New Zealand, but still very far.
121:02 Zealand, but still very far. >> Australia via New Zealand.
121:03 >> Australia via New Zealand. >> Australia via New Zealand. We just got
121:05 >> Australia via New Zealand. We just got someone to one up New Zealand. I have
121:07 someone to one up New Zealand. I have another quick question since we just
121:09 another quick question since we just came back from a coffee break. Also, if
121:10 came back from a coffee break. Also, if you're watching live on YouTube, you can
121:12 you're watching live on YouTube, you can comment. Who thinks they're the most
121:13 comment. Who thinks they're the most caffeinated right now? Who thinks
121:16 caffeinated right now? Who thinks they're the most caffeinated in the
121:17 they're the most caffeinated in the room? How many cups of coffee? I'm four
121:19 room? How many cups of coffee? I'm four right now. Anyone beat four? Oh, we got
121:22 right now. Anyone beat four? Oh, we got four. We got a five, maybe. Wow,
121:24 four. We got a five, maybe. Wow, impressive. Well, we are back for an
121:26 impressive. Well, we are back for an incredible next block of sessions. We're
121:28 incredible next block of sessions. We're going to be covering everything from
121:29 going to be covering everything from future proofing uh coding agents to
121:32 future proofing uh coding agents to moving away from agile, how to quantify
121:35 moving away from agile, how to quantify AI ROI in software engineering, the
121:38 AI ROI in software engineering, the state of AI code quality, hype versus
121:41 state of AI code quality, hype versus reality, and Miniax M2. But I am so
121:44 reality, and Miniax M2. But I am so excited to kick off this next block of
121:47 excited to kick off this next block of talks with OpenAI. Please welcome to the
121:50 talks with OpenAI. Please welcome to the stage Bill Chen and Brian Fioa from the
121:53 stage Bill Chen and Brian Fioa from the Applied AI team at OpenAI. Let's hear it
121:55 Applied AI team at OpenAI. Let's hear it for them. [applause]
122:15 >> Hello everyone. Um, today we'll be talking about how to build coding
122:16 talking about how to build coding agents.
122:18 agents. And uh, I'm Bill. I work on the applied
122:21 And uh, I'm Bill. I work on the applied AI startups team at OpenAI.
122:23 AI startups team at OpenAI. >> And I'm Brian. I work with Bill on the
122:26 >> And I'm Brian. I work with Bill on the OpenAI startups team
122:27 OpenAI startups team >> and we specifically uh focus on uh
122:30 >> and we specifically uh focus on uh building coding agents here at OpenAI.
122:33 building coding agents here at OpenAI. Um yeah so why are we talk giving this
122:36 Um yeah so why are we talk giving this talk? Why why are we you know u talking
122:39 talk? Why why are we you know u talking about coding agents? Well it's really
122:41 about coding agents? Well it's really quite interesting because it's been
122:43 quite interesting because it's been booming for the the the past year
122:45 booming for the the the past year actually. It's just if you think about
122:47 actually. It's just if you think about it it's not that much time ago like only
122:49 it it's not that much time ago like only been a year or so. the ground keeps
122:52 been a year or so. the ground keeps shifting really under the uh harness on
122:54 shifting really under the uh harness on on the coding agents. But if you think
122:56 on the coding agents. But if you think about it, it's really like why it's
122:58 about it, it's really like why it's interesting is because it's really a
123:00 interesting is because it's really a signal on how close we are to AGI.
123:02 signal on how close we are to AGI. Software engineering can be set as a
123:04 Software engineering can be set as a universal medium for problem solving.
123:07 universal medium for problem solving. But because the ground is shifting so
123:08 But because the ground is shifting so fast, uh we h kept having to rebuild the
123:11 fast, uh we h kept having to rebuild the agent on top of the model whenever a
123:13 agent on top of the model whenever a model is released. And today we're going
123:15 model is released. And today we're going to talk a little bit about how we might
123:17 to talk a little bit about how we might be able to get around that.
123:21 be able to get around that. So, here's what we're going to go over
123:22 So, here's what we're going to go over today. We'll start with the anatomy of a
123:25 today. We'll start with the anatomy of a coding agent, especially going into the
123:27 coding agent, especially going into the details of models and harnesses and how
123:29 details of models and harnesses and how they work together. We'll share some
123:31 they work together. We'll share some lessons that we learned from putting
123:33 lessons that we learned from putting them together ourselves. And we're
123:36 them together ourselves. And we're specifically going to talk about codeex
123:37 specifically going to talk about codeex here, which is our own coding agent.
123:40 here, which is our own coding agent. We'll talk a little bit about emerging
123:41 We'll talk a little bit about emerging patterns that we're seeing from all of
123:44 patterns that we're seeing from all of you for using agents like Codeex in your
123:46 you for using agents like Codeex in your own products. And lastly, we'll talk a
123:48 own products. And lastly, we'll talk a little bit about what to expect from
123:51 little bit about what to expect from Codeex in the future so that you can
123:53 Codeex in the future so that you can build along with us if you want to.
124:01 To start, let's talk a little bit about what makes a coding agent an agent as a
124:04 what makes a coding agent an agent as a whole. Um, it really is quite simple. I
124:07 whole. Um, it really is quite simple. I think, you know, people kind of over
124:09 think, you know, people kind of over complicate things a little bit these
124:10 complicate things a little bit these days. It's made out of three parts. It's
124:12 days. It's made out of three parts. It's a user interface. It has a model. It's a
124:15 a user interface. It has a model. It's a harness, right? Uh the interface quite
124:17 harness, right? Uh the interface quite self-explanatory could be a computer uh
124:21 self-explanatory could be a computer uh like a CLI tool or it could be a uh
124:24 like a CLI tool or it could be a uh integrated developer environment could
124:26 integrated developer environment could be also cloud or background agent. Um,
124:30 be also cloud or background agent. Um, models also very quite self-explanatory
124:32 models also very quite self-explanatory are, you know, the things like the
124:35 are, you know, the things like the latest and greatest, the GPD 5.1 codeex
124:38 latest and greatest, the GPD 5.1 codeex uh max that we just released yesterday
124:41 uh max that we just released yesterday uh or the GPD 5.1 series of models or
124:45 uh or the GPD 5.1 series of models or other uh models from other providers as
124:47 other uh models from other providers as well. And the harness uh is a little bit
124:50 well. And the harness uh is a little bit more of an interesting part. This is the
124:52 more of an interesting part. This is the part that directly interacts with the
124:54 part that directly interacts with the model uh in the most reductive way. You
124:56 model uh in the most reductive way. You can sort of think of it as a collection
124:58 can sort of think of it as a collection of prompts and tools combined in a core
125:01 of prompts and tools combined in a core agent loop which provides input and
125:03 agent loop which provides input and outputs uh from a model. Uh the last
125:07 outputs uh from a model. Uh the last part will be our focus for today.
125:16 As touched on a bit earlier, coding is one of the most active frontiers in
125:17 one of the most active frontiers in applied AI and uh how models are
125:20 applied AI and uh how models are constantly getting released and we're
125:22 constantly getting released and we're not making the problem uh easier for
125:24 not making the problem uh easier for everybody
125:26 everybody is that people have to constantly adapt
125:30 is that people have to constantly adapt uh the agents to the new models.
125:39 So, um, Bill's done a great job of giving us an overview of coding agents,
125:41 giving us an overview of coding agents, what they're made up of. So, let's zoom
125:44 what they're made up of. So, let's zoom in a little bit on the harness. Um,
125:47 in a little bit on the harness. Um, turns out that's a little bit tricky.
125:49 turns out that's a little bit tricky. So, what is a harness? A harness is
125:52 So, what is a harness? A harness is really the interface layer to the model.
125:54 really the interface layer to the model. It's the surface area the model uses to
125:57 It's the surface area the model uses to talk to users and the code and perform
126:00 talk to users and the code and perform actions with tools. It's made up of all
126:03 actions with tools. It's made up of all of the pieces that the model needs to
126:06 of the pieces that the model needs to work over many turns, call tools, and
126:09 work over many turns, call tools, and and really write code for you and
126:11 and really write code for you and interpret what the user is actually
126:13 interpret what the user is actually asking. [snorts] Um, for some, the
126:16 asking. [snorts] Um, for some, the harness might actually be the special
126:18 harness might actually be the special sauce of the product. But as we're going
126:21 sauce of the product. But as we're going to go into a little bit more, it's
126:22 to go into a little bit more, it's really challenging work to build a good
126:25 really challenging work to build a good harness. And we'll talk about how we did
126:28 harness. And we'll talk about how we did that.
126:30 that. So let's see what are some of these
126:32 So let's see what are some of these challenges. Um just to name a few, AV is
126:37 challenges. Um just to name a few, AV is one. Um your [laughter]
126:39 one. Um your [laughter] um your brand new innovative custom tool
126:42 um your brand new innovative custom tool that you're giving to your agent might
126:44 that you're giving to your agent might not actually be something the model is
126:45 not actually be something the model is using is used to using. It may not have
126:48 using is used to using. It may not have ever seen that tool before in trading.
126:50 ever seen that tool before in trading. And even if it is, you need to spend
126:52 And even if it is, you need to spend time tuning your prompt to that
126:54 time tuning your prompt to that particular model and the habits that it
126:57 particular model and the habits that it comes with.
126:59 comes with. And new models are coming out all the
127:01 And new models are coming out all the time. What about latency? Like does the
127:04 time. What about latency? Like does the model take a while to think about
127:06 model take a while to think about certain things? Which things do you
127:08 certain things? Which things do you prompt it not to? How do you expose the
127:10 prompt it not to? How do you expose the UX of what a thinking model is doing
127:13 UX of what a thinking model is doing while it's thinking? Is it communicating
127:15 while it's thinking? Is it communicating with you while it's thinking or do you
127:17 with you while it's thinking or do you have to summarize it? Managing the
127:19 have to summarize it? Managing the context window and compaction can be
127:22 context window and compaction can be really challenging. We just launched
127:24 really challenging. We just launched Codeex Max that does that out of the box
127:26 Codeex Max that does that out of the box for you. you don't have to worry about
127:28 for you. you don't have to worry about compaction and context window
127:30 compaction and context window management. It's really hard to do. Um,
127:33 management. It's really hard to do. Um, and so if you were to do it yourself,
127:36 and so if you were to do it yourself, have fun. Um, and then also like the
127:38 have fun. Um, and then also like the APIs keep changing, right? So we have
127:40 APIs keep changing, right? So we have completions, we have responses, we have
127:41 completions, we have responses, we have whatever else is coming in the future.
127:44 whatever else is coming in the future. What does the model know how to use and
127:46 What does the model know how to use and get to get the most intelligence out of
127:48 get to get the most intelligence out of the box?
127:50 the box? And so
127:52 And so this is the interesting part. Fitting a
127:54 this is the interesting part. Fitting a model into a harness takes a lot of
127:57 model into a harness takes a lot of prompting.
127:59 prompting. It turns out that how the model is
128:00 It turns out that how the model is trained has side effects.
128:04 trained has side effects. I like to think about it this way.
128:07 I like to think about it this way. Intelligence plus habit. Intelligence.
128:10 Intelligence plus habit. Intelligence. What is the model good at? What
128:13 What is the model good at? What languages does it know really well? What
128:15 languages does it know really well? What is what is its capabilities in terms of
128:18 is what is its capabilities in terms of like how well it can write code in
128:20 like how well it can write code in certain frameworks? And then what habits
128:24 certain frameworks? And then what habits did it learn to to use to solve those
128:27 did it learn to to use to solve those problems? We've trained our models to
128:30 problems? We've trained our models to have habits of like planning a solution,
128:33 have habits of like planning a solution, looking around, gathering context, and
128:36 looking around, gathering context, and and thinking about a problem before
128:38 and thinking about a problem before diving in and writing code, and then
128:40 diving in and writing code, and then testing its work at the end.
128:43 testing its work at the end. Developing a feel for these habits is
128:45 Developing a feel for these habits is how you become a good prompt engineer.
128:48 how you become a good prompt engineer. If you don't instruct the model in ways
128:50 If you don't instruct the model in ways that it's familiar with, you can have
128:53 that it's familiar with, you can have problems. We saw this when we launched
128:56 problems. We saw this when we launched GPD5. A lot of people who weren't used
128:59 GPD5. A lot of people who weren't used to using our models encoding tried to
129:01 to using our models encoding tried to take prompts that existed for other
129:04 take prompts that existed for other models and put them into their harness
129:06 models and put them into their harness and have GPD5 follow those instructions.
129:09 and have GPD5 follow those instructions. And it turned out that we taught our
129:12 And it turned out that we taught our model to do some of the things that the
129:14 model to do some of the things that the other models didn't really do out of the
129:16 other models didn't really do out of the box. And so when they were prompting
129:18 box. And so when they were prompting them to look really hard at the context
129:21 them to look really hard at the context and like examine every single file
129:23 and like examine every single file before making a a code edit, our model
129:27 before making a a code edit, our model was being very kind of thorough about
129:29 was being very kind of thorough about that and it was taking a really long
129:31 that and it was taking a really long time and they weren't seeing the best
129:32 time and they weren't seeing the best performance. And so we figured out that
129:35 performance. And so we figured out that if you let the model just do the
129:37 if you let the model just do the behaviors that it's used to and don't
129:39 behaviors that it's used to and don't overprompt it, it'll actually perform
129:41 overprompt it, it'll actually perform really better. We found out by asking. I
129:44 really better. We found out by asking. I was literally like, "Hey, like I like
129:46 was literally like, "Hey, like I like the solution, but it took you a long
129:47 the solution, but it took you a long time to get there. What can I do
129:50 time to get there. What can I do differently in your instructions to help
129:52 differently in your instructions to help you get there faster next time?" And
129:53 you get there faster next time?" And literally it said, "Uh, you're telling
129:55 literally it said, "Uh, you're telling me to go look at everything and I don't
129:57 me to go look at everything and I don't really need to. So that's what's taking
129:59 really need to. So that's what's taking forever."
130:06 And so you can actually see the advantages of building both the model
130:07 advantages of building both the model and the harness together because you
130:09 and the harness together because you just like know all of that while you're
130:11 just like know all of that while you're building it. And that's why Codex is
130:14 building it. And that's why Codex is both a model and a harness combined.
130:17 both a model and a harness combined. So let's dig deeper into Codeex and what
130:20 So let's dig deeper into Codeex and what it can actually do.
130:23 it can actually do. So we built Codex to be an agent for
130:25 So we built Codex to be an agent for everywhere that you code. It's a VS Code
130:27 everywhere that you code. It's a VS Code plugin. It's a CLI. You can call it in
130:30 plugin. It's a CLI. You can call it in the cloud from the VS Code plugin or
130:32 the cloud from the VS Code plugin or from chatgbt from your phone. Um, and
130:36 from chatgbt from your phone. Um, and it's very basic. You can use it to turn
130:38 it's very basic. You can use it to turn your specs into runnable code starting
130:40 your specs into runnable code starting from a prompt. Um, having a plan. It
130:44 from a prompt. Um, having a plan. It navigates your repo to edit files. It
130:46 navigates your repo to edit files. It runs commands, executes tasks, and you
130:49 runs commands, executes tasks, and you can call it from Slack or you can have
130:52 can call it from Slack or you can have it review PRs and GitHub. So, all of the
130:54 it review PRs and GitHub. So, all of the things that you would expect.
130:58 things that you would expect. And that means that the that codec um
131:00 And that means that the that codec um the harness of codec needs to be able to
131:02 the harness of codec needs to be able to do a lot of really complex things. Uh
131:06 do a lot of really complex things. Uh when I talked to a member of the codeex
131:08 when I talked to a member of the codeex team about this slide and what should be
131:09 team about this slide and what should be on it, he was like it's way harder than
131:11 on it, he was like it's way harder than you think. [laughter]
131:13 you think. [laughter] You have to manage parallel tool calls
131:15 You have to manage parallel tool calls like thread merging and all of the
131:17 like thread merging and all of the things involved in that. Think about all
131:19 things involved in that. Think about all the security considerations you have
131:20 the security considerations you have with sandboxing, prompt forwarding,
131:23 with sandboxing, prompt forwarding, permissions, uh, port management. Um,
131:26 permissions, uh, port management. Um, compaction is a whole thing. Um, and
131:29 compaction is a whole thing. Um, and doing that well is really complex. When
131:31 doing that well is really complex. When do you trigger compaction? When do you
131:33 do you trigger compaction? When do you reingject? How do you worry about uh
131:35 reingject? How do you worry about uh cache optimization during that MCP,
131:38 cache optimization during that MCP, right? Like all of the uh plumbing you
131:41 right? Like all of the uh plumbing you have to build for MCP support into the
131:43 have to build for MCP support into the harness. Uh, and then not even
131:45 harness. Uh, and then not even mentioning images and what's the
131:47 mentioning images and what's the resolution that you need to compress
131:49 resolution that you need to compress them to to send them to the model. All
131:50 them to to send them to the model. All this all of this is like work that you
131:51 this all of this is like work that you have to do if you're going to build this
131:52 have to do if you're going to build this from scratch and keep it updated as new
131:55 from scratch and keep it updated as new features come online.
132:02 So since we've bundled all of these features together for you in an agent
132:05 features together for you in an agent that can safely write its own tools to
132:07 that can safely write its own tools to solve new problems that it encounters.
132:12 solve new problems that it encounters. Oops.
132:14 Oops. Uh we actually have here uh a computer
132:18 Uh we actually have here uh a computer use agent for the terminal.
132:28 Wow, that sounds quite a bit powerful than just plain old coding agent,
132:30 than just plain old coding agent, doesn't it? Um but just think about it
132:32 doesn't it? Um but just think about it again. Well, before browser and graphic
132:34 again. Well, before browser and graphic user interface was a thing, wasn't that
132:36 user interface was a thing, wasn't that how we always operate a computer?
132:39 how we always operate a computer? they're writing code and chaining them
132:40 they're writing code and chaining them together in a command line interface. So
132:43 together in a command line interface. So that means if you can express your tasks
132:45 that means if you can express your tasks in command line as well as files tasks
132:49 in command line as well as files tasks codeex will be able to know what to do.
132:52 codeex will be able to know what to do. Um the example is I like to use codeex
132:54 Um the example is I like to use codeex to organize a lot of the photos from my
132:57 to organize a lot of the photos from my desktop into a folder and that's a very
133:00 desktop into a folder and that's a very simple use case but what it can also do
133:02 simple use case but what it can also do is it can analyze huge amounts of CSV
133:05 is it can analyze huge amounts of CSV files inside of a folder uh doing data
133:08 files inside of a folder uh doing data analysis it does not have to be a coding
133:11 analysis it does not have to be a coding task and if it can be accomplished by
133:13 task and if it can be accomplished by running tools from command line you can
133:14 running tools from command line you can use codeex
133:16 use codeex so now that we see codeex is such a cool
133:18 so now that we see codeex is such a cool harness um I want to also share a little
133:21 harness um I want to also share a little a bit about how you can use it to build
133:23 a bit about how you can use it to build your own agents. And what you can do is
133:26 your own agents. And what you can do is you can use codeex the agent inside of
133:29 you can use codeex the agent inside of your own agent.
133:32 your own agent. Um, how does that work? Well, if you
133:36 Um, how does that work? Well, if you want to build uh a coding uh the next
133:40 want to build uh a coding uh the next coding startup, we don't really have all
133:42 coding startup, we don't really have all the answers, but we do have a few
133:43 the answers, but we do have a few patterns uh that we thought uh might
133:46 patterns uh that we thought uh might help you having worked with some of the
133:48 help you having worked with some of the top coding customers uh like cursor and
133:51 top coding customers uh like cursor and VS code. Uh one of those patterns is uh
133:55 VS code. Uh one of those patterns is uh harness becoming the new abstraction
133:57 harness becoming the new abstraction layer. The benefits of this is quite
134:00 layer. The benefits of this is quite obvious. Um, you no longer have to care
134:03 obvious. Um, you no longer have to care about prioritize uh optimizing the
134:05 about prioritize uh optimizing the prompt and tools with every model
134:07 prompt and tools with every model upgrade.
134:10 upgrade. >> But, um, does that mean you're just
134:11 >> But, um, does that mean you're just building a wrapper?
134:13 building a wrapper? >> Well, I disagree with that take.
134:15 >> Well, I disagree with that take. [snorts] I disagree. I was disagreeing
134:18 [snorts] I disagree. I was disagreeing with my colleague here. Um, just like
134:21 with my colleague here. Um, just like how building rappers on top of models I
134:23 how building rappers on top of models I think is really reductive on uh on the
134:27 think is really reductive on uh on the whole value prop of the infrastructure
134:29 whole value prop of the infrastructure layer. Sorry, I used to be a VC.
134:31 layer. Sorry, I used to be a VC. [laughter]
134:32 [laughter] >> Focusing most of your efforts on
134:34 >> Focusing most of your efforts on differentiating your product is what
134:36 differentiating your product is what this pattern allows you to do. And
134:38 this pattern allows you to do. And that's where most of the value lies.
134:46 Exactly. Okay. So, let's look at some of these patterns that we've seen and
134:48 these patterns that we've seen and actually have helped our customers build
134:51 actually have helped our customers build um along with them. Codeex is an SDK. It
134:55 um along with them. Codeex is an SDK. It can be called through a TypeScript
134:56 can be called through a TypeScript library. You can call it
134:58 library. You can call it programmatically in a Python exec.
135:00 programmatically in a Python exec. There's a GitHub action that you can
135:02 There's a GitHub action that you can plug into to have it merge merge
135:04 plug into to have it merge merge conflicts on PRs that everybody hates
135:07 conflicts on PRs that everybody hates doing. Then uh you can also add it to
135:10 doing. Then uh you can also add it to the agents SDK and give it MCP
135:14 the agents SDK and give it MCP connectors back to your product. So now
135:16 connectors back to your product. So now you have an agent. I like to say we
135:18 you have an agent. I like to say we started with chat bots that you can talk
135:20 started with chat bots that you can talk to. Then we gave the chatbots tools to
135:22 to. Then we gave the chatbots tools to use and then now you can give uh a tool
135:27 use and then now you can give uh a tool to your chatbot that can make other
135:29 to your chatbot that can make other tools that it doesn't have. And so now
135:32 tools that it doesn't have. And so now you can actually build out enterprise
135:35 you can actually build out enterprise software that does it that writes its
135:36 software that does it that writes its own plug-in connectors to the API level
135:39 own plug-in connectors to the API level for each customer on the spot. That's
135:43 for each customer on the spot. That's something that a professional services
135:44 something that a professional services team used to have to do. Um, so you have
135:46 team used to have to do. Um, so you have fully customizable software that can now
135:49 fully customizable software that can now talk back to itself. Um, I made a conbon
135:52 talk back to itself. Um, I made a conbon board for Devday that can actually fix
135:53 board for Devday that can actually fix its own bugs. Um, it's pretty fun. And
135:57 its own bugs. Um, it's pretty fun. And then lastly, um, you can actually do
135:59 then lastly, um, you can actually do something like what Zed has done. They
136:01 something like what Zed has done. They have just decided to wrap codeex inside
136:05 have just decided to wrap codeex inside of a layer and give it an interface to
136:07 of a layer and give it an interface to the IDE for talking back and forth for
136:10 the IDE for talking back and forth for the user and making code edits. And now
136:12 the user and making code edits. And now they don't actually have to do all the
136:14 they don't actually have to do all the work of staying on top of all of the
136:16 work of staying on top of all of the things that we're good at doing and they
136:17 things that we're good at doing and they can focus on building like the best code
136:20 can focus on building like the best code editor.
136:26 Uh so our top coding partners like GitHub has used this uh to great effect
136:29 GitHub has used this uh to great effect and well uh we've created an SDK for it
136:33 and well uh we've created an SDK for it that they used to directly integrate uh
136:35 that they used to directly integrate uh with codeex. You can also use the SDK to
136:38 with codeex. You can also use the SDK to uh control codecs as part of your CI/CD
136:41 uh control codecs as part of your CI/CD pipeline as well as use it as an agent
136:43 pipeline as well as use it as an agent that directly interacts with your own
136:45 that directly interacts with your own agent as well. Uh if you really want to
136:49 agent as well. Uh if you really want to customize the agent layer, you can do it
136:51 customize the agent layer, you can do it too. As an example of this, we worked
136:53 too. As an example of this, we worked with closely with the cursor team to get
136:56 with closely with the cursor team to get the best performance out of the codecs.
136:57 the best performance out of the codecs. The model, not the agent, we're bad at
136:59 The model, not the agent, we're bad at naming things. The model is different
137:01 naming things. The model is different from the agent. They did so by aligning
137:04 from the agent. They did so by aligning their tools to be in distribution with
137:06 their tools to be in distribution with how the model is trained and they did so
137:08 how the model is trained and they did so by aligning uh their harness with our
137:10 by aligning uh their harness with our open- source uh implementation of codeex
137:13 open- source uh implementation of codeex CLI. All of this is publicly available.
137:16 CLI. All of this is publicly available. Uh you can fork the repo, you can use
137:19 Uh you can fork the repo, you can use our source code, you can use it. Uh go
137:22 our source code, you can use it. Uh go nuts.
137:29 So what does the future hold for Codeex? It hasn't even been out for a year. Um
137:32 It hasn't even been out for a year. Um and especially with the lo la la la la
137:33 la la la la la la la la la la la la la la la la la la la la la la la la la la
137:33 la la la la la la la la la la la la la la la la la la la la la la launch of
137:33 la la la la la la la la la launch of codeex match yesterday like things are
137:35 codeex match yesterday like things are really changing fast. Uh it's the
137:38 really changing fast. Uh it's the fastest growing model in usage now
137:40 fastest growing model in usage now serving dozens of trillions of tokens
137:42 serving dozens of trillions of tokens per week which has actually doubled
137:45 per week which has actually doubled since dev day.
137:48 since dev day. It's always good to build where the
137:51 It's always good to build where the models are going. It's safe to assume
137:53 models are going. It's safe to assume that the models will get better. They'll
137:56 that the models will get better. They'll be able to get to work on much longer
137:58 be able to get to work on much longer horizon tasks unsupervised.
138:00 horizon tasks unsupervised. New models will raise the trust ceiling.
138:03 New models will raise the trust ceiling. I trust these models now to do some way
138:06 I trust these models now to do some way harder work than I would have six months
138:08 harder work than I would have six months ago. And that's going to keep
138:10 ago. And that's going to keep increasing. The future is about
138:12 increasing. The future is about sprawling code bases and non-standard
138:15 sprawling code bases and non-standard libraries and knowing how to work in
138:16 libraries and knowing how to work in closed source environments, matching
138:18 closed source environments, matching existing templates and practices
138:21 existing templates and practices and the models uh and and and so you can
138:24 and the models uh and and and so you can imagine that the SDK will evolve to
138:26 imagine that the SDK will evolve to better support these model capabilities,
138:29 better support these model capabilities, letting the model learn as it goes and
138:31 letting the model learn as it goes and not repeat mistakes and generally
138:33 not repeat mistakes and generally provide more surface area for an agent
138:36 provide more surface area for an agent that writes code and uses a terminal to
138:39 that writes code and uses a terminal to solve whatever problems it encounters.
138:41 solve whatever problems it encounters. counters and you can use that in your
138:44 counters and you can use that in your products via the SDK.
138:48 products via the SDK. So, what have we learned? Harnesses are
138:50 So, what have we learned? Harnesses are really complicated and take a lot of
138:52 really complicated and take a lot of work to maintain, especially with all
138:54 work to maintain, especially with all the new models coming out. So, we've
138:57 the new models coming out. So, we've built one for you inside of Codeex that
138:59 built one for you inside of Codeex that you can use off the shelf or look at the
139:02 you can use off the shelf or look at the source if you want to and you can use it
139:04 source if you want to and you can use it to build new things outside of coding
139:07 to build new things outside of coding and let us do all of the work making
139:09 and let us do all of the work making sure that you have the most capable
139:11 sure that you have the most capable computer agent.
139:13 computer agent. And we're really excited to see what you
139:15 And we're really excited to see what you craft.
139:23 Our [applause] [music]
139:32 next presenters believe that most enterprises are failing to unlock real
139:34 enterprises are failing to unlock real value from AI because the systems in
139:37 value from AI because the systems in which they operate are [music] stuck in
139:39 which they operate are [music] stuck in the past. Here to share how agents are
139:42 the past. Here to share how agents are reshaping software delivery are McKenzie
139:44 reshaping software delivery are McKenzie partners Martin Harrison and Natasha
139:47 partners Martin Harrison and Natasha Mania.
140:00 >> All right, good morning. Hello everyone. It's really great to be here. Uh so I'm
140:03 It's really great to be here. Uh so I'm Martin and I'm here with my colleague
140:04 Martin and I'm here with my colleague Natasha. Uh we're from a part of
140:07 Natasha. Uh we're from a part of Mckenzie you may may not be as familiar
140:09 Mckenzie you may may not be as familiar with. We have a practice called software
140:11 with. We have a practice called software X and we work with uh mostly enterprise
140:14 X and we work with uh mostly enterprise clients on how to build better software
140:17 clients on how to build better software products which has messed mostly using
140:19 products which has messed mostly using AI uh in the in the past couple of
140:21 AI uh in the in the past couple of years.
140:23 years. Uh and so what our talk is about today
140:26 Uh and so what our talk is about today is really more focused on the people and
140:29 is really more focused on the people and the operating model aspects of
140:31 the operating model aspects of leveraging AI for software development
140:33 leveraging AI for software development and and that we believe that that has
140:35 and and that we believe that that has changed quite significantly and and
140:37 changed quite significantly and and that's what we're excited to talk to you
140:39 that's what we're excited to talk to you about.
140:41 about. If I take a quick step back uh in in
140:43 If I take a quick step back uh in in time and we just uh you know think
140:46 time and we just uh you know think through some of these the major
140:47 through some of these the major technology breakthroughs that we've seen
140:49 technology breakthroughs that we've seen in the last few decades uh they tend to
140:52 in the last few decades uh they tend to always come with a paradigm shift in
140:54 always come with a paradigm shift in also how we develop software and so I
140:57 also how we develop software and so I still recall uh almost 20 years ago now
140:59 still recall uh almost 20 years ago now I started working as a software engineer
141:01 I started working as a software engineer an entry- level developer um in a tech
141:04 an entry- level developer um in a tech company and the company I was working
141:06 company and the company I was working for was just switching to to agile we
141:09 for was just switching to to agile we were using camb boards we were doing uh
141:11 were using camb boards we were doing uh standups and and other ceremonies. This
141:14 standups and and other ceremonies. This was a big change. It was a massive
141:16 was a big change. It was a massive change for the for the company. And now
141:20 change for the for the company. And now with everything that is happening
141:21 with everything that is happening happening in AI, we're at the precipice
141:24 happening in AI, we're at the precipice of another such paradigm shift.
141:27 of another such paradigm shift. And
141:29 And um
141:31 um if we think about some of the um some of
141:33 if we think about some of the um some of the things that are happening um with AI
141:36 the things that are happening um with AI and software development that we've seen
141:38 and software development that we've seen at this um at this conference, there's
141:41 at this um at this conference, there's no doubt that this is a new paradigm
141:43 no doubt that this is a new paradigm that is about us. And so we'll talk
141:44 that is about us. And so we'll talk about two things. Uh we'll first touch a
141:47 about two things. Uh we'll first touch a little bit about how do you go from
141:49 little bit about how do you go from these things that we're seeing at
141:51 these things that we're seeing at individual productivity to scaling that
141:53 individual productivity to scaling that to the whole team and what that what
141:55 to the whole team and what that what type of changes we think that implies
141:57 type of changes we think that implies and then we'll talk a little bit uh
141:59 and then we'll talk a little bit uh about how do you scale that across uh a
142:01 about how do you scale that across uh a whole organization and to really get get
142:03 whole organization and to really get get value
142:08 um if if you sort of I I'm talking to an
142:10 if if you sort of I I'm talking to an audience here which is using AI agents
142:13 audience here which is using AI agents all the time and I if I if I asked you
142:16 all the time and I if I if I asked you about some examples. I'm sure you could
142:18 about some examples. I'm sure you could rattle off, you know, 10 different ones
142:20 rattle off, you know, 10 different ones where you would say, "Look, there was
142:22 where you would say, "Look, there was this thing that I used to do. It it used
142:24 this thing that I used to do. It it used to take uh maybe even days and and and
142:28 to take uh maybe even days and and and hours that are now taking only minutes,
142:31 hours that are now taking only minutes, right? There's no shortage of those
142:33 right? There's no shortage of those those stories. And you can go over to
142:35 those stories. And you can go over to the expo and and talk to any of the
142:37 the expo and and talk to any of the companies there about all these all
142:39 companies there about all these all these great use cases. It really shows
142:41 these great use cases. It really shows that these tools work and they can be
142:42 that these tools work and they can be really impactful." And so yet despite
142:46 really impactful." And so yet despite seeing you know some of these uh
142:47 seeing you know some of these uh improvement uh improvements
142:51 improvement uh improvements uh we've done some research to gauge you
142:53 uh we've done some research to gauge you know where are our clients at the
142:55 know where are our clients at the moment. We we recently surveyed about
142:57 moment. We we recently surveyed about 300 uh companies uh mostly enterprises
143:01 300 uh companies uh mostly enterprises around what are they seeing in terms of
143:03 around what are they seeing in terms of productivity improvements. So you have
143:05 productivity improvements. So you have this and then they would say uh on
143:08 this and then they would say uh on average we're often seeing only 5 10 15%
143:11 average we're often seeing only 5 10 15% improvements overall as as a company. So
143:14 improvements overall as as a company. So we're in a place where there's a bit of
143:16 we're in a place where there's a bit of a disconnect between this this big
143:18 a disconnect between this this big potential uh around AI as uh from the
143:22 potential uh around AI as uh from the reality.
143:23 reality. And so we we think that um there is this
143:28 And so we we think that um there is this gap because as we've started
143:30 gap because as we've started implementing AI whether it's um you know
143:32 implementing AI whether it's um you know coding assistance or whether it's now
143:34 coding assistance or whether it's now using you know you just heard about uh
143:37 using you know you just heard about uh you know how open AI is using agents and
143:40 you know how open AI is using agents and more complex uh workflows. What has
143:43 more complex uh workflows. What has started to emerge is a is a set of
143:45 started to emerge is a is a set of bottlenecks uh that that were not
143:48 bottlenecks uh that that were not necessarily there before. Like for for
143:51 necessarily there before. Like for for example, as we now start moving much
143:54 example, as we now start moving much faster in certain in certain aspects of
143:56 faster in certain in certain aspects of the work, uh we haven't really changed
143:58 the work, uh we haven't really changed how we collaborate among people and and
144:00 how we collaborate among people and and team members. That's not quite keeping
144:02 team members. That's not quite keeping up.
144:03 up. We started generating way more more
144:05 We started generating way more more code, but we're it's still being
144:07 code, but we're it's still being reviewed in a in a pretty manual way in
144:09 reviewed in a in a pretty manual way in in many companies. Then we also have
144:12 in many companies. Then we also have this this theme which was recently
144:14 this this theme which was recently highlighted in in even a research report
144:16 highlighted in in even a research report from from Carnegie Melon uh about how
144:19 from from Carnegie Melon uh about how all the new code that is being generated
144:21 all the new code that is being generated is also amplifying uh the generation of
144:23 is also amplifying uh the generation of tech debt in some in some cases and
144:25 tech debt in some in some cases and actually generating complexity. And so
144:29 actually generating complexity. And so there are these bottlenecks. They're not
144:30 there are these bottlenecks. They're not impossible to overcome but this is what
144:32 impossible to overcome but this is what we believe is limiting uh many companies
144:35 we believe is limiting uh many companies from seeing the the the real value that
144:37 from seeing the the the real value that that they should be seeing.
144:40 that they should be seeing. Let me talk about maybe just a couple of
144:43 Let me talk about maybe just a couple of examples to to make that uh come to life
144:46 examples to to make that uh come to life a little bit more. One of the things
144:48 a little bit more. One of the things that we see as a big rate limiter at the
144:50 that we see as a big rate limiter at the moment is around how work is allocated.
144:53 moment is around how work is allocated. And so what what we've learned over the
144:55 And so what what we've learned over the last couple of years is that the impact
144:57 last couple of years is that the impact from AI and agents is highly uneven.
144:59 from AI and agents is highly uneven. There are some tasks which where it
145:01 There are some tasks which where it works amazingly well today and you see
145:04 works amazingly well today and you see uh huge improvements and there are
145:06 uh huge improvements and there are others where it it's not as effective
145:08 others where it it's not as effective and so you have that variability. You
145:10 and so you have that variability. You also have variability among people. Some
145:12 also have variability among people. Some have have uh lots of experience now
145:14 have have uh lots of experience now using these tools and and know how to
145:17 using these tools and and know how to pick that up and others uh are less
145:19 pick that up and others uh are less experienced right now and so what that
145:21 experienced right now and so what that means for for team leaders for
145:23 means for for team leaders for engineering managers and so on is it's
145:25 engineering managers and so on is it's very highly non-trivial to know how to
145:28 very highly non-trivial to know how to allocate work and resources in in a good
145:30 allocate work and resources in in a good way and this is creating a lot of
145:32 way and this is creating a lot of inefficiencies.
145:34 inefficiencies. Another example uh is is around how work
145:38 Another example uh is is around how work is being reviewed. So agents are often
145:41 is being reviewed. So agents are often giving given pretty uh fuzzy uh you know
145:45 giving given pretty uh fuzzy uh you know stories that are written in pros with
145:47 stories that are written in pros with pretty fussy acceptance criteria. Uh
145:50 pretty fussy acceptance criteria. Uh which which means that the code that
145:51 which which means that the code that comes back is not always what it was
145:53 comes back is not always what it was intended to be and and for many
145:56 intended to be and and for many companies the only mechanism to control
145:58 companies the only mechanism to control that is is often manual review. So
146:00 that is is often manual review. So you've automated some things but we've
146:02 you've automated some things but we've generated more manual review. So these
146:04 generated more manual review. So these are some of the some of the examples of
146:06 are some of the some of the examples of uh these bottlenecks that we that we see
146:08 uh these bottlenecks that we that we see coming up.
146:13 And as mentioned what what has that has resulted in so far is that most most
146:17 resulted in so far is that most most large companies today uh are are stuck a
146:20 large companies today uh are are stuck a little bit in in a world of relatively
146:23 little bit in in a world of relatively marginal gains. Uh they're working in
146:26 marginal gains. Uh they're working in ways that was developed with constraints
146:29 ways that was developed with constraints that we had in the past paradigm of
146:30 that we had in the past paradigm of human development. So you have you you
146:33 human development. So you have you you know if you go out to most companies you
146:34 know if you go out to most companies you see 8 to 10 person teams you see working
146:38 see 8 to 10 person teams you see working in two week sprints you have all these
146:40 in two week sprints you have all these these elements that were largely parts
146:42 these elements that were largely parts of like an of an agile operating uh
146:44 of like an of an agile operating uh model and that is and that is uh putting
146:47 model and that is and that is uh putting in some some limits to what they can
146:49 in some some limits to what they can see. Over the past year, we've been
146:52 see. Over the past year, we've been working with lots of clients to to sort
146:54 working with lots of clients to to sort of break that model a bit uh and develop
146:57 of break that model a bit uh and develop new ways of of working in smaller teams
147:00 new ways of of working in smaller teams in new roles uh in with shorter cycles.
147:03 in new roles uh in with shorter cycles. And when you do that, we see really
147:05 And when you do that, we see really great performance improvements and
147:07 great performance improvements and that's what gives you gives us this uh
147:09 that's what gives you gives us this uh path to where we see things are going to
147:12 path to where we see things are going to improve.
147:18 So we realized that rewiring the PDLC is not just a one-sizefits-all solution.
147:21 not just a one-sizefits-all solution. For example, different types of
147:22 For example, different types of engineering functions across the
147:24 engineering functions across the enterprise along the product life cycle
147:26 enterprise along the product life cycle may require different operating models
147:28 may require different operating models based on how humans and agents best
147:30 based on how humans and agents best collaborate. So if we take the example
147:33 collaborate. So if we take the example of modernizing legacy code bases, this
147:36 of modernizing legacy code bases, this task requires a high context of
147:39 task requires a high context of potentially the entire codebase but also
147:41 potentially the entire codebase but also has clearly well- definfined outputs. So
147:44 has clearly well- definfined outputs. So an example operating model could look
147:46 an example operating model could look like a factory of agents where humans
147:48 like a factory of agents where humans provide an initial spec and final review
147:51 provide an initial spec and final review with minimal intervention.
147:53 with minimal intervention. For new features for green field and
147:56 For new features for green field and brownfield projects, the operating model
147:58 brownfield projects, the operating model may look like an iterative loop because
148:00 may look like an iterative loop because they may benefit from the
148:02 they may benefit from the non-deterministic outputs and increased
148:05 non-deterministic outputs and increased variation where agents act as
148:07 variation where agents act as co-creators um providing more options to
148:10 co-creators um providing more options to facilitate faster feedback loops.
148:15 facilitate faster feedback loops. So, as we mentioned, we did a survey
148:17 So, as we mentioned, we did a survey among 300 enterprises globally to
148:19 among 300 enterprises globally to understand what sets these top
148:21 understand what sets these top performers apart. We found that they are
148:23 performers apart. We found that they are seven times more likely to have AI
148:26 seven times more likely to have AI native workflows which meant scaling
148:28 native workflows which meant scaling over four use cases across the software
148:30 over four use cases across the software development life cycle rather than just
148:32 development life cycle rather than just having point solutions for just code
148:34 having point solutions for just code review or for just code dov. They were
148:37 review or for just code dov. They were also six times more likely to have AI
148:39 also six times more likely to have AI native roles which meant having smaller
148:42 native roles which meant having smaller pods with different skill sets and new
148:44 pods with different skill sets and new roles.
148:46 roles. To enable these shifts, these
148:48 To enable these shifts, these organizations were investing in
148:50 organizations were investing in continuous and hands-on upskilling,
148:52 continuous and hands-on upskilling, impact measurement, and also incentive
148:55 impact measurement, and also incentive structures to incentivize developers and
148:58 structures to incentivize developers and PMs to adopt AI.
149:01 PMs to adopt AI. This led to five to six times increase
149:04 This led to five to six times increase in time to market and delivery speed as
149:07 in time to market and delivery speed as well as higher quality and more
149:09 well as higher quality and more consistent artifacts.
149:12 consistent artifacts. So when we talk about AI native
149:13 So when we talk about AI native workflows we mean that these enterprises
149:16 workflows we mean that these enterprises are moving away from quarterly planning
149:18 are moving away from quarterly planning to continuous planning and also um the
149:21 to continuous planning and also um the unit of work is moving from storydriven
149:24 unit of work is moving from storydriven to spec driven development so that these
149:27 to spec driven development so that these PMs are iterating on the specs with
149:29 PMs are iterating on the specs with agents rather than iterating on long
149:31 agents rather than iterating on long PRDs.
149:33 PRDs. On the talent side, AI native roles
149:35 On the talent side, AI native roles essentially means that we're moving away
149:37 essentially means that we're moving away from the two pizza structure to one
149:40 from the two pizza structure to one pizza pods of three to five individuals.
149:42 pizza pods of three to five individuals. Instead of having separate QA front-end
149:45 Instead of having separate QA front-end and back-end engineers, there are more
149:48 and back-end engineers, there are more consolidated roles where product
149:49 consolidated roles where product builders are managing and orchestrating
149:51 builders are managing and orchestrating agents with full stack fluency and also
149:54 agents with full stack fluency and also a better understanding of the full
149:56 a better understanding of the full architecture of their codebase. PMS are
149:59 architecture of their codebase. PMS are starting to create direct um prototypes
150:01 starting to create direct um prototypes in code rather than iterating on these
150:04 in code rather than iterating on these long PRDs.
150:06 long PRDs. And one example um that we've described
150:08 And one example um that we've described in our article, we've studied some AI
150:10 in our article, we've studied some AI native startups and realized that
150:12 native startups and realized that they've actually implemented all of
150:14 they've actually implemented all of these shifts to accelerate their
150:16 these shifts to accelerate their outcomes. And in our article, we've
150:18 outcomes. And in our article, we've described how cursor actually operates
150:19 described how cursor actually operates internally.
150:21 internally. But if you're a large enterprise
150:23 But if you're a large enterprise predicated on the agile model, what are
150:25 predicated on the agile model, what are some steps you can take? So in in a
150:28 some steps you can take? So in in a recent client study with a leading
150:29 recent client study with a leading international bank, we tested some team
150:32 international bank, we tested some team level interventions to address the
150:33 level interventions to address the bottlenecks previously mentioned before
150:36 bottlenecks previously mentioned before mainly around the sequencing of steps
150:38 mainly around the sequencing of steps within the agile ceremony and how uh to
150:41 within the agile ceremony and how uh to define the roles of agents and humans
150:44 define the roles of agents and humans within the sprint cycle. So let's walk
150:46 within the sprint cycle. So let's walk through some examples.
150:48 through some examples. First, team leads would assign sprint
150:51 First, team leads would assign sprint stories using agents based on the data
150:53 stories using agents based on the data of the team velocity and delivery
150:55 of the team velocity and delivery history. And then they would create
150:58 history. And then they would create co-create multiple prototypes and
151:00 co-create multiple prototypes and iterate with agents on the acceptance
151:02 iterate with agents on the acceptance criteria around security and
151:04 criteria around security and observability needs to have more
151:05 observability needs to have more consistent artifacts across teams. This
151:08 consistent artifacts across teams. This prevents downstream rework that was
151:11 prevents downstream rework that was mentioned before so that developers
151:13 mentioned before so that developers don't have to constantly be iterating
151:15 don't have to constantly be iterating with the agents during during the code
151:17 with the agents during during the code process. The squads were also
151:19 process. The squads were also reorganized by workflow. So there would
151:21 reorganized by workflow. So there would be one which would be focused on um
151:24 be one which would be focused on um small bug fixes and another focused on
151:27 small bug fixes and another focused on green field development. In the
151:29 green field development. In the background agents would be used to look
151:32 background agents would be used to look and impact uh look at um the potential
151:36 and impact uh look at um the potential cross repository impacts um to prevent
151:39 cross repository impacts um to prevent debugging time for developers.
151:42 debugging time for developers. And another example is that instead of
151:44 And another example is that instead of re for reducing the collaboration
151:46 re for reducing the collaboration overhead and meetings that happen within
151:48 overhead and meetings that happen within this sprint cycle um instead of waiting
151:51 this sprint cycle um instead of waiting for data scientist input PMS would
151:54 for data scientist input PMS would directly be observing the real-time
151:56 directly be observing the real-time customer feedback to rep prioritize
151:58 customer feedback to rep prioritize these features
152:00 these features and this would lead to an acceleration
152:02 and this would lead to an acceleration in the backlog within the same amount of
152:05 in the backlog within the same amount of time.
152:07 time. So we studied the um impact of these
152:10 So we studied the um impact of these interventions and found high promising
152:12 interventions and found high promising results. For example, not just the
152:15 results. For example, not just the increase in agent consumption by over 60
152:18 increase in agent consumption by over 60 times, but there was also an increase in
152:20 times, but there was also an increase in the delivery speed that was tied
152:22 the delivery speed that was tied directly to the business priorities for
152:24 directly to the business priorities for this bank. There was a 51% increase in
152:27 this bank. There was a 51% increase in code mergers, but also a decrease in um
152:31 code mergers, but also a decrease in um an increase in efficiency.
152:34 an increase in efficiency. The other aspect of this is is uh around
152:37 The other aspect of this is is uh around the different roles and and and the
152:39 the different roles and and and the talent model. And so one of the biggest
152:42 talent model. And so one of the biggest differentiators that we saw as mentioned
152:44 differentiators that we saw as mentioned was around what you have actually
152:46 was around what you have actually changed the roles that uh that are
152:48 changed the roles that uh that are involved in software development. And so
152:50 involved in software development. And so you know what what you all are seeing is
152:53 you know what what you all are seeing is that engineers are moving away from
152:55 that engineers are moving away from execution and and just simply writing
152:57 execution and and just simply writing code to being more of orchestrators and
153:00 code to being more of orchestrators and and thinking through more how to divide
153:02 and thinking through more how to divide up work to agents. for example. And we
153:04 up work to agents. for example. And we also heard some examples of how the role
153:06 also heard some examples of how the role of the product manager is changing. And
153:09 of the product manager is changing. And so while this this may sound, you know,
153:11 so while this this may sound, you know, pretty straightforward to many of you
153:12 pretty straightforward to many of you here who are who are working with these
153:14 here who are who are working with these tools like day-to-day that you have to
153:16 tools like day-to-day that you have to change what you do, the reality is that
153:19 change what you do, the reality is that about 70% of the companies that we that
153:22 about 70% of the companies that we that we survey have have not changed the
153:23 we survey have have not changed the roles at all. Right? And so you have
153:26 roles at all. Right? And so you have this background expectation that people
153:28 this background expectation that people are going to do things differently but
153:29 are going to do things differently but the the role is still defined in the
153:31 the the role is still defined in the same way and it's the same understanding
153:33 same way and it's the same understanding uh as it was you know a couple of years
153:35 uh as it was you know a couple of years ago.
153:37 ago. Um but we are starting to see you know
153:40 Um but we are starting to see you know some companies changing this. So this is
153:42 some companies changing this. So this is another example from a from another
153:43 another example from a from another recent recent client. They were set up
153:46 recent recent client. They were set up in a in a way that is, you know, pretty
153:48 in a in a way that is, you know, pretty common for for u many companies and a
153:51 common for for u many companies and a kind of typical two pizza uh team model
153:55 kind of typical two pizza uh team model with with the types of roles that you
153:56 with with the types of roles that you would be familiar with. Um the we ran a
154:01 would be familiar with. Um the we ran a bunch of experiments and front runners
154:03 bunch of experiments and front runners and and tested new models that were had
154:05 and and tested new models that were had much smaller pods uh that had uh new
154:09 much smaller pods uh that had uh new roles which consolidated some of the
154:12 roles which consolidated some of the tasks that were previously done with
154:14 tasks that were previously done with different roles.
154:15 different roles. And and so by doing that we could we
154:18 And and so by doing that we could we could create basically more pods or more
154:20 could create basically more pods or more teams uh with with the same number of
154:23 teams uh with with the same number of people uh but retaining the expectation
154:25 people uh but retaining the expectation that each pod is is uh is um performing
154:29 that each pod is is uh is um performing at about the same level as as they were
154:31 at about the same level as as they were before.
154:33 before. And so so we also see really uh really
154:36 And so so we also see really uh really positive results from that uh with with
154:39 positive results from that uh with with uh maintaining and even improving in
154:41 uh maintaining and even improving in some is the quality of the code that was
154:43 some is the quality of the code that was generated. In particular there was a
154:45 generated. In particular there was a there was a high speed up in in terms of
154:49 there was a high speed up in in terms of uh the output from from the different
154:50 uh the output from from the different teams and you can see some of the
154:51 teams and you can see some of the metrics uh here.
154:57 Let's shift gears a little bit and and and go from talking about just the team
154:59 and go from talking about just the team level. So how does this now scale uh
155:02 level. So how does this now scale uh across a big organization?
155:05 across a big organization? The reality is that many many companies
155:08 The reality is that many many companies don't just have like one or two of these
155:10 don't just have like one or two of these these teams but often hundreds of teams
155:12 these teams but often hundreds of teams even and thousands or even tens of
155:14 even and thousands or even tens of thousands of people who are working in
155:16 thousands of people who are working in this way. And uh this is where one of
155:20 this way. And uh this is where one of the biggest differences that we that we
155:22 the biggest differences that we that we saw between those that are stuck a bit
155:24 saw between those that are stuck a bit in the um in in getting only 10% or so
155:28 in the um in in getting only 10% or so change improvements from those who are
155:30 change improvements from those who are seeing outsized improvements is around
155:33 seeing outsized improvements is around how you manage that how you manage that
155:35 how you manage that how you manage that change and change management I is like
155:38 change and change management I is like one of these is a little bit of an often
155:40 one of these is a little bit of an often catch or elusive term for uh for a lot
155:43 catch or elusive term for uh for a lot of different things but but I think in
155:46 of different things but but I think in some ways it's not a bad way to think
155:47 some ways it's not a bad way to think about Right. I I usually say that the
155:49 about Right. I I usually say that the change management is about getting a lot
155:51 change management is about getting a lot of like small things right. And so the
155:54 of like small things right. And so the crux to like actually scaling this is
155:56 crux to like actually scaling this is often about getting 20 30 or even more
155:59 often about getting 20 30 or even more things right at the same time that
156:01 things right at the same time that involve the way you communicate uh what
156:04 involve the way you communicate uh what this means, the way you incentivize
156:05 this means, the way you incentivize people, uh the way you upskill them, and
156:08 people, uh the way you upskill them, and it all has to come together.
156:11 it all has to come together. Um and when it when it's not, we we we
156:14 Um and when it when it's not, we we we see what happens. And so this is an
156:16 see what happens. And so this is an example from from another tech company
156:18 example from from another tech company that we worked with um where initially
156:21 that we worked with um where initially we're rolling out new AI tools for them
156:24 we're rolling out new AI tools for them that that hit different parts of the
156:26 that that hit different parts of the product development life cycle. Um we we
156:29 product development life cycle. Um we we rolled we rolled out the tools there was
156:31 rolled we rolled out the tools there was some usage but often it dropped off. It
156:33 some usage but often it dropped off. It was either not used or it was um it was
156:36 was either not used or it was um it was sort of um used in very suboptimal ways.
156:39 sort of um used in very suboptimal ways. So that's the sort of jagged part that
156:40 So that's the sort of jagged part that you're seeing on the on the left hand
156:42 you're seeing on the on the left hand side here. despite kind of adding more
156:44 side here. despite kind of adding more users uh the overall impact did not
156:47 users uh the overall impact did not change at all. So we had to do a quite a
156:50 change at all. So we had to do a quite a reset and and um start over effectively
156:54 reset and and um start over effectively reset the expectations. What should what
156:56 reset the expectations. What should what what does this mean if you're a
156:57 what does this mean if you're a developer dayto-day? What does it mean
156:59 developer dayto-day? What does it mean for a PM? Uh we had much more hands-on
157:02 for a PM? Uh we had much more hands-on upskilling. There was could bring your
157:04 upskilling. There was could bring your own code. there were, you know, coaches
157:06 own code. there were, you know, coaches available, especially those first like
157:08 available, especially those first like few sprints before you get make this a
157:10 few sprints before you get make this a habit and work it into the way that you
157:13 habit and work it into the way that you develop software dayto-day. It's a very
157:15 develop software dayto-day. It's a very critical time and that's when when this
157:17 critical time and that's when when this matters a lot. Um, and having a bit of a
157:21 matters a lot. Um, and having a bit of a a measurement system as well, so you
157:23 a measurement system as well, so you know what's changing and and you're able
157:24 know what's changing and and you're able to to see what's uh what's what what's
157:28 to to see what's uh what's what what's improving.
157:30 improving. Another example just to put this alive a
157:32 Another example just to put this alive a little bit as mentioned like this is
157:34 little bit as mentioned like this is about getting a lot of things um right
157:38 about getting a lot of things um right and it's each one of these individually
157:40 and it's each one of these individually may not seem like it's the biggest deal
157:42 may not seem like it's the biggest deal uh but put together they really make a
157:44 uh but put together they really make a make a huge difference like this is for
157:47 make a huge difference like this is for this is some of the top uh interventions
157:49 this is some of the top uh interventions that another client had to go through
157:52 that another client had to go through for them it really helped having you
157:54 for them it really helped having you know setting up code labs for example
157:56 know setting up code labs for example really you know instituting a new set of
157:58 really you know instituting a new set of certifications that help motivate and
158:00 certifications that help motivate and and drive people to to change what they
158:02 and drive people to to change what they do day dayto-day. And these these things
158:05 do day dayto-day. And these these things really added up to uh the change they
158:08 really added up to uh the change they needed.
158:10 needed. >> But building a robust measurement system
158:12 >> But building a robust measurement system that prioritizes outcomes and not just
158:14 that prioritizes outcomes and not just adoption is important not just to
158:16 adoption is important not just to monitor progress but also pinpoint
158:19 monitor progress but also pinpoint issues and course correct quickly. So,
158:22 issues and course correct quickly. So, one surprising result from the survey
158:24 one surprising result from the survey was that these enterprises that were
158:26 was that these enterprises that were bottom performers were not even
158:27 bottom performers were not even measuring speed and only 10% were
158:29 measuring speed and only 10% were measuring productivity.
158:31 measuring productivity. But our goal is to make our clients top
158:33 But our goal is to make our clients top performing organizations. So, we've
158:35 performing organizations. So, we've worked with them to create a holistic
158:36 worked with them to create a holistic measurement system that captures impact
158:40 measurement system that captures impact all the way down to inputs. So for
158:42 all the way down to inputs. So for inputs this would include the investment
158:45 inputs this would include the investment into coding tools and other AI tools but
158:47 into coding tools and other AI tools but also the time and resources in
158:49 also the time and resources in upskilling and change management. These
158:52 upskilling and change management. These inputs would lead to direct outputs but
158:54 inputs would lead to direct outputs but a lot of organizations are just focusing
158:56 a lot of organizations are just focusing on how the increased breath and depth of
158:59 on how the increased breath and depth of adoption with of AI tools is leading to
159:01 adoption with of AI tools is leading to increased velocity and capac capacity
159:04 increased velocity and capac capacity increase. However, it's also important
159:06 increase. However, it's also important to understand how developers have uh
159:09 to understand how developers have uh different uh NPS scores and if they're
159:11 different uh NPS scores and if they're enjoying their craft more um rather than
159:13 enjoying their craft more um rather than feeling more frustrated. And it's also
159:16 feeling more frustrated. And it's also important to understand whether the code
159:18 important to understand whether the code is becoming more secure and have has
159:20 is becoming more secure and have has better quality but also more resilient.
159:22 better quality but also more resilient. And one proxy for resiliency that we
159:24 And one proxy for resiliency that we used for our client was the meantime to
159:27 used for our client was the meantime to resolve priority bugs.
159:29 resolve priority bugs. Now if we look at economic outcomes
159:31 Now if we look at economic outcomes which is priority for um the seauite
159:34 which is priority for um the seauite executives they look into what is the
159:36 executives they look into what is the time to revenue target. What is the
159:38 time to revenue target. What is the increased price differential for higher
159:40 increased price differential for higher quality features or expanding the number
159:42 quality features or expanding the number of customers to meet the feature demand
159:44 of customers to meet the feature demand and also what is the cost reduction per
159:47 and also what is the cost reduction per pod for reduced human labor.
159:50 pod for reduced human labor. In aggregate, having these larger
159:53 In aggregate, having these larger economic outcomes can also lead um to
159:56 economic outcomes can also lead um to for organizations to understand how
159:58 for organizations to understand how there is an increased reinvestment in
160:00 there is an increased reinvestment in green field and brownfield development.
160:02 green field and brownfield development. But as these tools evolve, the proxies
160:05 But as these tools evolve, the proxies for these metrics will also evolve. But
160:07 for these metrics will also evolve. But hopefully this provides a MECI framework
160:10 hopefully this provides a MECI framework as an initial starting point.
160:12 as an initial starting point. So what's next? The future of course is
160:15 So what's next? The future of course is difficult to predict, let alone in the
160:17 difficult to predict, let alone in the next 5 years. But we hope that with our
160:20 next 5 years. But we hope that with our vision of a new software development
160:21 vision of a new software development model, even as agents increase in their
160:24 model, even as agents increase in their intelligence and humans become more
160:26 intelligence and humans become more fluent in AI, that this model still
160:29 fluent in AI, that this model still stands. So hopefully this model that
160:32 stands. So hopefully this model that includes um shorter sprints, smaller
160:35 includes um shorter sprints, smaller teams, but large u smaller but larger
160:37 teams, but large u smaller but larger number of teams will set enterprises up
160:40 number of teams will set enterprises up for success in the long term.
160:42 for success in the long term. >> So just leave you with some some key
160:44 >> So just leave you with some some key takeaways. um start now. I would say to
160:48 takeaways. um start now. I would say to to our our clients, this is a human
160:50 to our our clients, this is a human change and it takes some times and it's
160:52 change and it takes some times and it's a big change and and it's going to be a
160:55 a big change and and it's going to be a journey and so I think um this is
160:57 journey and so I think um this is something that everyone needs to go on.
160:59 something that everyone needs to go on. I think it's also important to figure
161:01 I think it's also important to figure out which model works for you and set a
161:03 out which model works for you and set a really bold ambition and with that say
161:06 really bold ambition and with that say thank you so much for listening to us
161:08 thank you so much for listening to us and and uh we have an article here if
161:10 and and uh we have an article here if you're more interested in in the
161:12 you're more interested in in the research that we've conducted. Thank you
161:14 research that we've conducted. Thank you so much for having us. Our [applause]
161:28 next presenter is a researcher at Stanford who studies how AI impacts over
161:31 Stanford who studies how AI impacts over 100,000 developers in the real world.
161:35 100,000 developers in the real world. Please welcome Jaor Dennis Blanch.
161:55 So companies spend millions on AI tools for software engineering. But do we
161:57 for software engineering. But do we actually know how well these tools work
161:59 actually know how well these tools work in the enterprise or are these tools
162:01 in the enterprise or are these tools just all hype? to answer this and for
162:05 just all hype? to answer this and for the past two years we've been
162:06 the past two years we've been researching the impact of AI on software
162:08 researching the impact of AI on software engineering productivity and our
162:11 engineering productivity and our research is time series because we look
162:13 research is time series because we look at get historical data meaning we can go
162:15 at get historical data meaning we can go back in time and it's also
162:17 back in time and it's also cross-sectional because we cut across
162:19 cross-sectional because we cut across companies and the way we use to measure
162:23 companies and the way we use to measure most of the of the impact is by a
162:25 most of the of the impact is by a machine learning model that replicates a
162:27 machine learning model that replicates a panel of human experts. The way this
162:30 panel of human experts. The way this works is that imagine you have a
162:33 works is that imagine you have a software engineer who writes a code
162:35 software engineer who writes a code commit and this code commit would be
162:37 commit and this code commit would be evaluated by multiple panels or of 10
162:40 evaluated by multiple panels or of 10 and 15 independent experts who would
162:42 and 15 independent experts who would evaluate that code commit across
162:45 evaluate that code commit across implementation time maintainability and
162:47 implementation time maintainability and complexity and then produce an output
162:49 complexity and then produce an output evaluation. So we took the labels of
162:52 evaluation. So we took the labels of these panels across you know millions of
162:54 these panels across you know millions of of kind of evaluations and then trained
162:56 of kind of evaluations and then trained a model to replicate this panel of
162:58 a model to replicate this panel of experts meaning that we can deploy this
163:00 experts meaning that we can deploy this at scale and if there's ever any doubts
163:03 at scale and if there's ever any doubts around the models output you can always
163:05 around the models output you can always kind of assemble your own panel and see
163:07 kind of assemble your own panel and see that it correlates pretty well with
163:08 that it correlates pretty well with reality.
163:10 reality. Today we'll talk about four things.
163:12 Today we'll talk about four things. We'll start off with looking at some of
163:14 We'll start off with looking at some of the things that are driving AI
163:16 the things that are driving AI productivity gains in software. Then
163:18 productivity gains in software. Then we'll look at a AI practices benchmark
163:21 we'll look at a AI practices benchmark that we developed. We'll then look at
163:24 that we developed. We'll then look at how we propose to measure AI return on
163:27 how we propose to measure AI return on investment in software engineering. And
163:29 investment in software engineering. And lastly, we'll finish things off with a
163:31 lastly, we'll finish things off with a case study.
163:33 case study. So here we took 46 teams that were using
163:37 So here we took 46 teams that were using AI and we matched them with 46 similar
163:40 AI and we matched them with 46 similar teams that were not using AI and we
163:43 teams that were not using AI and we measured their net productivity gains
163:45 measured their net productivity gains from AI quarterly. And the shaded area
163:49 from AI quarterly. And the shaded area is the middle 50% of the data and the
163:51 is the middle 50% of the data and the dark blue line is the median which as of
163:53 dark blue line is the median which as of July of this year stands at about 10%
163:55 July of this year stands at about 10% for this cohort.
163:58 for this cohort. I'd like to direct your attention to the
163:59 I'd like to direct your attention to the fact that the discrepancy between the
164:02 fact that the discrepancy between the top performers and the bottom ones is
164:04 top performers and the bottom ones is increasing. There's a widening gap. And
164:07 increasing. There's a widening gap. And so if we very unscientifically and very
164:10 so if we very unscientifically and very illustratively project this forward, we
164:12 illustratively project this forward, we might get something like this, right?
164:14 might get something like this, right? Where uh you can have these top
164:15 Where uh you can have these top performers being part of this the rich
164:18 performers being part of this the rich gets richer effect where they these
164:20 gets richer effect where they these successful early AI adopters might
164:22 successful early AI adopters might compound their gains while these
164:24 compound their gains while these strugglers could fall further behind. At
164:26 strugglers could fall further behind. At some point this is going to converge and
164:28 some point this is going to converge and this is very directional. But my point
164:30 this is very directional. But my point here is that if you're a leader in a
164:32 here is that if you're a leader in a company, you definitely need to know in
164:34 company, you definitely need to know in which cohort you are right now so that
164:35 which cohort you are right now so that you can course correct. And without
164:37 you can course correct. And without measuring the impact of AI on your
164:40 measuring the impact of AI on your engineers, you're not going to be able
164:41 engineers, you're not going to be able to do this.
164:44 to do this. So we started investigating what are
164:46 So we started investigating what are some of the factors that drive these top
164:48 some of the factors that drive these top teams to perform better. And the first
164:50 teams to perform better. And the first thing we looked at is AI usage or
164:52 thing we looked at is AI usage or basically token spent. In this graph you
164:55 basically token spent. In this graph you have the same kind of on the vertical
164:58 have the same kind of on the vertical axis the productivity increase and then
165:00 axis the productivity increase and then on the horizontal one you have the token
165:02 on the horizontal one you have the token usage per engineer per month on a
165:03 usage per engineer per month on a logarithmic scale. And what you can see
165:06 logarithmic scale. And what you can see is that the correlation is quite loose
165:08 is that the correlation is quite loose 20 or so linearly. And there is a bit of
165:11 20 or so linearly. And there is a bit of a death valley effect around the 10
165:13 a death valley effect around the 10 million uh token mark whereby com teams
165:16 million uh token mark whereby com teams that were using that amount of tokens
165:17 that were using that amount of tokens seem to be doing worse than teams that
165:19 seem to be doing worse than teams that were using a bit less tokens. It's very
165:21 were using a bit less tokens. It's very directional but interesting
165:22 directional but interesting nevertheless.
165:24 nevertheless. The conclusion here might be that AI
165:26 The conclusion here might be that AI usage quality matters more than AI usage
165:30 usage quality matters more than AI usage value.
165:32 value. We dug deeper and we said well does the
165:35 We dug deeper and we said well does the environment in which the engineers work
165:37 environment in which the engineers work impact the productivity from AI and we
165:40 impact the productivity from AI and we came up with an environment cleaniness
165:42 came up with an environment cleaniness index index. It's quite experimental.
165:44 index index. It's quite experimental. It's a composite score that looks at
165:46 It's a composite score that looks at tests looks at uh types at documentation
165:49 tests looks at uh types at documentation and at modularity and at code quality.
165:52 and at modularity and at code quality. And that index is on the bottom axis
165:54 And that index is on the bottom axis here from 0 to one. And then on the
165:56 here from 0 to one. And then on the vertical axis once again you have the
165:57 vertical axis once again you have the kind of productivity lift relative to
165:59 kind of productivity lift relative to teams not using AI. And so what you can
166:02 teams not using AI. And so what you can see is that there's a40 R squar meaning
166:05 see is that there's a40 R squar meaning a pretty decent correlation around
166:07 a pretty decent correlation around environment cleanliness and gains from
166:10 environment cleanliness and gains from uh AI or productivity gains from using
166:13 uh AI or productivity gains from using AI. And so the takeaway here is to
166:15 AI. And so the takeaway here is to invest in codebased hygiene to unlock
166:18 invest in codebased hygiene to unlock these AI productivity gains.
166:21 these AI productivity gains. We dug deeper to illustrate this
166:23 We dug deeper to illustrate this concept. And here we have on this graph
166:26 concept. And here we have on this graph on the vertical axis the percentage of
166:28 on the vertical axis the percentage of tasks that might uh be able to be
166:31 tasks that might uh be able to be completed by AI based on three colors.
166:33 completed by AI based on three colors. And so green means that AI can do most
166:36 And so green means that AI can do most of the work for that task in that
166:37 of the work for that task in that sprint. Yellow means that AI can help
166:40 sprint. Yellow means that AI can help someone and red uh means that AI is not
166:43 someone and red uh means that AI is not very useful. And this is quite
166:44 very useful. And this is quite illustrative but it it conveys the
166:46 illustrative but it it conveys the point. And so then any code base at any
166:48 point. And so then any code base at any point in time sits on a vertical line
166:50 point in time sits on a vertical line across this graphic. And what you can
166:52 across this graphic. And what you can see is that clean code amplifies AI
166:55 see is that clean code amplifies AI gains.
166:56 gains. Secondly is that you need to manage your
166:59 Secondly is that you need to manage your codebase entropy, right? Your codebase
167:01 codebase entropy, right? Your codebase tech debt because if you just use AI
167:04 tech debt because if you just use AI unchecked, this is going to accelerate
167:06 unchecked, this is going to accelerate this entropy which is going to push and
167:08 this entropy which is going to push and degrade your cleaniness to the left kind
167:10 degrade your cleaniness to the left kind of right. And then you as as a human
167:12 of right. And then you as as a human need to push on the other side to kind
167:14 need to push on the other side to kind of improve or maintain that cleanliness
167:16 of improve or maintain that cleanliness to keep reaping the benefits from AI.
167:20 to keep reaping the benefits from AI. Thirdly is that it's important that
167:21 Thirdly is that it's important that engineers need to know when to use AI
167:24 engineers need to know when to use AI and when not to use AI. And what happens
167:26 and when not to use AI. And what happens when they don't is this kind of line on
167:29 when they don't is this kind of line on the left whereby you have AI AI outputs
167:32 the left whereby you have AI AI outputs that are rejected or need heavy
167:34 that are rejected or need heavy rewriting which then leads to engineers
167:37 rewriting which then leads to engineers losing trust in AI saying okay this just
167:39 losing trust in AI saying okay this just doesn't work. I'm not going to use it.
167:40 doesn't work. I'm not going to use it. Which then further collapses your AI
167:42 Which then further collapses your AI gains.
167:51 Now, we said, can we find out whether we can look not only at usage but at how
167:53 can look not only at usage but at how are these companies and these engineers
167:55 are these companies and these engineers using AI? And we came up with an AI
167:59 using AI? And we came up with an AI engineering practices benchmark. The way
168:01 engineering practices benchmark. The way this works is that we can scan your
168:03 this works is that we can scan your codebase and detect these AI
168:05 codebase and detect these AI fingerprints or artifacts. Basically,
168:07 fingerprints or artifacts. Basically, traces of how your team is using AI.
168:10 traces of how your team is using AI. It's quite directional at this point,
168:11 It's quite directional at this point, but evolving. And we can quantify this
168:15 but evolving. And we can quantify this based on the percentage of your active
168:17 based on the percentage of your active engineering work that uses each AI
168:19 engineering work that uses each AI pattern. And then we kind of repeat this
168:21 pattern. And then we kind of repeat this monthly using get history. And the way
168:24 monthly using get history. And the way this works is more or less you have kind
168:26 this works is more or less you have kind of a few levels. And level zero might be
168:28 of a few levels. And level zero might be how humans are just not using AI and
168:30 how humans are just not using AI and write all of the code. Level one is kind
168:33 write all of the code. Level one is kind of like personal use where engineers are
168:35 of like personal use where engineers are not sharing prompts across the team or
168:38 not sharing prompts across the team or not versioning them. Level two is team
168:40 not versioning them. Level two is team use whereby teams are are sharing these
168:43 use whereby teams are are sharing these kind of prompts and rules. And then
168:45 kind of prompts and rules. And then level three is even more sophisticated.
168:46 level three is even more sophisticated. It's where AI autonomously does specific
168:49 It's where AI autonomously does specific tasks maybe not the entire workflow. And
168:51 tasks maybe not the entire workflow. And level four is you know agentic
168:53 level four is you know agentic orchestration which is where AI just
168:55 orchestration which is where AI just runs the entire process. And so this is
168:57 runs the entire process. And so this is going to be an open- source tool which
168:59 going to be an open- source tool which you can leverage if you sign up on the
169:01 you can leverage if you sign up on the sweeper research portal.
169:05 sweeper research portal. We applied this benchmark to one of the
169:08 We applied this benchmark to one of the companies in our research data set and
169:10 companies in our research data set and we saw this this company had two
169:12 we saw this this company had two business units with equal access to AI
169:15 business units with equal access to AI tools, right? Same licenses, same spend,
169:18 tools, right? Same licenses, same spend, same tools, same everything. But the
169:20 same tools, same everything. But the adoption rate and the usage rate was
169:22 adoption rate and the usage rate was very different by business unit. On the
169:24 very different by business unit. On the left, the first business unit, you can
169:27 left, the first business unit, you can as you can see in the area in the blue,
169:28 as you can see in the area in the blue, seemed to be using AI a lot more for
169:31 seemed to be using AI a lot more for almost 40% of their work. Whereas on the
169:34 almost 40% of their work. Whereas on the on the uh right, the second business
169:36 on the uh right, the second business unit seem to struggle behind a bit more.
169:39 unit seem to struggle behind a bit more. And so the takeaway here is that access
169:41 And so the takeaway here is that access to AI and even AI usage doesn't mean or
169:45 to AI and even AI usage doesn't mean or doesn't guarantee that that AI is going
169:48 doesn't guarantee that that AI is going to be used in the same way across a
169:50 to be used in the same way across a company.
169:52 company. As a leader, you really want to be
169:53 As a leader, you really want to be understanding not just whether they're
169:55 understanding not just whether they're using but also how your engineers are
169:57 using but also how your engineers are using AI.
170:03 Great. Now let's dive into how do we actually measure AI return on investment
170:06 actually measure AI return on investment in software engineering.
170:14 Oh uh there we go. Okay. So here ideally we would be measuring this based on
170:15 we would be measuring this based on business outcomes, right? I give my AI
170:18 business outcomes, right? I give my AI engineer my engineers AI and then I make
170:21 engineer my engineers AI and then I make more money, more revenue, net revenue
170:23 more money, more revenue, net revenue retention, whatever business KPI you
170:24 retention, whatever business KPI you want to track. The problem is that
170:27 want to track. The problem is that there's too much noise between the
170:29 there's too much noise between the treatment right giving AI and the result
170:32 treatment right giving AI and the result which is the business outcome. And on
170:34 which is the business outcome. And on top of this there's confounding
170:36 top of this there's confounding variables such as your sales execution,
170:38 variables such as your sales execution, the macro environment, your product
170:39 the macro environment, your product strategy and therefore although that
170:42 strategy and therefore although that would be ideal unfortunately uh I think
170:44 would be ideal unfortunately uh I think we need to find alternative paths and
170:47 we need to find alternative paths and the most logical one is to simply look
170:48 the most logical one is to simply look at the engineering outcomes because
170:50 at the engineering outcomes because there is a clear signal right but here
170:52 there is a clear signal right but here we need to go beyond measuring AI usage
170:55 we need to go beyond measuring AI usage into measuring engineering outcomes.
170:57 into measuring engineering outcomes. There's a few caveats and this topic is
170:59 There's a few caveats and this topic is quite heavily discussed and so I want to
171:01 quite heavily discussed and so I want to mention some of them.
171:03 mention some of them. The first one is that this is assuming
171:05 The first one is that this is assuming that our product function can properly
171:08 that our product function can properly direct that increased capacity into
171:09 direct that increased capacity into something that generates value. And if
171:11 something that generates value. And if they aren't directing that, then it's a
171:13 they aren't directing that, then it's a product problem, which although sits
171:16 product problem, which although sits quite close to engineering, it's
171:18 quite close to engineering, it's slightly different, right?
171:20 slightly different, right? The second caveat is that this assumes
171:22 The second caveat is that this assumes that engineering is a meaningful
171:23 that engineering is a meaningful bottleneck for value which frankly it
171:25 bottleneck for value which frankly it typically is and that you can guard
171:27 typically is and that you can guard against good hards law by using a
171:30 against good hards law by using a balanced set of metrics and also by
171:32 balanced set of metrics and also by having a good company culture that
171:34 having a good company culture that doesn't weaponize these metrics.
171:36 doesn't weaponize these metrics. And thirdly is that AI is still very new
171:39 And thirdly is that AI is still very new and measuring proxy metrics is still
171:42 and measuring proxy metrics is still better than not measuring. There's going
171:44 better than not measuring. There's going to be winners and losers in this AI
171:46 to be winners and losers in this AI race. And progress is better than
171:48 race. And progress is better than perfection here. And so metrics don't
171:50 perfection here. And so metrics don't need to be flawless to be useful is what
171:52 need to be flawless to be useful is what I want to illustrate.
172:03 So then um here we have uh two parts which you need to do to get the ROI from
172:05 which you need to do to get the ROI from AI, right? You kind of need to measure
172:07 AI, right? You kind of need to measure usage and then you need to measure
172:08 usage and then you need to measure engineering outcomes. And so let's start
172:12 engineering outcomes. And so let's start with usage.
172:14 with usage. There's really two buckets for
172:15 There's really two buckets for enterprises. There's kind of more in a
172:17 enterprises. There's kind of more in a research environment, but to make it
172:18 research environment, but to make it simple, there's access based and there's
172:20 simple, there's access based and there's usage based. Accessbased is basically
172:22 usage based. Accessbased is basically looking at when did people get access to
172:25 looking at when did people get access to the tool. And here we have you can kind
172:27 the tool. And here we have you can kind of do a pilot group, give that group AI
172:30 of do a pilot group, give that group AI and then compare it to a similar group
172:32 and then compare it to a similar group without AI or you can measure the same
172:34 without AI or you can measure the same team across time. The problem is that
172:36 team across time. The problem is that access based is noisy and the gold
172:39 access based is noisy and the gold standard is really usage based which uh
172:42 standard is really usage based which uh uses telemetry from APIs from these
172:45 uses telemetry from APIs from these coding assistants right to uh give you
172:48 coding assistants right to uh give you the right data to know who's using AI
172:50 the right data to know who's using AI and and where and the caveat here is
172:52 and and where and the caveat here is that the vendor API is different
172:54 that the vendor API is different unfortunately tools like GitHub copilot
172:56 unfortunately tools like GitHub copilot aggregate the data and other tools like
172:58 aggregate the data and other tools like cursor give you more granular data
173:01 cursor give you more granular data the big takeaway is that you can measure
173:03 the big takeaway is that you can measure impact of um retroactively by using get
173:07 impact of um retroactively by using get history. And so you don't need to set up
173:09 history. And so you don't need to set up an experiment now and wait 6 months. You
173:11 an experiment now and wait 6 months. You can actually if you've already adopted
173:12 can actually if you've already adopted AI, you can go back in time and and and
173:15 AI, you can go back in time and and and do this. It's quite easy.
173:18 do this. It's quite easy. Now we've seen usage. Let's look into
173:20 Now we've seen usage. Let's look into how do we actually measure engineering
173:21 how do we actually measure engineering outcomes? What are some of the metrics
173:23 outcomes? What are some of the metrics we propose?
173:33 Here we have um our framework which we proposed which is using a primary metric
173:35 proposed which is using a primary metric and a guardrail metric. And so here um
173:37 and a guardrail metric. And so here um the primary metric is engineering
173:39 the primary metric is engineering output. It's not lines of code. It's not
173:41 output. It's not lines of code. It's not PR counts and it's not dura. And it's
173:43 PR counts and it's not dura. And it's basically based on this machine learning
173:45 basically based on this machine learning model that replicates the panel of
173:46 model that replicates the panel of experts, right? And the second set of
173:48 experts, right? And the second set of metrics are the guard ones which you
173:51 metrics are the guard ones which you want to maintain at a healthy level but
173:53 want to maintain at a healthy level but you don't want to maximize. It doesn't
173:54 you don't want to maximize. It doesn't make sense to maximize them truly. And
173:57 make sense to maximize them truly. And so then there's three categories within
173:59 so then there's three categories within the guardrail ones rework and
174:01 the guardrail ones rework and refactoring quality tech and risk and
174:03 refactoring quality tech and risk and then people and devops. The third bucket
174:05 then people and devops. The third bucket is important to highlight that these are
174:07 is important to highlight that these are not productivity metrics. They're useful
174:09 not productivity metrics. They're useful but you cannot just kind of use them
174:11 but you cannot just kind of use them like maximize them to maximize developer
174:13 like maximize them to maximize developer productivity. They kind of fall off at
174:15 productivity. They kind of fall off at some point. And so the goal here might
174:17 some point. And so the goal here might be to keep your guardrail metrics
174:18 be to keep your guardrail metrics healthy while increasing the primary
174:20 healthy while increasing the primary metric to whatever degree possible.
174:24 metric to whatever degree possible. Now let's dive into a case study. Here
174:28 Now let's dive into a case study. Here we worked with
174:31 we worked with a company that uh large enterprise. We
174:34 a company that uh large enterprise. We took a team of uh 350 people under a
174:36 took a team of uh 350 people under a vice president and we measured pull
174:38 vice president and we measured pull requests. The reason we did this is to
174:41 requests. The reason we did this is to illustrate that you cannot measure pull
174:43 illustrate that you cannot measure pull requests to understand whether AI is
174:45 requests to understand whether AI is helping you. And so here this team
174:47 helping you. And so here this team adopted um AI in May of this year and we
174:49 adopted um AI in May of this year and we measured the four months before four
174:51 measured the four months before four months after. We saw a 14% increase.
174:54 months after. We saw a 14% increase. Great. That's fantastic. But what about
174:56 Great. That's fantastic. But what about reviewer burden? What about code
174:58 reviewer burden? What about code quality? So we measured code quality.
175:01 quality? So we measured code quality. And here what we saw is um I mean
175:04 And here what we saw is um I mean firstly actually code quality think of
175:06 firstly actually code quality think of it as maintainability scale from 0 to
175:08 it as maintainability scale from 0 to 10. And uh there's kind of these bands.
175:11 10. And uh there's kind of these bands. Uh it uses our our methodology. You can
175:13 Uh it uses our our methodology. You can read it online. But basically what you
175:16 read it online. But basically what you see is that in the preAI period their
175:18 see is that in the preAI period their code quality was quite stable and
175:20 code quality was quite stable and consistent. And once they adopted AI,
175:22 consistent. And once they adopted AI, two things happened. Code quality
175:23 two things happened. Code quality decreased and then code quality became
175:25 decreased and then code quality became more erratic.
175:31 Next, we took a look at our metric, which is engineering output. It's not
175:33 which is engineering output. It's not lines of code. And here for every month,
175:35 lines of code. And here for every month, you see the sigma, the sum of the output
175:38 you see the sigma, the sum of the output delivered for that month, broken down
175:39 delivered for that month, broken down into four buckets. Rework and
175:42 into four buckets. Rework and refactoring. So rework is when you're
175:44 refactoring. So rework is when you're changing or editing code that was it's
175:47 changing or editing code that was it's still kind of fresh, so it's recent.
175:48 still kind of fresh, so it's recent. refactoring is when you're changing code
175:50 refactoring is when you're changing code that's a bit older and uh what uh then
175:54 that's a bit older and uh what uh then like added and removed it's pretty
175:56 like added and removed it's pretty self-explanatory and then also you can
175:58 self-explanatory and then also you can see these kind of benchmarks so we can
176:00 see these kind of benchmarks so we can benchmark this company against similar
176:01 benchmark this company against similar companies in their industry and here AI
176:04 companies in their industry and here AI usage had two effects firstly is that
176:06 usage had two effects firstly is that rework went up by 2.5 times which is
176:09 rework went up by 2.5 times which is really bad and effective output which is
176:11 really bad and effective output which is kind of like a proxy for productivity or
176:13 kind of like a proxy for productivity or so didn't really change
176:15 so didn't really change and so then what's the conclusion here
176:17 and so then what's the conclusion here let's do a recap app. So we saw that PRs
176:20 let's do a recap app. So we saw that PRs went up by 14%. But this is inconclusive
176:23 went up by 14%. But this is inconclusive because more PRs doesn't mean better. We
176:26 because more PRs doesn't mean better. We saw that code quality decreased by 9%
176:28 saw that code quality decreased by 9% which is problematic. We saw that
176:30 which is problematic. We saw that effective output didn't increase
176:32 effective output didn't increase meaningfully. And then we saw that
176:34 meaningfully. And then we saw that rework increased by a lot. And so then
176:36 rework increased by a lot. And so then the question here is what is the ROI of
176:39 the question here is what is the ROI of this AI adoption, right? It might be
176:41 this AI adoption, right? It might be negative. And what I want to point out
176:42 negative. And what I want to point out here is that had this company not
176:44 here is that had this company not measured this more thoroughly and simply
176:46 measured this more thoroughly and simply measured PR counts, they would have
176:48 measured PR counts, they would have thought, hey, we're doing great. We
176:50 thought, hey, we're doing great. We increased our productivity by 14%. Let's
176:52 increased our productivity by 14%. Let's run from the numbers. That's how many
176:54 run from the numbers. That's how many million lots of millions of dollars. And
176:56 million lots of millions of dollars. And does this offset the AI license? Sure
176:58 does this offset the AI license? Sure thing it does, right? The other thing is
177:00 thing it does, right? The other thing is that I don't think this company should
177:02 that I don't think this company should abandon AI. They should simply use this
177:03 abandon AI. They should simply use this data to understand what they're doing
177:05 data to understand what they're doing wrong. How can they improve? Because AI
177:07 wrong. How can they improve? Because AI is here to stay. It's a tool that's
177:08 is here to stay. It's a tool that's going to transform how engineers are are
177:10 going to transform how engineers are are working, right? and you can just um kind
177:13 working, right? and you can just um kind of like abandon it or yourself.
177:16 of like abandon it or yourself. Great. So, this concludes our insights
177:19 Great. So, this concludes our insights for today. If you've enjoyed this uh
177:21 for today. If you've enjoyed this uh talk and you would like similar insights
177:23 talk and you would like similar insights for your company, I invite you to
177:24 for your company, I invite you to participate in our research. Everything
177:26 participate in our research. Everything you've seen today can uh be accessed
177:29 you've seen today can uh be accessed through kind of participating in our
177:30 through kind of participating in our research, some of them through live
177:31 research, some of them through live dashboards in our research portal. And
177:34 dashboards in our research portal. And especially I'd like to invite companies
177:36 especially I'd like to invite companies that have access to Cursor Enterprise to
177:39 that have access to Cursor Enterprise to participate because we have a high need
177:41 participate because we have a high need for this so we can publish papers around
177:43 for this so we can publish papers around the granularity of using AI um in
177:45 the granularity of using AI um in software engineering. You can sign up at
177:47 software engineering. You can sign up at software engineering
177:48 software engineering productivity.stanford.edu.
177:50 productivity.stanford.edu. Thank you so much. [applause]
178:04 [music] next speaker will separate hype from reality on AI code quality using
178:07 from reality on AI code quality using realworld data to show when AI generated
178:11 realworld data to show when AI generated code can be trusted in production.
178:14 code can be trusted in production. Please welcome CEO of Kodto, Edidomar
178:17 Please welcome CEO of Kodto, Edidomar Freriedman.
178:28 It will grow. It will grow one or two more months. I'm really excited being
178:30 more months. I'm really excited being here. So many so much pragmatic and
178:32 here. So many so much pragmatic and insight and suggestions. I was sitting
178:34 insight and suggestions. I was sitting there uh just just before. So I'm Edmar
178:37 there uh just just before. So I'm Edmar Freiedman, the CEO and co-founder of
178:38 Freiedman, the CEO and co-founder of Kodto. Codto stands for quality of
178:40 Kodto. Codto stands for quality of development and I'm going to share uh
178:43 development and I'm going to share uh our reports and other companies reports
178:45 our reports and other companies reports about state of AI code quality. uh you
178:48 about state of AI code quality. uh you know trying to uh talk about the hype
178:51 know trying to uh talk about the hype versus reality which was uh like one of
178:53 versus reality which was uh like one of the uh points that were discussed here
178:56 the uh points that were discussed here quite a lot which is awesome. So in the
178:58 quite a lot which is awesome. So in the last three weeks, four weeks, we saw
179:00 last three weeks, four weeks, we saw like three outages in the clouds
179:02 like three outages in the clouds unfortunately, right? And these are
179:05 unfortunately, right? And these are coming from companies that really care
179:07 coming from companies that really care about moving fast, right? They're
179:09 about moving fast, right? They're they're they're saying themselves that
179:11 they're they're saying themselves that they're using AI to generate code 10%,
179:14 they're using AI to generate code 10%, 30%, 50%, at the same time, they care
179:16 30%, 50%, at the same time, they care about quality. So how did that happen?
179:19 about quality. So how did that happen? And is it is it related? I don't know.
179:21 And is it is it related? I don't know. But let's have some I'm going to share
179:23 But let's have some I'm going to share some guess. So by the way 60% of
179:25 some guess. So by the way 60% of developers say that the like quarter of
179:28 developers say that the like quarter of their code is either generated by AI or
179:31 their code is either generated by AI or in in like uh uh shaped by I and 15% say
179:35 in in like uh uh shaped by I and 15% say that even more than 80 80% of their code
179:38 that even more than 80 80% of their code uh is basically generated or or shaped
179:40 uh is basically generated or or shaped by AI. Now people are using AI to do
179:45 by AI. Now people are using AI to do vibe coding but actually they're even
179:47 vibe coding but actually they're even doing it for vibe checking vibe
179:49 doing it for vibe checking vibe reviewing. This is the command of cloud.
179:53 reviewing. This is the command of cloud. This is the prompt for the command of
179:55 This is the prompt for the command of claude code for security review. It was
179:58 claude code for security review. It was hyped like two months ago. Do you know
179:59 hyped like two months ago. Do you know what I'm talking about now? It says
180:01 what I'm talking about now? It says there, I don't know if you see it. Uh,
180:03 there, I don't know if you see it. Uh, you are a senior security engineer.
180:05 you are a senior security engineer. Good. And then like somewhere there uh
180:08 Good. And then like somewhere there uh down the line, it says please exclude
180:11 down the line, it says please exclude denial of service. Don't don't uh catch
180:14 denial of service. Don't don't uh catch denial of service issues. Maybe that's
180:17 denial of service issues. Maybe that's part of the part of the reason like
180:19 part of the part of the reason like we're we're having uh cloud outages.
180:21 we're we're having uh cloud outages. probably not just that, but you get the
180:23 probably not just that, but you get the point. Like we need to be rigorous about
180:26 point. Like we need to be rigorous about how we deal with quality. It's not just
180:28 how we deal with quality. It's not just like vibe quality or or so like we're
180:31 like vibe quality or or so like we're doing vibe coding sometimes. Uh let's go
180:33 doing vibe coding sometimes. Uh let's go to another example. Okay, cursor I guess
180:37 to another example. Okay, cursor I guess like or or pilot most of you use rules,
180:40 like or or pilot most of you use rules, right? We're going to talk about it. You
180:41 right? We're going to talk about it. You invest in code generation. After a
180:43 invest in code generation. After a [snorts] while, you understand if you
180:44 [snorts] while, you understand if you invest, you'll get more out of it. And
180:47 invest, you'll get more out of it. And uh we we asked like a bunch of of
180:50 uh we we asked like a bunch of of developers and I'm asking you as well
180:52 developers and I'm asking you as well think for a second for all the
180:53 think for a second for all the developers there in the audience like
180:56 developers there in the audience like when you write cursor rules or copilot
180:58 when you write cursor rules or copilot rules etc. Do you feel they're
181:00 rules etc. Do you feel they're completely followed or it's like mostly
181:02 completely followed or it's like mostly followed? Do you know how much they're
181:04 followed? Do you know how much they're followed? And what extent are they
181:05 followed? And what extent are they followed? It's rigorously like how
181:07 followed? It's rigorously like how technical deep they're they're being
181:09 technical deep they're they're being followed. So the what we get back like
181:11 followed. So the what we get back like the answer from what you see here on the
181:13 the answer from what you see here on the screen is mostly like B, C, and D. They
181:16 screen is mostly like B, C, and D. They are followed but they're not completely
181:18 are followed but they're not completely followed. Okay. So that means like we
181:21 followed. Okay. So that means like we are generating code trying to push it to
181:23 are generating code trying to push it to the standards but it's not necessarily
181:25 the standards but it's not necessarily still like getting to the quality we
181:27 still like getting to the quality we wanted. I'm going to share a bit more
181:29 wanted. I'm going to share a bit more statistics and and information and some
181:31 statistics and and information and some insight from three reports. One done by
181:34 insight from three reports. One done by Codo, another by done by Sonar, another
181:38 Codo, another by done by Sonar, another by far and all of them are are focused
181:41 by far and all of them are are focused on code code quality review etc. The
181:44 on code code quality review etc. The sample size is thousands of developers
181:46 sample size is thousands of developers in some cases even more millions of pull
181:48 in some cases even more millions of pull requests and and a billion of of lines
181:51 requests and and a billion of of lines lines of code that were uh uh being
181:54 lines of code that were uh uh being checked. Like for example, if you think
181:56 checked. Like for example, if you think about uh Sonar, this is a company, yeah,
181:59 about uh Sonar, this is a company, yeah, a bit like coming from pre-AI, but they
182:02 a bit like coming from pre-AI, but they see code at scale and you they're doing
182:06 see code at scale and you they're doing like a lot of checks in code that are
182:09 like a lot of checks in code that are not necessarily AI focused, but are
182:11 not necessarily AI focused, but are necessary in order to check uh your your
182:14 necessary in order to check uh your your software from all possible direction.
182:17 software from all possible direction. And that's why their scaling and the
182:19 And that's why their scaling and the scale of the code that you're seeing is
182:20 scale of the code that you're seeing is is immense. Okay. So for example, we
182:22 is immense. Okay. So for example, we took information from from their report
182:25 took information from from their report and eventually my purpose here is to
182:27 and eventually my purpose here is to break down the different dimension of
182:29 break down the different dimension of what uh code quality means and give you
182:31 what uh code quality means and give you some share some stats and and insights.
182:34 some share some stats and and insights. I want to start with the end. Okay, this
182:38 I want to start with the end. Okay, this is the takeaway I want you all all like
182:40 is the takeaway I want you all all like to take from from the next 13 minutes
182:42 to take from from the next 13 minutes that I have. We started with code
182:45 that I have. We started with code generation. We like out of the box use
182:48 generation. We like out of the box use it autocomplete etc. and you invest in
182:50 it autocomplete etc. and you invest in it and you can get more out of it. But
182:54 it and you can get more out of it. But there's the glass ceiling for how much
182:55 there's the glass ceiling for how much productivity you can get from code
182:57 productivity you can get from code generation. And then we move to the
182:59 generation. And then we move to the agent code generation, right? Let's call
183:02 agent code generation, right? Let's call it gen 2.0. And that's a higher glass
183:04 it gen 2.0. And that's a higher glass ceiling. It could do much more
183:06 ceiling. It could do much more productivity and especially if you
183:08 productivity and especially if you invest in it, for example, rules, etc.
183:11 invest in it, for example, rules, etc. Then with AI breaking outside of the
183:15 Then with AI breaking outside of the IDE, we can start using AI also for code
183:19 IDE, we can start using AI also for code for agentic quality workflows. It could
183:23 for agentic quality workflows. It could be inside the ID, but the the truth is
183:25 be inside the ID, but the the truth is that if you think about all the
183:26 that if you think about all the workflows you have in your organization,
183:28 workflows you have in your organization, especially if you're more than 100
183:29 especially if you're more than 100 developers or so, you probably have a
183:31 developers or so, you probably have a lot of workflows that you related to
183:33 lot of workflows that you related to quality that you need to auto automate.
183:35 quality that you need to auto automate. And that's where you start like breaking
183:38 And that's where you start like breaking through the glass ceiling of
183:40 through the glass ceiling of productivity. if you invest in it. And
183:42 productivity. if you invest in it. And finally, I I claim that you need those
183:45 finally, I I claim that you need those agentic workflows. Keep learning. And we
183:48 agentic workflows. Keep learning. And we might touch a little bit of that like
183:50 might touch a little bit of that like later later on. Okay? Like because
183:52 later later on. Okay? Like because quality is something dynamic. So you'll
183:54 quality is something dynamic. So you'll only finally break break the glass
183:57 only finally break break the glass ceiling if if you really have those
183:59 ceiling if if you really have those quality workflows and rules and standard
184:01 quality workflows and rules and standard being dynamic. And then then you will
184:04 being dynamic. And then then you will see the promised 2x let alone the 10x
184:07 see the promised 2x let alone the 10x that you were promised the hyped and and
184:08 that you were promised the hyped and and you you heard from McKenzie and from
184:10 you you heard from McKenzie and from Stanford you're not getting that. I
184:12 Stanford you're not getting that. I don't need to tell you the 2x 10x for
184:14 don't need to tell you the 2x 10x for the entire software development uh life
184:16 the entire software development uh life cycle. So a bit about more about the the
184:19 cycle. So a bit about more about the the market adoption. Uh one of the report
184:22 market adoption. Uh one of the report says that 82% of adoption already for AI
184:27 says that 82% of adoption already for AI dev tools are being used daily or
184:28 dev tools are being used daily or weekly. uh some people at 60 60% 59
184:32 weekly. uh some people at 60 60% 59 report that they're using more than
184:34 report that they're using more than three and 20% saying that they're using
184:36 three and 20% saying that they're using more than five code generation tools. If
184:38 more than five code generation tools. If you think about it for a second uh don't
184:40 you think about it for a second uh don't only take like cursor compil
184:43 only take like cursor compil etc. Sorry if I'm insulting anyone in
184:45 etc. Sorry if I'm insulting anyone in the that I forgot their tool but there's
184:47 the that I forgot their tool but there's also the lovable etc. They also generate
184:50 also the lovable etc. They also generate code and by the way you're going to get
184:51 code and by the way you're going to get to 10 I'm count on me you're going to
184:53 to 10 I'm count on me you're going to get to 10 tools in two three years that
184:56 get to 10 tools in two three years that generate code for you okay come to talk
184:58 generate code for you okay come to talk to me about later I'll try to convince
184:59 to me about later I'll try to convince you and and the thing is that it it's
185:02 you and and the thing is that it it's coming from bottom up like 50% of the
185:04 coming from bottom up like 50% of the usage is coming from less than 10 teams
185:07 usage is coming from less than 10 teams that are less than 10 developers but it
185:09 that are less than 10 developers but it is propagating also to the enterprise
185:11 is propagating also to the enterprise again I'm sure you know I mean talk
185:14 again I'm sure you know I mean talk propagating to the enterprise at scale
185:15 propagating to the enterprise at scale like not just like five developers in
185:17 like not just like five developers in the last year we're seeing like more and
185:19 the last year we're seeing like more and more enterprise using co code
185:20 more enterprise using co code generation. Uh so if like an in average
185:24 generation. Uh so if like an in average with within reports we saw 82 to 92%
185:28 with within reports we saw 82 to 92% using weekly to a monthly uh code
185:30 using weekly to a monthly uh code generation tools and in some cases maybe
185:33 generation tools and in some cases maybe extreme maybe not we're going to talk
185:35 extreme maybe not we're going to talk about it we saw 3x productivity boost in
185:39 about it we saw 3x productivity boost in writing code okay but that doesn't mean
185:42 writing code okay but that doesn't mean that if you have uh 3x productivity in
185:44 that if you have uh 3x productivity in writing code that you actually guarantee
185:46 writing code that you actually guarantee any quality like I presented before so
185:49 any quality like I presented before so actually 67% of the developer that we as
185:52 actually 67% of the developer that we as asked have serious equality concerns
185:55 asked have serious equality concerns about all the AI generated all the
185:57 about all the AI generated all the generated code uh uh the code generated
186:00 generated code uh uh the code generated by AI or influenced by AI and they're
186:03 by AI or influenced by AI and they're claiming that they're missing the
186:04 claiming that they're missing the framework how to deal with quality how
186:07 framework how to deal with quality how to measure quality it's a big question
186:09 to measure quality it's a big question what is quality I'm going to talk about
186:11 what is quality I'm going to talk about it in the next few slides okay think
186:12 it in the next few slides okay think about it for a second before I break
186:14 about it for a second before I break break it down what what is quality
186:17 break it down what what is quality um so what we're actually saying that
186:19 um so what we're actually saying that the crisis with VIP coding uh viable
186:22 the crisis with VIP coding uh viable coding we're seeing it shifting and
186:24 coding we're seeing it shifting and evolving is that you're getting like
186:27 evolving is that you're getting like more task being done like 20 some report
186:30 more task being done like 20 some report 20% more task you know velocity and like
186:34 20% more task you know velocity and like 97 more% or so of PRs being opened and
186:38 97 more% or so of PRs being opened and eventually it takes more time to review
186:40 eventually it takes more time to review PR like 90% more time to review PR and
186:44 PR like 90% more time to review PR and by the way like there's a lot of
186:45 by the way like there's a lot of statistics about AI generating code at
186:48 statistics about AI generating code at least there's not less amount amount of
186:51 least there's not less amount amount of bugs per line of code. I'm not claiming
186:52 bugs per line of code. I'm not claiming that there are more, but even if there's
186:54 that there are more, but even if there's not less bugs per line of code, you have
186:57 not less bugs per line of code, you have much more bugs because there are much
186:58 much more bugs because there are much more PRs, much more code being
187:00 more PRs, much more code being generated, etc. Right? So that that's a
187:02 generated, etc. Right? So that that's a problem for the reviewer. So it's
187:04 problem for the reviewer. So it's somebody surprised it takes more time to
187:06 somebody surprised it takes more time to review these, especially in the age of
187:08 review these, especially in the age of agents, right? When five minutes calling
187:11 agents, right? When five minutes calling to cloud code, I have 1,000 line of code
187:13 to cloud code, I have 1,000 line of code after 5 minutes. Once upon a time, it
187:15 after 5 minutes. Once upon a time, it took me like hours to write 10 proper
187:17 took me like hours to write 10 proper lines of code. Right? Now let's zoom out
187:19 lines of code. Right? Now let's zoom out for a second. Code generation is
187:21 for a second. Code generation is magnificent. Okay? Like it it's a
187:24 magnificent. Okay? Like it it's a gamecher when you're talking about green
187:26 gamecher when you're talking about green field. You saw people talk about it a
187:28 field. You saw people talk about it a few slides a few minutes before me. Uh
187:32 few slides a few minutes before me. Uh it it revolutionized how we do p proof
187:34 it it revolutionized how we do p proof of concept uh project etc. But when
187:37 of concept uh project etc. But when you're dealing with heavyduty software
187:40 you're dealing with heavyduty software then you you like it or not we are
187:42 then you you like it or not we are dealing with a lot of things when uh
187:45 dealing with a lot of things when uh when you serve millions of clients you
187:47 when you serve millions of clients you have financial transactions when you're
187:49 have financial transactions when you're doing transportation you're dealing with
187:51 doing transportation you're dealing with code integrity if you like code
187:53 code integrity if you like code governance uh review standards testing
187:56 governance uh review standards testing relability etc. That's what we need to
187:59 relability etc. That's what we need to uh uh to deal with. Now let's break that
188:02 uh uh to deal with. Now let's break that under the surface part of the glacier
188:04 under the surface part of the glacier into two dimensions. This is one
188:06 into two dimensions. This is one dimension you can look on the quality
188:09 dimension you can look on the quality issues in throughout the software
188:11 issues in throughout the software development life cycle like planning and
188:14 development life cycle like planning and then development writing code review
188:17 then development writing code review code review is a bit of a process but
188:19 code review is a bit of a process but like what you're like checking quality
188:22 like what you're like checking quality that's part of the process of code
188:23 that's part of the process of code review testing which is another part of
188:25 review testing which is another part of of quality and and deployment and I know
188:28 of quality and and deployment and I know I didn't cover the entire like uh
188:31 I didn't cover the entire like uh software development life cycle but just
188:32 software development life cycle but just to give you an example and each one of
188:34 to give you an example and each one of them like possess like introduce new
188:37 them like possess like introduce new problems that are coming because you're
188:39 problems that are coming because you're using more and more AI generated code.
188:41 using more and more AI generated code. Um now another dimension to look at it
188:44 Um now another dimension to look at it is actually code level problems and
188:46 is actually code level problems and process level problems. Okay, I'm not
188:50 process level problems. Okay, I'm not I'm not opening the you know list of
188:52 I'm not opening the you know list of functional just opening the list of
188:54 functional just opening the list of non-functional. You're talking about
188:56 non-functional. You're talking about security inefficiency that are not
188:59 security inefficiency that are not necessarily uh functional. Use I'll show
189:02 necessarily uh functional. Use I'll show you some statistics about that. And then
189:04 you some statistics about that. And then process level is for example learning.
189:07 process level is for example learning. Hey if you will have a
189:11 Hey if you will have a a a bad outage because of AI generated
189:14 a a bad outage because of AI generated code who is responsible is it the AI or
189:16 code who is responsible is it the AI or or the team that own that okay like you
189:20 or the team that own that okay like you need to learn and own the code
189:21 need to learn and own the code eventually that's a process that needs
189:23 eventually that's a process that needs to be done verification porting
189:25 to be done verification porting guardrails standards uh etc. So, so all
189:29 guardrails standards uh etc. So, so all of those issues when they are introduced
189:31 of those issues when they are introduced to thousands of developer that we asked
189:34 to thousands of developer that we asked them do you think like actually AI
189:36 them do you think like actually AI helped to reduce with those problems or
189:39 helped to reduce with those problems or or actually made more like more
189:42 or actually made more like more challenging 42 people reported that they
189:45 challenging 42 people reported that they spend 42 more of the development time on
189:48 spend 42 more of the development time on solving issues on fixing bugs etc and
189:51 solving issues on fixing bugs etc and and they saw 35 uh% project delays
189:57 and they saw 35 uh% project delays we're talking But we're talking about
189:58 we're talking But we're talking about maybe games they're talking about like
190:00 maybe games they're talking about like delays. Okay, there's some bias. We told
190:02 delays. Okay, there's some bias. We told them we talked about problem with
190:04 them we talked about problem with quality and what's the impact etc. Um
190:07 quality and what's the impact etc. Um but that's what they they they present
190:10 but that's what they they they present uh to when they they answer uh when when
190:12 uh to when they they answer uh when when you're talking about like when you're
190:14 you're talking about like when you're mass using AI code AI generated code and
190:17 mass using AI code AI generated code and we see reports uh some of the reports
190:19 we see reports uh some of the reports talking about 3x more security inc
190:22 talking about 3x more security inc incidents. By the way, it makes sense.
190:23 incidents. By the way, it makes sense. You remember we had a slide saying 3x
190:25 You remember we had a slide saying 3x more writing code. So 3x more security
190:28 more writing code. So 3x more security incidents like the same amount of line
190:29 incidents like the same amount of line of code the same amount of uh uh
190:31 of code the same amount of uh uh problems correlation. So what to do with
190:33 problems correlation. So what to do with that? Like I talked about problems and
190:35 that? Like I talked about problems and problems and problems. Okay, help help
190:37 problems and problems. Okay, help help me deal with it. Like let's let's spend
190:39 me deal with it. Like let's let's spend a few minutes on on that. So one one
190:42 a few minutes on on that. So one one suspect of course is testing and
190:45 suspect of course is testing and actually really interesting. We asked a
190:47 actually really interesting. We asked a couple of question about testing and one
190:49 couple of question about testing and one really relevant saying that people said
190:51 really relevant saying that people said that when they heavily use AI to on
190:54 that when they heavily use AI to on testing use AI to do testing they
190:58 testing use AI to do testing they actually double their trust in the AI
191:01 actually double their trust in the AI generated code. Okay, that's one thing.
191:04 generated code. Okay, that's one thing. The ne next suspect to help us with the
191:06 The ne next suspect to help us with the quality is code review. What really
191:09 quality is code review. What really interesting about code review that it's
191:10 interesting about code review that it's a process that helps almost with all the
191:13 a process that helps almost with all the process level and the code level like
191:17 process level and the code level like issues. For example, you can set your AI
191:19 issues. For example, you can set your AI code review tool to tell you block this
191:22 code review tool to tell you block this PR if it doesn't cover certain level of
191:25 PR if it doesn't cover certain level of test coverage. So through the PR, you
191:28 test coverage. So through the PR, you take care of the testing process
191:30 take care of the testing process problem. Okay. So code like code review
191:33 problem. Okay. So code like code review with AI is actually one of one of the
191:36 with AI is actually one of one of the major things you you you can do and
191:38 major things you you you can do and people that are developers that are
191:40 people that are developers that are using AI code review tool they're saying
191:42 using AI code review tool they're saying that they're saying they're seeing
191:44 that they're saying they're seeing double the quality gain and they're
191:46 double the quality gain and they're saying that actually it's it helps them
191:49 saying that actually it's it helps them to uh uh improve improve 47% in
191:53 to uh uh improve improve 47% in productivity of writing code. Okay. Now
191:56 productivity of writing code. Okay. Now a bit statistics from our own uh AI code
191:59 a bit statistics from our own uh AI code review tool. We scan a million of PRs a
192:02 review tool. We scan a million of PRs a month and we took one mill million of
192:04 month and we took one mill million of those PRs and we noticed that 17%
192:07 those PRs and we noticed that 17% include like high severity issues. By
192:09 include like high severity issues. By the way, we're now analyzing uh before
192:11 the way, we're now analyzing uh before and after using AI. I don't have that
192:13 and after using AI. I don't have that statistics yet, but we are noticing
192:15 statistics yet, but we are noticing since we're starting uh most of the
192:17 since we're starting uh most of the companies we serve, they use AI
192:19 companies we serve, they use AI generated code. So that's why uh I don't
192:21 generated code. So that's why uh I don't have before. We need to go scan
192:23 have before. We need to go scan backwards. Uh and that's like a really
192:26 backwards. Uh and that's like a really big a big number. Another thing I want
192:28 big a big number. Another thing I want to talk to you like about uh when you're
192:30 to talk to you like about uh when you're trying to improve on quality is is the
192:33 trying to improve on quality is is the foundation of having the right context
192:35 foundation of having the right context that is brought to the uh code
192:38 that is brought to the uh code generation tool that is brought to the
192:40 generation tool that is brought to the AI code review tool. Better context
192:43 AI code review tool. Better context better quality across the board wherever
192:45 better quality across the board wherever you're using AI. Uh so when we asked
192:47 you're using AI. Uh so when we asked developers when when you h when you
192:49 developers when when you h when you don't trust AI generated code like you
192:52 don't trust AI generated code like you remember like 67% sa like are really
192:54 remember like 67% sa like are really worried about that they said 80 80% of
192:58 worried about that they said 80 80% of the time they don't trust the context
193:00 the time they don't trust the context that the LLM have okay and and and uh
193:04 that the LLM have okay and and and uh when we asked developers what would you
193:06 when we asked developers what would you like to be improved in your AI generated
193:09 like to be improved in your AI generated code in your AI code review tool they
193:11 code in your AI code review tool they said the number one was context it was
193:13 said the number one was context it was number one was 33% they could choose
193:16 number one was 33% they could choose among many things to to improve. So
193:18 among many things to to improve. So context is extremely important. I can
193:20 context is extremely important. I can tell you that as codto one of our
193:22 tell you that as codto one of our technology moes uh is is around context
193:25 technology moes uh is is around context and when you connect our context engine
193:27 and when you connect our context engine we're seeing it as the number one tool
193:29 we're seeing it as the number one tool that is being used like 60% of code
193:32 that is being used like 60% of code generator or code review tools 60% of
193:35 generator or code review tools 60% of their calls to an MCP would be to a
193:37 their calls to an MCP would be to a context MCP. Okay. And just to tell you
193:41 context MCP. Okay. And just to tell you the context doesn't necessarily need to
193:43 the context doesn't necessarily need to include only your code. It could also
193:45 include only your code. It could also include context to your standards, your
193:47 include context to your standards, your best practices. We're seeing in our AI
193:49 best practices. We're seeing in our AI code review that 8% of the context usage
193:52 code review that 8% of the context usage is actually from files that are related
193:54 is actually from files that are related to standards and and best practices etc.
193:57 to standards and and best practices etc. Okay, I have to CEO of Kodo like
194:00 Okay, I have to CEO of Kodo like marketing will be mad on me if I don't
194:01 marketing will be mad on me if I don't brag a little bit. Right? So this is uh
194:04 brag a little bit. Right? So this is uh kind of like our architecture of our
194:06 kind of like our architecture of our context engine being presented by Jensen
194:08 context engine being presented by Jensen on GTC keynote. And he notice he didn't
194:11 on GTC keynote. And he notice he didn't talk about our qu co code review
194:13 talk about our qu co code review capabilities about our testing
194:14 capabilities about our testing capabilities. He talked about our
194:16 capabilities. He talked about our context engine that Nvidia checked
194:18 context engine that Nvidia checked because there's a realization that AI
194:21 because there's a realization that AI quality AI generated whatever review
194:23 quality AI generated whatever review testing will come from bringing the
194:25 testing will come from bringing the right context. So invest in that you
194:28 right context. So invest in that you need to build your context. Buy a
194:30 need to build your context. Buy a solution and invest in it. Build your
194:32 solution and invest in it. Build your solution uh etc. And the context needs
194:35 solution uh etc. And the context needs to include code uh uh versioning PR
194:39 to include code uh uh versioning PR history uh organization logs etc. That's
194:42 history uh organization logs etc. That's where all the context sits. It's not
194:44 where all the context sits. It's not just in the last branch of your
194:46 just in the last branch of your codebase. Okay. So I'm I'm zooming out
194:49 codebase. Okay. So I'm I'm zooming out starting to talk about like
194:50 starting to talk about like recommendations and uh and like uh
194:54 recommendations and uh and like uh takeaways. So what what what's next? So
194:57 takeaways. So what what what's next? So automated uh quality gateways invest in
195:00 automated uh quality gateways invest in that. People talked throughout the
195:02 that. People talked throughout the morning about parallel agents. You know
195:04 morning about parallel agents. You know what I'm talking about? Like background
195:05 what I'm talking about? Like background agents. You can use a lot of those like
195:09 agents. You can use a lot of those like tools and capabilities to build build
195:10 tools and capabilities to build build your quality gates. Uh use intelligent
195:13 your quality gates. Uh use intelligent code review testing and you need a li
195:17 code review testing and you need a li living and breathing like documentation
195:20 living and breathing like documentation and and what documentation means is is a
195:22 and and what documentation means is is a story by itself. Uh I'm not going to
195:24 story by itself. Uh I'm not going to double click on it. And and this is how
195:27 double click on it. And and this is how I present for three years now and I
195:30 I present for three years now and I think I'm going to go all the way until
195:33 think I'm going to go all the way until age of 60 with this slide of how I think
195:36 age of 60 with this slide of how I think the future of software development looks
195:38 the future of software development looks like. Okay. So basically you have your
195:41 like. Okay. So basically you have your specification and you have your code
195:44 specification and you have your code right and you have multiple agents
195:46 right and you have multiple agents parallel agents that are helping you to
195:49 parallel agents that are helping you to improve your spec write your spec
195:50 improve your spec write your spec improve your code transfer transfer from
195:53 improve your code transfer transfer from your spec to your to your code uh make
195:57 your spec to your to your code uh make tests which are executable specs right
196:00 tests which are executable specs right uh and and then you're going to have
196:01 uh and and then you're going to have your context engine the software
196:02 your context engine the software development database and you will build
196:05 development database and you will build your tools especially MCPs around
196:07 your tools especially MCPs around quality and verification and you will
196:10 quality and verification and you will Make sure you have environments, stable,
196:13 Make sure you have environments, stable, secured sandboxes where those agents can
196:16 secured sandboxes where those agents can run and and run validation and quality
196:18 run and and run validation and quality uh workflows. So don't don't forget like
196:21 uh workflows. So don't don't forget like the path forward is quality is your
196:24 the path forward is quality is your competitive edge over your uh
196:27 competitive edge over your uh competition. AI is a tool. It's not it's
196:30 competition. AI is a tool. It's not it's not a solution. Okay? And don't like
196:33 not a solution. Okay? And don't like only think about code generation as the
196:35 only think about code generation as the only thing. Look on the entire SDLC or
196:38 only thing. Look on the entire SDLC or product development life cycle. I saw
196:40 product development life cycle. I saw one of the uh people talked um speakers
196:44 one of the uh people talked um speakers and it iterate with everything we talked
196:46 and it iterate with everything we talked about today. I have uh I want to tell
196:48 about today. I have uh I want to tell you that you will gain value from it.
196:51 you that you will gain value from it. We're seeing in the reports people
196:53 We're seeing in the reports people seeing like security availability being
196:55 seeing like security availability being reduced faster code review you we just
196:58 reduced faster code review you we just got a hit on that because of AI
197:00 got a hit on that because of AI generated code and test coverage in a
197:02 generated code and test coverage in a month can can triple depends on on the
197:05 month can can triple depends on on the project etc. with with the last minute I
197:08 project etc. with with the last minute I want to show you like a really small
197:09 want to show you like a really small piece of what you can do with codo. uh
197:12 piece of what you can do with codo. uh you can go into codto and define your
197:14 you can go into codto and define your own rule for example almost the same
197:17 own rule for example almost the same rule you'll put on cursor of I don't
197:19 rule you'll put on cursor of I don't like nested ifs if this is a problem
197:21 like nested ifs if this is a problem that you have but then codto will look
197:24 that you have but then codto will look on your context build the good example
197:26 on your context build the good example the bad example and then start giving
197:29 the bad example and then start giving like building a workflow that is
197:31 like building a workflow that is specifically to catch that issue and
197:35 specifically to catch that issue and give you statistics over time when it's
197:38 give you statistics over time when it's being accepted and when not so you can
197:40 being accepted and when not so you can adjust that rule and really know and
197:42 adjust that rule and really know and have visibility to to your standards.
197:45 have visibility to to your standards. Okay. So when a PR is written with a few
197:48 Okay. So when a PR is written with a few ifs and else although it was written
197:50 ifs and else although it was written with cursor copilot that had a rule do
197:53 with cursor copilot that had a rule do not do nested ifs etc. then eventually
197:56 not do nested ifs etc. then eventually when you open a PR you will get uh codo
198:00 when you open a PR you will get uh codo uh uh catching that and giving a
198:02 uh uh catching that and giving a suggestion according to the good and the
198:04 suggestion according to the good and the bad example. COD will also make a graph,
198:06 bad example. COD will also make a graph, give you a CLI checks like check each
198:09 give you a CLI checks like check each one of the rules and eventually tell you
198:11 one of the rules and eventually tell you the nested if and then we'll record and
198:15 the nested if and then we'll record and learn what you did or did not do with
198:17 learn what you did or did not do with that suggestion in order to adapt the
198:19 that suggestion in order to adapt the standard and of the of the quality. Um
198:22 standard and of the of the quality. Um there will also automated like
198:24 there will also automated like suggestion. You don't need to write your
198:25 suggestion. You don't need to write your own. It learns your your your standards
198:28 own. It learns your your your standards and quality and offer that to you. And
198:30 and quality and offer that to you. And that's it. I'm I'm really really excited
198:33 that's it. I'm I'm really really excited about like breaking the glass ceiling,
198:35 about like breaking the glass ceiling, okay, with what we did with code
198:37 okay, with what we did with code generation and then a jet to code
198:39 generation and then a jet to code generation. Now we're turning into the
198:42 generation. Now we're turning into the era of putting AI into work and through
198:45 era of putting AI into work and through the entire SDLC. The most important part
198:48 the entire SDLC. The most important part is related to quality. You would need to
198:50 is related to quality. You would need to invest in that. It's not out of the box.
198:52 invest in that. It's not out of the box. Okay. And then you would see eventually
198:55 Okay. And then you would see eventually the promised 2x that that that probably
198:59 the promised 2x that that that probably promised to the CEO or something like
199:01 promised to the CEO or something like that once they give you the budget for
199:02 that once they give you the budget for for the relevant tools. Thank you so
199:04 for the relevant tools. Thank you so much. [applause]
199:07 much. [applause] [music]
199:16 Our next speaker is introducing Miniaax's latest model and how it powers
199:19 Miniaax's latest model and how it powers nextG experiences for code generation.
199:22 nextG experiences for code generation. Please welcome to the stage senior
199:24 Please welcome to the stage senior researcher at Miniaax, Olive Song.
199:30 researcher at Miniaax, Olive Song. [music]
199:42 Hi. Hi everyone. Um, I'm Olive. It's my great honor here today to present on our
199:44 great honor here today to present on our new model Mini Max M2. Um, I actually
199:47 new model Mini Max M2. Um, I actually lived in New York City for six years, so
199:49 lived in New York City for six years, so it feels great to come back. Um, but
199:51 it feels great to come back. Um, but with a different role. Um, I currently
199:53 with a different role. Um, I currently study reinforcement learning and model
199:55 study reinforcement learning and model evaluation at Miniax. Um, let me just
199:59 evaluation at Miniax. Um, let me just get a quick sense of the room. Who here
200:01 get a quick sense of the room. Who here has heard or have tried of Miniax
200:03 has heard or have tried of Miniax before? Oh, a couple of there. Yeah, not
200:08 before? Oh, a couple of there. Yeah, not everybody, but I guess yeah, but here's
200:10 everybody, but I guess yeah, but here's the value, right, of me standing here
200:12 the value, right, of me standing here today. Um so we are a global company
200:17 today. Um so we are a global company that works on both foundation models and
200:19 that works on both foundation models and applications. We develop multi modality
200:22 applications. We develop multi modality models including text um vision language
200:26 models including text um vision language models our video generation model hyo
200:28 models our video generation model hyo and speech generation music generation
200:31 and speech generation music generation stuff and we also have um many
200:33 stuff and we also have um many applications including agents and stuff
200:36 applications including agents and stuff um inhouse. So that that's the specific
200:40 um inhouse. So that that's the specific thing that's different from the other
200:41 thing that's different from the other labs for other companies. So we both
200:44 labs for other companies. So we both develop foundation models um and
200:47 develop foundation models um and applications. So we have research and
200:49 applications. So we have research and developers sitting uh sitting side by
200:52 developers sitting uh sitting side by side working on things. Um so our
200:55 side working on things. Um so our difference would be that we have
200:57 difference would be that we have firsthand experience from our um
201:01 firsthand experience from our um in-house developers into developing
201:03 in-house developers into developing models that developers would really need
201:07 models that developers would really need in the community. And here I want to
201:09 in the community. And here I want to introduce our Miniax M2 um which is an
201:13 introduce our Miniax M2 um which is an openweight model very small with only 10
201:16 openweight model very small with only 10 billion active parameters um that was
201:19 billion active parameters um that was designed specifically for coding
201:21 designed specifically for coding workplace agentic tasks. It's very
201:24 workplace agentic tasks. It's very costefficient.
201:27 costefficient. Um let me just go over the benchmark
201:30 Um let me just go over the benchmark performance because people care about
201:31 performance because people care about it. So uh we rank very top in both um
201:36 it. So uh we rank very top in both um intelligence benchmarks and also agent
201:39 intelligence benchmarks and also agent benchmarks. Uh we I think we're on the
201:42 benchmarks. Uh we I think we're on the top of the open source models. But then
201:45 top of the open source models. But then numbers don't tell everything because
201:47 numbers don't tell everything because sometimes you get those super high
201:50 sometimes you get those super high number models you plug into them um into
201:53 number models you plug into them um into your environment and they suck, right?
201:55 your environment and they suck, right? So we really care about the dynamics in
201:59 So we really care about the dynamics in the community and in our first week we
202:01 the community and in our first week we had the most downloads
202:04 had the most downloads and also we climbed up to top three
202:06 and also we climbed up to top three token usage on open router. So we're
202:09 token usage on open router. So we're very glad that people in the community
202:11 very glad that people in the community are really loving our model um into
202:13 are really loving our model um into their development cycle.
202:16 their development cycle. So today what I want to share is how we
202:19 So today what I want to share is how we actually shape these men model
202:22 actually shape these men model characteristics that made M2 so good in
202:25 characteristics that made M2 so good in your coding experience. And I'm going to
202:28 your coding experience. And I'm going to present to you um the training be behind
202:31 present to you um the training be behind it that supports each one of them from
202:34 it that supports each one of them from coding experience to long horizon state
202:37 coding experience to long horizon state tracking tasks um to robust
202:40 tracking tasks um to robust generalization to different scaffolds to
202:42 generalization to different scaffolds to multi- aent uh scalability.
202:46 multi- aent uh scalability. So first let's talk about code
202:48 So first let's talk about code experience which we sc uh which we
202:50 experience which we sc uh which we supported with um scaled environments
202:53 supported with um scaled environments and scaled experts.
202:56 and scaled experts. So um developers need a model that can
203:00 So um developers need a model that can actually work in the language they use
203:02 actually work in the language they use and across the workflow that they deal
203:04 and across the workflow that they deal with every day. So which means that we
203:07 with every day. So which means that we need to utilize the real data from from
203:09 need to utilize the real data from from the internet and then um scale the
203:12 the internet and then um scale the number of environments so that the model
203:15 number of environments so that the model when during training for example during
203:16 when during training for example during reinforcement learning it can actually
203:19 reinforcement learning it can actually um reacts to the uh environment. it can
203:22 um reacts to the uh environment. it can actually target verifiable coding goals
203:25 actually target verifiable coding goals and to learn from it. So that's why we
203:27 and to learn from it. So that's why we scaled both the number uh of
203:30 scaled both the number uh of environments and also our um
203:33 environments and also our um infrastructure so that we can perform
203:35 infrastructure so that we can perform those training very efficiently.
203:38 those training very efficiently. So um with data construction and
203:41 So um with data construction and reinforcement learning we were able to
203:43 reinforcement learning we were able to train the model so that it's very strong
203:46 train the model so that it's very strong um it's full stack multilingual
203:50 um it's full stack multilingual and what I want to mention here is that
203:52 and what I want to mention here is that besides scaling environment that
203:54 besides scaling environment that everybody talks about we actually scale
203:56 everybody talks about we actually scale something called expert developers um as
203:59 something called expert developers um as reward models. So as I mentioned before
204:02 reward models. So as I mentioned before uh we have a ton of um super expert
204:06 uh we have a ton of um super expert developers in house that could give us
204:08 developers in house that could give us feedback to our model's performance. So
204:11 feedback to our model's performance. So they participated closely into the model
204:14 they participated closely into the model development and training cycle including
204:16 development and training cycle including problem definition for example um bugs
204:20 problem definition for example um bugs bug fixing for example um repo
204:23 bug fixing for example um repo refactoring and stuff like that. And
204:25 refactoring and stuff like that. And also they identify the model behaviors
204:27 also they identify the model behaviors that developers enjoy and they identify
204:31 that developers enjoy and they identify what's reliable and uh what developers
204:33 what's reliable and uh what developers would trust
204:35 would trust and they give precise reward and
204:37 and they give precise reward and evaluation to the model's behaviors to
204:40 evaluation to the model's behaviors to the final um deliverables so that um it
204:43 the final um deliverables so that um it is a model that developers really want
204:45 is a model that developers really want to work with and that can adds
204:47 to work with and that can adds efficiency to the developers.
204:54 So with that we were able to lead in many um languages in real use.
204:58 many um languages in real use. And the second characteristic that
204:59 And the second characteristic that Miniax M2 has is it it performs good in
205:04 Miniax M2 has is it it performs good in those long horizon tasks. Uh those long
205:07 those long horizon tasks. Uh those long tasks that require interacting with
205:10 tasks that require interacting with complex environments that requiring um
205:12 complex environments that requiring um using multiple tools with reasoning.
205:16 using multiple tools with reasoning. And we supported that with the interled
205:19 And we supported that with the interled thinking pattern um and reinforcement
205:21 thinking pattern um and reinforcement learning.
205:24 learning. So what is interled thinking? Um so with
205:28 So what is interled thinking? Um so with a normal reasoning model that can use
205:31 a normal reasoning model that can use tools, it it normally works like this.
205:33 tools, it it normally works like this. You have the tools information given to
205:35 You have the tools information given to it. You have the system prompts. Um you
205:38 it. You have the system prompts. Um you have user prompts and then the model
205:40 have user prompts and then the model would think and then it calls tools. It
205:43 would think and then it calls tools. It can be a couple of tools at the same
205:45 can be a couple of tools at the same time. And then they get the tool
205:48 time. And then they get the tool response from the environment and then
205:50 response from the environment and then it performs a final thinking and deliver
205:52 it performs a final thinking and deliver a final content. But but here's the
205:55 a final content. But but here's the truth, right? In real world, the
205:58 truth, right? In real world, the environments are often noisy and
206:00 environments are often noisy and dynamic. You can't really perform this
206:03 dynamic. You can't really perform this one test just by once. You can get um
206:06 one test just by once. You can get um tool errors for example. You can get um
206:09 tool errors for example. You can get um unexpected results from the environment
206:12 unexpected results from the environment and stuff like that. So um what we did
206:15 and stuff like that. So um what we did is that we imagine how humans interact
206:17 is that we imagine how humans interact with the world. We we we look at
206:20 with the world. We we we look at something we get feedbacks and then we
206:22 something we get feedbacks and then we think about it. We think if the feedback
206:24 think about it. We think if the feedback is good or not and then we make other
206:26 is good or not and then we make other actions, make other decisions. And
206:29 actions, make other decisions. And that's why we did the same thing with
206:31 that's why we did the same thing with our M2 model. So if we look at this um
206:34 our M2 model. So if we look at this um chart over a diagram on the right. So
206:38 chart over a diagram on the right. So instead of just stopping um after one
206:42 instead of just stopping um after one round of tool calling, it actually
206:44 round of tool calling, it actually thinks again and reacts to the uh reacts
206:48 thinks again and reacts to the uh reacts to the environments to see if the
206:51 to the environments to see if the information is enough for it to uh get
206:54 information is enough for it to uh get what it wants. So basically we call the
206:57 what it wants. So basically we call the interle thinking or people call it
206:59 interle thinking or people call it interle thinking because it interle
207:01 interle thinking because it interle thinking with tool calling. um a couple
207:04 thinking with tool calling. um a couple of time it can be you know uh tens to a
207:07 of time it can be you know uh tens to a hundred um turns of tool calling within
207:10 hundred um turns of tool calling within just one user interaction term
207:15 just one user interaction term so it helps um adaptation to environment
207:18 so it helps um adaptation to environment noise for example uh just like what I
207:21 noise for example uh just like what I mentioned the environment is it's it's
207:24 mentioned the environment is it's it's not stable all the time and then
207:26 not stable all the time and then something is suboptimal and then it can
207:28 something is suboptimal and then it can choose to use other tools or do other
207:30 choose to use other tools or do other decisions it can focus on long horizon
207:33 decisions it can focus on long horizon has um can automate your workflow um
207:37 has um can automate your workflow um using for example Gmails, notions um
207:40 using for example Gmails, notions um terminal all at the same time. You just
207:42 terminal all at the same time. You just need to maybe make one model call
207:45 need to maybe make one model call without minim with minimal um human
207:48 without minim with minimal um human intervention. It can do it all by
207:50 intervention. It can do it all by itself. And here's a cool illustration
207:53 itself. And here's a cool illustration on the right because it's New York City.
207:55 on the right because it's New York City. I feel the vibe of you know trading and
207:57 I feel the vibe of you know trading and marketing. Um so you can see that there
208:01 marketing. Um so you can see that there was some um there was some pertabbations
208:04 was some um there was some pertabbations in the stock market uh I think last week
208:07 in the stock market uh I think last week and then our model was able to keep it
208:10 and then our model was able to keep it stable. So just like I said there's like
208:12 stable. So just like I said there's like environment noise there's no new
208:14 environment noise there's no new information there's like yeah news it
208:18 information there's like yeah news it looks like there there's like other
208:19 looks like there there's like other trading policies and stuff like that but
208:21 trading policies and stuff like that but our model was able to uh to perform
208:25 our model was able to uh to perform pretty stably in these kind of
208:27 pretty stably in these kind of environments.
208:29 environments. And the third characteristic is our
208:32 And the third characteristic is our robust um generalization to many agent
208:35 robust um generalization to many agent scaffolds which was supported by our
208:38 scaffolds which was supported by our perturbations in the data pipeline.
208:42 perturbations in the data pipeline. So we want our agent to generalize. But
208:45 So we want our agent to generalize. But what is agent generalization?
208:47 what is agent generalization? At first we thought it was just tool
208:49 At first we thought it was just tool scaling. We train the model with enough
208:52 scaling. We train the model with enough tools, various tools kind of new tools.
208:54 tools, various tools kind of new tools. we invent tools um and then it will just
208:57 we invent tools um and then it will just perform good on unseen tools. Well, that
209:00 perform good on unseen tools. Well, that was kind of the truth. It worked at
209:02 was kind of the truth. It worked at first. Uh but then we soon realized that
209:05 first. Uh but then we soon realized that if we perturb the environment a little
209:08 if we perturb the environment a little bit, for example, we change another
209:09 bit, for example, we change another agent scaffold, then it doesn't
209:11 agent scaffold, then it doesn't generalize. So what is agent
209:14 generalize. So what is agent generalization?
209:16 generalization? Well, we conclude that um it's
209:18 Well, we conclude that um it's adaptation to perturbations across the
209:20 adaptation to perturbations across the model's entire uh operational space.
209:24 model's entire uh operational space. If we uh think back what's the model's
209:27 If we uh think back what's the model's um operational space that we talked
209:30 um operational space that we talked about it can be tool information it can
209:34 about it can be tool information it can be system prompts it can be user prompts
209:36 be system prompts it can be user prompts they can all all be different they can
209:39 they can all all be different they can be the chat template they can be the
209:41 be the chat template they can be the environment they can be the tool
209:43 environment they can be the tool response. So what we did is that we
209:45 response. So what we did is that we designed and maintain perturbation
209:47 designed and maintain perturbation pipelines of our data so that um our
209:50 pipelines of our data so that um our model can actually gen generalized to a
209:53 model can actually gen generalized to a lot of agent scaffolds.
209:57 lot of agent scaffolds. And the fourth characteristic that I
209:59 And the fourth characteristic that I want to mention is the multi- aent
210:01 want to mention is the multi- aent scalability
210:03 scalability um which is very possible with M2
210:06 um which is very possible with M2 because it's very small and cost
210:08 because it's very small and cost effective.
210:14 I have a couple of videos here. Um, this is M2 powered by our own Miniax agent uh
210:19 is M2 powered by our own Miniax agent uh app. We actually have the QR code
210:21 app. We actually have the QR code downside. So, if you want it, you can
210:23 downside. So, if you want it, you can just scan and try it. So, it's like an
210:25 just scan and try it. So, it's like an agent app we we developed. And here we
210:29 agent app we we developed. And here we can see different copies of M2, right?
210:31 can see different copies of M2, right? It can do research. um it can write the
210:36 It can do research. um it can write the write the research results and analyze
210:37 write the research results and analyze it and put it in a re report. It can put
210:40 it and put it in a re report. It can put it in some kind of front-end
210:43 it in some kind of front-end illustration and they can work in
210:45 illustration and they can work in parallel. So because it is so small um
210:48 parallel. So because it is so small um and so cost effective, it can really um
210:51 and so cost effective, it can really um support those long run agentic tasks and
210:54 support those long run agentic tasks and tasks that maybe um require some kind of
210:57 tasks that maybe um require some kind of parallelism.
211:04 So what's next right for Miniax M2 from what I've introduced we gathered
211:06 what I've introduced we gathered environments um algorithms data expert
211:10 environments um algorithms data expert values model architecture inference
211:12 values model architecture inference evaluation all these stuff to build a
211:15 evaluation all these stuff to build a model um that was you know fast that was
211:19 model um that was you know fast that was uh intelligent that could use tools that
211:22 uh intelligent that could use tools that generalizes
211:23 generalizes what's next
211:26 what's next for um M2.1 1 and M3 were in the future.
211:31 for um M2.1 1 and M3 were in the future. We thinks of better coding, maybe memory
211:33 We thinks of better coding, maybe memory work, context management, proactive AI
211:37 work, context management, proactive AI for workplace, vertical experts, and
211:40 for workplace, vertical experts, and because we have those great audio
211:43 because we have those great audio generation, video generation models,
211:46 generation, video generation models, maybe we can integrate them. But all our
211:48 maybe we can integrate them. But all our mission is that we're committed to bring
211:51 mission is that we're committed to bring all these resources, whatever is on the
211:53 all these resources, whatever is on the screen and maybe more. Yeah. and values
211:56 screen and maybe more. Yeah. and values and put them all together to develop
211:59 and put them all together to develop models for uh the community to use. So
212:04 models for uh the community to use. So um we really need feedback from the
212:06 um we really need feedback from the community if possible because we want to
212:08 community if possible because we want to build this together and you know this is
212:10 build this together and you know this is kind of a race that everyone needs to
212:13 kind of a race that everyone needs to participate and then um we com we are
212:18 participate and then um we com we are committed to share it with the
212:19 committed to share it with the community. Yeah.
212:22 community. Yeah. And that's all the insights for today.
212:25 And that's all the insights for today. Um, we really hope again we really hope
212:28 Um, we really hope again we really hope you to try the model because it's pretty
212:31 you to try the model because it's pretty good. And then we can contact contact us
212:33 good. And then we can contact contact us up there. You can try the models by
212:36 up there. You can try the models by scanning the QR code. Yeah, basically
212:38 scanning the QR code. Yeah, basically that's it. Thank you all for listening.
212:40 that's it. Thank you all for listening. [applause]
212:52 Ladies [music] and gentlemen, please welcome back to the stage, Alex
212:54 welcome back to the stage, Alex Lieberman.
212:56 Lieberman. Let's give it up again for Olive [music]
212:58 Let's give it up again for Olive [music] and all the other speakers from the
213:00 and all the other speakers from the morning. [applause]
213:02 morning. [applause] It is time for lunch. Very exciting. Uh,
213:06 It is time for lunch. Very exciting. Uh, one thing I want to say before we head
213:07 one thing I want to say before we head out for lunch and we're it's going to be
213:09 out for lunch and we're it's going to be downstairs in the expo. check out all
213:12 downstairs in the expo. check out all the boos, talk to people, have food is,
213:16 the boos, talk to people, have food is, you know, my own experience with going
213:17 you know, my own experience with going to conferences is even though I come up
213:21 to conferences is even though I come up talk on stage a lot, I find it very
213:23 talk on stage a lot, I find it very difficult to engage in conversation with
213:25 difficult to engage in conversation with people when there's like these little
213:26 people when there's like these little small group settings. I don't know like
213:28 small group settings. I don't know like can I go and chat with people? Can I
213:29 can I go and chat with people? Can I not? This is a kind of awkward. I give
213:32 not? This is a kind of awkward. I give you all permission to butt into
213:33 you all permission to butt into conversations, introduce yourself. Ben
213:36 conversations, introduce yourself. Ben and Swix have done an incredible job of
213:38 and Swix have done an incredible job of cultivating such a high quality
213:40 cultivating such a high quality community here. And the most value you
213:43 community here. And the most value you will get is not just from these
213:45 will get is not just from these incredible presentations. It's from
213:47 incredible presentations. It's from meeting other folks in the crowd. So
213:49 meeting other folks in the crowd. So please, you have my permission butt into
213:51 please, you have my permission butt into conversations. Introduce yourself. Share
213:53 conversations. Introduce yourself. Share what you've learned with folks. And if
213:55 what you've learned with folks. And if you need any sort of uh ice breakers to
213:57 you need any sort of uh ice breakers to get the conversation going, I have two
213:59 get the conversation going, I have two for you. One is just go into a group and
214:02 for you. One is just go into a group and share your hottest take on uh the state
214:04 share your hottest take on uh the state of AI today. It's a great way to get off
214:06 of AI today. It's a great way to get off to a good start with someone. The
214:08 to a good start with someone. The second, a little less intense, is is a
214:10 second, a little less intense, is is a hot dog a sandwich? Is cereal uh in milk
214:14 hot dog a sandwich? Is cereal uh in milk a soup? That is how you're going to
214:16 a soup? That is how you're going to start the conversations with folks.
214:18 start the conversations with folks. Everyone enjoy lunch. We'll see you back
214:19 Everyone enjoy lunch. We'll see you back in an hour and uh thanks so much for
214:21 in an hour and uh thanks so much for your time.
215:16 Heat. Heat. [music]
216:02 Heat. [music]
216:34 [music] >> Heat up here.
217:24 Heat up
217:30 here. [music]
218:16 >> Heat up here.
218:19 here. [music]
219:07 Heat. [music] Heat.
219:35 [music] Heat.
220:21 [music] here.
220:46 >> Heat [music]
220:47 [music] up here.
221:39 >> Heat up here.
221:42 up here. [music]
222:10 >> Heat. Heat. [music]
222:15 Heat. [music]
223:09 [music] >> Heat.
223:16 [music] Heat.
224:42 Heat. [music]
225:35 [music] Heat. Heat.
226:05 Heat. Heat. [music]
226:24 >> Heat. [music] Heat.
230:20 [music] Heat.
230:22 Heat. Heat [music]
233:09 Heat up
234:21 Heat up here. [music]
234:50 Heat [music] up here.
235:06 Heat. [music]
235:23 Heat. [music]
235:44 Heat. [music]
237:02 >> [music] >> Heat. Heat.
239:17 here. [music]
240:34 [music] Heat
241:04 [music] Heat
241:49 [music] Heat.
242:15 Heat [music]
242:53 >> Heat. Heat. [music]
243:01 [music] Heat. Heat.
244:48 here. [music]
244:49 [music] Heat.
245:31 Heat up here. [music]
245:55 >> Heat up here.
246:26 up here. [music]
246:52 >> [music] >> Heat. Heat.
247:27 here. Heat. Heat.
250:23 [music] >> Heat. Heat.
252:02 [music] Heat.
252:46 >> Heat up here.
253:09 >> Heat. [music]
253:11 [music] Heat.
253:33 >> up [music] here.
254:03 [music] Heat.
254:22 [music] Heat.
255:30 >> [music] >> Heat up
255:34 >> Heat up here.
256:22 here. [music]
257:06 >> [music] >> up
257:20 here. Heat. Heat.
257:57 >> Heat up
258:51 Heat. Heat. [music]
259:42 Heat up here.
260:52 [music] Heat.
261:57 [music] >> Heat up here.
262:29 >> [music] >> Heat up here.
266:26 Heat. [music]
267:04 Heat. Heat.
267:58 >> Heat. Heat.
268:04 [music] Heat. Heat.
268:51 [music] Hey. Hey.
269:08 >> Heat. Heat.
269:47 Heat. Heat. Heat. [music]
270:08 Heat. Heat. [music]
271:13 >> Heat up here.
271:15 here. [music]
271:48 [music] Heat. Heat.
272:29 [music] up
272:31 up here.
272:38 Heat. [music]
272:55 Heat. [music]
273:14 >> Heat up here. [music]
273:44 >> Heat [music]
273:52 up here. Heat. Heat.
274:18 [music] Heat. Heat.
276:05 >> Heat up [music]
276:32 [music] Heat.
276:35 Heat. [music]
277:05 >> Heat. Heat. [music]
277:32 Heat. Heat. [music]
279:27 >> up [music]
280:39 [music] Heat.
280:46 [music] Heat. Heat.
281:39 >> Heat >> [music]
281:48 [music] >> Heat.
281:54 Heat. [music]
282:16 >> [music] >> Heat up here.
282:53 Heat. [music]
283:25 >> [music] >> Heat up here.
283:41 Heat >> [music]
284:07 Heat [music]
284:18 [music] up here.
285:05 >> How's everyone How you doing? Good lunch. [music]
285:06 lunch. [music] Excited for the afternoon sessions. Out
285:08 Excited for the afternoon sessions. Out of curiosity, did anyone have the hot
285:10 of curiosity, did anyone have the hot dog conversation? Does anyone think Who
285:12 dog conversation? Does anyone think Who thinks that a hot dog's a sandwich?
285:15 thinks that a hot dog's a sandwich? We got one. We got two.
285:18 We got one. We got two. Uh, anyone think a hot dog isn't a
285:20 Uh, anyone think a hot dog isn't a sandwich? Most of the crowd. That is
285:22 sandwich? Most of the crowd. That is that is usually the consensus. Uh, one
285:24 that is usually the consensus. Uh, one other question. Who thinks that they
285:26 other question. Who thinks that they have the hottest take on the state of AI
285:29 have the hottest take on the state of AI or AI engineering right now in the room?
285:32 or AI engineering right now in the room? Anyone think they have the hottest take?
285:34 Anyone think they have the hottest take? Well, I I'll I'll give you uh a tea up
285:36 Well, I I'll I'll give you uh a tea up for later. My co-founder Arman is
285:39 for later. My co-founder Arman is speaking around four and I would say he
285:40 speaking around four and I would say he has one of the hotter takes I've seen,
285:42 has one of the hotter takes I've seen, which is he thinks all engineers should
285:44 which is he thinks all engineers should be paid like salespeople based on
285:46 be paid like salespeople based on output. That is going to attract a lot
285:48 output. That is going to attract a lot of debate and I give you full permission
285:50 of debate and I give you full permission to debate him after his talk. Well, are
285:53 to debate him after his talk. Well, are you guys ready to jump into the next
285:55 you guys ready to jump into the next group of sessions?
285:56 group of sessions? >> Let's do it. We will be diving into
285:59 >> Let's do it. We will be diving into proactive agents from Google labs,
286:01 proactive agents from Google labs, building Gen BI at a Fortune 100
286:03 building Gen BI at a Fortune 100 business, deploying AI within
286:06 business, deploying AI within Bloomberg's engineering org, lessons
286:08 Bloomberg's engineering org, lessons learned building an AI browser, and
286:11 learned building an AI browser, and developer experience in the age of AI
286:13 developer experience in the age of AI coding agents. With that, please join me
286:16 coding agents. With that, please join me in welcoming our next speaker, Kath
286:18 in welcoming our next speaker, Kath Corvec, director of product at Google
286:20 Corvec, director of product at Google Labs. Let's give it to her.
286:37 >> Hi everybody. I'm so excited to be here. I love New
286:39 I'm so excited to be here. I love New York and I love meeting everybody here.
286:42 York and I love meeting everybody here. And I am Kath Corbec. I'm from Google
286:45 And I am Kath Corbec. I'm from Google Labs and I work on this little team
286:47 Labs and I work on this little team called ADA. And I'm going to be talking
286:48 called ADA. And I'm going to be talking about some of the stuff that we've been
286:50 about some of the stuff that we've been doing on this project called Jewels. So,
286:53 doing on this project called Jewels. So, a few months ago in my household, our
286:56 a few months ago in my household, our dishwasher broke. And while it was being
286:58 dishwasher broke. And while it was being repaired, my husband decided that he was
287:00 repaired, my husband decided that he was going to do all the dishes. And so, he
287:02 going to do all the dishes. And so, he told me he was going to do this. But
287:03 told me he was going to do this. But every single night, I found myself
287:05 every single night, I found myself reminding him to do the dishes. And you
287:07 reminding him to do the dishes. And you can imagine that got old pretty fast.
287:10 can imagine that got old pretty fast. And I realized that even though I wasn't
287:12 And I realized that even though I wasn't physically washing the dishes, I was
287:14 physically washing the dishes, I was still carrying this mental load. I know
287:16 still carrying this mental load. I know a lot of you can probably relate to
287:18 a lot of you can probably relate to this. I was keeping track of whether or
287:20 this. I was keeping track of whether or not that task was done, following up,
287:23 not that task was done, following up, making sure that things kept moving. And
287:25 making sure that things kept moving. And I realized in that moment that that's
287:27 I realized in that moment that that's exactly where we are with asynchronous
287:29 exactly where we are with asynchronous agents today. They can handle some of
287:31 agents today. They can handle some of the work, but we're still the ones as
287:33 the work, but we're still the ones as developers carrying that mental load and
287:35 developers carrying that mental load and monitoring them. So here's the truth.
287:39 monitoring them. So here's the truth. Humans, we are serial processors, not
287:42 Humans, we are serial processors, not parallel ones. We can juggle multiple
287:45 parallel ones. We can juggle multiple goals, but we execute them in sequence,
287:48 goals, but we execute them in sequence, not all at once. When you manually kick
287:50 not all at once. When you manually kick off a task in jewels, you're usually
287:52 off a task in jewels, you're usually waiting to be able to move on. And it's
287:56 waiting to be able to move on. And it's that pause, it's that gap in attention
287:58 that pause, it's that gap in attention where we really lose momentum. And this
288:00 where we really lose momentum. And this is actually backed up by science where
288:03 is actually backed up by science where uh humans actually think we think we're
288:05 uh humans actually think we think we're multitaskers, but we're actually
288:07 multitaskers, but we're actually executing many tasks very rapidly. But
288:11 executing many tasks very rapidly. But switching between these tasks comes with
288:13 switching between these tasks comes with a huge cost. It can cost up to 40% of
288:16 a huge cost. It can cost up to 40% of your productive time. So that's like
288:18 your productive time. So that's like half a day lost to switching contexts
288:22 half a day lost to switching contexts and reloading. So if humans are uniters,
288:27 and reloading. So if humans are uniters, what's the solution here with agents? So
288:30 what's the solution here with agents? So for async agents, in order in order for
288:32 for async agents, in order in order for them to succeed, developers can't be
288:34 them to succeed, developers can't be expected to babysit them.
288:37 expected to babysit them. We've all seen that post on Twitter of
288:40 We've all seen that post on Twitter of 16 different cla tasks running in
288:42 16 different cla tasks running in parallel on 16 different terminals on
288:45 parallel on 16 different terminals on three different huge browsers or huge
288:47 three different huge browsers or huge monitors. And when I first saw this, I
288:49 monitors. And when I first saw this, I thought, god forbid that is the DevX of
288:52 thought, god forbid that is the DevX of the future. I want to I don't want to
288:55 the future. I want to I don't want to manage work. I don't want to manage my
288:56 manage work. I don't want to manage my agents. I want to be a coder. I want to
288:58 agents. I want to be a coder. I want to build. And so we need to think we need
289:02 build. And so we need to think we need uh uh collaborators in our system that
289:04 uh uh collaborators in our system that we can trust. agents that really
289:06 we can trust. agents that really understand context, can anticipate our
289:09 understand context, can anticipate our needs, and they know really when to step
289:12 needs, and they know really when to step in. And then uh I think finally, we're
289:15 in. And then uh I think finally, we're reaching that point with models where
289:17 reaching that point with models where they're getting better and better at
289:18 they're getting better and better at executing end to end as long as they
289:22 executing end to end as long as they understand what our goals are clearly.
289:24 understand what our goals are clearly. And that's where trust really becomes
289:26 And that's where trust really becomes this unlock where you can trust the
289:28 this unlock where you can trust the system to know what's missing, to fill
289:30 system to know what's missing, to fill in the gaps, and to really keep progress
289:33 in the gaps, and to really keep progress moving forward while you manage on
289:35 moving forward while you manage on something else where where while you
289:36 something else where where while you focus on what matters most. And
289:39 focus on what matters most. And essentially, we want jewels to do the
289:40 essentially, we want jewels to do the dishes without being asked.
289:43 dishes without being asked. So most AI developer tools today are
289:45 So most AI developer tools today are fundamentally reactive. you open up your
289:47 fundamentally reactive. you open up your CLI or your ID and you ask the agent to
289:50 CLI or your ID and you ask the agent to do something and it responds or it waits
289:52 do something and it responds or it waits for you to start typing and then it
289:53 for you to start typing and then it autocompletes a suggestion. And there's
289:55 autocompletes a suggestion. And there's a benefit to this model. It's very
289:57 a benefit to this model. It's very efficient. It only uses compute when you
290:00 efficient. It only uses compute when you explicitly ask for it. But the real
290:02 explicitly ask for it. But the real question I'm asking myself is, is this
290:04 question I'm asking myself is, is this how I want to manage AI? And if you
290:06 how I want to manage AI? And if you think about in the future, imagine a
290:08 think about in the future, imagine a world where compute is not a limiting
290:11 world where compute is not a limiting factor anymore. Instead of a single
290:14 factor anymore. Instead of a single reactive assistant for instructions, you
290:16 reactive assistant for instructions, you could have dozens of small proactive
290:18 could have dozens of small proactive agents working with you in parallel,
290:21 agents working with you in parallel, quietly looking for patterns, noticing
290:23 quietly looking for patterns, noticing friction, and taking on the boring tasks
290:26 friction, and taking on the boring tasks that you don't want to do before you
290:28 that you don't want to do before you even ask. It can do things like fixing
290:31 even ask. It can do things like fixing authentication bugs that you've been
290:33 authentication bugs that you've been avoiding. uh updating configs, flagging
290:36 avoiding. uh updating configs, flagging potential order uh errors, preparing uh
290:40 potential order uh errors, preparing uh migrations and all of this can happen in
290:42 migrations and all of this can happen in the background triggered off of things
290:44 the background triggered off of things in my natural workflow. So I really
290:46 in my natural workflow. So I really think there are four essential
290:48 think there are four essential ingredients that make up proactive
290:49 ingredients that make up proactive systems today. There's observation. The
290:52 systems today. There's observation. The agent has to really continually
290:53 agent has to really continually understand what is happening and of what
290:56 understand what is happening and of what your code changes are, what your
290:58 your code changes are, what your patterns are, what your workflow is,
291:00 patterns are, what your workflow is, etc. to get context about your entire
291:01 etc. to get context about your entire project. And then there's
291:03 project. And then there's personalization. And this one's
291:04 personalization. And this one's difficult. It has to learn how you work,
291:06 difficult. It has to learn how you work, what you care about, what you tend to
291:08 what you care about, what you tend to ignore, what your preferences are, the
291:10 ignore, what your preferences are, the code that you absolutely don't want to
291:11 code that you absolutely don't want to ever touch. And then it has to be timely
291:14 ever touch. And then it has to be timely as well. If it comes in too soon, it's
291:16 as well. If it comes in too soon, it's going to interrupt you. And if it's too
291:17 going to interrupt you. And if it's too late, then the moment is lost. And it
291:19 late, then the moment is lost. And it also has to work seamlessly across your
291:22 also has to work seamlessly across your workflow. It has to insert itself into
291:24 workflow. It has to insert itself into spaces where you naturally work already
291:26 spaces where you naturally work already in your terminal, in your repository, in
291:28 in your terminal, in your repository, in your IDE, not forcing you to go
291:31 your IDE, not forcing you to go somewhere else to some application
291:33 somewhere else to some application that's secret or that you forgot about.
291:35 that's secret or that you forgot about. So bringing all these tools together,
291:36 So bringing all these tools together, you can imagine, is not trivial.
291:46 >> So is running this presentation. Um, and uh, you you want to be able to ask your
291:48 uh, you you want to be able to ask your agent to understand your workflow and
291:51 agent to understand your workflow and anticipate your needs and then intervene
291:53 anticipate your needs and then intervene at exactly the right moment without
291:55 at exactly the right moment without breaking your workflow.
291:57 breaking your workflow. And that's when it really starts to feel
291:59 And that's when it really starts to feel like magic. The interesting thing is pro
292:02 like magic. The interesting thing is pro these proactive systems, they're all
292:03 these proactive systems, they're all around us today. One of my favorite
292:05 around us today. One of my favorite examples is Google Nest where you put it
292:07 examples is Google Nest where you put it in your house, you install it, and then
292:09 in your house, you install it, and then you configure it and then it starts to
292:12 you configure it and then it starts to learn your habits as you leave the
292:14 learn your habits as you leave the house, as you come back, uh, as you go
292:16 house, as you come back, uh, as you go to sleep, as you wake up in the morning.
292:18 to sleep, as you wake up in the morning. And then pretty soon, you don't have to
292:20 And then pretty soon, you don't have to think about climate control in your
292:21 think about climate control in your house anymore because it's learned what
292:23 house anymore because it's learned what your habits are. Another one is your own
292:25 your habits are. Another one is your own body. your heart rate elevates as you go
292:27 body. your heart rate elevates as you go for a run or start to work out or it
292:30 for a run or start to work out or it anticipates that you're about to fall
292:32 anticipates that you're about to fall and so it reacts before you consciously
292:34 and so it reacts before you consciously think I'm going to put my hand out. So
292:37 think I'm going to put my hand out. So when you look at it like that
292:38 when you look at it like that proactivity is actually not that
292:40 proactivity is actually not that proactivity for AI is actually not that
292:42 proactivity for AI is actually not that futuristic. It's very familiar and it is
292:45 futuristic. It's very familiar and it is very human and that's exactly the point.
292:48 very human and that's exactly the point. What we're building is tools that behave
292:50 What we're building is tools that behave more like a good collaborator and less
292:53 more like a good collaborator and less like command line utilities. So we're
292:56 like command line utilities. So we're already doing this in this tool called
292:58 already doing this in this tool called jewels which is this uh proactive
293:00 jewels which is this uh proactive asynchronous autonomous coding agent
293:02 asynchronous autonomous coding agent from Google labs. And we're doing this
293:05 from Google labs. And we're doing this in kind of three levels of of uh
293:07 in kind of three levels of of uh proactivity. Level one is where a
293:10 proactivity. Level one is where a collaboration really starts to emerge.
293:12 collaboration really starts to emerge. And this is how Jules works today where
293:15 And this is how Jules works today where it can detect things like missing tests,
293:17 it can detect things like missing tests, unused dependencies, unsafe patterns,
293:19 unused dependencies, unsafe patterns, and then it starts to automatically fix
293:21 and then it starts to automatically fix those things as it's doing other other
293:23 those things as it's doing other other tasks that you've asked it to do. This
293:25 tasks that you've asked it to do. This is sort of like this attentive sue chef
293:27 is sort of like this attentive sue chef in your workflow where it's keeping the
293:29 in your workflow where it's keeping the kitchen clean, the knives sharp, the
293:31 kitchen clean, the knives sharp, the kitchen uh stocked so that you can focus
293:33 kitchen uh stocked so that you can focus on what comes next. And that's the
293:35 on what comes next. And that's the beginning of proactive software. At
293:38 beginning of proactive software. At level two, the agent becomes more
293:40 level two, the agent becomes more contextually aware of the entire
293:42 contextually aware of the entire project. It observes how you work, the
293:44 project. It observes how you work, the code you write. If you're a back-end
293:46 code you write. If you're a back-end engineer, maybe you need help with
293:47 engineer, maybe you need help with React. If you're a designer, maybe it
293:49 React. If you're a designer, maybe it wants you to may maybe it'll help uh uh
293:52 wants you to may maybe it'll help uh uh write the database schema. And then it
293:54 write the database schema. And then it learns what your frameworks are and what
293:57 learns what your frameworks are and what your deployment style is, etc. And this
293:59 your deployment style is, etc. And this is the kitchen manager. This is the
294:01 is the kitchen manager. This is the person in your workflow keeping the
294:03 person in your workflow keeping the rhythm and anticipating what you need
294:04 rhythm and anticipating what you need next. And then comes level three. And
294:07 next. And then comes level three. And this is what we're working on pretty
294:09 this is what we're working on pretty hard right now going into December. And
294:11 hard right now going into December. And I'll show you a little bit of what we're
294:12 I'll show you a little bit of what we're what we're going to be shipping in
294:13 what we're going to be shipping in December in a minute. But level three is
294:15 December in a minute. But level three is where things start to converge around
294:17 where things start to converge around that context. It's where the agent
294:19 that context. It's where the agent starts to understand not just context,
294:22 starts to understand not just context, but also consequence. How these choices
294:25 but also consequence. How these choices are actually affecting the users of your
294:27 are actually affecting the users of your products, the performance, and the
294:28 products, the performance, and the outcomes. And at that level, we have
294:31 outcomes. And at that level, we have this thing jewels. We also have an agent
294:33 this thing jewels. We also have an agent called Stitch, which is a design agent.
294:35 called Stitch, which is a design agent. and another one we're building called
294:36 and another one we're building called insights which is a data agent and
294:38 insights which is a data agent and they're all coming together to build
294:39 they're all coming together to build this collective intelligence across your
294:41 this collective intelligence across your application tools can see what's
294:43 application tools can see what's breaking in the software stitch
294:45 breaking in the software stitch understands how users are interacting
294:47 understands how users are interacting with it and insights connects behaviors
294:50 with it and insights connects behaviors from real world signals like analytics
294:53 from real world signals like analytics telemetry and conversion rates and then
294:55 telemetry and conversion rates and then together they can propose improvements
294:58 together they can propose improvements across boundaries of how the system all
295:00 across boundaries of how the system all works together doing things like
295:02 works together doing things like performance fixes to improve UX and then
295:04 performance fixes to improve UX and then design changes to prevent regressions
295:07 design changes to prevent regressions and then all of that is organized based
295:09 and then all of that is organized based on live data. So the trick here is that
295:13 on live data. So the trick here is that the human stays firmly in the loop.
295:14 the human stays firmly in the loop. You're observing what the agents are
295:16 You're observing what the agents are doing. You're refining when you when
295:18 doing. You're refining when you when they when you need to intervene and then
295:20 they when you need to intervene and then you're redirecting it when it has when
295:23 you're redirecting it when it has when it has been misdirected. So level three
295:26 it has been misdirected. So level three isn't really about autonomy anymore.
295:28 isn't really about autonomy anymore. It's actually about alignment to your
295:30 It's actually about alignment to your project. a a agents and humans
295:33 project. a a agents and humans collaborating together across the full
295:35 collaborating together across the full life cycle of your project.
295:39 life cycle of your project. So right now Jules is focused on this
295:41 So right now Jules is focused on this code awareness piece that understands
295:43 code awareness piece that understands the environment, the frameworks and the
295:45 the environment, the frameworks and the project structures and we're moving
295:47 project structures and we're moving towards more of that system awareness.
295:49 towards more of that system awareness. So things that we're introducing in
295:51 So things that we're introducing in Jules now, we've added something called
295:52 Jules now, we've added something called memory which I'm sure a lot of you are
295:54 memory which I'm sure a lot of you are familiar with. It's the ability for
295:56 familiar with. It's the ability for Jules to write its own memories and you
295:59 Jules to write its own memories and you can edit them and interact with them. it
296:01 can edit them and interact with them. it can edit them and it understands that
296:02 can edit them and it understands that and builds this memory and context and
296:05 and builds this memory and context and knowledge of of your project as you work
296:07 knowledge of of your project as you work with it. We've added a critic agent
296:09 with it. We've added a critic agent which works adversarially with Jules to
296:11 which works adversarially with Jules to make sure that the code is is high
296:13 make sure that the code is is high quality but then also does a full code
296:15 quality but then also does a full code review. And then we've added
296:17 review. And then we've added verification where Jules will write a
296:18 verification where Jules will write a playwright script, take a screenshot,
296:20 playwright script, take a screenshot, and then put that back into the
296:22 and then put that back into the trajectory for you to validate. And then
296:24 trajectory for you to validate. And then we're also doing things like adding uh a
296:27 we're also doing things like adding uh a to-do bot that will look through your
296:29 to-do bot that will look through your code and look through your repository
296:32 code and look through your repository and pick up on anything that where
296:34 and pick up on anything that where you've said this is a to-do I want to
296:35 you've said this is a to-do I want to get to in the future and it will start
296:37 get to in the future and it will start to proactively work on those things with
296:39 to proactively work on those things with that context. We're also adding in
296:41 that context. We're also adding in things like best practices where Jules
296:43 things like best practices where Jules will understand best practices and start
296:45 will understand best practices and start to suggest those and also environment
296:48 to suggest those and also environment setup. We have an environment agent that
296:50 setup. We have an environment agent that we use internally for running evals and
296:53 we use internally for running evals and we're extending that externally to
296:55 we're extending that externally to better understand how environment how
296:57 better understand how environment how your environments work and and set those
296:59 your environments work and and set those up for you. And then we also are adding
297:01 up for you. And then we also are adding something called a just in time context.
297:03 something called a just in time context. It's like a jewels cheat sheet where if
297:05 It's like a jewels cheat sheet where if it's doing something very specific it
297:07 it's doing something very specific it can and gets stuck it can just
297:09 can and gets stuck it can just immediately look at that cheat sheet
297:10 immediately look at that cheat sheet instead of reaching out to you. So, this
297:13 instead of reaching out to you. So, this is all moving Jules very close to being
297:15 is all moving Jules very close to being that proactive teammate, not just this
297:17 that proactive teammate, not just this reactive assistant. Okay, so this
297:21 reactive assistant. Okay, so this morning I was talking to my team back in
297:22 morning I was talking to my team back in San Francisco and I was thinking, okay,
297:25 San Francisco and I was thinking, okay, I'm going to do a live demo, but the
297:27 I'm going to do a live demo, but the live demo gods did not align with me
297:29 live demo gods did not align with me this morning. We still have CLS that are
297:30 this morning. We still have CLS that are being pushed to staging right now. So,
297:32 being pushed to staging right now. So, I'm going to walk you through a little
297:34 I'm going to walk you through a little bit of this. And if you know Jed, he's
297:36 bit of this. And if you know Jed, he's going to, I think, be talking tomorrow.
297:38 going to, I think, be talking tomorrow. We're gonna um affectionately try to fix
297:40 We're gonna um affectionately try to fix Jed's code here. Um, so this is a view
297:44 Jed's code here. Um, so this is a view of of proactivity and this is this is
297:47 of of proactivity and this is this is Jules where you prompt it and the first
297:49 Jules where you prompt it and the first thing you that you do when you configure
297:51 thing you that you do when you configure and enable proactivity is Jules will
297:53 and enable proactivity is Jules will index your entire uh codebase. It'll
297:56 index your entire uh codebase. It'll index your directory and start looking
297:57 index your directory and start looking for things that it can do and then it'll
297:59 for things that it can do and then it'll that'll show up on the screen. So right
298:02 that'll show up on the screen. So right here we're looking at a little bit more
298:05 here we're looking at a little bit more in this um in this repository ADK Python
298:08 in this um in this repository ADK Python and uh and it's indexed the repository
298:12 and uh and it's indexed the repository and it's found a bunch of to-dos. It's
298:14 and it's found a bunch of to-dos. It's found a bunch of best practices that it
298:16 found a bunch of best practices that it can update and it's giving me some
298:17 can update and it's giving me some signal about what it's finding. And so
298:19 signal about what it's finding. And so you can see the signal is high
298:21 you can see the signal is high confidence, medium confidence, and low.
298:23 confidence, medium confidence, and low. And so it's actually telling me what it
298:25 And so it's actually telling me what it thinks it can achieve based on what's in
298:28 thinks it can achieve based on what's in my code and what it wants to do. And
298:31 my code and what it wants to do. And that's so it has high confidence in
298:32 that's so it has high confidence in green, medium and purple, low in yellow
298:35 green, medium and purple, low in yellow way down at the bottom. Um, and so I can
298:37 way down at the bottom. Um, and so I can go through this and I can manually click
298:39 go through this and I can manually click these and say I want to start these. And
298:42 these and say I want to start these. And so I don't have to think about the
298:43 so I don't have to think about the prompt. I don't have to look at the
298:45 prompt. I don't have to look at the code. I don't I I can do kind of less
298:47 code. I don't I I can do kind of less cognitive load here. We're working on
298:50 cognitive load here. We're working on something to just start these
298:51 something to just start these automatically. And so that's coming in
298:53 automatically. And so that's coming in the future. But I can also delete these.
298:55 the future. But I can also delete these. I can say, "Hey, this one isn't isn't
298:56 I can say, "Hey, this one isn't isn't for me. Isn't good." And so once it gets
298:59 for me. Isn't good." And so once it gets started on a task, I can kind of drill
299:01 started on a task, I can kind of drill into it and see a little bit more. I can
299:03 into it and see a little bit more. I can peek into the code that it is suggesting
299:06 peek into the code that it is suggesting uh that uh it's suggesting it work on. I
299:10 uh that uh it's suggesting it work on. I can find the location of that code. And
299:11 can find the location of that code. And it also gives me some rationale about
299:15 it also gives me some rationale about why it wants to work on that code, why
299:16 why it wants to work on that code, why what it's doing, etc. And so it's giving
299:18 what it's doing, etc. And so it's giving me a lot more context and helping me
299:21 me a lot more context and helping me trust that it knows what to do here.
299:26 trust that it knows what to do here. Okay. So that's proactivity that's
299:28 Okay. So that's proactivity that's coming in December and hopefully we'll
299:31 coming in December and hopefully we'll be able to give that to everybody here.
299:33 be able to give that to everybody here. We're very excited about it and I want
299:35 We're very excited about it and I want to tell you a little story about uh
299:38 to tell you a little story about uh something my husband and I were working
299:39 something my husband and I were working on just to kind of set set wrap things
299:42 on just to kind of set set wrap things up. We uh tinker a bunch with hardware
299:46 up. We uh tinker a bunch with hardware and we live on this slow street in the
299:48 and we live on this slow street in the middle of San Francisco in Haydashberry
299:49 middle of San Francisco in Haydashberry district. And so on Halloween we get a
299:51 district. And so on Halloween we get a lot of people walking by our house and
299:53 lot of people walking by our house and so we are trying to take advantage of
299:55 so we are trying to take advantage of that with our Halloween decorations. And
299:57 that with our Halloween decorations. And so we built this six-foot animatronic
300:00 so we built this six-foot animatronic head that sits in the front of our
300:03 head that sits in the front of our house. It's this old Victorian house.
300:06 house. It's this old Victorian house. And he sculpted it out of foam, epoxy,
300:08 And he sculpted it out of foam, epoxy, and fiberglass. And then I our our kids
300:11 and fiberglass. And then I our our kids also called this lovingly the bald head.
300:14 also called this lovingly the bald head. And it's based off of if you ever see
300:16 And it's based off of if you ever see saw Peewee Herman from the 80s. It's
300:18 saw Peewee Herman from the 80s. It's based off of the Peewee Herman Peewee's
300:19 based off of the Peewee Herman Peewee's Big Adventures head. Um so while my
300:22 Big Adventures head. Um so while my husband was doing this I was spending my
300:25 husband was doing this I was spending my time working with Jules on updating the
300:27 time working with Jules on updating the firmware, controlling the stepper
300:28 firmware, controlling the stepper motors, working on the um on the LEDs
300:31 motors, working on the um on the LEDs and the sensors. And for me that's the
300:34 and the sensors. And for me that's the fun part for me is like really getting
300:35 fun part for me is like really getting creative with what the LEDs are doing.
300:38 creative with what the LEDs are doing. So I wanted to focus on that, the LED
300:40 So I wanted to focus on that, the LED animations, but I ended up spending most
300:42 animations, but I ended up spending most of my time actually fixing bugs and
300:45 of my time actually fixing bugs and swapping libraries and doing things like
300:47 swapping libraries and doing things like that. So what I would do is I would
300:48 that. So what I would do is I would prompt Jules, I'd wait 10 minutes and
300:51 prompt Jules, I'd wait 10 minutes and then I would repeat. And I found that
300:54 then I would repeat. And I found that process very very tedious. And what I
300:57 process very very tedious. And what I wanted was actually Jules to do the
300:59 wanted was actually Jules to do the research. I wanted it to handle the the
301:01 research. I wanted it to handle the the ugly parts where it was researching how
301:04 ugly parts where it was researching how to fix a bug. Uh doing the debugging
301:07 to fix a bug. Uh doing the debugging itself. And I wanted it to do this so
301:09 itself. And I wanted it to do this so that I could focus on the creative
301:10 that I could focus on the creative parts. I wanted the eyes to move and
301:13 parts. I wanted the eyes to move and like follow people as they walk down the
301:15 like follow people as they walk down the street and like have lasers coming in
301:17 street and like have lasers coming in out of its eyes and stuff like I
301:18 out of its eyes and stuff like I mentioned it was Halloween. It was very
301:19 mentioned it was Halloween. It was very scary. Uh and and this but but I
301:22 scary. Uh and and this but but I couldn't really do as much of that. and
301:24 couldn't really do as much of that. and I ended up actually not shipping as much
301:26 I ended up actually not shipping as much as I wanted to with this animatronic
301:29 as I wanted to with this animatronic bald head. And so it's that gap that we
301:33 bald head. And so it's that gap that we actually want to close. It's the space
301:35 actually want to close. It's the space between with jewels, it's the space
301:36 between with jewels, it's the space between that tool friction and creative
301:39 between that tool friction and creative freedom that we're trying to unlock with
301:42 freedom that we're trying to unlock with these kinds of proactive agents.
301:44 these kinds of proactive agents. So what I really want you guys to take
301:48 So what I really want you guys to take away from it, I give this advice to the
301:50 away from it, I give this advice to the the folks on on the Jules team a lot is
301:53 the folks on on the Jules team a lot is that the product we build today actually
301:56 that the product we build today actually won't be the project the products that
301:57 won't be the project the products that we have in the future and I think a lot
301:59 we have in the future and I think a lot of us know that but in reality I want
302:02 of us know that but in reality I want everybody in this room and everyone
302:03 everybody in this room and everyone building working with AI to be able to
302:06 building working with AI to be able to take those big steps. I think the
302:08 take those big steps. I think the patterns that we rely on today git uh
302:10 patterns that we rely on today git uh your your idees even the code how we
302:13 your your idees even the code how we think about the code itself might not
302:16 think about the code itself might not exist a year from now might not exist
302:18 exist a year from now might not exist six months from now and that's the
302:20 six months from now and that's the exciting part for me it's sort of we get
302:22 exciting part for me it's sort of we get to invent the future right now we get to
302:25 to invent the future right now we get to describe and decide how software is made
302:28 describe and decide how software is made and built uh kind of all the people in
302:30 and built uh kind of all the people in this room so my my challenge to you is
302:34 this room so my my challenge to you is to not be afraid to question the old
302:37 to not be afraid to question the old ways of how you're building software
302:38 ways of how you're building software because really the future is coming
302:41 because really the future is coming faster than any of us know. It's
302:43 faster than any of us know. It's probably already here and the cool thing
302:45 probably already here and the cool thing is we get to build it together. Thank
302:48 is we get to build it together. Thank you.
302:49 you. [applause]
302:51 [applause] [music]
302:53 [music] Our
303:01 next talk is a case study from the enterprise on incremental rollout of AI.
303:04 enterprise on incremental rollout of AI. Here to provide us with a blueprint for
303:06 Here to provide us with a blueprint for making AI transformation fundable,
303:08 making AI transformation fundable, governable and real inside large risk
303:11 governable and real inside large risk averse organizations is engineering
303:14 averse organizations is engineering leader at Northwestern Mutual, ASAF
303:17 leader at Northwestern Mutual, ASAF board.
303:23 >> [music] [applause]
303:32 >> Doesn't this look like something's going to drop from the ceiling? Like a ground
303:35 to drop from the ceiling? Like a ground zero type thing? [snorts] Be honest.
303:37 zero type thing? [snorts] Be honest. Like, who has a buzzer that if I'm I
303:39 Like, who has a buzzer that if I'm I really suck, they press it and
303:41 really suck, they press it and everything falls down through the trap
303:43 everything falls down through the trap door? No.
303:44 door? No. >> Be careful.
303:44 >> Be careful. >> Yeah. Okay. Who was it? Okay. you tell
303:48 >> Yeah. Okay. Who was it? Okay. you tell me if I'm doing okay or if I should take
303:49 me if I'm doing okay or if I should take a couple steps back. Right. So, hi
303:52 a couple steps back. Right. So, hi everyone. I'm Assaf. Um, and I'm here to
303:55 everyone. I'm Assaf. Um, and I'm here to talk about Genbi. And kind of first
303:58 talk about Genbi. And kind of first disclaimer, this presentation was not
304:00 disclaimer, this presentation was not created with Genai. Um, to be honest, I
304:03 created with Genai. Um, to be honest, I actually started doing it uh with uh
304:06 actually started doing it uh with uh GPT03 back in August. Uh, [snorts] and
304:09 GPT03 back in August. Uh, [snorts] and then I did kind of a first draft and
304:11 then I did kind of a first draft and then a couple of weeks back I wanted to
304:13 then a couple of weeks back I wanted to come in and refresh it before the
304:15 come in and refresh it before the conference and then GPT5 took over
304:18 conference and then GPT5 took over completely messed up my slide so I ended
304:21 completely messed up my slide so I ended up doing it manually kind of
304:22 up doing it manually kind of oldfashioned. So if I'm missing like an
304:25 oldfashioned. So if I'm missing like an M dash somewhere in the middle let me
304:27 M dash somewhere in the middle let me know after. Okay. [snorts] Uh, so first
304:30 know after. Okay. [snorts] Uh, so first of all a bit of housekeeping. What's
304:31 of all a bit of housekeeping. What's GenBI? So it's a fusion of Gen AI and
304:34 GenBI? So it's a fusion of Gen AI and BI. It's basically an agent that helps
304:37 BI. It's basically an agent that helps people answer business questions with
304:39 people answer business questions with data like a a business intelligence
304:42 data like a a business intelligence person would do in real life. Uh the
304:44 person would do in real life. Uh the reason that we're pursuing GenBI is
304:46 reason that we're pursuing GenBI is really because of the data
304:47 really because of the data democratization that it can bring.
304:49 democratization that it can bring. Right? So having access to data at your
304:52 Right? So having access to data at your fingertips without having to be reliant
304:54 fingertips without having to be reliant on a BI team that helps you find a
304:56 on a BI team that helps you find a report, figure out what it means, uh
304:58 report, figure out what it means, uh understand your world before they can
305:00 understand your world before they can even give you any kind of input. Uh, so
305:03 even give you any kind of input. Uh, so that's Genbi. Uh, a bit about
305:05 that's Genbi. Uh, a bit about Northwestern Mutual. That's where I
305:07 Northwestern Mutual. That's where I work. So, we're a financial services,
305:10 work. So, we're a financial services, life insurance, and wealth management.
305:11 life insurance, and wealth management. Been around for 160 years. Uh, [snorts]
305:14 Been around for 160 years. Uh, [snorts] some very impressive numbers there. But
305:16 some very impressive numbers there. But first of all, I want to say why is
305:18 first of all, I want to say why is Northwestern Mutual a great place to do
305:20 Northwestern Mutual a great place to do Gen AI? We got a lot of data, we got a
305:23 Gen AI? We got a lot of data, we got a lot of money, we got a lot of use cases,
305:26 lot of money, we got a lot of use cases, and we got access to some of the best
305:27 and we got access to some of the best talent uh, anyone can dream of. really
305:30 talent uh, anyone can dream of. really truly humbled by the people that I get
305:32 truly humbled by the people that I get to work with. Um, but on the flip side,
305:36 to work with. Um, but on the flip side, why is it hard to do Gen AI at
305:38 why is it hard to do Gen AI at Northwestern Mutual? Because it is a
305:40 Northwestern Mutual? Because it is a very riskaverse company, right? If you
305:43 very riskaverse company, right? If you think about it, our main motto is
305:45 think about it, our main motto is generational responsibility. I call it
305:48 generational responsibility. I call it don't f up. Uh, because what we end
305:51 don't f up. Uh, because what we end up selling to people is a decadesl long
305:55 up selling to people is a decadesl long commitment, right? you buy life
305:58 commitment, right? you buy life insurance now,
306:00 insurance now, uh, if you stay with us until it comes
306:03 uh, if you stay with us until it comes to term, so to speak, that can be 20,
306:06 to term, so to speak, that can be 20, 40, 80 years down the line, depending on
306:09 40, 80 years down the line, depending on when you buy it and how long you get to
306:10 when you buy it and how long you get to live. And so stability is something
306:13 live. And so stability is something that's very important for us because
306:15 that's very important for us because it's important for our clients. So, how
306:17 it's important for our clients. So, how do we balance stability with innovation?
306:20 do we balance stability with innovation? That's what I want to talk about today.
306:22 That's what I want to talk about today. Um, and really the four main challenges
306:26 Um, and really the four main challenges that we had when we even came up with
306:28 that we had when we even came up with the idea kind of a pie in the sky Genbi
306:31 the idea kind of a pie in the sky Genbi concept. Uh, first [snorts] of all, no
306:34 concept. Uh, first [snorts] of all, no one's done it before, right? Truly, no
306:36 one's done it before, right? Truly, no one's done GenBI in this fashion in the
306:38 one's done GenBI in this fashion in the past. Uh, secondly, and this was really
306:41 past. Uh, secondly, and this was really a preference for us, we wanted to use
306:44 a preference for us, we wanted to use actual data that's messy because we knew
306:47 actual data that's messy because we knew that those were that's where the real
306:50 that those were that's where the real challenges are going to be, right?
306:51 challenges are going to be, right? understanding actual messy data for 160
306:54 understanding actual messy data for 160 year old company and how can we perform
306:57 year old company and how can we perform well within that ecosystem. Um the third
307:00 well within that ecosystem. Um the third was kind of a blind trust bias. So um
307:05 was kind of a blind trust bias. So um the bi the trust that we had to build
307:07 the bi the trust that we had to build was both with the users but also with
307:09 was both with the users but also with the leadership of the company, right?
307:11 the leadership of the company, right? How can we bring accurate information,
307:14 How can we bring accurate information, accurate answers to people when uh all
307:17 accurate answers to people when uh all of these things that we know about and
307:19 of these things that we know about and everyone's talked about is is just out
307:21 everyone's talked about is is just out there, right? No one's blind to the
307:23 there, right? No one's blind to the trust barriers. No one's blind to the
307:24 trust barriers. No one's blind to the accuracy barriers. So, how do we
307:26 accuracy barriers. So, how do we convince that this is actually something
307:28 convince that this is actually something that we can trust in the company? And
307:31 that we can trust in the company? And lastly,
307:33 lastly, um but really firstly, when we go to
307:36 um but really firstly, when we go to approach this from an enterprise
307:37 approach this from an enterprise perspective, budget impact, right? How
307:40 perspective, budget impact, right? How do we convince someone in a leadership
307:42 do we convince someone in a leadership uh organization where risk averseman is
307:46 uh organization where risk averseman is ingrained in the DNA to even invest in
307:49 ingrained in the DNA to even invest in something like this that no one's done
307:51 something like this that no one's done before? We don't really know how we
307:53 before? We don't really know how we would do it. Uh we're not even sure how
307:54 would do it. Uh we're not even sure how it would look like when it comes to
307:56 it would look like when it comes to term.
307:58 term. Uh so I'll start kind of one by one uh
308:01 Uh so I'll start kind of one by one uh and first of all really talk about why
308:03 and first of all really talk about why we chose to use actual data uh and not
308:06 we chose to use actual data uh and not synthesized data or cleanse data. Uh
308:08 synthesized data or cleanse data. Uh [snorts] so really it's about making
308:10 [snorts] so really it's about making sure that we understand the actual
308:11 sure that we understand the actual complexities that we will have to face
308:14 complexities that we will have to face when we eventually want to go to
308:16 when we eventually want to go to production right we know that you know
308:18 production right we know that you know building uh PC's and demos is so easy
308:20 building uh PC's and demos is so easy but the gap from PC to production is so
308:23 but the gap from PC to production is so broad uh especially in this genai space
308:26 broad uh especially in this genai space especially because we don't know upfront
308:28 especially because we don't know upfront how to design the system what we would
308:30 how to design the system what we would expect it to behave like so making sure
308:33 expect it to behave like so making sure that we operate with real data just gave
308:35 that we operate with real data just gave us that extra confidence that when
308:36 us that extra confidence that when something works in the it's very likely
308:39 something works in the it's very likely to also work in reality. Uh but also and
308:42 to also work in reality. Uh but also and maybe not uh in the least less important
308:46 maybe not uh in the least less important is that we got to work with actual
308:48 is that we got to work with actual people who work with the data day in and
308:50 people who work with the data day in and day out and that gave us two things.
308:52 day out and that gave us two things. Okay, first of all subject matter
308:54 Okay, first of all subject matter expertise which are super critical for
308:56 expertise which are super critical for us to be able to validate that the
308:57 us to be able to validate that the system is actually working gave us a lot
309:00 system is actually working gave us a lot of real life examples of what people are
309:02 of real life examples of what people are actually asking in a corporate and what
309:04 actually asking in a corporate and what people have answered to them. So
309:06 people have answered to them. So basically the eval right and all the
309:08 basically the eval right and all the testing and stuff uh but at the end of
309:11 testing and stuff uh but at the end of the day it also brought the business to
309:15 the day it also brought the business to be a part of the research project itself
309:18 be a part of the research project itself and they became kind of bought into the
309:20 and they became kind of bought into the idea as part of the process. So we
309:23 idea as part of the process. So we didn't just test something in the lab
309:24 didn't just test something in the lab and then had to convince someone to go
309:26 and then had to convince someone to go ahead and use it. The end users were
309:29 ahead and use it. The end users were part of the research process itself. And
309:32 part of the research process itself. And so when eventually it matured enough so
309:34 so when eventually it matured enough so we can take some of that to production,
309:37 we can take some of that to production, they were already there and they
309:38 they were already there and they actually were pulling that. They told us
309:40 actually were pulling that. They told us we want to take this, how can we wrap
309:42 we want to take this, how can we wrap it? How can we package it uh quickly
309:44 it? How can we package it uh quickly enough so we can put it into practice?
309:50 Uh and the next part was really about building trust. Uh so this is about
309:53 building trust. Uh so this is about building trust first of all with our
309:55 building trust first of all with our management team. right now. I don't know
309:57 management team. right now. I don't know about you, but last time that I got a
309:59 about you, but last time that I got a million dollar to do a research project
310:01 million dollar to do a research project that I wanted in a pie sky idea, I woke
310:04 that I wanted in a pie sky idea, I woke up from the dream and I realized that
310:06 up from the dream and I realized that this is not how things work in reality.
310:08 this is not how things work in reality. You don't just get a million dollars and
310:10 You don't just get a million dollars and go ahead and try something out. Uh you
310:13 go ahead and try something out. Uh you had to show that you know what you're
310:14 had to show that you know what you're doing. And part of what we did, it's
310:17 doing. And part of what we did, it's kind of listed out here, but obviously,
310:19 kind of listed out here, but obviously, you know, we did all the regular stuff,
310:21 you know, we did all the regular stuff, right? We worked in a sandbox
310:22 right? We worked in a sandbox environment. We made sure that we're not
310:24 environment. We made sure that we're not using actual client data. We made sure
310:26 using actual client data. We made sure to put in all the security risks aside,
310:29 to put in all the security risks aside, but uh one of the first approaches that
310:31 but uh one of the first approaches that we said we're going to take is we're not
310:33 we said we're going to take is we're not just going to build a tool that's going
310:35 just going to build a tool that's going to be uh released to everyone, right? We
310:38 to be uh released to everyone, right? We understood very quickly that um how
310:43 understood very quickly that um how people interact with the tool, their
310:45 people interact with the tool, their ability to verify that what they're
310:47 ability to verify that what they're getting is right and also give us
310:49 getting is right and also give us feedback changes dramatically depending
310:51 feedback changes dramatically depending on their expertise and understanding of
310:53 on their expertise and understanding of the data. So we took that crawl, walk,
310:55 the data. So we took that crawl, walk, run approach that basically said we're
310:58 run approach that basically said we're first going to release it to actual BI
311:01 first going to release it to actual BI experts, right? People that would be
311:04 experts, right? People that would be able to do it on their own and know what
311:06 able to do it on their own and know what good looks like when they get it. and
311:07 good looks like when they get it. and we're just going to expedite the process
311:09 we're just going to expedite the process for them. Kind of like a GitHub
311:10 for them. Kind of like a GitHub co-pilot. The next phase would be to
311:13 co-pilot. The next phase would be to bring it to business managers. And
311:15 bring it to business managers. And again, people who are closer to the BI
311:17 again, people who are closer to the BI team, but when they see a mistake, they
311:20 team, but when they see a mistake, they can pretty much figure out that what
311:22 can pretty much figure out that what they're seeing is wrong because they're
311:24 they're seeing is wrong because they're used to seeing that on day-to-day basis.
311:26 used to seeing that on day-to-day basis. Um, and they will might be less
311:28 Um, and they will might be less sensitive to these types of mistakes and
311:30 sensitive to these types of mistakes and be more inclined to give us that
311:31 be more inclined to give us that feedback instead of just, you know,
311:33 feedback instead of just, you know, dumping it aside and never using it
311:35 dumping it aside and never using it again. giving this type of tool to
311:37 again. giving this type of tool to executives in the company. I don't even
311:39 executives in the company. I don't even know when we're going to get there,
311:41 know when we're going to get there, right? Like an executive, they want
311:43 right? Like an executive, they want clear, concise answers that they know
311:46 clear, concise answers that they know they can trust. We're definitely not
311:48 they can trust. We're definitely not there yet. I think that's the vision uh
311:50 there yet. I think that's the vision uh at some point in time, but the system is
311:52 at some point in time, but the system is not accurate enough for us to get there.
311:54 not accurate enough for us to get there. Maybe it never will be.
311:56 Maybe it never will be. Um, [snorts] another way that we another
311:58 Um, [snorts] another way that we another liver that we kind of used to build
312:01 liver that we kind of used to build inherent transit the system is that we
312:03 inherent transit the system is that we said well in the get-go we're not going
312:06 said well in the get-go we're not going to even try to build SQLs right this is
312:10 to even try to build SQLs right this is very complex this is very hard even for
312:13 very complex this is very hard even for a person so we said step number one
312:15 a person so we said step number one let's just bring information that is
312:18 let's just bring information that is already in the ecosystem that's already
312:20 already in the ecosystem that's already verified right we have a lot of uh
312:22 verified right we have a lot of uh certified reports and dashboards um and
312:25 certified reports and dashboards um and actually in the conversation we had with
312:27 actually in the conversation we had with some of the BI teams that we worked
312:29 some of the BI teams that we worked with. They told us guys like 80% of the
312:32 with. They told us guys like 80% of the work that we do is basically sending
312:33 work that we do is basically sending people to the right report and helping
312:35 people to the right report and helping them figure out how to use it. So the
312:37 them figure out how to use it. So the report is already there. Um and that
312:40 report is already there. Um and that again built some inherent trust into how
312:43 again built some inherent trust into how we architected the system because we
312:44 we architected the system because we said we're not going to make up
312:46 said we're not going to make up information. We're just going to deliver
312:48 information. We're just going to deliver you the same asset that you would have
312:49 you the same asset that you would have gotten anyway just in a much faster much
312:52 gotten anyway just in a much faster much more interactive way. uh and that was
312:54 more interactive way. uh and that was the alignment of expectations that we
312:56 the alignment of expectations that we did very upfront with the uh users and
312:59 did very upfront with the uh users and also with the management team.
313:01 also with the management team. Now [clears throat]
313:03 Now [clears throat] the biggest um
313:07 the biggest um process or kind of the most important
313:09 process or kind of the most important approach that we took when uh
313:11 approach that we took when uh approaching our leadership team and
313:12 approaching our leadership team and convincing them that we want to do this
313:15 convincing them that we want to do this was to create a very gradual incremental
313:18 was to create a very gradual incremental process that gave them a lot of
313:20 process that gave them a lot of visibility and control. [snorts] Uh and
313:23 visibility and control. [snorts] Uh and it was very important for us to build
313:25 it was very important for us to build incremental deliveries throughout that
313:27 incremental deliveries throughout that process so that uh not only did they
313:31 process so that uh not only did they have the the visibility into what are we
313:33 have the the visibility into what are we funding now, what do we get out of it,
313:35 funding now, what do we get out of it, they actually had business deliverables
313:37 they actually had business deliverables they could realize potential from
313:39 they could realize potential from throughout the process and at any point
313:42 throughout the process and at any point in time they could pull the plug right
313:43 in time they could pull the plug right and say okay like it's not working well
313:45 and say okay like it's not working well or we got enough out of it or you know
313:48 or we got enough out of it or you know the next phase is so you know unknown
313:50 the next phase is so you know unknown and long that we don't want further
313:52 and long that we don't want further invest in it. And this is how we
313:54 invest in it. And this is how we basically broke it down. So phase one
313:57 basically broke it down. So phase one was just pure research, right? We kind
313:59 was just pure research, right? We kind of did the shift from natural language
314:01 of did the shift from natural language to SQL. We figured out how to write
314:03 to SQL. We figured out how to write responses. We figure out how to
314:05 responses. We figure out how to understand questions that coming in.
314:07 understand questions that coming in. Just kind of setting the stage. Phase
314:09 Just kind of setting the stage. Phase [snorts] two was about really
314:10 [snorts] two was about really understanding, okay, so what does good
314:12 understanding, okay, so what does good metadata and good context look like in
314:16 metadata and good context look like in the perspective of a BI agent, right? It
314:18 the perspective of a BI agent, right? It looks very different if you're just
314:20 looks very different if you're just chatting with something or if you're
314:21 chatting with something or if you're trying to do a rag with you know
314:23 trying to do a rag with you know unstructured data like documents and uh
314:26 unstructured data like documents and uh business knowledge and stuff like that.
314:28 business knowledge and stuff like that. And this phase on its own already had uh
314:31 And this phase on its own already had uh impact on the business because when we
314:33 impact on the business because when we define what good metadata looks like for
314:34 define what good metadata looks like for an an LLM uh we could immediately apply
314:39 an an LLM uh we could immediately apply that also to just the ecosystem of data
314:41 that also to just the ecosystem of data users across the enterprise. Um, and by
314:45 users across the enterprise. Um, and by understanding how to extract LLM from
314:47 understanding how to extract LLM from the information, we could also how to
314:50 the information, we could also how to extract metadata. Sorry, here's where
314:52 extract metadata. Sorry, here's where the [snorts] trap door comes into play,
314:53 the [snorts] trap door comes into play, right? Um, we could also project that on
314:57 right? Um, we could also project that on how or what good metadata looks like for
315:00 how or what good metadata looks like for humans interacting with the data. We
315:02 humans interacting with the data. We have another initiative around semantic
315:04 have another initiative around semantic layer going on which tries to model
315:06 layer going on which tries to model exactly that and this provided a very
315:08 exactly that and this provided a very valuable input to that initiative as
315:10 valuable input to that initiative as well. But the immediate next step was
315:12 well. But the immediate next step was basically just doing this kind of uh
315:15 basically just doing this kind of uh multicontext semantic search, right?
315:17 multicontext semantic search, right? People coming in asking different
315:18 People coming in asking different questions and having the system figure
315:21 questions and having the system figure out what's the right context, what's the
315:23 out what's the right context, what's the right information we uh bring them. And
315:25 right information we uh bring them. And this is something that could already be
315:27 this is something that could already be packaged as its own product and
315:30 packaged as its own product and delivered uh and basically just do kind
315:32 delivered uh and basically just do kind of a data finder and data owner finder
315:35 of a data finder and data owner finder which is something that could take
315:38 which is something that could take anywhere between two to maybe four weeks
315:41 anywhere between two to maybe four weeks in an enterprise like Northwestern
315:43 in an enterprise like Northwestern Mutual just finding what data exists and
315:45 Mutual just finding what data exists and who owns it so I can start talk uh the
315:47 who owns it so I can start talk uh the conversation with them.
315:49 conversation with them. Um and the next layer was really about
315:52 Um and the next layer was really about pulling in information and trying to do
315:54 pulling in information and trying to do some light pivoting around the data. Um
315:57 some light pivoting around the data. Um each one of these steps as you can see
316:00 each one of these steps as you can see also created an input to the step to the
316:02 also created an input to the step to the following step so that the research
316:05 following step so that the research itself was kind of self u
316:07 itself was kind of self u self-propelling and there were
316:08 self-propelling and there were incremental outcomes coming out of each
316:10 incremental outcomes coming out of each one of these phases. Uh the next one is
316:13 one of these phases. Uh the next one is more kind of setting it up for
316:15 more kind of setting it up for enterprise level usage. So understanding
316:17 enterprise level usage. So understanding roles of in uh of different users coming
316:20 roles of in uh of different users coming in what they may be asking about what
316:22 in what they may be asking about what type of access we want to give them etc
316:24 type of access we want to give them etc and eventually and this is still some
316:27 and eventually and this is still some ways to go ahead uh building kind of a
316:29 ways to go ahead uh building kind of a fullyfledged NBI agent which doesn't
316:31 fullyfledged NBI agent which doesn't only quote information from existing
316:33 only quote information from existing reports but I can actually run SQL
316:36 reports but I can actually run SQL queries on its own uh pull in more data
316:39 queries on its own uh pull in more data do more sophisticated joints between
316:41 do more sophisticated joints between different data so it can answer more
316:42 different data so it can answer more complex questions so that's the road map
316:45 complex questions so that's the road map right That's kind of the high level
316:47 right That's kind of the high level plan. Now, why did that work? Well, kind
316:49 plan. Now, why did that work? Well, kind of quickly summarize them. We talked
316:50 of quickly summarize them. We talked about uh so we get value uh early and we
316:54 about uh so we get value uh early and we get value often. Each one of this was a
316:56 get value often. Each one of this was a six week sprint at the end of which we
316:58 six week sprint at the end of which we had had a very tangible deliverable
317:01 had had a very tangible deliverable coming back to the business that we
317:03 coming back to the business that we could decide to productize. Uh and at
317:06 could decide to productize. Uh and at any point in time, we could decide how
317:08 any point in time, we could decide how we want to move forward. There was
317:10 we want to move forward. There was transparent progress. There was
317:11 transparent progress. There was incremental business value. Uh each one
317:14 incremental business value. Uh each one of these steps allowed us to learn
317:16 of these steps allowed us to learn something that helped feed the next
317:18 something that helped feed the next step.
317:19 step. And maybe the most important part and
317:21 And maybe the most important part and that's the bottom line here and that's
317:23 that's the bottom line here and that's the part that executives really look at.
317:26 the part that executives really look at. How do we control the risk in continuing
317:28 How do we control the risk in continuing to invest in this type of research
317:30 to invest in this type of research project and this is really about
317:33 project and this is really about eliminating things like some cost bias
317:35 eliminating things like some cost bias right? We already paid you know you know
317:37 right? We already paid you know you know whatever a million dollar let's just get
317:39 whatever a million dollar let's just get through the project see what we get at
317:40 through the project see what we get at the end. This eliminates the uh uh fear
317:44 the end. This eliminates the uh uh fear of of competitors coming in and maybe we
317:46 of of competitors coming in and maybe we don't need to continue investing in this
317:48 don't need to continue investing in this right so everyone in the industry is
317:50 right so everyone in the industry is researching Genbi and there are
317:51 researching Genbi and there are solutions like data bricks genie that
317:53 solutions like data bricks genie that are coming up and they're getting better
317:54 are coming up and they're getting better and better maybe at some point in time
317:56 and better maybe at some point in time it's better for us as an organization to
317:58 it's better for us as an organization to actually adopt data bricks genie but at
318:01 actually adopt data bricks genie but at that point again first it's much easier
318:03 that point again first it's much easier for us to pull the plug and the funding
318:05 for us to pull the plug and the funding but we already have a good understanding
318:08 but we already have a good understanding of what good looks like we have
318:09 of what good looks like we have benchmarks that we used for ourselves
318:11 benchmarks that we used for ourselves when testing our own system that we can
318:13 when testing our own system that we can test a third party solution with. And we
318:16 test a third party solution with. And we know what to expect, right? We know what
318:18 know what to expect, right? We know what works, we know what doesn't. We know
318:20 works, we know what doesn't. We know what a kind of fluffy demo from a vendor
318:23 what a kind of fluffy demo from a vendor would look like. And we know where to
318:24 would look like. And we know where to drill in to ask the tough questions.
318:30 So let's see kind of what it looks like under the hood and how we productize
318:33 under the hood and how we productize different elements uh of this
318:34 different elements uh of this architecture. Uh and maybe kind of very
318:36 architecture. Uh and maybe kind of very quickly, why can't we just do it with uh
318:38 quickly, why can't we just do it with uh chat GPT? So you know [snorts] just
318:40 chat GPT? So you know [snorts] just dumping a schema into chat GPT doesn't
318:42 dumping a schema into chat GPT doesn't work. Usually schemas are very messy.
318:44 work. Usually schemas are very messy. It's not uh easy to understand the
318:46 It's not uh easy to understand the context and the meaning of things. Uh
318:48 context and the meaning of things. Uh [snorts] and eventually governance is
318:50 [snorts] and eventually governance is super important. So there was a lot of
318:51 super important. So there was a lot of governance built into the architecture
318:53 governance built into the architecture that was very hard to apply on Czech GPD
318:55 that was very hard to apply on Czech GPD from the outside but even solutions like
318:58 from the outside but even solutions like you know data bricks genius third party
319:00 you know data bricks genius third party much harder to govern from the outside
319:01 much harder to govern from the outside than from the inside. But still TBD.
319:05 than from the inside. But still TBD. Uh so the stack kind of looks like this.
319:07 Uh so the stack kind of looks like this. Uh we have a data and metadata layer
319:10 Uh we have a data and metadata layer that we produced. We have four different
319:11 that we produced. We have four different agents that are running across the
319:13 agents that are running across the pipeline. A metadata agent that
319:16 pipeline. A metadata agent that understands the context. A rag agent
319:17 understands the context. A rag agent that finds the different reports. An SQL
319:20 that finds the different reports. An SQL agent that can pull more data if we need
319:22 agent that can pull more data if we need that. And then eventually what we call a
319:23 that. And then eventually what we call a BI agent that takes all that information
319:25 BI agent that takes all that information and delivers an answer to the question
319:27 and delivers an answer to the question that was asked. On top of that, we slap
319:30 that was asked. On top of that, we slap governance and trust and orchestration
319:32 governance and trust and orchestration and eventually some kind of a contextual
319:34 and eventually some kind of a contextual UI. Um and this is how the flow goes. So
319:39 UI. Um and this is how the flow goes. So when a business question comes in we uh
319:42 when a business question comes in we uh push it into the orchestrator and
319:44 push it into the orchestrator and basically decides how to facilitate the
319:46 basically decides how to facilitate the process. The first thing that we do is
319:48 process. The first thing that we do is understanding the context. So that's
319:50 understanding the context. So that's where that metadata agent comes in works
319:52 where that metadata agent comes in works with the catalog works with all the
319:54 with the catalog works with all the documentation that we have across the
319:55 documentation that we have across the system to understand what we're being
319:57 system to understand what we're being asked about and what's the relevant
319:59 asked about and what's the relevant information to share. Then we go to the
320:01 information to share. Then we go to the rag agent which tries to find an
320:03 rag agent which tries to find an existing report again out of a list of
320:05 existing report again out of a list of certified reports that we know are
320:08 certified reports that we know are allowed for people to use and people
320:09 allowed for people to use and people have spent a lot of time fine-tuning
320:11 have spent a lot of time fine-tuning them and making them as accurate as
320:13 them and making them as accurate as possible.
320:14 possible. If we can't find the report or if it's
320:17 If we can't find the report or if it's not exactly what we need to um to use,
320:19 not exactly what we need to um to use, that's where we go to the SQL agent that
320:22 that's where we go to the SQL agent that basically tries to create a more um
320:25 basically tries to create a more um exact query or a more elaborate query.
320:28 exact query or a more elaborate query. And even if the report that we have is
320:31 And even if the report that we have is not usable as is, it gives us that
320:34 not usable as is, it gives us that initial seed of a query that we can then
320:36 initial seed of a query that we can then expand on rather than having to build
320:38 expand on rather than having to build one from scratch. So it's kind of like a
320:41 one from scratch. So it's kind of like a fewot uh example, but in this case the
320:45 fewot uh example, but in this case the example that we give is very very close
320:47 example that we give is very very close to the actual result that we're
320:49 to the actual result that we're expecting to get. We then execute it
320:52 expecting to get. We then execute it against the database pull and push it
320:53 against the database pull and push it into the BI agent which gen with which
320:56 into the BI agent which gen with which gen uh translate that to a business
320:59 gen uh translate that to a business answer and not just dumping data back on
321:02 answer and not just dumping data back on the user and this is what goes into the
321:05 the user and this is what goes into the final answer. Now there's obviously some
321:06 final answer. Now there's obviously some kind of a loop that says if I'm in the
321:08 kind of a loop that says if I'm in the same conversation I'm probably talking
321:10 same conversation I'm probably talking about the same data so we don't have to
321:12 about the same data so we don't have to talk about this or do this again and
321:13 talk about this or do this again and again. Now
321:16 again. Now each one of these three components, each
321:17 each one of these three components, each one of these three agents can be
321:19 one of these three agents can be packaged as its own product and
321:23 packaged as its own product and delivered to production with a very
321:26 delivered to production with a very tangible and actual impact on business
321:29 tangible and actual impact on business metrics. Okay. And that's the kind of
321:32 metrics. Okay. And that's the kind of beauty of this uh approach that after we
321:36 beauty of this uh approach that after we productize each one of these, we could
321:38 productize each one of these, we could have basically said stop or let's move
321:40 have basically said stop or let's move forward.
321:42 forward. uh and just some giving bottom line
321:44 uh and just some giving bottom line numbers around some of these. So just
321:46 numbers around some of these. So just the rag agent that pulls the right
321:49 the rag agent that pulls the right report uh allowed us to take about 20%
321:53 report uh allowed us to take about 20% of the overall capacity of the BI team
321:56 of the overall capacity of the BI team that basically said uh all we do is just
322:00 that basically said uh all we do is just share the right report with the right
322:01 share the right report with the right person. So we were able to automate
322:03 person. So we were able to automate around 80% out of those uh 20% and we're
322:06 around 80% out of those uh 20% and we're talking about a team of 10 people. So
322:10 talking about a team of 10 people. So roughly two people full-time job all
322:13 roughly two people full-time job all they do is find the right report and
322:15 they do is find the right report and send it to the right person.
322:18 send it to the right person. uh the metadata understandings that we
322:20 uh the metadata understandings that we got from learning how to interact with
322:22 got from learning how to interact with the data through an LLM allowed us to
322:24 the data through an LLM allowed us to run AB test in a in the semantic layer
322:27 run AB test in a in the semantic layer project that we did and that allowed us
322:29 project that we did and that allowed us to prove back again to the senior
322:33 to prove back again to the senior leadership in the company that there is
322:34 leadership in the company that there is value and tangible value measurable
322:38 value and tangible value measurable value in enriching metadata. And we did
322:41 value in enriching metadata. And we did that basically by running uh a a battery
322:44 that basically by running uh a a battery of questions um against a database that
322:47 of questions um against a database that had good metadata and one that didn't
322:49 had good metadata and one that didn't have good metadata. And we show how much
322:52 have good metadata. And we show how much better an LLM performs when having the
322:54 better an LLM performs when having the right metadata in place. So basically
322:56 right metadata in place. So basically proving the value of something that can
322:58 proving the value of something that can be very fluffy like hey let's bring in
323:00 be very fluffy like hey let's bring in more documentation into the code. Uh
323:03 more documentation into the code. Uh right now we're experimenting with the
323:05 right now we're experimenting with the data pivoting bot. Uh so once you have a
323:08 data pivoting bot. Uh so once you have a dashboard or a report be able to change
323:11 dashboard or a report be able to change the time horizon some of the views some
323:13 the time horizon some of the views some of the segmentations and the groupings
323:14 of the segmentations and the groupings of the data again kind of real time
323:16 of the data again kind of real time without having a person do that for uh a
323:20 without having a person do that for uh a business stakeholder and some of the
323:22 business stakeholder and some of the next steps is really evaluating the
323:23 next steps is really evaluating the tools that are out there for uh Genbi
323:26 tools that are out there for uh Genbi like data bricks genie for example and
323:28 like data bricks genie for example and we're going to go into a much more
323:30 we're going to go into a much more rigorous process of enriching our
323:31 rigorous process of enriching our catalog with metadata and documentation
323:34 catalog with metadata and documentation and that's also going to come out of a
323:36 and that's also going to come out of a lot of the learnings that we got from uh
323:38 lot of the learnings that we got from uh the research that we've done. So even if
323:40 the research that we've done. So even if we don't end up writing a GenBI agent
323:44 we don't end up writing a GenBI agent full-fledged end to end, we already got
323:47 full-fledged end to end, we already got a lot of value back from this and this
323:49 a lot of value back from this and this is really what allowed our senior
323:51 is really what allowed our senior leadership team to continuously invest
323:53 leadership team to continuously invest in this project quarter over quarter.
323:57 in this project quarter over quarter. One thing that I want to wrap up with is
324:00 One thing that I want to wrap up with is just a couple of thoughts I had about
324:02 just a couple of thoughts I had about the future. So um I think we talk a lot
324:05 the future. So um I think we talk a lot about how to prepare data. I think
324:07 about how to prepare data. I think that's going to be a huge area in the
324:08 that's going to be a huge area in the market and they're going to be probably
324:09 market and they're going to be probably a lot of companies and tools that are
324:11 a lot of companies and tools that are going to help us with that. Uh building
324:14 going to help us with that. Uh building very specific task specific models and
324:17 very specific task specific models and applications. I think a lot of startups
324:19 applications. I think a lot of startups and companies are going to come up from
324:20 and companies are going to come up from that area. Uh co-pilots is really making
324:24 that area. Uh co-pilots is really making sure that we meet the users where they
324:25 sure that we meet the users where they are. Uh and securing of models obviously
324:28 are. Uh and securing of models obviously a very big thing. The last thing is the
324:30 a very big thing. The last thing is the one the the one I want to focus on the
324:32 one the the one I want to focus on the most because that's kind of a recent
324:34 most because that's kind of a recent thought that came to me a couple of
324:36 thought that came to me a couple of weeks ago. How we do pricing of SAS in
324:39 weeks ago. How we do pricing of SAS in the Gen AI era. Uh this is really about
324:42 the Gen AI era. Uh this is really about the fact that one individual person
324:45 the fact that one individual person today can be 10x more effective uh than
324:48 today can be 10x more effective uh than they used to be in the past. And then do
324:51 they used to be in the past. And then do we price uh software based on seats or
324:54 we price uh software based on seats or do we price software based on how much
324:56 do we price software based on how much they used it or do we price software
324:58 they used it or do we price software based on the value that they got out of
325:00 based on the value that they got out of it. Uh Salesforce is already
325:03 it. Uh Salesforce is already experimenting with that. So the the data
325:05 experimenting with that. So the the data cloud product at Salesforce is starting
325:06 cloud product at Salesforce is starting to be uh usage priced and not seats
325:10 to be uh usage priced and not seats priced. And I think this is going to
325:11 priced. And I think this is going to have a big impact on just the uh kind of
325:14 have a big impact on just the uh kind of SAS economics worldwide.
325:17 SAS economics worldwide. uh and it it doesn't even matter if the
325:19 uh and it it doesn't even matter if the product itself is genai. It's really
325:20 product itself is genai. It's really about what does the person using the
325:23 about what does the person using the product can do and what can they do in
325:25 product can do and what can they do in their other time uh and whether it still
325:28 their other time uh and whether it still makes sense to price it by how many
325:29 makes sense to price it by how many employees you have or how much work you
325:32 employees you have or how much work you get done with the employees that you
325:34 get done with the employees that you have.
325:36 have. That is me and thank you very much for
325:38 That is me and thank you very much for listening and thanks for not opening the
325:40 listening and thanks for not opening the door on me.
325:42 door on me. [applause]
325:44 [applause] Our
325:55 next presenter [music] is the head of technology infrastructure engineering at
325:58 technology infrastructure engineering at Bloomberg. He's here to tell us what
326:00 Bloomberg. He's here to tell us what they learned deploying AI within
326:03 they learned deploying AI within Bloomberg's engineering organization.
326:05 Bloomberg's engineering organization. Please join me in welcoming to the stage
326:08 Please join me in welcoming to the stage Le Jang.
326:20 >> All right. I don't have a joke about the dot. I don't have a joke about the uh
326:21 dot. I don't have a joke about the uh hot dog either. So, I would just jump to
326:23 hot dog either. So, I would just jump to the topic right away. Um so, my name is
326:26 the topic right away. Um so, my name is Lei. Um I lead the uh department of
326:29 Lei. Um I lead the uh department of technology infrastructure in Bloommer.
326:31 technology infrastructure in Bloommer. So, we're basically a group of
326:32 So, we're basically a group of technologists focused on global
326:34 technologists focused on global infrastructure. Think data centers
326:36 infrastructure. Think data centers connectivities
326:38 connectivities um developer productivities. uh think
326:41 um developer productivities. uh think SCS tooling and also uh reliability
326:45 SCS tooling and also uh reliability solutions think telemetry and instant
326:48 solutions think telemetry and instant responses right so um depends on the
326:52 responses right so um depends on the audience sometimes uh you know you're
326:54 audience sometimes uh you know you're familiar with what Bloomer is sometimes
326:56 familiar with what Bloomer is sometimes you don't so I thought it might be good
326:57 you don't so I thought it might be good idea to talk a little bit about about
327:00 idea to talk a little bit about about our company
327:02 our company um so there's no better way to talk
327:03 um so there's no better way to talk about our company by sharing some
327:05 about our company by sharing some numbers I want to highlight a few
327:07 numbers I want to highlight a few numbers we have more than 9,000
327:09 numbers we have more than 9,000 engineers and most of them are software
327:11 engineers and most of them are software engineers. Uh we handle a lot of market
327:13 engineers. Uh we handle a lot of market techs uh which in the billions and 600
327:16 techs uh which in the billions and 600 billions I believe and um we also have
327:20 billions I believe and um we also have tons of folks uh focus on AI research
327:24 tons of folks uh focus on AI research and engineering. So we have a more than
327:27 and engineering. So we have a more than you know really today's 500 plus
327:29 you know really today's 500 plus employees focus on AI products uh for um
327:33 employees focus on AI products uh for um sort of our customers. So takeaway here
327:36 sort of our customers. So takeaway here is we are I guess you know building a
327:40 is we are I guess you know building a lot of software and use a lot of data to
327:44 lot of software and use a lot of data to empower our flagship product which is
327:46 empower our flagship product which is called the bloomer terminal and to
327:48 called the bloomer terminal and to really support our users to make the
327:50 really support our users to make the most important f decisions for them to
327:53 most important f decisions for them to do their job um the best. Um in the
327:58 do their job um the best. Um in the technical lens um a lot of time kind of
328:02 technical lens um a lot of time kind of to explain that we actually have one of
328:04 to explain that we actually have one of the largest private network uh in the
328:06 the largest private network uh in the whole world. We also have one of the
328:08 whole world. We also have one of the largest JavaScript codebase um in the
328:11 largest JavaScript codebase um in the world. Um we because the domain we're in
328:16 world. Um we because the domain we're in uh so terminal is really you can think
328:18 uh so terminal is really you can think of a um software that supports thousands
328:23 of a um software that supports thousands of different applications. uh we call
328:25 of different applications. uh we call them functions right um email is a
328:28 them functions right um email is a function uh news is a group of functions
328:33 function uh news is a group of functions um let's say fixed income price to yield
328:36 um let's say fixed income price to yield calculation to spread calculation is
328:37 calculation to spread calculation is another function um trading workflows is
328:41 another function um trading workflows is another group of functions so there's
328:43 another group of functions so there's many many many different type of
328:44 many many many different type of functions as you can imagine we kind of
328:46 functions as you can imagine we kind of have to utilize different technologies
328:48 have to utilize different technologies to really support those uh
328:50 to really support those uh functionalities
328:52 functionalities uh we also been
328:55 uh we also been increasingly more than use but also
328:57 increasingly more than use but also contribute to open source communities.
328:59 contribute to open source communities. um for this audience I guess I want to
329:01 um for this audience I guess I want to call out you know we kind of helped
329:03 call out you know we kind of helped creation of the case envoy AI gateways
329:07 creation of the case envoy AI gateways and among many many other things that
329:09 and among many many other things that that we deploy inhouse and support the
329:11 that we deploy inhouse and support the communities again in summary there's a
329:14 communities again in summary there's a lot of software there's a lot of data uh
329:16 lot of software there's a lot of data uh we kind of have to um figure out how to
329:20 we kind of have to um figure out how to make the best of AI tooling to support
329:22 make the best of AI tooling to support us to do our engineering work all right
329:25 us to do our engineering work all right so get to what is AI for coding Um we
329:30 so get to what is AI for coding Um we started about two years ago maybe a
329:33 started about two years ago maybe a little bit more than that. Um and as I
329:37 little bit more than that. Um and as I guess the rest of the world we look at
329:38 guess the rest of the world we look at the toolings provided and you know I
329:41 the toolings provided and you know I apologize if if your logos are not here.
329:44 apologize if if your logos are not here. Um but as you can imagine it's kind of
329:46 Um but as you can imagine it's kind of like overwhelming right there's so many
329:48 like overwhelming right there's so many things and every day there's news about
329:50 things and every day there's news about this is great this is great. Um so at
329:53 this is great this is great. Um so at the time we actually didn't know what
329:57 the time we actually didn't know what all the AI solutions can help us to
330:01 all the AI solutions can help us to uh boost our productivities as well as
330:03 uh boost our productivities as well as stability. But one thing we knew at the
330:06 stability. But one thing we knew at the time is um unless we deploy and try we
330:13 time is um unless we deploy and try we wouldn't know what's the best way to
330:14 wouldn't know what's the best way to benefit from all the awesome work and
330:16 benefit from all the awesome work and and you know a lot of folks are
330:18 and you know a lot of folks are contributing to. So at the time uh we
330:21 contributing to. So at the time uh we quickly form a team people start
330:25 quickly form a team people start kind of like release um kind a set of
330:28 kind of like release um kind a set of capabilities so that people start
330:30 capabilities so that people start iterating on um utilizing the toolings
330:33 iterating on um utilizing the toolings and then of course you know we are data
330:35 and then of course you know we are data company so kind of want to get a sense
330:37 company so kind of want to get a sense of how we measure the impact and um what
330:41 of how we measure the impact and um what we can do from the capability we provide
330:44 we can do from the capability we provide right so we look at the typical
330:47 right so we look at the typical developer productivity measurements we
330:50 developer productivity measurements we run a you survey. Uh it was very obvious
330:52 run a you survey. Uh it was very obvious that people felt like there's much
330:55 that people felt like there's much quicker uh proof concept, people roll
330:58 quicker uh proof concept, people roll out tests, uh there's a lot of one time
331:01 out tests, uh there's a lot of one time use scripts being generated and then the
331:03 use scripts being generated and then the measurements dropped actually pretty
331:05 measurements dropped actually pretty quickly when you
331:07 quickly when you go beyond all the green field type of
331:10 go beyond all the green field type of thing, right? And then then we start
331:13 thing, right? And then then we start thinking like okay so what are the
331:16 thinking like okay so what are the things that we should really be doing
331:18 things that we should really be doing using all those wonderful things so that
331:19 using all those wonderful things so that we can really make a dent um in the in
331:23 we can really make a dent um in the in the space and then at this time we also
331:27 the space and then at this time we also kind of like also be thoughtful of um
331:31 kind of like also be thoughtful of um unleash a very powerful tooling right uh
331:36 unleash a very powerful tooling right uh the the benefits is it's very fast the
331:39 the the benefits is it's very fast the challenge is also it's very fast, right?
331:42 challenge is also it's very fast, right? U for any of you who actually dealt with
331:45 U for any of you who actually dealt with hundreds of millions of lines code, you
331:49 hundreds of millions of lines code, you probably understand the system
331:51 probably understand the system complexity is a at least
331:55 complexity is a at least um exponential or at least polinomial as
331:59 um exponential or at least polinomial as function of your line of code or
332:01 function of your line of code or software assets, right? So at some point
332:03 software assets, right? So at some point you kind of want to be very careful uh
332:05 you kind of want to be very careful uh what you do with your software assets.
332:08 what you do with your software assets. And what we thought so maybe we should
332:09 And what we thought so maybe we should look at some of the basics. One idea we
332:13 look at some of the basics. One idea we had is
332:14 had is um all right so AI for coding there's
332:17 um all right so AI for coding there's narrow definition of what coding is but
332:19 narrow definition of what coding is but there's also a broader definition of
332:20 there's also a broader definition of what software engineering right and then
332:23 what software engineering right and then maybe we can also look into some of the
332:25 maybe we can also look into some of the work our developers don't really prefer
332:29 work our developers don't really prefer to do for instance
332:32 to do for instance um some men work some of the migration
332:35 um some men work some of the migration work some of the I don't know
332:37 work some of the I don't know maintenance work and stuff like that so
332:39 maintenance work and stuff like that so I want to give some examples of the
332:40 I want to give some examples of the things that we been trying and we think
332:43 things that we been trying and we think there's pretty good return on
332:44 there's pretty good return on investment.
332:46 investment. So the question we ask ourselves is how
332:48 So the question we ask ourselves is how do we evolve our codebase right the
332:50 do we evolve our codebase right the first one is all right wouldn't it be
332:53 first one is all right wouldn't it be cool uh the day you get a ticket say hey
332:56 cool uh the day you get a ticket say hey you know this piece of software is
332:57 you know this piece of software is patched and at the same time you have a
333:00 patched and at the same time you have a pull request with the fix with a patch
333:04 pull request with the fix with a patch and also with thinking why the patch
333:06 and also with thinking why the patch happened that way right so it's kind of
333:08 happened that way right so it's kind of like we're trying to uh broadly deploy
333:12 like we're trying to uh broadly deploy something called uplift agents
333:14 something called uplift agents um broadly scan through our codebase and
333:17 um broadly scan through our codebase and figure out what the patch would be
333:18 figure out what the patch would be applicable and be able to apply those
333:20 applicable and be able to apply those patch step back a little bit. We did
333:22 patch step back a little bit. We did have a reg based refraction tool. Um it
333:26 have a reg based refraction tool. Um it works to some extent but it's limited
333:28 works to some extent but it's limited right now with um LMS and other tooling.
333:31 right now with um LMS and other tooling. So we are able to uh see very much
333:34 So we are able to uh see very much better results from the um uplift
333:36 better results from the um uplift agents. So there are a few challenges in
333:38 agents. So there are a few challenges in case you also plan to deploy such
333:41 case you also plan to deploy such capabilities. The first one is
333:44 capabilities. The first one is I guess any AI or ML it would be really
333:46 I guess any AI or ML it would be really nice if there's some detistic
333:48 nice if there's some detistic verification capability. uh oftentimes
333:51 verification capability. uh oftentimes it's not so easy especially if you have
333:52 it's not so easy especially if you have test cases you don't have good llinter
333:54 test cases you don't have good llinter if you don't have good verification the
333:56 if you don't have good verification the the patch can sometimes be uh uh
334:00 the patch can sometimes be uh uh difficult to to to be applied
334:03 difficult to to to be applied and uh one thing we also realized when
334:05 and uh one thing we also realized when we deploy AI tooling is the average open
334:09 we deploy AI tooling is the average open pull requests increased and time to
334:12 pull requests increased and time to merge also increased uh because you're
334:15 merge also increased uh because you're spinning a lot of new code and then
334:17 spinning a lot of new code and then still we have to review the code and
334:19 still we have to review the code and merge the code right so time to merge
334:20 merge the code right so time to merge become a challenge sometimes. And the
334:22 become a challenge sometimes. And the last one is um I think it applies to any
334:26 last one is um I think it applies to any gen is the shift becomes what do we want
334:29 gen is the shift becomes what do we want to achieve rather than how we want to
334:30 to achieve rather than how we want to achieve. Right? So the second example
334:34 achieve. Right? So the second example that I I want to share is uh the other
334:38 that I I want to share is uh the other area that people kind of like sometimes
334:41 area that people kind of like sometimes really impact our productivity in a
334:43 really impact our productivity in a negative way or impact our stability in
334:45 negative way or impact our stability in negative way is how we handle instance.
334:47 negative way is how we handle instance. So we're trying to develop and then
334:49 So we're trying to develop and then deploy um in response agents. Um now
334:56 deploy um in response agents. Um now the importance of this is if you really
334:59 the importance of this is if you really think about GI tools it's really really
335:01 think about GI tools it's really really fast and it's also unbiased right
335:05 fast and it's also unbiased right instance it can go through your codebase
335:08 instance it can go through your codebase really quickly. It can go through your
335:10 really quickly. It can go through your telemetry system very quickly. It can go
335:12 telemetry system very quickly. It can go through your feature flags very quickly.
335:14 through your feature flags very quickly. it can go through your um I don't know
335:16 it can go through your um I don't know call trace very quickly and in I
335:19 call trace very quickly and in I unbiased lens when we do troubleshooting
335:21 unbiased lens when we do troubleshooting sometimes we have this biased views it
335:23 sometimes we have this biased views it must be this it turns out to be not the
335:25 must be this it turns out to be not the case so there's many many interesting
335:28 case so there's many many interesting benefits um by uh deploying agents from
335:32 benefits um by uh deploying agents from this perspective
335:35 this perspective and then the second question is become
335:38 and then the second question is become interesting is imagine you have
335:40 interesting is imagine you have organization of 10,000 pe um let's say
335:43 organization of 10,000 pe um let's say 9,000 people as I described a lot of
335:46 9,000 people as I described a lot of people trying to fix those problems,
335:47 people trying to fix those problems, right? And you can have 10 teams who
335:49 right? And you can have 10 teams who wants to build a pull request review
335:51 wants to build a pull request review bots. You have 20 teams who wants to
335:54 bots. You have 20 teams who wants to build a instant response agents, right?
335:57 build a instant response agents, right? They become very quickly chaotic and
336:00 They become very quickly chaotic and sometimes can have duplications.
336:03 sometimes can have duplications. So before I talk about the pay pass, I'm
336:06 So before I talk about the pay pass, I'm going to give example of the uh instance
336:09 going to give example of the uh instance response agent. So basically this is
336:11 response agent. So basically this is what you know a in response agent will
336:15 what you know a in response agent will look like. Um the key part is we're
336:17 look like. Um the key part is we're going to need to build a lot of MCP
336:18 going to need to build a lot of MCP servers to connect to the uh the metrics
336:22 servers to connect to the uh the metrics and logs dashboards you have connect to
336:24 and logs dashboards you have connect to the topology you have whether it's
336:26 the topology you have whether it's network topology or it's the um your
336:29 network topology or it's the um your service dependency topology uh your
336:31 service dependency topology uh your alarms your triggers right your SLOs's
336:35 alarms your triggers right your SLOs's and then we kind of don't want people
336:37 and then we kind of don't want people just start building MCP servers uh
336:41 just start building MCP servers uh without a pay pass so we created a pay
336:43 without a pay pass so we created a pay pass in partnership with our AI
336:45 pass in partnership with our AI organization and I will talk a little
336:48 organization and I will talk a little bit what that means.
336:50 bit what that means. Before that
336:52 Before that um I do want to explain a little bit
336:54 um I do want to explain a little bit some of the platform principles.
336:57 some of the platform principles. Some company allow teams to be have a
337:01 Some company allow teams to be have a lot of freedom as at the same time
337:03 lot of freedom as at the same time responsibility in the sense a business
337:05 responsibility in the sense a business unit can build whatever infrastructure
337:06 unit can build whatever infrastructure whatever platform.
337:08 whatever platform. um some organization
337:11 um some organization have a very very strong tight
337:13 have a very very strong tight abstraction of the service
337:14 abstraction of the service infrastructure and typically kind of
337:16 infrastructure and typically kind of have to use their platforms right so
337:19 have to use their platforms right so Bloomberg is kind of in the middle if
337:20 Bloomberg is kind of in the middle if you look at the golden ones we kind of
337:23 you look at the golden ones we kind of believe in provide a golden path
337:26 believe in provide a golden path um with enablement teams so my team is
337:30 um with enablement teams so my team is really a en enabling team and one of the
337:34 really a en enabling team and one of the guiding principle for us is we want to
337:37 guiding principle for us is we want to make easy is extremely easy to do. Uh
337:40 make easy is extremely easy to do. Uh sorry, the right thing is extremely easy
337:41 sorry, the right thing is extremely easy to do and we want to make sure the wrong
337:43 to do and we want to make sure the wrong thing is ridiculously hard to do. So
337:45 thing is ridiculously hard to do. So that's the guiding principle here.
337:48 that's the guiding principle here. Now move on. So what is the pay path
337:51 Now move on. So what is the pay path here? So the pay path is
337:53 here? So the pay path is uh we have a gateway so that teams can
337:56 uh we have a gateway so that teams can easily figure out which model works the
337:58 easily figure out which model works the best. They can do quick experiments.
338:00 best. They can do quick experiments. they can um we can have visibility of
338:03 they can um we can have visibility of what kind of models being used and we
338:05 what kind of models being used and we can also guide through the teams which
338:06 can also guide through the teams which model should is a better fit for the for
338:08 model should is a better fit for the for the problem they want to solve. uh we
338:10 the problem they want to solve. uh we have a two discovery uh basically MCP
338:13 have a two discovery uh basically MCP directory via hub so that let's say team
338:16 directory via hub so that let's say team A wants to do something they will go to
338:17 A wants to do something they will go to the hub okay someone is building MCB
338:19 the hub okay someone is building MCB server already maybe I should partner
338:21 server already maybe I should partner with them to build it together right
338:24 with them to build it together right uh tool creation and deployment is via a
338:26 uh tool creation and deployment is via a pass it's basically a um you know a
338:31 pass it's basically a um you know a standard platform service where you can
338:33 standard platform service where you can do your SLC and and we provide runtime
338:35 do your SLC and and we provide runtime environment for you as well taking care
338:37 environment for you as well taking care of all and side of things as well so it
338:40 of all and side of things as well so it really reduce friction of for for teams
338:42 really reduce friction of for for teams to to deploy um their MT MCP servers.
338:46 to to deploy um their MT MCP servers. And then this is kind of interesting is
338:49 And then this is kind of interesting is we want to make demo very easy so that
338:52 we want to make demo very easy so that or I really say proof concept very easy
338:54 or I really say proof concept very easy so that people can try have ideal
338:56 so that people can try have ideal generation uh because we believe in
338:58 generation uh because we believe in creativity come from some freedom of try
339:02 creativity come from some freedom of try different new things but we also want to
339:04 different new things but we also want to make sure the production requires some
339:06 make sure the production requires some quality control. um
339:10 quality control. um because at the end of the day stability
339:12 because at the end of the day stability and system reliability is at the core of
339:15 and system reliability is at the core of our business. So this is sort of the
339:17 our business. So this is sort of the path that we deployed um and enabled the
339:21 path that we deployed um and enabled the rest of engineering really the 9,000
339:23 rest of engineering really the 9,000 software engineers to do their job.
339:25 software engineers to do their job. Okay.
339:27 Okay. And um with all this and then we start
339:30 And um with all this and then we start maybe okay yes we got path uh path we
339:34 maybe okay yes we got path uh path we have some good ideas of how to evolve
339:36 have some good ideas of how to evolve our codebase.
339:37 our codebase. help out our people right um now this is
339:42 help out our people right um now this is where I find that
339:45 where I find that any new things any adoption of new
339:47 any new things any adoption of new things provide opportunity to leverage
339:50 things provide opportunity to leverage the strength you have and also identify
339:52 the strength you have and also identify the some of the weakness that you may
339:54 the some of the weakness that you may have so um in Bloomberg we have a
339:58 have so um in Bloomberg we have a wellestablished training program uh it's
340:00 wellestablished training program uh it's more than 20 years so there's on
340:01 more than 20 years so there's on boarding training depends on entry level
340:03 boarding training depends on entry level it depends on senior level um so we have
340:06 it depends on senior level um so we have this whole training program to prepare
340:08 this whole training program to prepare folks to before they join a team. And
340:11 folks to before they join a team. And what we did is we just incorporate AI
340:13 what we did is we just incorporate AI coding in on boarding training program
340:15 coding in on boarding training program and also show them how to best utilize
340:18 and also show them how to best utilize them with our principles and our
340:19 them with our principles and our technologies right there's a huge
340:22 technologies right there's a huge benefits here because um if any of you
340:24 benefits here because um if any of you run into the challenge of adoption
340:26 run into the challenge of adoption somehow run into a chasm right the rest
340:29 somehow run into a chasm right the rest of is not uh adopt as quick as possible.
340:33 of is not uh adopt as quick as possible. Whenever we have folks join a company,
340:35 Whenever we have folks join a company, they learn how to do things in new way.
340:36 they learn how to do things in new way. When they go back to their team, they
340:38 When they go back to their team, they were like, "Hey, why don't we do that?"
340:39 were like, "Hey, why don't we do that?" Right? They're going to challenge the
340:41 Right? They're going to challenge the some of the senior folks as well to say,
340:43 some of the senior folks as well to say, "Hey, there's a new way to do this type
340:44 "Hey, there's a new way to do this type of things. Why don't we do that?" So, we
340:45 of things. Why don't we do that?" So, we actually find this program extremely
340:47 actually find this program extremely effective uh to be a change agent for
340:50 effective uh to be a change agent for anything I want to push out.
340:53 anything I want to push out. And then bunch of results. There's a lot
340:55 And then bunch of results. There's a lot more familiarity and comfortable with
340:57 more familiarity and comfortable with the tooling. Um and also the important
341:00 the tooling. Um and also the important part is there's lot more nuance insights
341:03 part is there's lot more nuance insights of where it's at value right
341:07 of where it's at value right the second one is um often times we run
341:10 the second one is um often times we run organization to push uh new initiatives
341:14 organization to push uh new initiatives so within bloomer we have something
341:16 so within bloomer we have something called um a champ program and a guild
341:19 called um a champ program and a guild program that's basically a cross
341:20 program that's basically a cross organization or tech communities where
341:23 organization or tech communities where people have similar interest and similar
341:24 people have similar interest and similar passion they get together and get stuff
341:26 passion they get together and get stuff done so Um we had this for more than 10
341:30 done so Um we had this for more than 10 years now. Uh we sort of bootstrapped
341:34 years now. Uh we sort of bootstrapped engineer AI productivity community two
341:36 engineer AI productivity community two years back leveraged the community we
341:38 years back leveraged the community we have already and then have some few
341:41 have already and then have some few results uh because we have this pretty
341:44 results uh because we have this pretty much everyone passionate about this and
341:46 much everyone passionate about this and will be in that community. So
341:48 will be in that community. So organically it dduplicates efforts and
341:52 organically it dduplicates efforts and there's shared learning uh shared
341:53 there's shared learning uh shared learning happening
341:55 learning happening and it also helps to boost inner source
341:58 and it also helps to boost inner source contributions and then visit engineer
341:59 contributions and then visit engineer idea right often times team A wants to
342:01 idea right often times team A wants to do something team B let's say a platform
342:03 do something team B let's say a platform team have different prioritization and
342:07 team have different prioritization and the way we solve this is via inner
342:09 the way we solve this is via inner source or via visit engineer we just
342:11 source or via visit engineer we just move someone over the team work for six
342:12 move someone over the team work for six months a year get it done and then we
342:14 months a year get it done and then we can move on Um the last one is
342:18 can move on Um the last one is interesting. So our data shows
342:20 interesting. So our data shows individual contributors have a much
342:22 individual contributors have a much better stronger adoption than our
342:24 better stronger adoption than our leadership team. Now if you think about
342:27 leadership team. Now if you think about this a lot of software TLS and managers
342:32 this a lot of software TLS and managers in the age of AI they kind of don't
342:35 in the age of AI they kind of don't really have
342:38 really have um enough experience to truly guide
342:40 um enough experience to truly guide their teams to build software right so
342:44 their teams to build software right so often times the stuff that they learned
342:45 often times the stuff that they learned before might not be exactly applicable
342:48 before might not be exactly applicable it's still very valuable but there's
342:49 it's still very valuable but there's some missing piece there to make sure
342:51 some missing piece there to make sure they can continue to guide the team to
342:52 they can continue to guide the team to do the right thing. So, we're rolling
342:54 do the right thing. So, we're rolling out leadership workshops to make sure
342:56 out leadership workshops to make sure our leaders are equipped with whatever
342:58 our leaders are equipped with whatever knowledge they need to have to drive the
342:59 knowledge they need to have to drive the techn um innovation.
343:03 techn um innovation. So, um I'm going to close my part and to
343:08 So, um I'm going to close my part and to share with you what uh the part I'm I
343:10 share with you what uh the part I'm I feel most excited about. The part I feel
343:13 feel most excited about. The part I feel most excite most excited about is that
343:16 most excite most excited about is that with a lot of um creativity and
343:19 with a lot of um creativity and innovation in the geni space, it
343:22 innovation in the geni space, it actually changes the cost function of
343:25 actually changes the cost function of software engineering.
343:27 software engineering. Meaning
343:29 Meaning the trade-off decision of whether we do
343:31 the trade-off decision of whether we do something versus we don't do something
343:33 something versus we don't do something actually changed because some of the
343:35 actually changed because some of the work become a lot cheaper to do and some
343:37 work become a lot cheaper to do and some work become a lot more expensive to do.
343:40 work become a lot more expensive to do. I tend to think it is a great
343:42 I tend to think it is a great opportunity for engineers and
343:45 opportunity for engineers and engineering leaders to get back to some
343:48 engineering leaders to get back to some of the uh basic principles and sort of
343:52 of the uh basic principles and sort of ask a soul searching question. What is a
343:55 ask a soul searching question. What is a high quality soft engineering and how
343:56 high quality soft engineering and how can we use a tool for that purpose? So
343:58 can we use a tool for that purpose? So that's it. Thank you very much.
344:01 that's it. Thank you very much. [applause]
344:03 [applause] [music]
344:04 [music] Our
344:12 next speaker helped to reimagine a beloved browser from Arcadia by
344:14 beloved browser from Arcadia by rebuilding it around AI native
344:16 rebuilding it around AI native experiences.
344:18 experiences. Please welcome to the stage head of AI
344:21 Please welcome to the stage head of AI engineering at the browser company Samir
344:24 engineering at the browser company Samir Motti. [music]
344:36 Hey everyone. Oh wow. How's it going? Um, my name is Samir and I'm the head of
344:39 Um, my name is Samir and I'm the head of AI engineering at the browser company of
344:42 AI engineering at the browser company of New York. And today I'm going to talk a
344:44 New York. And today I'm going to talk a little bit about how we transitioned
344:46 little bit about how we transitioned from building ARC to DIA and the lessons
344:49 from building ARC to DIA and the lessons we learned in building an AI browser.
344:53 we learned in building an AI browser. But first, a little about the browser
344:54 But first, a little about the browser company.
344:56 company. So we started with a mission to rethink
344:59 So we started with a mission to rethink how people use the internet. At its
345:01 how people use the internet. At its core, we believe that the browser is one
345:04 core, we believe that the browser is one of the most important pieces of software
345:06 of the most important pieces of software in your life and it wasn't getting the
345:08 in your life and it wasn't getting the attention it deserved.
345:11 attention it deserved. Simply put, the way we've used a browser
345:13 Simply put, the way we've used a browser has changed over the last couple
345:15 has changed over the last couple decades, but the browser itself hadn't.
345:17 decades, but the browser itself hadn't. And think about this, we we started this
345:19 And think about this, we we started this company in 2019. Um, and so this is a
345:23 company in 2019. Um, and so this is a screen cap of Josh, our CEO, sharing a
345:26 screen cap of Josh, our CEO, sharing a little bit about our idea on the
345:28 little bit about our idea on the internet a few years ago, which we
345:30 internet a few years ago, which we endearingly called the internet
345:31 endearingly called the internet computer. So our mission has been to
345:35 computer. So our mission has been to build a browser that reflects how people
345:38 build a browser that reflects how people use the internet today and how we think
345:40 use the internet today and how we think the browser should be used tomorrow.
345:44 the browser should be used tomorrow. So through years of discovery, trial and
345:49 So through years of discovery, trial and error, and some ups and some downs, we
345:52 error, and some ups and some downs, we shipped our first browser, Arc, in 2022.
345:56 shipped our first browser, Arc, in 2022. It was a browser we felt was an
345:58 It was a browser we felt was an improvement over the browsers of that
346:00 improvement over the browsers of that time. It made the internet more
346:02 time. It made the internet more personal, more organized, and to us a
346:05 personal, more organized, and to us a little more delightful with a little
346:07 little more delightful with a little more craft.
346:08 more craft. And it was a browser that was loved by
346:10 And it was a browser that was loved by many. It still is by millions. many of
346:13 many. It still is by millions. many of whom are probably in this audience
346:14 whom are probably in this audience today. I've gotten a lot of questions
346:16 today. I've gotten a lot of questions about ARC today. Um, and it's great, but
346:22 about ARC today. Um, and it's great, but um, if we took a step back, we felt that
346:24 um, if we took a step back, we felt that ARC was still just an incremental
346:26 ARC was still just an incremental improvement over the browsers of that
346:28 improvement over the browsers of that time. And it didn't really hit the
346:30 time. And it didn't really hit the vision that we set out to create. And
346:33 vision that we set out to create. And so, uh, we kept building. And then in
346:37 so, uh, we kept building. And then in 2022, we got access to LLMs like the GPT
346:40 2022, we got access to LLMs like the GPT models. And so we started like we always
346:43 models. And so we started like we always do with prototyping. We started trying
346:46 do with prototyping. We started trying new ideas um and eventually shipped a
346:49 new ideas um and eventually shipped a few of them in ARC. But what started as
346:52 few of them in ARC. But what started as a you know a basic exploration turned
346:54 a you know a basic exploration turned into a fully formed thesis. In the
346:56 into a fully formed thesis. In the beginning of 2024, uh, our company put
346:59 beginning of 2024, uh, our company put out what we called act two, a video on
347:02 out what we called act two, a video on YouTube where we shared that thesis that
347:05 YouTube where we shared that thesis that we believe that AI is going to transform
347:08 we believe that AI is going to transform how people use the internet and in turn
347:11 how people use the internet and in turn fundamentally change the browser itself.
347:13 fundamentally change the browser itself. And so with that, we started building
347:16 And so with that, we started building again, but this time we built a new
347:18 again, but this time we built a new browser with AI speed and security in
347:22 browser with AI speed and security in mind and from the ground up. And later
347:24 mind and from the ground up. And later or sorry earlier this year we shipped
347:26 or sorry earlier this year we shipped DIA our AI native browser.
347:30 DIA our AI native browser. It allows you to have an assistant
347:32 It allows you to have an assistant alongside you in all the work you do in
347:33 alongside you in all the work you do in the browser. It gets to know you,
347:36 the browser. It gets to know you, personalizes, helps you get work done
347:38 personalizes, helps you get work done with your tabs and effectively get more
347:41 with your tabs and effectively get more work done through the apps you use. And
347:45 work done through the apps you use. And while it hasn't achieved our vision yet,
347:47 while it hasn't achieved our vision yet, we fully believe it's well on the way
347:49 we fully believe it's well on the way too.
347:57 So it is not easy to build a product. You all know that. Let alone two. The
347:59 You all know that. Let alone two. The latter of which an AI native one. We've
348:02 latter of which an AI native one. We've had a lot of years of iteration, trial
348:04 had a lot of years of iteration, trial and error. And through that we've
348:06 and error. And through that we've learned a lot. And I'm going to just
348:08 learned a lot. And I'm going to just talk about a few of those things uh here
348:11 talk about a few of those things uh here today.
348:13 today. The first I want to talk about is
348:15 The first I want to talk about is optimizing your tools and process for
348:16 optimizing your tools and process for faster iteration. From the beginning,
348:19 faster iteration. From the beginning, browser company has believed that we're
348:21 browser company has believed that we're not going to win unless we build the
348:23 not going to win unless we build the tools, the process, the platform, and
348:26 tools, the process, the platform, and the mindset to iterate, build, ship, and
348:29 the mindset to iterate, build, ship, and learn faster than everyone else. And
348:31 learn faster than everyone else. And that of course holds true today. But the
348:33 that of course holds true today. But the form it takes with AI and an AI native
348:36 form it takes with AI and an AI native product has changed.
348:38 product has changed. So even as a small company, where are we
348:41 So even as a small company, where are we investing in tooling these days? First
348:44 investing in tooling these days? First is prototyping for AI product features.
348:46 is prototyping for AI product features. Second is building and running evals.
348:49 Second is building and running evals. Third is collecting data for training
348:51 Third is collecting data for training and for evals. And uh last but
348:54 and for evals. And uh last but definitely not least automation for hill
348:56 definitely not least automation for hill climbing.
348:58 climbing. So let's start with tools. Initially uh
349:01 So let's start with tools. Initially uh as we always do we built some tools. The
349:03 as we always do we built some tools. The first was a very rudimentary uh prompt
349:06 first was a very rudimentary uh prompt editor and it was only in dev builds.
349:08 editor and it was only in dev builds. What did what did this mean for us? Well
349:10 What did what did this mean for us? Well it meant a few things. One limited
349:12 it meant a few things. One limited access as only engineers were able to
349:14 access as only engineers were able to access this. two slow iteration speeds
349:18 access this. two slow iteration speeds and three none of your personal context
349:20 and three none of your personal context and as you all know with an AI product
349:22 and as you all know with an AI product the context is what matters and what's
349:24 the context is what matters and what's gives you the feel of whether product is
349:25 gives you the feel of whether product is good or not. So we evolved and since
349:29 good or not. So we evolved and since then we built all of our tools into our
349:31 then we built all of our tools into our product. the product that we as a
349:33 product. the product that we as a company internally use every day and
349:35 company internally use every day and that includes the prompts, the tools,
349:37 that includes the prompts, the tools, the context, the models, every
349:39 the context, the models, every parameter. Um, which has not only
349:41 parameter. Um, which has not only allowed us to 10x our speed of ideating,
349:44 allowed us to 10x our speed of ideating, iterating and refining our products, but
349:46 iterating and refining our products, but has also widened the number of people
349:48 has also widened the number of people who can access and iterate on our
349:49 who can access and iterate on our products themselves from our CEO to our
349:52 products themselves from our CEO to our newest hire can ideate and create a new
349:54 newest hire can ideate and create a new product in DIA and also refine an
349:56 product in DIA and also refine an existing one all with their full
349:58 existing one all with their full context.
350:00 context. And this holds true with all of our
350:02 And this holds true with all of our major product protocols. We have tools
350:04 major product protocols. We have tools for optimizing our memory knowledge
350:05 for optimizing our memory knowledge graph which all of us use and we have
350:08 graph which all of us use and we have tools for creating iterating on our
350:10 tools for creating iterating on our computer use mechanism. We actually
350:12 computer use mechanism. We actually tried tens of different types of
350:14 tried tens of different types of computer use strategies before landing
350:16 computer use strategies before landing on one before even building it into the
350:18 on one before even building it into the product itself.
350:21 product itself. And I'll say and I'll end this part with
350:24 And I'll say and I'll end this part with uh it actually is a lot of fun. People
350:26 uh it actually is a lot of fun. People don't talk about that a lot but uh
350:28 don't talk about that a lot but uh actually building these tools into our
350:30 actually building these tools into our product has enabled so much creativity.
350:32 product has enabled so much creativity. It has enabled our PMs, our designers,
350:35 It has enabled our PMs, our designers, uh customer service and strategy and ops
350:37 uh customer service and strategy and ops to try out new ideas that are tailored
350:39 to try out new ideas that are tailored to their use cases. And that ultimately
350:42 to their use cases. And that ultimately is what we're trying to do.
350:44 is what we're trying to do. The next thing I want to talk about is
350:46 The next thing I want to talk about is how we evolve and optimize our prompts
350:49 how we evolve and optimize our prompts through a mechanism called Jeepa. This
350:51 through a mechanism called Jeepa. This for us is very nent but an important
350:54 for us is very nent but an important learning nevertheless.
350:56 learning nevertheless. How we hill climb and refine our AI
350:58 How we hill climb and refine our AI products is just as important as
351:00 products is just as important as ideating them in the first place. So
351:02 ideating them in the first place. So we're investing in mechanisms to help
351:04 we're investing in mechanisms to help with this to enable faster hill climbing
351:06 with this to enable faster hill climbing and one of those being Jeepa and this is
351:08 and one of those being Jeepa and this is based on a paper from earlier this year
351:10 based on a paper from earlier this year from a few smart folks.
351:13 from a few smart folks. So the key motivation here is simple.
351:15 So the key motivation here is simple. It's a sample efficient way to improve a
351:16 It's a sample efficient way to improve a complex LLM system without having to
351:19 complex LLM system without having to leverage RL or other fine-tuning
351:20 leverage RL or other fine-tuning techniques. And for us as a small
351:23 techniques. And for us as a small company that's hugely critical.
351:25 company that's hugely critical. And how it works is you're able to seed
351:27 And how it works is you're able to seed the system with a set of prompts, then
351:29 the system with a set of prompts, then execute it across a set of tasks and
351:31 execute it across a set of tasks and score them. Then leverage a mechanism
351:34 score them. Then leverage a mechanism called PA selection to select the best
351:36 called PA selection to select the best ones. And then leverage an LLM on top of
351:38 ones. And then leverage an LLM on top of that to reflect on what went well and
351:40 that to reflect on what went well and what didn't and then generate new
351:42 what didn't and then generate new prompts and then repeat with the key
351:44 prompts and then repeat with the key innovations here being around that
351:46 innovations here being around that reflective prompt mutation technique.
351:48 reflective prompt mutation technique. the selection process which allows you
351:50 the selection process which allows you to explore more of the space of
351:51 to explore more of the space of prompting rather than one avenue and the
351:54 prompting rather than one avenue and the ability to tune text and not weights.
351:58 ability to tune text and not weights. And here's a modest uh example of this
352:01 And here's a modest uh example of this at work for us. You know, you can
352:03 at work for us. You know, you can provide it a very simple uh a simple
352:06 provide it a very simple uh a simple simple prompt and run it through JPA and
352:08 simple prompt and run it through JPA and it's able to optimize it uh along the
352:10 it's able to optimize it uh along the metrics and scoring mechanisms that we
352:13 metrics and scoring mechanisms that we uh created to refine that prompt.
352:20 And so if I take a step back and talk about kind of how we build uh for
352:23 about kind of how we build uh for certain types of features, I would buck
352:25 certain types of features, I would buck it into a couple different phases. The
352:27 it into a couple different phases. The first is that prototyping and ideation
352:29 first is that prototyping and ideation phase where we have widened the breadth
352:31 phase where we have widened the breadth of number of ideas at the top of the
352:33 of number of ideas at the top of the funnel um and lower the threshold on who
352:35 funnel um and lower the threshold on who can build them and how. And so we try
352:38 can build them and how. And so we try out a bunch of ideas every week, every
352:39 out a bunch of ideas every week, every day from all types of people and we dog
352:41 day from all types of people and we dog food those. And if we feel like there's
352:43 food those. And if we feel like there's actually real utility there, it's
352:45 actually real utility there, it's solving a real problem for us and there
352:48 solving a real problem for us and there is a path towards actually hitting the
352:50 is a path towards actually hitting the quality threshold that we believe we
352:51 quality threshold that we believe we need to hit, then we'll move on to this
352:53 need to hit, then we'll move on to this next phase where we collect and refine
352:55 next phase where we collect and refine eval to clarify product requirements and
352:58 eval to clarify product requirements and then hill climb through code through
353:00 then hill climb through code through prompting and automated techniques like
353:02 prompting and automated techniques like Jeepa and then dog food as we always do
353:04 Jeepa and then dog food as we always do internally and then chip.
353:08 internally and then chip. And I do want to kind of double down on
353:10 And I do want to kind of double down on these phases. The ideation phase is
353:13 these phases. The ideation phase is extremely important just as much as that
353:15 extremely important just as much as that refinement phase.
353:21 And our goal is to enable faster ideation and a more efficient path to
353:23 ideation and a more efficient path to shipping because with all these AI
353:25 shipping because with all these AI advancements every week, new
353:27 advancements every week, new possibilities are unlocked in DIA. And
353:29 possibilities are unlocked in DIA. And it's up to us as a browser, as a product
353:32 it's up to us as a browser, as a product to get as many at bats with these new
353:34 to get as many at bats with these new ideas and try out as many of them and
353:36 ideas and try out as many of them and explore as many of them as possible. At
353:38 explore as many of them as possible. At the same time though not underestimating
353:40 the same time though not underestimating the path it takes to ship some of these
353:42 the path it takes to ship some of these ideas to productions as a high quality
353:44 ideas to productions as a high quality experience.
353:50 Next uh I want to talk about treating model behavior as a craft and
353:52 model behavior as a craft and discipline.
353:53 discipline. So what is model behavior to us? It's
353:56 So what is model behavior to us? It's the function that defines evaluates and
353:58 the function that defines evaluates and ships the desired behavior models. It's
354:01 ships the desired behavior models. It's turning principles into product
354:02 turning principles into product requirements, prompts and evals, and
354:05 requirements, prompts and evals, and ultimately shaping the behavior and the
354:06 ultimately shaping the behavior and the personality of our LLM products and
354:09 personality of our LLM products and ultimately for us our DIA assistant.
354:13 ultimately for us our DIA assistant. So, I'd buck it into a few different
354:14 So, I'd buck it into a few different areas. First, it's that behavior design
354:16 areas. First, it's that behavior design defining the product experience we
354:18 defining the product experience we actually want, the style, the tone, the
354:20 actually want, the style, the tone, the shape of responses in some cases. Then,
354:23 shape of responses in some cases. Then, it's collecting that data for
354:24 it's collecting that data for measurement and training, clarifying
354:26 measurement and training, clarifying those product requirements through eval.
354:29 those product requirements through eval. And last but not least, it's the model
354:30 And last but not least, it's the model steering. It's the building of the
354:32 steering. It's the building of the product itself. It's the prompting. It's
354:34 product itself. It's the prompting. It's the model selection. It's defining the
354:35 the model selection. It's defining the what's in the context window, the
354:37 what's in the context window, the parameters, etc. Um, and so much more.
354:41 parameters, etc. Um, and so much more. And to us, that that process is
354:44 And to us, that that process is iterative, very iterative. We build,
354:47 iterative, very iterative. We build, refine, we create evals, and then we
354:49 refine, we create evals, and then we ship, and then we collect more feedback
354:52 ship, and then we collect more feedback and feed that into our iterative
354:54 and feed that into our iterative building process. That could be internal
354:56 building process. That could be internal feedback, and that could be also uh
354:57 feedback, and that could be also uh external feedback.
354:59 external feedback. And so I move on for a second. One
355:02 And so I move on for a second. One analogy we've thought about uh is for
355:04 analogy we've thought about uh is for model behaviors that to product design
355:07 model behaviors that to product design through the evolution of the internet.
355:09 through the evolution of the internet. At first websites were functional. They
355:11 At first websites were functional. They got the job done. But over time that
355:14 got the job done. But over time that evolved as we tried to achieve more on
355:16 evolved as we tried to achieve more on the internet and technology advanced. Uh
355:19 the internet and technology advanced. Uh product design and the craft of the
355:21 product design and the craft of the internet itself grew as well as well as
355:23 internet itself grew as well as well as the complexity.
355:25 the complexity. And so what might that be for model
355:27 And so what might that be for model behavior? Well, at first it was
355:29 behavior? Well, at first it was functional. We had prompts, we had
355:31 functional. We had prompts, we had evals, we had instructions in and output
355:33 evals, we had instructions in and output out. Now we frame it through agent
355:35 out. Now we frame it through agent behaviors. It's goal- directed
355:37 behaviors. It's goal- directed reasoning, the shaping of autonomous
355:39 reasoning, the shaping of autonomous tasks, selfcorrection in learning, and
355:42 tasks, selfcorrection in learning, and even shaping the personality of the LLM
355:44 even shaping the personality of the LLM models themselves.
355:46 models themselves. And so, what might the future hold? I'm
355:48 And so, what might the future hold? I'm excited to see. But what we believe is
355:51 excited to see. But what we believe is that we are in the early days of
355:52 that we are in the early days of building AI products and model behavior
355:55 building AI products and model behavior will continue to evolve and into a
355:57 will continue to evolve and into a specialized and prevalent function of
355:59 specialized and prevalent function of its own even at product companies.
356:02 its own even at product companies. And the last thing I'll leave you with
356:03 And the last thing I'll leave you with here is that the best people for it
356:05 here is that the best people for it might just surprise you. One of my
356:08 might just surprise you. One of my favorite stories about building DIA
356:10 favorite stories about building DIA these last couple years has been uh the
356:12 these last couple years has been uh the formation of actually this model
356:14 formation of actually this model behavior team. As I mentioned earlier,
356:16 behavior team. As I mentioned earlier, uh engineers were writing the prompts at
356:17 uh engineers were writing the prompts at first and then we built these prompt
356:19 first and then we built these prompt tools to enable more people at the
356:20 tools to enable more people at the company to actually prompt and iterate.
356:23 company to actually prompt and iterate. And there was a person on our team on
356:24 And there was a person on our team on the strategy and ops team. And he
356:26 the strategy and ops team. And he actually leveraged these prompt tools
356:28 actually leveraged these prompt tools one weekend to rewrite all our prompts.
356:30 one weekend to rewrite all our prompts. And he came in on a Monday morning and
356:32 And he came in on a Monday morning and dropped a Loom video sharing what he
356:35 dropped a Loom video sharing what he did, how he did it, and why. And a set
356:37 did, how he did it, and why. And a set of prompts. And those prompts alone
356:39 of prompts. And those prompts alone unlocked a new level of capability and
356:42 unlocked a new level of capability and quality and experience in our product.
356:44 quality and experience in our product. And consequentially uh it was the
356:46 And consequentially uh it was the formation of our model behavior team.
356:49 formation of our model behavior team. And so one thing I'd emphasize to you
356:51 And so one thing I'd emphasize to you all is to think about who are those
356:53 all is to think about who are those people at the company agnostic of their
356:55 people at the company agnostic of their role who can help shape your product and
356:57 role who can help shape your product and help shape and steer the model itself.
356:59 help shape and steer the model itself. It might not be an engineer or it might
357:01 It might not be an engineer or it might be it could also be someone on the
357:02 be it could also be someone on the strategy and ops team.
357:06 strategy and ops team. Next, I want to talk about AI security
357:08 Next, I want to talk about AI security as an emergent property of product
357:10 as an emergent property of product building. And today, I'm going to focus
357:11 building. And today, I'm going to focus specifically on prompt injections.
357:14 specifically on prompt injections. So, what is a prompt injection? Well,
357:17 So, what is a prompt injection? Well, it's a prompt attack in which a third
357:19 it's a prompt attack in which a third party can override the instructions of
357:20 party can override the instructions of an LLM to cause harm. That might be data
357:23 an LLM to cause harm. That might be data exfiltration, the execution of malicious
357:25 exfiltration, the execution of malicious commands, or ignoring safety rules.
357:30 commands, or ignoring safety rules. And so here's an example in which you
357:32 And so here's an example in which you give uh the context of a website to an
357:35 give uh the context of a website to an LLM and instruct it to summarize it.
357:38 LLM and instruct it to summarize it. Little did you know that there was a
357:39 Little did you know that there was a prompt injection hidden in that
357:40 prompt injection hidden in that website's uh HTML. So instead of
357:44 website's uh HTML. So instead of actually summarizing the web page, the
357:45 actually summarizing the web page, the LM actually gets directed to open a new
357:48 LM actually gets directed to open a new website, extracting your personal
357:49 website, extracting your personal information and embedding it as get
357:51 information and embedding it as get parameters in the website's URL,
357:53 parameters in the website's URL, effectively excfiltrating that data.
357:56 effectively excfiltrating that data. So, as a browser, prompt injections are
357:59 So, as a browser, prompt injections are extremely crucial for us to prevent.
358:02 extremely crucial for us to prevent. They're critical to prevent
358:04 They're critical to prevent because browsers sit at the middle of
358:07 because browsers sit at the middle of what we can call a lethal trifecta.
358:10 what we can call a lethal trifecta. It has access to your private data. It
358:12 It has access to your private data. It has exposure to untrusted content and it
358:15 has exposure to untrusted content and it has the ability to externally
358:17 has the ability to externally communicate. And for us, that means
358:19 communicate. And for us, that means opening websites, sending emails,
358:21 opening websites, sending emails, scheduling events, etc. So, how to
358:24 scheduling events, etc. So, how to prevent this? Well, there's some
358:27 prevent this? Well, there's some technical strategies we can try. First
358:29 technical strategies we can try. First is wrapping that untrusted context in
358:31 is wrapping that untrusted context in tax. You can tell the LM, listen to
358:33 tax. You can tell the LM, listen to these instructions around these tags and
358:35 these instructions around these tags and don't listen to the content around these
358:36 don't listen to the content around these tags. But this is easily escapable and
358:40 tags. But this is easily escapable and quite trivy, an attacker could still uh
358:43 quite trivy, an attacker could still uh leverage a prompt injection on your
358:45 leverage a prompt injection on your browser.
358:46 browser. Well, another solution we could try is
358:48 Well, another solution we could try is separating that data and that
358:50 separating that data and that instructions. We can assign uh the
358:54 instructions. We can assign uh the operating instructions to a system role
358:56 operating instructions to a system role and we can assign a user role for the
358:58 and we can assign a user role for the content of the third party and even
359:00 content of the third party and even layer on randomly generated tags to wrap
359:02 layer on randomly generated tags to wrap that user content to be extra sure that
359:05 that user content to be extra sure that the LM listens to the instructions and
359:07 the LM listens to the instructions and not the content. And while this can
359:09 not the content. And while this can help, there are no guarantees and prompt
359:12 help, there are no guarantees and prompt injections will still happen.
359:16 injections will still happen. So what do we do? Well, it's on us to
359:19 So what do we do? Well, it's on us to design a product with that in mind. We
359:22 design a product with that in mind. We have to blend technology approaches and
359:24 have to blend technology approaches and user experience and design into a
359:26 user experience and design into a cohesive story that actually builds them
359:29 cohesive story that actually builds them from the ground up and solves it
359:30 from the ground up and solves it together.
359:32 together. So, what that might what that excuse me
359:34 So, what that might what that excuse me what might that be for a feature in DIA?
359:37 what might that be for a feature in DIA? Well, let's take the autofill tool in
359:39 Well, let's take the autofill tool in DIA. The autofill tool allows you to
359:41 DIA. The autofill tool allows you to leverage an LLM with context, memory,
359:44 leverage an LLM with context, memory, and your details to fill forms on the
359:46 and your details to fill forms on the internet. It's extremely powerful, but
359:49 internet. It's extremely powerful, but as you can imagine, it has some
359:51 as you can imagine, it has some vulnerabilities. A prompt injection here
359:53 vulnerabilities. A prompt injection here could extract your data and put it on a
359:56 could extract your data and put it on a form, and once it's on that form, it's
359:58 form, and once it's on that form, it's out of your hands.
359:59 out of your hands. So, we try to build with that in mind.
360:02 So, we try to build with that in mind. In this case, before the form is written
360:04 In this case, before the form is written to, we actually let the user read and
360:06 to, we actually let the user read and confirm that data in plain text. This
360:09 confirm that data in plain text. This doesn't prevent a prompt injection, but
360:11 doesn't prevent a prompt injection, but it gives the user control, awareness,
360:13 it gives the user control, awareness, and trust in what is happening. And this
360:16 and trust in what is happening. And this is a framing we carry throughout our
360:18 is a framing we carry throughout our product and how we build every single
360:20 product and how we build every single feature. So here are some examples.
360:22 feature. So here are some examples. Scheduling events in DIA, we have a
360:24 Scheduling events in DIA, we have a similar confirmation step. Writing
360:27 similar confirmation step. Writing emails India, we also have a similar
360:30 emails India, we also have a similar confirmation step.
360:35 So I've talked about three different things here today. First is optimizing
360:37 things here today. First is optimizing your tools and process for fast
360:39 your tools and process for fast iteration. Second, treating model
360:41 iteration. Second, treating model behavior as a craft and discipline. And
360:44 behavior as a craft and discipline. And third, AI security as an emergent
360:46 third, AI security as an emergent property of building products.
360:50 property of building products. But uh the last thing I want to leave
360:52 But uh the last thing I want to leave you with, when we started on this
360:54 you with, when we started on this journey to building DIA, we recognized a
360:56 journey to building DIA, we recognized a technology shift and we sought to evolve
360:59 technology shift and we sought to evolve our product of ARC. We initially came at
361:01 our product of ARC. We initially came at it from a hey, how can we leverage AI to
361:04 it from a hey, how can we leverage AI to make ARC better, make the browser
361:06 make ARC better, make the browser better. But what we quickly learned and
361:09 better. But what we quickly learned and adapted to was that it wasn't just a
361:11 adapted to was that it wasn't just a product evolution. It was a company one
361:14 product evolution. It was a company one and today I shared a glimpse of that.
361:16 and today I shared a glimpse of that. How we build and how it's changed a team
361:18 How we build and how it's changed a team we've literally created around this and
361:21 we've literally created around this and how we think about security for AI
361:22 how we think about security for AI products. But really it's so much more.
361:24 products. But really it's so much more. It goes beyond that. It's how we train
361:26 It goes beyond that. It's how we train everyone here. It's how we hire. It's
361:28 everyone here. It's how we hire. It's how we communicate. It's how we
361:29 how we communicate. It's how we collaborate and so much more. And if
361:32 collaborate and so much more. And if there's one thing I'll leave you all
361:33 there's one thing I'll leave you all with, if there's one thing we've learned
361:35 with, if there's one thing we've learned over the last couple years, it's that
361:37 over the last couple years, it's that when when you recognize that technology
361:39 when when you recognize that technology shift, you have to embrace it. And you
361:41 shift, you have to embrace it. And you have to embrace it with conviction.
361:44 have to embrace it with conviction. Thank you.
361:46 Thank you. [applause]
361:56 Our next speaker [music] draws on over 20 years in enterprise
361:59 draws on over 20 years in enterprise developer experience to ask what will
362:02 developer experience to ask what will still matter when AI coding agents are
362:05 still matter when AI coding agents are everywhere. Please welcome to the stage
362:08 everywhere. Please welcome to the stage executive distinguished engineer at
362:10 executive distinguished engineer at Capital 1, Max Canet Alexander.
362:15 Capital 1, Max Canet Alexander. [music]
362:16 [music] [applause]
362:25 Hey, how's everybody doing? Still awake? Okay, great. So
362:28 Okay, great. So like the robot voice said, I have been
362:29 like the robot voice said, I have been doing developer experience for a very
362:31 doing developer experience for a very long time and I have never in my life
362:34 long time and I have never in my life seen anything like the last 12 months.
362:38 seen anything like the last 12 months. The you know about every two to three
362:39 The you know about every two to three weeks software engineers been making
362:41 weeks software engineers been making this face on the screen.
362:44 this face on the screen. Okay. And if you work in developer
362:46 Okay. And if you work in developer experience the problem is even worse.
362:49 experience the problem is even worse. You're like this guy on the screen every
362:51 You're like this guy on the screen every few weeks. You're like, "Oh yeah, yeah,
362:53 few weeks. You're like, "Oh yeah, yeah, yeah, yeah, yeah. Here's the new
362:55 yeah, yeah, yeah. Here's the new hotness." And then somebody else comes
362:56 hotness." And then somebody else comes up and they're like, "Well, can I use
362:58 up and they're like, "Well, can I use the the new new hotness?" And you know,
362:59 the the new new hotness?" And you know, people have been doing that for years.
363:01 people have been doing that for years. I've been working in developer
363:02 I've been working in developer experience for a long time. Everybody
363:03 experience for a long time. Everybody always shows up and they're like, "Oh,
363:05 always shows up and they're like, "Oh, can I use this tool that came out
363:06 can I use this tool that came out yesterday?" And you're like, "No, of
363:08 yesterday?" And you're like, "No, of course not." And now we're like, "Uh,
363:10 course not." And now we're like, "Uh, maybe yes." Right? And what this leads
363:13 maybe yes." Right? And what this leads to overall is the future is super hard
363:16 to overall is the future is super hard to predict right now. So
363:20 to predict right now. So a I think a lot of people a lot of CTO's
363:22 a I think a lot of people a lot of CTO's a lot of people who work in developer
363:24 a lot of people who work in developer experience to people who care about
363:25 experience to people who care about helping developers are asking themselves
363:26 helping developers are asking themselves this question
363:28 this question are all of my investments going to go to
363:30 are all of my investments going to go to waste like what could I invest in now
363:34 waste like what could I invest in now that if I look back at the end of 2026
363:36 that if I look back at the end of 2026 I'll be like I sure am glad that I
363:38 I'll be like I sure am glad that I invested in that for my developers and I
363:41 invested in that for my developers and I think a lot of people have just decided
363:42 think a lot of people have just decided well I don't know I guess it's just
363:43 well I don't know I guess it's just coding agents and I guess they'll fix
363:45 coding agents and I guess they'll fix every single thing about my entire
363:46 every single thing about my entire company by themselves.
363:49 company by themselves. they're amazing.
364:02 The first one is how can we use our understanding of the principles of
364:03 understanding of the principles of developer experience to know what's
364:05 developer experience to know what's going to be valuable no matter what
364:07 going to be valuable no matter what happens. Okay. And what do we need to do
364:11 happens. Okay. And what do we need to do to get the maximum possible value from
364:14 to get the maximum possible value from AI agents? like what would we need to
364:17 AI agents? like what would we need to fix at all levels outside of the agents
364:20 fix at all levels outside of the agents in order to make sure that the agents
364:22 in order to make sure that the agents and our developers can be as effective
364:24 and our developers can be as effective as possible. And this isn't like a minor
364:27 as possible. And this isn't like a minor question. These are the sorts of things
364:28 question. These are the sorts of things that could make or break you as a
364:30 that could make or break you as a software business going into the future.
364:34 software business going into the future. So let's talk about what some of those
364:36 So let's talk about what some of those things are that I think are no regrets
364:38 things are that I think are no regrets investments that will help both our
364:39 investments that will help both our human beings and our agents. So the in
364:43 human beings and our agents. So the in general one of the framings that I think
364:45 general one of the framings that I think about here is things that are inputs to
364:47 about here is things that are inputs to the agents things around the agents that
364:49 the agents things around the agents that help them be more effective. And one of
364:51 help them be more effective. And one of the biggest one is the development
364:52 the biggest one is the development environment. What are the tools that you
364:55 environment. What are the tools that you use to build your code? What package
364:57 use to build your code? What package manager do you use? What llinters do you
364:59 manager do you use? What llinters do you run? Those sorts of things. You want to
365:02 run? Those sorts of things. You want to use the industry standard tools in the
365:05 use the industry standard tools in the same way the industry uses them and
365:08 same way the industry uses them and ideally in the same way the outside
365:10 ideally in the same way the outside world uses them because that's what's in
365:12 world uses them because that's what's in the training set. And look, yes, you can
365:14 the training set. And look, yes, you can write instruction files and you can try
365:16 write instruction files and you can try your best to try to fight the training
365:17 your best to try to fight the training set and make it do something unnatural
365:19 set and make it do something unnatural and unholy with some crazy amalgamation
365:22 and unholy with some crazy amalgamation that or modification that you've made of
365:24 that or modification that you've made of those developer tools. Like you might be
365:26 those developer tools. Like you might be you invented your own package manager.
365:28 you invented your own package manager. You probably should not do that. you
365:29 You probably should not do that. you probably should undo that and try to go
365:31 probably should undo that and try to go back to the way the outside world does
365:33 back to the way the outside world does software development because then you
365:34 software development because then you are not fighting the training set. Um,
365:38 are not fighting the training set. Um, and also it means it means things like
365:41 and also it means it means things like you can't use obscure programming
365:42 you can't use obscure programming languages anymore. Look, I'm a
365:43 languages anymore. Look, I'm a programming language nerd. I love those
365:46 programming language nerd. I love those things. I do not use them anymore in my
365:49 things. I do not use them anymore in my day-to-day agentic software development
365:51 day-to-day agentic software development work. as an enthusiast, I do come
365:52 work. as an enthusiast, I do come sometimes go and I code on, you know,
365:54 sometimes go and I code on, you know, frontline uh software engineering
365:56 frontline uh software engineering languages, but not in my like real work
365:59 languages, but not in my like real work anymore.
366:01 anymore. So, what people ask me sometimes, does
366:03 So, what people ask me sometimes, does that mean like we're never going to ever
366:04 that mean like we're never going to ever have any new tools again because we're
366:05 have any new tools again because we're always going to be dependent on the
366:06 always going to be dependent on the tools that the model already knows?
366:07 tools that the model already knows? Probably not because like I said,
366:09 Probably not because like I said, there's still going to be enthusiasts.
366:10 there's still going to be enthusiasts. And also, but like I would like to make
366:13 And also, but like I would like to make a point. The thing that I'm talking
366:14 a point. The thing that I'm talking about has always been a real problem.
366:17 about has always been a real problem. Like there's always some developer at
366:18 Like there's always some developer at the company has always come up to you
366:20 the company has always come up to you and be like, "Can I use this technology
366:22 and be like, "Can I use this technology that came out last week and has never
366:23 that came out last week and has never been vetted in an enterprise to run my
366:25 been vetted in an enterprise to run my like 100,000 queries per second service
366:27 like 100,000 queries per second service that serves a billion users?" And I'm
366:29 that serves a billion users?" And I'm like, "No, you can't do that now and you
366:32 like, "No, you can't do that now and you can't do that yesterday. It's still the
366:33 can't do that yesterday. It's still the same."
366:35 same." Uh, another one is
366:38 Uh, another one is in order to take action today, agents
366:40 in order to take action today, agents need either a CLI or an API to take that
366:42 need either a CLI or an API to take that action. Yes, there's computer use. Yes,
366:44 action. Yes, there's computer use. Yes, you can make them write playright and
366:45 you can make them write playright and orchestrate a browser. But why? Like if
366:49 orchestrate a browser. But why? Like if you could have a CLI that the agent can
366:51 you could have a CLI that the agent can just execute natively in its normal
366:53 just execute natively in its normal format that it understands the most
366:54 format that it understands the most natively, which is text interaction, why
366:57 natively, which is text interaction, why why would you choose to do something
366:59 why would you choose to do something else, especially in an area where
367:01 else, especially in an area where accuracy matters dramatically and where
367:03 accuracy matters dramatically and where that accuracy dramatically influences
367:05 that accuracy dramatically influences the effectiveness of the agent?
367:08 the effectiveness of the agent? One of the most important things that
367:10 One of the most important things that you can invest in is validation. So any
367:13 you can invest in is validation. So any kind of objective deterministic
367:14 kind of objective deterministic validation that you give an agent will
367:16 validation that you give an agent will increase its capabilities. So yes,
367:18 increase its capabilities. So yes, sometimes you can create this with the
367:20 sometimes you can create this with the agent. I'm going to talk about that in a
367:21 agent. I'm going to talk about that in a second. But it doesn't really matter how
367:23 second. But it doesn't really matter how you get it or where you get it from. You
367:24 you get it or where you get it from. You just need to think about how do I have
367:27 just need to think about how do I have high quality validation that produces
367:30 high quality validation that produces very clear error messages. This is the
367:33 very clear error messages. This is the same thing you always wanted by the way
367:35 same thing you always wanted by the way in your tests and your llinters, right?
367:37 in your tests and your llinters, right? But it's even more important for the
367:38 But it's even more important for the agents because the agents cannot divine
367:40 agents because the agents cannot divine what you mean by 500 internal error with
367:43 what you mean by 500 internal error with no other message, right? Like they need
367:47 no other message, right? Like they need a way to actually understand what the
367:49 a way to actually understand what the problem was and what they should do
367:50 problem was and what they should do about it.
367:52 about it. However, there is a problem here. So,
367:54 However, there is a problem here. So, you know, you think, okay, I'll just get
367:56 you know, you think, okay, I'll just get the agent to do it. They'll write my
367:57 the agent to do it. They'll write my tests and then I'll be fine. But have
367:59 tests and then I'll be fine. But have you ever asked an agent to write a test
368:01 you ever asked an agent to write a test on a completely untestable codebase?
368:04 on a completely untestable codebase? They do kind of what it's like is
368:06 They do kind of what it's like is happening on the screen here. They will
368:09 happening on the screen here. They will write a test that says, "Hey boss, I
368:11 write a test that says, "Hey boss, I pushed the button and the button pushed
368:13 pushed the button and the button pushed successfully. Test passed."
368:16 successfully. Test passed." Um, like so there is a sort of a a
368:20 Um, like so there is a sort of a a larger problem that a lot of enterprises
368:23 larger problem that a lot of enterprises have in particular, which is there's a
368:25 have in particular, which is there's a lot of legacy code bases that either
368:26 lot of legacy code bases that either were not designed with testing in mind
368:28 were not designed with testing in mind or were not designed with like high
368:30 or were not designed with like high quality testing in mind. like maybe they
368:32 quality testing in mind. like maybe they just have like some very high level
368:33 just have like some very high level endto-end tests and they don't have like
368:35 endto-end tests and they don't have like great unit tests that the agent can
368:36 great unit tests that the agent can actually run iteratively in a loop and
368:38 actually run iteratively in a loop and that will produce actionable and useful
368:39 that will produce actionable and useful errors.
368:41 errors. So another thing that you can invest in
368:44 So another thing that you can invest in that will can be perennially valuable
368:45 that will can be perennially valuable both to humans and to agents is
368:48 both to humans and to agents is structure of your systems and structure
368:50 structure of your systems and structure of your code bases. Agents work better
368:52 of your code bases. Agents work better on better structured code bases. And for
368:55 on better structured code bases. And for those of you who have never worked in a
368:57 those of you who have never worked in a large enterprise and seen very old
368:59 large enterprise and seen very old legacy codebases, you might not be
369:00 legacy codebases, you might not be familiar with what I'm talking about.
369:01 familiar with what I'm talking about. But for those who have, you know that
369:03 But for those who have, you know that there are code bases that no human being
369:05 there are code bases that no human being could reason about in any kind of
369:06 could reason about in any kind of successful way because the information
369:08 successful way because the information necessary to reason about that codebase
369:10 necessary to reason about that codebase isn't in the codebase and the structure
369:13 isn't in the codebase and the structure of the codebase makes the codebase
369:14 of the codebase makes the codebase impossible to reason about looking at
369:17 impossible to reason about looking at it. Yes, you the agents can do the same
369:20 it. Yes, you the agents can do the same thing human beings do in that case,
369:22 thing human beings do in that case, which is sort of go through an iterative
369:23 which is sort of go through an iterative process of trying to run the thing and
369:25 process of trying to run the thing and see what breaks, but that decreases the
369:28 see what breaks, but that decreases the capability of the agent so much compared
369:30 capability of the agent so much compared to just it having the ability to just
369:32 to just it having the ability to just look at the code and reason about it the
369:34 look at the code and reason about it the exact same way the human capability is
369:36 exact same way the human capability is decreased. And of course, like I said,
369:40 decreased. And of course, like I said, that all has to lead up to being
369:42 that all has to lead up to being testable. If the only thing I can do
369:44 testable. If the only thing I can do with your codebase is push a button and
369:46 with your codebase is push a button and know if the button pushed successfully
369:47 know if the button pushed successfully and not see the explosion behind it,
369:50 and not see the explosion behind it, like if if if there's no way to get that
369:52 like if if if there's no way to get that information out of the codebase from the
369:53 information out of the codebase from the test, then the agent's not going to be
369:55 test, then the agent's not going to be able to do that either unless it it goes
369:57 able to do that either unless it it goes and refactors it or you go and refactor
369:58 and refactors it or you go and refactor it first.
370:03 And you know, there's a lot of talk about documentation. There's always been
370:04 about documentation. There's always been a lot of talk about documentation in the
370:06 a lot of talk about documentation in the field of developer experience, in the
370:08 field of developer experience, in the field of improving things. And there's
370:10 field of improving things. And there's people go back and forth about it.
370:12 people go back and forth about it. Engineers hate writing documentation.
370:14 Engineers hate writing documentation. Uh, and the value of it is often debated
370:17 Uh, and the value of it is often debated like what kind of documentation you want
370:19 like what kind of documentation you want or don't want, do or don't want. But
370:21 or don't want, do or don't want. But here's the thing. The agent, let's just
370:24 here's the thing. The agent, let's just take this in the context of the agent.
370:25 take this in the context of the agent. The agent cannot read your mind. It did
370:28 The agent cannot read your mind. It did not attend your verbal meeting that had
370:31 not attend your verbal meeting that had no transcript.
370:33 no transcript. Okay? Now there are many companies in
370:36 Okay? Now there are many companies in the world that depend on that sort of
370:38 the world that depend on that sort of tribal knowledge to understand what the
370:41 tribal knowledge to understand what the requirements are for the system. Why the
370:44 requirements are for the system. Why the code is being written? What is the
370:46 code is being written? What is the specification that we're we're writing
370:48 specification that we're we're writing towards if things are not written down?
370:51 towards if things are not written down? And that sounds like blatantly obvious
370:55 And that sounds like blatantly obvious but like there are a lot of things that
370:57 but like there are a lot of things that are fundamentally written like if the
370:59 are fundamentally written like if the code is comprehensible like all the
371:00 code is comprehensible like all the other steps are in that we've gotten to
371:02 other steps are in that we've gotten to so far. you don't need to reexplain
371:04 so far. you don't need to reexplain what's in the code. So, there's actually
371:06 what's in the code. So, there's actually probably a whole class of documentation
371:07 probably a whole class of documentation that we may not need anymore where you
371:09 that we may not need anymore where you can just ask the agent like, "Hey, tell
371:11 can just ask the agent like, "Hey, tell me about the structure of this codebase
371:12 me about the structure of this codebase overall and it'll just do it, but it
371:14 overall and it'll just do it, but it won't be able to ever know why you wrote
371:17 won't be able to ever know why you wrote it unless that's written down
371:18 it unless that's written down somewhere." or things that happen
371:20 somewhere." or things that happen outside of the program like what is the
371:23 outside of the program like what is the shape of the data that comes in from
371:25 shape of the data that comes in from this URL parameter as an example like if
371:28 this URL parameter as an example like if you have already written the code
371:30 you have already written the code there's a validator and that does
371:32 there's a validator and that does explain it but if you haven't written
371:33 explain it but if you haven't written the code yet it doesn't know what comes
371:35 the code yet it doesn't know what comes in from the outside world so basically
371:36 in from the outside world so basically anything that can't be in the code or
371:38 anything that can't be in the code or isn't in the code needs to somehow be
371:40 isn't in the code needs to somehow be written somewhere that the agent can
371:42 written somewhere that the agent can access
371:47 now we've covered sort of a few technical aspects of things that we need
371:48 technical aspects of things that we need to improve improve. But there's a point
371:52 to improve improve. But there's a point about software development in general
371:54 about software development in general and it that's always been true and one
371:56 and it that's always been true and one of and that's you've heard this we spend
371:59 of and that's you've heard this we spend more time reading code than writing it.
372:03 more time reading code than writing it. The difference today is that writing
372:05 The difference today is that writing code has become reading code. So even
372:08 code has become reading code. So even now when we are writing code we spend
372:10 now when we are writing code we spend more time reading it than actually
372:13 more time reading it than actually typing things into the terminal.
372:16 typing things into the terminal. And what that means is
372:19 And what that means is every software engineer becomes a code
372:22 every software engineer becomes a code reviewer as basically their primary job.
372:26 reviewer as basically their primary job. In addition, as anybody who has worked
372:29 In addition, as anybody who has worked in a in a shop that has deeply adopted
372:31 in a in a shop that has deeply adopted agentic coding, we generate far more PRs
372:34 agentic coding, we generate far more PRs than ever before, which has led to code
372:36 than ever before, which has led to code review itself, the like the big scale
372:38 review itself, the like the big scale code review being a bottleneck.
372:42 code review being a bottleneck. So one of the things that we need to do
372:45 So one of the things that we need to do is we need to figure out how to improve
372:47 is we need to figure out how to improve code review velocity both for the big
372:50 code review velocity both for the big code reviews that we like where we you
372:52 code reviews that we like where we you send a PR and somebody like you know
372:54 send a PR and somebody like you know writes comments on it and you go back
372:55 writes comments on it and you go back and forth and also just the iterative
372:57 and forth and also just the iterative process of working with the agent. How
372:59 process of working with the agent. How do you speed up a person's ability to
373:02 do you speed up a person's ability to look at code and know what to do with
373:04 look at code and know what to do with it? So
373:06 it? So the principles are pretty similar for
373:08 the principles are pretty similar for both of those, but the exact way you
373:10 both of those, but the exact way you implement them is a little bit
373:11 implement them is a little bit different. What you care about the most
373:14 different. What you care about the most is making each individual response fast.
373:17 is making each individual response fast. You don't actually want to shorten the
373:19 You don't actually want to shorten the whole timeline of code review generally
373:22 whole timeline of code review generally because code review is a quality
373:24 because code review is a quality process. It's the same thing with agent
373:26 process. It's the same thing with agent iteration. Like what you want with agent
373:28 iteration. Like what you want with agent iteration is you want to get to the
373:30 iteration is you want to get to the place where you've got the right result.
373:33 place where you've got the right result. You don't want to like just be like,
373:34 You don't want to like just be like, "Well, I guess I've hit my five minute
373:36 "Well, I guess I've hit my five minute time limit, so I'm going to check in
373:37 time limit, so I'm going to check in this garbage that doesn't work, right?
373:39 this garbage that doesn't work, right? You you But what you do want is you want
373:42 You you But what you do want is you want the iterations to be fast." Not just the
373:44 the iterations to be fast." Not just the agents iterations, but the human
373:47 agents iterations, but the human response time to the agent to be fast.
373:50 response time to the agent to be fast. And in order to do that, they have to
373:51 And in order to do that, they have to get very good at doing code reviews or
373:53 get very good at doing code reviews or knowing what the next step is to do with
373:54 knowing what the next step is to do with a lot of code. At the big code of review
373:57 a lot of code. At the big code of review level, one thing that I see that I think
373:59 level, one thing that I see that I think is sort of a social disease that has
374:01 is sort of a social disease that has infected a lot of companies is when
374:03 infected a lot of companies is when people want PR reviews, they just send a
374:05 people want PR reviews, they just send a Slack message to a team channel and say,
374:07 Slack message to a team channel and say, "Hey, could one of the 10 of you review
374:09 "Hey, could one of the 10 of you review my PR?" And what and you know what that
374:11 my PR?" And what and you know what that means is one person does all those
374:13 means is one person does all those reviews. That's what really happens.
374:15 reviews. That's what really happens. There there's like when you look at the
374:17 There there's like when you look at the code of review stats of teams like that
374:18 code of review stats of teams like that there's one person who has like 50 and
374:20 there's one person who has like 50 and the other person have like three two
374:22 the other person have like three two five seven because there's just one
374:24 five seven because there's just one person is like super responsive. So but
374:27 person is like super responsive. So but what that means is if you start
374:28 what that means is if you start generating dramatically more PRs that
374:30 generating dramatically more PRs that one person cannot handle the load you
374:32 one person cannot handle the load you have to distribute it and really the
374:33 have to distribute it and really the only way to distribute it is to assign
374:35 only way to distribute it is to assign it to specific individuals have a system
374:37 it to specific individuals have a system that distributes it among those
374:39 that distributes it among those individuals and then set SLOs's that
374:40 individuals and then set SLOs's that have some mechanism of enforcement. And
374:44 have some mechanism of enforcement. And another thing is like that GitHub, for
374:46 another thing is like that GitHub, for example, is not very good at today is
374:48 example, is not very good at today is making it clear whose turn it is to take
374:50 making it clear whose turn it is to take action. Like I left a bunch of comments
374:52 action. Like I left a bunch of comments on your PR. Uh,
374:55 on your PR. Uh, you now responded to one of my comments.
374:58 you now responded to one of my comments. Should I come back again now? Oh, wait.
375:00 Should I come back again now? Oh, wait. No, no, now you pushed a no change.
375:02 No, no, now you pushed a no change. Should I come back now? Okay. No, no,
375:04 Should I come back now? Okay. No, no, now you've responded to more comments.
375:06 now you've responded to more comments. What I rely on mostly is people telling
375:08 What I rely on mostly is people telling me in Slack, I'm ready for you to review
375:09 me in Slack, I'm ready for you to review my PR again. Which is a terrible and
375:11 my PR again. Which is a terrible and inefficient system.
375:18 And another thing you got to think about a lot is the quality of code reviews.
375:21 a lot is the quality of code reviews. And I mean this once again both for the
375:23 And I mean this once again both for the individual developers doing it with the
375:25 individual developers doing it with the agent and the people doing it in the
375:28 agent and the people doing it in the code review pipeline.
375:31 code review pipeline. You have to keep holding a high bar. I
375:34 You have to keep holding a high bar. I know that people have other opinions
375:36 know that people have other opinions about this. And yes, depending on the
375:38 about this. And yes, depending on the timeline that you expect your software
375:39 timeline that you expect your software to live, you might not need as much
375:41 to live, you might not need as much software design. Like look, it's
375:42 software design. Like look, it's software design is not the goal of
375:43 software design is not the goal of perfection. It's a goal of good enough
375:45 perfection. It's a goal of good enough and better than you had before, right?
375:48 and better than you had before, right? But sometimes good enough for a very
375:50 But sometimes good enough for a very long lived system is a much higher bar
375:52 long lived system is a much higher bar than people expect it to be.
375:55 than people expect it to be. And if you don't have a process that is
375:58 And if you don't have a process that is capable of rejecting things that
375:59 capable of rejecting things that shouldn't go in, you will very likely
376:02 shouldn't go in, you will very likely actually see decreasing productivity
376:04 actually see decreasing productivity gains from your agentic coders over time
376:07 gains from your agentic coders over time as the system becomes harder and harder
376:08 as the system becomes harder and harder for both the agent and the human to work
376:10 for both the agent and the human to work with.
376:11 with. The problem is this. In many companies,
376:15 The problem is this. In many companies, we have the people who are the best code
376:17 we have the people who are the best code reviewers not doing any of their time
376:20 reviewers not doing any of their time doing code review. They are spending all
376:22 doing code review. They are spending all their times in meetings doing highle
376:24 their times in meetings doing highle reviews doing strategy. And so we aren't
376:28 reviews doing strategy. And so we aren't teaching junior engineers to be better
376:31 teaching junior engineers to be better software engineers and to be better code
376:33 software engineers and to be better code reviewers. So we have to have some
376:35 reviewers. So we have to have some mechanism that allows the people who are
376:37 mechanism that allows the people who are the best at this to do this through
376:38 the best at this to do this through apprenticeship. If somebody else has a
376:40 apprenticeship. If somebody else has a better way of doing this than doing code
376:42 better way of doing this than doing code reviews with people, I would love to
376:43 reviews with people, I would love to know because in the 20 plus years that
376:45 know because in the 20 plus years that I've been doing this, I have never found
376:47 I've been doing this, I have never found a way to teach people to be good code
376:48 a way to teach people to be good code reviewers other than doing good code
376:50 reviewers other than doing good code reviews with them.
376:58 Now, if you do if you don't do all the things that I talked about, what is the
377:01 things that I talked about, what is the danger?
377:02 danger? The danger is you take a bad codebase
377:05 The danger is you take a bad codebase with a confusing environment. You give
377:08 with a confusing environment. You give it to an agent or a developer working
377:10 it to an agent or a developer working with that agent. The agent produces
377:14 with that agent. The agent produces relative levels of nonsense
377:17 relative levels of nonsense and the developer experiences more or
377:20 and the developer experiences more or less frustration. And depending on how
377:22 less frustration. And depending on how persistent they are, at some point they
377:23 persistent they are, at some point they give up and they just send their PR off
377:26 give up and they just send their PR off for review. They're like, I think it
377:28 for review. They're like, I think it works. Right? And then if you have
377:31 works. Right? And then if you have low-quality code reviews or code
377:33 low-quality code reviews or code reviewers who are overwhelmed, they go,
377:35 reviewers who are overwhelmed, they go, I don't know. I don't know what to do
377:36 I don't know. I don't know what to do with this. I guess it's okay. And you
377:39 with this. I guess it's okay. And you just have lots and lots and lots of bad
377:41 just have lots and lots and lots of bad rubber stamp PRs that keep going in and
377:43 rubber stamp PRs that keep going in and you get into a vicious cycle where what
377:45 you get into a vicious cycle where what I expect to occur and what my prediction
377:47 I expect to occur and what my prediction is is if you are in this cycle, uh, your
377:49 is is if you are in this cycle, uh, your agent productivity will decrease
377:51 agent productivity will decrease consistently through the year. On the
377:53 consistently through the year. On the other hand, we live in an amazing time
377:57 other hand, we live in an amazing time where if we increase the ability of the
378:00 where if we increase the ability of the agents to help us be productive, then
378:02 agents to help us be productive, then they can actually help us be more
378:04 they can actually help us be more productive and we actually get into a
378:06 productive and we actually get into a virtuous cycle instead where we actually
378:09 virtuous cycle instead where we actually accelerate more and more and more and
378:11 accelerate more and more and more and more. And yes, some of these things
378:13 more. And yes, some of these things sound like very expensive fundamental
378:15 sound like very expensive fundamental investments, but I think now is the time
378:17 investments, but I think now is the time to make them because now is one of the
378:20 to make them because now is one of the times you're going to have the biggest
378:21 times you're going to have the biggest differentiation in your business in
378:23 differentiation in your business in terms of software engineering velocity
378:25 terms of software engineering velocity if you can do these things versus other
378:27 if you can do these things versus other in industries or companies that can't
378:28 in industries or companies that can't structurally do these things.
378:32 structurally do these things. So to summarize, here's a few things.
378:35 So to summarize, here's a few things. Not literally everything in the world
378:36 Not literally everything in the world you can do that's no regrets, but you
378:38 you can do that's no regrets, but you can standardize your development
378:39 can standardize your development environments. You can make CLIs or APIs
378:42 environments. You can make CLIs or APIs for anything that needs a CLI or API.
378:44 for anything that needs a CLI or API. Those CLIs or APIs have to run at
378:46 Those CLIs or APIs have to run at development time. By the way, too,
378:47 development time. By the way, too, another big thing that people miss is
378:49 another big thing that people miss is sometimes they have things that only run
378:50 sometimes they have things that only run in CI. If you're CI takes 15, 20 minutes
378:54 in CI. If you're CI takes 15, 20 minutes and you know, agents are like way more
378:57 and you know, agents are like way more persistent and patient than a human
378:59 persistent and patient than a human being is. So like, but they're also more
379:01 being is. So like, but they're also more errorprone than human beings are. So
379:03 errorprone than human beings are. So like they will run the thing and then
379:04 like they will run the thing and then run your test and then run a thing and
379:05 run your test and then run a thing and then run your test and then run a thing
379:06 then run your test and then run a thing and then run your test and they'll do it
379:07 and then run your test and they'll do it like five times in a row. If that takes
379:09 like five times in a row. If that takes 20 minutes, your developers productivity
379:12 20 minutes, your developers productivity is going to be shot to heck. Whereas, if
379:14 is going to be shot to heck. Whereas, if it takes 30 seconds, you're going to
379:16 it takes 30 seconds, you're going to have a they're going to have a much
379:17 have a they're going to have a much better experience. You can improve
379:19 better experience. You can improve validation. You can refactor for both
379:21 validation. You can refactor for both testability and the ability to reason
379:23 testability and the ability to reason about the codebase. You can make sure
379:25 about the codebase. You can make sure all the external context and your
379:26 all the external context and your intentions, the why is written down. You
379:29 intentions, the why is written down. You can make every response during code
379:30 can make every response during code review faster. And you can raise the bar
379:31 review faster. And you can raise the bar on code review quality. But if you look
379:33 on code review quality. But if you look at all of these things, there's one
379:36 at all of these things, there's one lesson and one principle that we take
379:37 lesson and one principle that we take away from all these things that covers
379:39 away from all these things that covers even more things than this. And it's
379:42 even more things than this. And it's basically that what's good for humans is
379:44 basically that what's good for humans is good for AI. And the great thing about
379:47 good for AI. And the great thing about this, one second. The great thing about
379:49 this, one second. The great thing about this is that it means that when we
379:52 this is that it means that when we invest in this thing, we will help our
379:55 invest in this thing, we will help our developers no matter what. Even if
379:57 developers no matter what. Even if sometimes we miss on helping the agent,
379:59 sometimes we miss on helping the agent, we are guaranteed to help the humans.
380:01 we are guaranteed to help the humans. Thank you very much. [applause]
380:16 >> Ladies and gentlemen, please welcome back to the stage, Alex Lieberman.
380:19 back to the stage, Alex Lieberman. [music] Let's give it up again for Max.
380:23 [music] Let's give it up again for Max. [applause]
380:24 [applause] We have one more break now and then the
380:27 We have one more break now and then the last block of sessions where we'll have
380:28 last block of sessions where we'll have speakers talking about AI. uh
380:31 speakers talking about AI. uh consultancies uh paying engineers like
380:34 consultancies uh paying engineers like salespeople and how to make your company
380:36 salespeople and how to make your company AI native. So be back here at 4 o'clock
380:39 AI native. So be back here at 4 o'clock or if you're watching the live stream be
380:41 or if you're watching the live stream be back online at four o'clock and we'll
380:43 back online at four o'clock and we'll see you then. Thanks everyone.
383:56 Heat. [music]
385:12 >> Heat. Heat. [music]
385:41 [music] >> Heat. Heat.
385:57 >> Heat [music]
386:21 [music] >> Heat.
386:24 >> Heat. Heat.
387:16 >> Down no [music]
387:18 no [music] down.
388:27 Heat up [music]
389:44 >> Heat. Heat.
390:50 >> Heat. Heat. [music]
390:55 Heat >> [music]
391:16 >> up here. [music]
392:05 Heat. [music]
393:01 [music] >> Heat. Heat.
393:08 [music] Heat.
393:46 up [music]
394:16 Heat. [music]
395:50 >> [music] >> Heat. Heat.
396:56 >> [music] >> Heat. Heat.
399:02 Heat. [music]
400:14 up Heat.
402:39 >> [music] >> Heat.
402:42 >> Heat. Heat.
404:06 Heat. Heat. [music]
405:59 >> Heat up [music] here.
407:13 Heat [music]
407:25 up here.
408:18 >> Heat. Heat. [music]
409:47 [music] Heat. Heat.
411:40 [music] Heat. Heat.
412:05 Heat [music]
413:14 Heat. Heat. [music]
414:00 [music] Heat
414:13 [music] >> up here.
414:47 >> [music] >> Heat. Heat.
418:45 >> [music] >> Heat. Heat.
419:47 [music] Heat.
420:13 How we doing? We are officially 7 hours in. How's the energy level? 7 hours in.
420:16 in. How's the energy level? 7 hours in. Let's hear it. There we go. There we go.
420:19 Let's hear it. There we go. There we go. So this is our last block of sessions
420:22 So this is our last block of sessions before you all get to enjoy the graphite
420:25 before you all get to enjoy the graphite afterparty. More coming on that in a
420:27 afterparty. More coming on that in a few. And for this block, we're going to
420:30 few. And for this block, we're going to cover a lot. AI consulting in practice,
420:33 cover a lot. AI consulting in practice, paying engineers like salespeople, as I
420:36 paying engineers like salespeople, as I mentioned earlier, leadership in AI
420:38 mentioned earlier, leadership in AI assisted engineering, and how to build
420:41 assisted engineering, and how to build an AI native company. You guys ready for
420:44 an AI native company. You guys ready for this?
420:46 this? >> Oh, come on. Let's go. So [applause]
420:48 >> Oh, come on. Let's go. So [applause] with that, please join me in welcoming
420:50 with that, please join me in welcoming our next speaker and one of last year's
420:53 our next speaker and one of last year's MC's to talk about helping organizations
420:56 MC's to talk about helping organizations transform with AI. Let's hear it for
420:58 transform with AI. Let's hear it for NLW.
421:06 [music] All right. Great to be back here, guys.
421:09 All right. Great to be back here, guys. Uh for those of you who are here in
421:11 Uh for those of you who are here in February, I had the privilege of MCing.
421:13 February, I had the privilege of MCing. Uh and today I'm excited to talk about
421:15 Uh and today I'm excited to talk about something a little bit different. So
421:17 something a little bit different. So right now uh there's been the last
421:20 right now uh there's been the last couple of months have been an
421:21 couple of months have been an interesting time in AI. There's been a
421:23 interesting time in AI. There's been a sort of surge in the air uh the
421:24 sort of surge in the air uh the narrative of an AI bubble. A lot of it
421:27 narrative of an AI bubble. A lot of it driven by dubious studies uh like the
421:29 driven by dubious studies uh like the MIT report. And so what I wanted to do
421:33 MIT report. And so what I wanted to do today is get into not so much the
421:36 today is get into not so much the practice of consulting and transforming
421:38 practice of consulting and transforming but what organizations are actually
421:39 but what organizations are actually finding value in right now. So for those
421:42 finding value in right now. So for those of you who don't know me, uh there's
421:43 of you who don't know me, uh there's kind of two contexts I bring to this
421:44 kind of two contexts I bring to this conversation. The first is as the host
421:46 conversation. The first is as the host of the AI daily brief which is a daily
421:48 of the AI daily brief which is a daily uh news analysis podcast about AI. The
421:50 uh news analysis podcast about AI. The second is as the CEO of super
421:52 second is as the CEO of super intelligent which is an AI planning
421:53 intelligent which is an AI planning platform. And so the different
421:54 platform. And so the different perspectives are sort of very high level
421:56 perspectives are sort of very high level macro thinking about the news that's
421:58 macro thinking about the news that's happening and then a much more kind of
422:00 happening and then a much more kind of ground level view where we're spending a
422:01 ground level view where we're spending a ton of time interviewing executives
422:03 ton of time interviewing executives about what's going on inside their
422:04 about what's going on inside their organizations. And what we're going to
422:06 organizations. And what we're going to talk about is sort of one kind of
422:08 talk about is sort of one kind of briefly in the first part just the
422:09 briefly in the first part just the status of enterprise adoption uh as it
422:12 status of enterprise adoption uh as it currently stands. And two, um, and the
422:14 currently stands. And two, um, and the more interesting part is we've been live
422:17 more interesting part is we've been live with a study in in the market for about
422:19 with a study in in the market for about a month now collecting self-reported
422:21 a month now collecting self-reported information about ROI around different
422:23 information about ROI around different use cases. And this will be the first
422:25 use cases. And this will be the first time uh, this week was the first time I
422:27 time uh, this week was the first time I did some analysis on it. And so I'm
422:29 did some analysis on it. And so I'm going to share what people have uh, what
422:31 going to share what people have uh, what people have told us around the first
422:32 people have told us around the first kind of 2500 or so use cases that
422:34 kind of 2500 or so use cases that they've shared. Um, so it should be
422:36 they've shared. Um, so it should be pretty pretty interesting stuff. talking
422:38 pretty pretty interesting stuff. talking about kind of enterprise AI adoption
422:40 about kind of enterprise AI adoption first. I'll go through this pretty
422:41 first. I'll go through this pretty quickly because it's um pretty
422:42 quickly because it's um pretty well-known stuff. Uh the short of it is
422:45 well-known stuff. Uh the short of it is enterprises are adopting AI uh in in a
422:48 enterprises are adopting AI uh in in a growing fashion. Um pretty much everyone
422:49 growing fashion. Um pretty much everyone is using it at least a little bit. Uh
422:51 is using it at least a little bit. Uh and increasingly they're using it a lot.
422:54 and increasingly they're using it a lot. Uh this year I will need to tell none of
422:56 Uh this year I will need to tell none of you that there is a major inflection
422:58 you that there is a major inflection around um specifically adoption in the
423:01 around um specifically adoption in the uh coding and software engineering.
423:02 uh coding and software engineering. Right? You saw a huge huge uptick in
423:04 Right? You saw a huge huge uptick in this. Um there's a lot that's
423:06 this. Um there's a lot that's interesting about that from an
423:07 interesting about that from an enterprise perspective because it wasn't
423:09 enterprise perspective because it wasn't just with the software engineering
423:10 just with the software engineering organizations. Other parts of the
423:11 organizations. Other parts of the organization are also now thinking about
423:13 organization are also now thinking about how they can communicate with code,
423:15 how they can communicate with code, build things with code. Uh but that's a
423:17 build things with code. Uh but that's a huge huge theme of this year [snorts]
423:19 huge huge theme of this year [snorts] coming into 2025. One of the big sort of
423:22 coming into 2025. One of the big sort of thoughts that many people had was that
423:23 thoughts that many people had was that this would be the year of agents inside
423:25 this would be the year of agents inside the enterprise, right? That big chunks
423:27 the enterprise, right? That big chunks of work would get automated away. And on
423:29 of work would get automated away. And on the one hand, I think it's pretty clear
423:31 the one hand, I think it's pretty clear that we didn't see some sort of mass
423:34 that we didn't see some sort of mass shift towards automation uh at large
423:36 shift towards automation uh at large across different functions in the
423:37 across different functions in the organization. But when you dig into the
423:40 organization. But when you dig into the numbers, there has been actually pretty
423:42 numbers, there has been actually pretty significant uh shifts in the patterns of
423:44 significant uh shifts in the patterns of of agent adoption. So this is from
423:46 of agent adoption. So this is from KPMG's quarterly pulse survey. And it's
423:49 KPMG's quarterly pulse survey. And it's a measure of how many enterprises that
423:51 a measure of how many enterprises that are a part of their survey, which is all
423:52 are a part of their survey, which is all companies over a billion dollars in
423:54 companies over a billion dollars in revenue, have uh actual sort of full
423:56 revenue, have uh actual sort of full production agents in deployment. So this
423:58 production agents in deployment. So this isn't pilots, this isn't experiments.
424:00 isn't pilots, this isn't experiments. This is where they consider uh some
424:02 This is where they consider uh some agent that's actually doing doing kind
424:04 agent that's actually doing doing kind of work in a in a full way. And it's
424:06 of work in a in a full way. And it's jumped from 11% in Q1 of this year to
424:08 jumped from 11% in Q1 of this year to 42% in their most recent study for Q3.
424:10 42% in their most recent study for Q3. So you actually are seeing pretty
424:12 So you actually are seeing pretty meaningful uptake of of agents inside
424:15 meaningful uptake of of agents inside the enterprise. In fact, I would argue
424:17 the enterprise. In fact, I would argue based on our conversations that people
424:19 based on our conversations that people have that it's moved more quickly
424:22 have that it's moved more quickly through the pilot or experimental phase
424:24 through the pilot or experimental phase than people might have thought. um so
424:26 than people might have thought. um so much so that you're actually seeing now
424:28 much so that you're actually seeing now a big shift in the emphasis around kind
424:31 a big shift in the emphasis around kind of the human side of agents and how
424:33 of the human side of agents and how humans are going to interact with agents
424:34 humans are going to interact with agents and it's involving a shift in upskilling
424:37 and it's involving a shift in upskilling and and uh and enablement work. Um
424:39 and and uh and enablement work. Um you're seeing a decrease in the sort of
424:41 you're seeing a decrease in the sort of resistance to agents as people start to
424:43 resistance to agents as people start to actually dig in with them. You're seeing
424:44 actually dig in with them. You're seeing more experiments like these sandboxes
424:46 more experiments like these sandboxes where people can interact with agents.
424:48 where people can interact with agents. So this is a big theme even if it wasn't
424:50 So this is a big theme even if it wasn't necessarily the dominant theme that some
424:52 necessarily the dominant theme that some thought it might be coming into this
424:53 thought it might be coming into this year. At the same time, it is absolutely
424:57 year. At the same time, it is absolutely the case that many many if not most
424:59 the case that many many if not most enterprises are broadly speaking stuck
425:03 enterprises are broadly speaking stuck inside sort of pilot and experimental
425:04 inside sort of pilot and experimental phases. There is a lot of challenge
425:07 phases. There is a lot of challenge around moving from some of those first
425:09 around moving from some of those first exciting experiments to something that's
425:10 exciting experiments to something that's more scaled. Um, so this is from
425:12 more scaled. Um, so this is from McKenzie state of AI study which came
425:14 McKenzie state of AI study which came out I think a couple weeks ago now and
425:15 out I think a couple weeks ago now and you can see only 7% of the organizations
425:17 you can see only 7% of the organizations that they talk to claim or sort of see
425:20 that they talk to claim or sort of see themselves as as fully at scale with
425:22 themselves as as fully at scale with with AI and agents. and it's something
425:23 with AI and agents. and it's something like 62% are either still experimenting
425:26 like 62% are either still experimenting or piloting.
425:28 or piloting. Interestingly, big organizations are on
425:31 Interestingly, big organizations are on in general a little bit ahead in terms
425:33 in general a little bit ahead in terms of uh the organizations that are scaling
425:36 of uh the organizations that are scaling as compared to small organizations. This
425:38 as compared to small organizations. This has been a a thing that we've noticed
425:39 has been a a thing that we've noticed kind of throughout the trajectory of uh
425:42 kind of throughout the trajectory of uh of AI um adoption over the last couple
425:44 of AI um adoption over the last couple of years that you would think that
425:45 of years that you would think that perhaps smaller, more nimble companies
425:48 perhaps smaller, more nimble companies uh would be more kind of quick to adopt
425:50 uh would be more kind of quick to adopt these things, but in fact, it's often
425:52 these things, but in fact, it's often been the opposite with the biggest
425:53 been the opposite with the biggest organizations making the biggest
425:55 organizations making the biggest efforts. You can also see from the chart
425:57 efforts. You can also see from the chart on the bottom that there's very sort of
425:59 on the bottom that there's very sort of jagged patterns of adoption, right?
426:01 jagged patterns of adoption, right? you're starting to see uh from you know
426:03 you're starting to see uh from you know last year if you looked there's very
426:04 last year if you looked there's very similar kind of rates of experimentation
426:06 similar kind of rates of experimentation across lots of different departments
426:08 across lots of different departments you're starting to see some pretty big
426:09 you're starting to see some pretty big breakouts now uh with for example you
426:11 breakouts now uh with for example you know IT operations kind of jumping out
426:14 know IT operations kind of jumping out ahead of other functions
426:16 ahead of other functions I won't spend too much time on this sort
426:18 I won't spend too much time on this sort of high performer piece but I think the
426:19 of high performer piece but I think the thing to note because it comes back in
426:21 thing to note because it comes back in and in and some of the stuff that we
426:22 and in and some of the stuff that we found with our ROI study is that you are
426:25 found with our ROI study is that you are also starting to see a pretty
426:27 also starting to see a pretty significant bifurcation between leaders
426:29 significant bifurcation between leaders and laggers when it comes to AI
426:31 and laggers when it comes to AI adoption. And one of the things that
426:33 adoption. And one of the things that tends to distinguish the companies that
426:35 tends to distinguish the companies that are leading is that they are just doing
426:37 are leading is that they are just doing more of it and they are thinking more
426:38 more of it and they are thinking more comprehensively and systematically about
426:41 comprehensively and systematically about AI and agent adoption. So they are not
426:43 AI and agent adoption. So they are not just sort of doing spot experiments.
426:45 just sort of doing spot experiments. They're thinking about their strategy as
426:46 They're thinking about their strategy as a whole. They're doing multiple things
426:48 a whole. They're doing multiple things at once. And importantly, they're not
426:50 at once. And importantly, they're not just thinking about sort of the very
426:52 just thinking about sort of the very kind of first tier time savings or
426:54 kind of first tier time savings or productivity types of use cases. They're
426:56 productivity types of use cases. They're also thinking about how do we grow
426:57 also thinking about how do we grow revenue? How do we create new
426:58 revenue? How do we create new capabilities? How do we create new
426:59 capabilities? How do we create new product lines?
427:03 product lines? Overall, it's very clear that despite
427:06 Overall, it's very clear that despite what is sort of, you know, the the the
427:08 what is sort of, you know, the the the concerns in the media that spend is
427:10 concerns in the media that spend is going to do nothing but increase on
427:11 going to do nothing but increase on this. Um, so the bottom is the KPMG
427:13 this. Um, so the bottom is the KPMG pulse survey again, and this is a an
427:15 pulse survey again, and this is a an estimation of the amount of money that
427:17 estimation of the amount of money that these organizations intend to spend on
427:19 these organizations intend to spend on AI over the next 12 months. The
427:20 AI over the next 12 months. The beginning of the year was 114, which by
427:22 beginning of the year was 114, which by the way was up from like 88 in Q4 of
427:24 the way was up from like 88 in Q4 of last year. It's now up to in their last
427:26 last year. It's now up to in their last study 130 million is what they expect to
427:28 study 130 million is what they expect to spend uh in the in the year ahead which
427:31 spend uh in the in the year ahead which obviously the the total magnitude
427:32 obviously the the total magnitude doesn't matter as much as the change. Um
427:34 doesn't matter as much as the change. Um you also see the green charts are from
427:36 you also see the green charts are from Deote and you can see 90% plus of
427:39 Deote and you can see 90% plus of organizations intend to increase their
427:41 organizations intend to increase their spend uh on AI in the next 12 months and
427:44 spend uh on AI in the next 12 months and as part of that I think that you're
427:46 as part of that I think that you're going to see a much more determined
427:48 going to see a much more determined conversation around impact and ROI uh
427:51 conversation around impact and ROI uh which is a particularly thorny topic but
427:53 which is a particularly thorny topic but interestingly
427:55 interestingly there has been an increase in optimism
427:58 there has been an increase in optimism over the course of this year around the
428:00 over the course of this year around the realization of AI. So this is from a
428:02 realization of AI. So this is from a different KPMG study, their annual CEO
428:05 different KPMG study, their annual CEO survey, which interviews tons and tons
428:07 survey, which interviews tons and tons of CEOs. And if you look at the 2024
428:09 of CEOs. And if you look at the 2024 numbers, 63% of those pled thought that
428:13 numbers, 63% of those pled thought that it would take between 3 and 5 years to
428:15 it would take between 3 and 5 years to realize ROI from their AI investments.
428:17 realize ROI from their AI investments. 20% said 1 to three and 16% said more
428:20 20% said 1 to three and 16% said more than five. This year in that same
428:22 than five. This year in that same survey, the number that said 1 to 3
428:24 survey, the number that said 1 to 3 years had gone up to 67%. There were now
428:27 years had gone up to 67%. There were now 19% who said 6 months to one year. uh
428:30 19% who said 6 months to one year. uh and 3 to 5 years was down to just 12%.
428:33 and 3 to 5 years was down to just 12%. So huge huge kind of pull forward of
428:36 So huge huge kind of pull forward of expectations of of ROI realization. The
428:39 expectations of of ROI realization. The challenge is that ROI is really tough.
428:42 challenge is that ROI is really tough. So this is back to the pulse survey. 78%
428:45 So this is back to the pulse survey. 78% of those pled in that in that survey
428:47 of those pled in that in that survey said that they thought that ROI was
428:49 said that they thought that ROI was going to basically become a bigger
428:50 going to basically become a bigger consideration in the year to come. Uh
428:52 consideration in the year to come. Uh but also 78% said that traditional
428:56 but also 78% said that traditional impact metrics and measures were having
428:58 impact metrics and measures were having a very hard time keeping up with the
429:00 a very hard time keeping up with the with the new reality that we were living
429:01 with the new reality that we were living in. And this is something that I've
429:02 in. And this is something that I've heard constantly over and over from CIOS
429:05 heard constantly over and over from CIOS and other people who are in charge of
429:06 and other people who are in charge of these investments that the the the ways
429:08 these investments that the the the ways that we have measured impact of previous
429:10 that we have measured impact of previous technologies and just previous
429:11 technologies and just previous initiatives are kind of falling flat
429:13 initiatives are kind of falling flat with AI. And so that got us thinking
429:16 with AI. And so that got us thinking about the the the overall need that we
429:18 about the the the overall need that we have to just have more information. I'm
429:21 have to just have more information. I'm not even talking about good systematic
429:24 not even talking about good systematic information, just more information
429:26 information, just more information around what ROI looks like, what impact
429:28 around what ROI looks like, what impact looks like, and you know, I've got this
429:30 looks like, and you know, I've got this great podcast audience. They're super
429:32 great podcast audience. They're super engaged. And so, we just decided, screw
429:33 engaged. And so, we just decided, screw it. We're going to ask them, we're just
429:35 it. We're going to ask them, we're just going to ask them to report on what ROI
429:37 going to ask them to report on what ROI they're finding from their use cases.
429:39 they're finding from their use cases. So, this went up at the very end of
429:40 So, this went up at the very end of October. Uh like I said as of this
429:42 October. Uh like I said as of this morning or when I looked last looked
429:44 morning or when I looked last looked we've had over a thousand submissions uh
429:46 we've had over a thousand submissions uh a thousand individual organizations
429:48 a thousand individual organizations rather submit something like 3500 use
429:50 rather submit something like 3500 use cases and um this is uh some some of the
429:53 cases and um this is uh some some of the first observations that we had around um
429:55 first observations that we had around um kind of the first 2500.
429:57 kind of the first 2500. So the impact categories the way that we
430:00 So the impact categories the way that we divided things was into sort of eight
430:02 divided things was into sort of eight broad categories of impact um which will
430:04 broad categories of impact um which will all I think be very intuitive to you
430:06 all I think be very intuitive to you guys. time savings, increased output,
430:09 guys. time savings, increased output, improvement in quality, new
430:10 improvement in quality, new capabilities, improved decision- making,
430:12 capabilities, improved decision- making, cost savings, increased revenue, and
430:14 cost savings, increased revenue, and risk reduction. So, basically, it was
430:16 risk reduction. So, basically, it was trying to think of like kind of a a
430:17 trying to think of like kind of a a broad simple heristic for uh for for
430:20 broad simple heristic for uh for for kind of dividing or subdividing the
430:22 kind of dividing or subdividing the different the different ways that people
430:23 different the different ways that people are thinking about ROI. And TLDDR is
430:27 are thinking about ROI. And TLDDR is that people are finding uh ROI right
430:29 that people are finding uh ROI right now. Um, now again, the caveats are that
430:32 now. Um, now again, the caveats are that this is a highly infranchised audience.
430:34 this is a highly infranchised audience. they're listening to a daily AI podcast
430:36 they're listening to a daily AI podcast and they are voluntarily sharing this.
430:38 and they are voluntarily sharing this. So, I think that, you know, there's
430:39 So, I think that, you know, there's there's some caveing there, but you have
430:41 there's some caveing there, but you have 44.3% saying that they're seeing modest
430:43 44.3% saying that they're seeing modest ROI right now. And then you have another
430:46 ROI right now. And then you have another 37.6% seeing high ROI. For the purposes
430:49 37.6% seeing high ROI. For the purposes of a lot of these stats, high ROI will
430:51 of a lot of these stats, high ROI will be significant plus transformational. Uh
430:54 be significant plus transformational. Uh only 5% or so are seeing negative ROI.
430:56 only 5% or so are seeing negative ROI. And keep in mind, negative ROI doesn't
430:58 And keep in mind, negative ROI doesn't mean that they think programs are
430:59 mean that they think programs are failing. It just means they haven't
431:00 failing. It just means they haven't they've spent more than they've gained
431:02 they've spent more than they've gained uh in terms of how their their
431:03 uh in terms of how their their perception is. More [snorts] than that,
431:06 perception is. More [snorts] than that, expectations are absolutely skyhigh. 67%
431:10 expectations are absolutely skyhigh. 67% think over the next year they will see
431:12 think over the next year they will see uh increased and high growth in their
431:14 uh increased and high growth in their ROI. So we have really optimistic sense
431:18 ROI. So we have really optimistic sense from the ground view of where ROI is
431:20 from the ground view of where ROI is going to be in AI. Um you even have the
431:23 going to be in AI. Um you even have the teams that are currently experiencing
431:25 teams that are currently experiencing negative ROI. 53% say that they're going
431:28 negative ROI. 53% say that they're going to see high growth. So very very
431:30 to see high growth. So very very optimistic. Um as [snorts] you might
431:32 optimistic. Um as [snorts] you might imagine, time savings is the default.
431:35 imagine, time savings is the default. It's the starting point for so many
431:36 It's the starting point for so many organizations. It represents about 35%
431:38 organizations. It represents about 35% of the use cases. After that, increasing
431:41 of the use cases. After that, increasing output, quality improvement, basically
431:42 output, quality improvement, basically all those things that you would imagine
431:43 all those things that you would imagine around productivity are sort of like the
431:46 around productivity are sort of like the dominant categories when it comes to
431:47 dominant categories when it comes to these uh when it comes to these use
431:49 these uh when it comes to these use cases. When it comes to the specifics
431:51 cases. When it comes to the specifics around time savings, you see a real
431:53 around time savings, you see a real cluster between 1 and 10 hours,
431:54 cluster between 1 and 10 hours, especially right around 5 hours. And I
431:56 especially right around 5 hours. And I think this is interesting to call out
431:58 think this is interesting to call out because it's so obvious to all of us who
432:00 because it's so obvious to all of us who are inside building these things uh
432:03 are inside building these things uh whether you are a developer or an
432:04 whether you are a developer or an entrepreneur or just someone sort of in
432:06 entrepreneur or just someone sort of in and around it how the the vast breadth
432:09 and around it how the the vast breadth of opportunity that AI represents new
432:11 of opportunity that AI represents new capabilities things unimagined yet. It's
432:14 capabilities things unimagined yet. It's hard to or it's easy to forget that if
432:16 hard to or it's easy to forget that if you save 5 hours a week or 10 hours a
432:18 you save 5 hours a week or 10 hours a week you're talking about winning back 7
432:19 week you're talking about winning back 7 to 10 work weeks a year. Uh and that's
432:22 to 10 work weeks a year. Uh and that's very very powerful. And when it comes to
432:24 very very powerful. And when it comes to a lot of these enterprises, that is a
432:26 a lot of these enterprises, that is a very meaningful thing, even if it's not
432:28 very meaningful thing, even if it's not what they're ultimately in it for.
432:31 what they're ultimately in it for. Interestingly though, it's very clear
432:33 Interestingly though, it's very clear that the story, although it might be uh
432:35 that the story, although it might be uh have a concentration in time savings, is
432:37 have a concentration in time savings, is about much more than time savings. So
432:39 about much more than time savings. So this is the ROI distribution category uh
432:42 this is the ROI distribution category uh ROI distribution by organization size.
432:44 ROI distribution by organization size. And this starts to get really
432:46 And this starts to get really interesting where you can see that there
432:47 interesting where you can see that there are some differences in where different
432:50 are some differences in where different size organizations are focused. So for
432:53 size organizations are focused. So for example, the organization size between
432:55 example, the organization size between 200 and a,000 people has a higher
432:58 200 and a,000 people has a higher portion of their use cases concentrated
433:00 portion of their use cases concentrated in increasing output. Now we haven't
433:02 in increasing output. Now we haven't taken the time yet to really figure out
433:04 taken the time yet to really figure out exactly what this means or even
433:05 exactly what this means or even speculate on on what this means. But I
433:07 speculate on on what this means. But I think it's interesting that this is a
433:09 think it's interesting that this is a category of organization that has often
433:11 category of organization that has often reached a certain scale but is still
433:13 reached a certain scale but is still very much striving for more and so seems
433:14 very much striving for more and so seems to be focused more on use cases that
433:17 to be focused more on use cases that expand their capabilities.
433:19 expand their capabilities. Same thing with uh when you start to
433:22 Same thing with uh when you start to divide things by role you see real kind
433:24 divide things by role you see real kind of variance where for example seuitees
433:26 of variance where for example seuitees and leaders uh are less focused on those
433:29 and leaders uh are less focused on those time savings use cases and more focused
433:31 time savings use cases and more focused on other things like increased output
433:33 on other things like increased output and uh and new capabilities
433:37 and uh and new capabilities in general we're finding that sea
433:39 in general we're finding that sea leaders uh and just sort of seuite and
433:41 leaders uh and just sort of seuite and and leaders in general are even more
433:44 and leaders in general are even more optimistic and excited and seeing
433:46 optimistic and excited and seeing transformational impact than people who
433:48 transformational impact than people who are in more junior positions. Now, some
433:50 are in more junior positions. Now, some of this might be sort of selection bias
433:52 of this might be sort of selection bias in terms of um what types of use cases
433:55 in terms of um what types of use cases you are focused on. If you are in that
433:57 you are focused on. If you are in that seuite, you're thinking about things
433:58 seuite, you're thinking about things that inherently if they work are more
434:00 that inherently if they work are more transformational. Uh but it is notable
434:02 transformational. Uh but it is notable that 17% of uh of the use cases that
434:05 that 17% of uh of the use cases that that people in those leadership
434:06 that people in those leadership positions have submitted uh they say
434:08 positions have submitted uh they say have transformational impact and ROI
434:10 have transformational impact and ROI already. [snorts] Uh I'm going to skip
434:12 already. [snorts] Uh I'm going to skip this because there's we don't have time
434:14 this because there's we don't have time for too much. um you're seeing
434:16 for too much. um you're seeing interestingly uh a concentration um
434:20 interestingly uh a concentration um where the smallest organizations are
434:22 where the smallest organizations are getting more of that transformational
434:24 getting more of that transformational benefit early. Um one of the things that
434:26 benefit early. Um one of the things that I want to do following this study is
434:28 I want to do following this study is maybe do a sort of second round where we
434:30 maybe do a sort of second round where we dig into what this 1 to 50 person uh
434:35 dig into what this 1 to 50 person uh size really looks like. I actually think
434:37 size really looks like. I actually think that whereas there might be a lot of
434:39 that whereas there might be a lot of similarity between a 1000 and a 2,000
434:41 similarity between a 1000 and a 2,000 person organization, there could be a
434:43 person organization, there could be a wild difference between a threeperson,
434:45 wild difference between a threeperson, you know, small company and a 40 person
434:48 you know, small company and a 40 person company. And so I'd really like to dig
434:49 company. And so I'd really like to dig into that more. But you are definitely
434:51 into that more. But you are definitely seeing a a lot of impact in those sort
434:55 seeing a a lot of impact in those sort of more small nimble moving
434:56 of more small nimble moving organizations.
434:58 organizations. Uh as you might expect, coding and uh
435:01 Uh as you might expect, coding and uh and software related or uh use cases
435:04 and software related or uh use cases have a higher ROI than average and a
435:06 have a higher ROI than average and a lower negative ROI than average. Um one
435:08 lower negative ROI than average. Um one really interesting kind of you know
435:10 really interesting kind of you know pulling on a specific category of use
435:12 pulling on a specific category of use cases. Risk reduction is our lowest
435:15 cases. Risk reduction is our lowest category in terms of the percentage of
435:17 category in terms of the percentage of use cases that that that was their
435:19 use cases that that that was their primary benefit. So when you're filling
435:21 primary benefit. So when you're filling out the survey, which is by the way at
435:22 out the survey, which is by the way at ROI survey.ai AI if you want to check it
435:24 ROI survey.ai AI if you want to check it out. Uh you basically only get to pick a
435:28 out. Uh you basically only get to pick a primary ROI benefit. We didn't want it
435:30 primary ROI benefit. We didn't want it to be super sort of um we wanted you to
435:32 to be super sort of um we wanted you to pick and and hone in on the thing that
435:34 pick and and hone in on the thing that was uh seemed most important or most
435:35 was uh seemed most important or most significant. And so only 3.4% have risk
435:38 significant. And so only 3.4% have risk reduction as their primary benefit uh in
435:42 reduction as their primary benefit uh in terms of ROI categories. But it is by
435:45 terms of ROI categories. But it is by far those use cases are by far the most
435:48 far those use cases are by far the most likely to have transformational impact
435:51 likely to have transformational impact as as the as as their outcome. It's at
435:54 as as the as as their outcome. It's at 25%. So a full quarter of those uh have
435:57 25%. So a full quarter of those uh have transformational ROI. And interestingly,
435:59 transformational ROI. And interestingly, I was having this conversation with a
436:00 I was having this conversation with a couple of my friends who work in sort of
436:02 couple of my friends who work in sort of back office and compliance and risk
436:04 back office and compliance and risk functions, and this has been their
436:05 functions, and this has been their experience as well, where there are a
436:08 experience as well, where there are a lot of uh a lot of the the the
436:10 lot of uh a lot of the the the challenges for those organizations
436:12 challenges for those organizations involve sheer volume and quantity uh in
436:14 involve sheer volume and quantity uh in ways that that AI can be really helpful
436:16 ways that that AI can be really helpful for.
436:18 for. We also are finding some interesting
436:19 We also are finding some interesting patterns among organizations. And again,
436:22 patterns among organizations. And again, this is where we get into some of the
436:23 this is where we get into some of the limits of this just being a whoever
436:25 limits of this just being a whoever walks through the door of my listeners.
436:27 walks through the door of my listeners. We have a pretty heavy concentration
436:28 We have a pretty heavy concentration among technology, as you might expect,
436:30 among technology, as you might expect, industries and among professional
436:32 industries and among professional services, but we still have fairly
436:33 services, but we still have fairly decent sample sizes for some others. And
436:36 decent sample sizes for some others. And in both healthcare and manufacturing,
436:38 in both healthcare and manufacturing, the use cases are meaningfully higher
436:40 the use cases are meaningfully higher impact on average uh than the average
436:42 impact on average uh than the average across all organizations. Um, which I
436:44 across all organizations. Um, which I think is uh it was kind of worthy of
436:46 think is uh it was kind of worthy of further study.
436:48 further study. Last sort of part of this as I wrap up,
436:51 Last sort of part of this as I wrap up, you know, a lot of these use cases as
436:53 you know, a lot of these use cases as you saw have to do with that sort of
436:56 you saw have to do with that sort of first tier that most enterprises are
436:57 first tier that most enterprises are going to be in. Uh, increasing the
436:59 going to be in. Uh, increasing the amount of content that you output,
437:01 amount of content that you output, increasing the quality of that content,
437:03 increasing the quality of that content, just finding ways to win back, you know,
437:04 just finding ways to win back, you know, your 5 hours a week. Um but increasingly
437:07 your 5 hours a week. Um but increasingly there are automation and agentic use
437:09 there are automation and agentic use cases and we are absolutely seeing that
437:12 cases and we are absolutely seeing that where those are the the focus where
437:14 where those are the the focus where those use cases mention certain types of
437:16 those use cases mention certain types of automation or they mention agents they
437:18 automation or they mention agents they wildly outperform in terms of the
437:20 wildly outperform in terms of the self-reported ROI from them that's both
437:22 self-reported ROI from them that's both on automation and it's on agents and I
437:25 on automation and it's on agents and I think that that's sort of a a trend
437:26 think that that's sort of a a trend towards where we're headed with sort of
437:29 towards where we're headed with sort of the next layer of more advanced use
437:31 the next layer of more advanced use cases.
437:33 cases. The last thing that uh from this sort of
437:35 The last thing that uh from this sort of first first look of observations is
437:38 first first look of observations is there is clearly benefits and this goes
437:40 there is clearly benefits and this goes back to to what we saw with that
437:42 back to to what we saw with that Mackenzie study as well of thinking
437:44 Mackenzie study as well of thinking about AI and agentic transformation in
437:47 about AI and agentic transformation in systematic cross-organizational
437:49 systematic cross-organizational cross-disciplinary types of terms. um
437:52 cross-disciplinary types of terms. um effectively pretty much uh directly the
437:55 effectively pretty much uh directly the more use cases that a person or an
437:57 more use cases that a person or an organization submitted the the better
438:00 organization submitted the the better they tended to see uh ROI for. Now
438:03 they tended to see uh ROI for. Now there's lots of reasons for that but I
438:04 there's lots of reasons for that but I do think it speaks to that that core
438:06 do think it speaks to that that core idea that once you move beyond kind of
438:08 idea that once you move beyond kind of your single spot experiments there's a
438:11 your single spot experiments there's a lot of opportunity uh to to sort of grow
438:13 lot of opportunity uh to to sort of grow grow the impact of the organization. So,
438:15 grow the impact of the organization. So, like I said, that is the the first look.
438:17 like I said, that is the the first look. Uh, it's kind of the first twothirds of
438:19 Uh, it's kind of the first twothirds of these uh of these use cases. We'll be
438:21 these uh of these use cases. We'll be open for another week and then we'll
438:23 open for another week and then we'll have the full study out at the beginning
438:24 have the full study out at the beginning of December. Um, I'm really excited, I
438:27 of December. Um, I'm really excited, I think, heading into next year to see how
438:29 think, heading into next year to see how we move from sort of generic
438:31 we move from sort of generic conversations about impact uh and our
438:34 conversations about impact uh and our gut senses about impact to a lot more
438:37 gut senses about impact to a lot more random experiments like this to figure
438:38 random experiments like this to figure out where the impact really is and uh
438:41 out where the impact really is and uh and where we go next. So, look at that.
438:43 and where we go next. So, look at that. I'm going to end 27 seconds early and
438:45 I'm going to end 27 seconds early and really throw off the time, but
438:46 really throw off the time, but appreciate you guys all being here. Uh,
438:48 appreciate you guys all being here. Uh, and again, if you want to check this
438:49 and again, if you want to check this out, it's roicervey.ai.
439:05 As AI [music] changes our business and engineering landscape, do we need to
439:08 engineering landscape, do we need to rethink how we incentivize and
439:10 rethink how we incentivize and compensate engineers? Here to provide us
439:13 compensate engineers? Here to provide us with a case study for scaling output,
439:16 with a case study for scaling output, not overhead, is the co-founder and
439:19 not overhead, is the co-founder and managing partner at 10X, Arman Hezarki.
439:35 How's everybody feeling? It's been uh 7 and 1/2 hours. We doing what? Are we
439:37 and 1/2 hours. We doing what? Are we doing okay?
439:39 doing okay? >> Awesome. I'm Arman. Uh like the voice of
439:43 >> Awesome. I'm Arman. Uh like the voice of God apparently. That's what they're
439:44 God apparently. That's what they're called, Voice of God. Apparently. Uh so
439:48 called, Voice of God. Apparently. Uh so my name's Armon. I'm one of the
439:49 my name's Armon. I'm one of the co-founders and managing partners at a
439:50 co-founders and managing partners at a company called 10X. Uh my co-founder is
439:52 company called 10X. Uh my co-founder is Alex who's been uh kindly announcing
439:55 Alex who's been uh kindly announcing everybody all day. We do a lot of cool
439:58 everybody all day. We do a lot of cool work. We uh we help companies with their
440:00 work. We uh we help companies with their AI transformation. We have incredible
440:02 AI transformation. We have incredible clients all over the world. But I'm not
440:04 clients all over the world. But I'm not going to talk about any of that today.
440:06 going to talk about any of that today. I'm going to talk about something much
440:07 I'm going to talk about something much more niche. I'm going to talk about how
440:09 more niche. I'm going to talk about how we pay engineers. And we pay engineers
440:13 we pay engineers. And we pay engineers like salespeople. Earlier I was just in
440:15 like salespeople. Earlier I was just in the green room with a bunch of
440:17 the green room with a bunch of distinguished engineers that I've grown
440:19 distinguished engineers that I've grown to uh respect for my entire career. And
440:22 to uh respect for my entire career. And we were talking and I was telling them
440:24 we were talking and I was telling them that we pay engineers based on the story
440:26 that we pay engineers based on the story points that they complete. And we had a
440:30 points that they complete. And we had a lot of people roll their eyes and and
440:32 lot of people roll their eyes and and laugh. And they asked, "What do you
440:34 laugh. And they asked, "What do you mean?" And I said, "Clients pay us for
440:36 mean?" And I said, "Clients pay us for the number of story points that we
440:37 the number of story points that we deliver and we pay engineers based on
440:39 deliver and we pay engineers based on the number of story points that they
440:41 the number of story points that they complete." And similar to the looks that
440:44 complete." And similar to the looks that I'm getting from some of you, there was
440:46 I'm getting from some of you, there was skepticism.
440:47 skepticism. And I know this sounds crazy, but it's
440:50 And I know this sounds crazy, but it's working. We've been able to hire
440:53 working. We've been able to hire incredible engineers, many of whom have
440:54 incredible engineers, many of whom have started and exited uh companies before
440:57 started and exited uh companies before this. We have been able to hire
440:59 this. We have been able to hire worldclass machine learning and AI
441:01 worldclass machine learning and AI researchers. We've hired rocket
441:03 researchers. We've hired rocket scientists from NASA. We are shipping
441:06 scientists from NASA. We are shipping code incredibly quickly, and it's
441:08 code incredibly quickly, and it's maintainable and high quality code. Of
441:11 maintainable and high quality code. Of course, that is everyone's dream.
441:12 course, that is everyone's dream. Everybody wants to hire great people.
441:14 Everybody wants to hire great people. Everyone wants to deliver really uh fast
441:16 Everyone wants to deliver really uh fast code.
441:17 code. So, my goal here is not to convince you
441:20 So, my goal here is not to convince you all to adopt our model. My goal is to
441:22 all to adopt our model. My goal is to show you what compensation looks like in
441:25 show you what compensation looks like in AI and hopefully provide a new
441:27 AI and hopefully provide a new perspective on the fact that things
441:29 perspective on the fact that things might change as we introduce this
441:30 might change as we introduce this technology. Before I jump in though, I
441:33 technology. Before I jump in though, I want to talk about uh how we got here.
441:36 want to talk about uh how we got here. So, I'm a software engineer by training.
441:38 So, I'm a software engineer by training. I went to Carnegie Melon and then I
441:40 I went to Carnegie Melon and then I taught there in their school of computer
441:41 taught there in their school of computer science. After that, I went to Google
441:43 science. After that, I went to Google and I helped them scale their AI, cloud,
441:45 and I helped them scale their AI, cloud, and mobile practices internationally
441:47 and mobile practices internationally before starting a few ventureback
441:49 before starting a few ventureback startups. And in my last startup, I
441:51 startups. And in my last startup, I would work out of a weiwork. And I was
441:53 would work out of a weiwork. And I was sitting in this uh 33 Irving Weiwork. If
441:55 sitting in this uh 33 Irving Weiwork. If any of you are from New York, you you
441:56 any of you are from New York, you you might have worked out of that we work.
441:58 might have worked out of that we work. And they have these big tables and there
442:00 And they have these big tables and there were 12 of 12 of us kind of sitting
442:01 were 12 of 12 of us kind of sitting around. No one's talking. Everyone has
442:03 around. No one's talking. Everyone has their headphones in. And I look to my
442:05 their headphones in. And I look to my left and I see I see somebody with
442:08 left and I see I see somebody with Visual Studio Code open, right? I'm
442:09 Visual Studio Code open, right? I'm like, "Okay, I have a fellow engineer to
442:11 like, "Okay, I have a fellow engineer to my left." And I see that he was typing,
442:14 my left." And I see that he was typing, but I didn't see a chat window. This
442:16 but I didn't see a chat window. This person was typing into the code editor.
442:19 person was typing into the code editor. They were typing fo
442:22 They were typing fo like a caveman. This this poor person
442:24 like a caveman. This this poor person was typing like with their little
442:26 was typing like with their little chopstick fingers individual characters.
442:29 chopstick fingers individual characters. I I I couldn't believe it. On my
442:31 I I I couldn't believe it. On my computer, I had 45 agents. Three were
442:33 computer, I had 45 agents. Three were ordering me lunch. Two were writing
442:35 ordering me lunch. Two were writing code. One was doing research. Just
442:37 code. One was doing research. Just different worlds were happening on my
442:40 different worlds were happening on my computer versus this person's computer.
442:42 computer versus this person's computer. And I felt bad. I thought maybe we
442:43 And I felt bad. I thought maybe we should do a GoFundMe or something. But I
442:45 should do a GoFundMe or something. But I I I tried to look deeply at what is
442:48 I I tried to look deeply at what is actually causing this difference. Why am
442:51 actually causing this difference. Why am I using AI in the way that I am? And why
442:54 I using AI in the way that I am? And why is this person not?
442:56 is this person not? There are different ways that that
442:58 There are different ways that that people try AI and there are different
443:00 people try AI and there are different reasons why people don't use it. We've
443:02 reasons why people don't use it. We've all heard people who have tried it and
443:03 all heard people who have tried it and have said it's not as good as me. We've
443:05 have said it's not as good as me. We've all heard people who have not tried it
443:07 all heard people who have not tried it because they don't want to. But
443:08 because they don't want to. But regardless, my belief is that this is an
443:11 regardless, my belief is that this is an incentive issue. For me, I was a founder
443:14 incentive issue. For me, I was a founder and I wanted to squeak out every bit of
443:17 and I wanted to squeak out every bit of incremental value and and efficiency
443:20 incremental value and and efficiency that I could. And so I would sit on
443:22 that I could. And so I would sit on Twitter and LinkedIn and read blog posts
443:23 Twitter and LinkedIn and read blog posts and try to understand what is the
443:26 and try to understand what is the cutting edge in software engineering and
443:28 cutting edge in software engineering and what's going to give me the ability to
443:30 what's going to give me the ability to output more code higher quality faster.
443:34 output more code higher quality faster. And because of that, I was using all
443:36 And because of that, I was using all these all these different agents. But
443:38 these all these different agents. But this person probably worked at a
443:40 this person probably worked at a startup, probably had a base salary with
443:43 startup, probably had a base salary with an annual bonus and some equity. And
443:46 an annual bonus and some equity. And that was supposed to be the model that
443:48 that was supposed to be the model that incentivized people to be innovated, to
443:50 incentivized people to be innovated, to be innovative, and to work smarter and
443:52 be innovative, and to work smarter and faster and harder. But it wasn't
443:54 faster and harder. But it wasn't working. And so, in order to understand
443:57 working. And so, in order to understand how we got to where we are, I'm going to
443:58 how we got to where we are, I'm going to do a brief uh history of compensation.
444:02 do a brief uh history of compensation. And this is by no means accurate. I'm
444:05 And this is by no means accurate. I'm making a lot of things up here. It's all
444:07 making a lot of things up here. It's all illustrative. Okay. So, back in the day,
444:10 illustrative. Okay. So, back in the day, we had some cavemen who were writing
444:11 we had some cavemen who were writing code. We were we're uh probably
444:14 code. We were we're uh probably inscribing C in a in a tablet somewhere
444:17 inscribing C in a in a tablet somewhere and we were paying people hourly, right?
444:20 and we were paying people hourly, right? This makes sense. I look at somebody
444:21 This makes sense. I look at somebody sitting in a chair and I'm going to pay
444:23 sitting in a chair and I'm going to pay them some amount of dollars for some
444:25 them some amount of dollars for some amount of time. That makes sense for me
444:27 amount of time. That makes sense for me and it makes sense for the for the
444:28 and it makes sense for the for the engineer. But why is that broken? I
444:30 engineer. But why is that broken? I actually I want to hear from people. Why
444:32 actually I want to hear from people. Why is hourly broken?
444:38 >> It's slow output. >> No upside.
444:39 >> No upside. >> There's no upside. there's no reason to
444:41 >> There's no upside. there's no reason to work faster, right? And in fact, there's
444:43 work faster, right? And in fact, there's a disincentive to work faster. And so,
444:46 a disincentive to work faster. And so, what if I I notice this as the buyer of
444:48 what if I I notice this as the buyer of this technology and I say, "Okay, how
444:49 this technology and I say, "Okay, how long is it going to take you? It's going
444:50 long is it going to take you? It's going to take you five hours. Okay, so I'll
444:52 to take you five hours. Okay, so I'll pay you 500 bucks, right? Hourly $100.
444:55 pay you 500 bucks, right? Hourly $100. Multiply that by five." And then you as
444:57 Multiply that by five." And then you as the engineer, if you work faster, great.
444:59 the engineer, if you work faster, great. You get to keep the $500. And if you
445:01 You get to keep the $500. And if you work slower, that's on you. As
445:04 work slower, that's on you. As engineers, we're really, really bad at
445:05 engineers, we're really, really bad at estimating how long things are going to
445:07 estimating how long things are going to take. And so because of that, I'm not
445:10 take. And so because of that, I'm not going to say it's going to take five
445:10 going to say it's going to take five hours. I'm going to say it's going to
445:11 hours. I'm going to say it's going to take 15 hours, 20 hours, so that I have
445:14 take 15 hours, 20 hours, so that I have no downside. And so again, as the buyer,
445:16 no downside. And so again, as the buyer, I don't want to pay you based on the
445:19 I don't want to pay you based on the project.
445:20 project. So what if we hire people on salary and
445:22 So what if we hire people on salary and give them a bonus, right? Well, we in
445:25 give them a bonus, right? Well, we in the startup community know what happens
445:26 the startup community know what happens in that when when this is the case.
445:29 in that when when this is the case. People punch in at five, leave or nine,
445:31 People punch in at five, leave or nine, leave at five. And so I'm Larry Paige. I
445:35 leave at five. And so I'm Larry Paige. I notice this and I see why am I working
445:36 notice this and I see why am I working so hard at Google? Why am I putting my
445:39 so hard at Google? Why am I putting my blood, sweat, and tears into this? It's
445:40 blood, sweat, and tears into this? It's because I have some of the upside. I own
445:43 because I have some of the upside. I own the company, right? And so when we exit
445:45 the company, right? And so when we exit for for many, many dollars, I'm going to
445:47 for for many, many dollars, I'm going to see that. So what if I can share that
445:49 see that. So what if I can share that with my employees? And that's when
445:51 with my employees? And that's when equity comes in. And and this has
445:53 equity comes in. And and this has worked. This has worked for many many
445:55 worked. This has worked for many many years to incentivize employees. This is
445:58 years to incentivize employees. This is this is the foundation of the startup
446:00 this is the foundation of the startup community that we all know and are a
446:01 community that we all know and are a part of. It's incredible.
446:04 part of. It's incredible. But
446:05 But the not every company is Google. In
446:08 the not every company is Google. In fact, for every one Google, there are
446:10 fact, for every one Google, there are many many failures. And software
446:12 many many failures. And software engineers know this, right? For those
446:14 engineers know this, right? For those who want to take the risk, many will
446:16 who want to take the risk, many will just go to YC or or start their own
446:18 just go to YC or or start their own company. And for the ones who don't want
446:20 company. And for the ones who don't want the risk, they're opting for cash over
446:23 the risk, they're opting for cash over equity. Many of us who've hired
446:24 equity. Many of us who've hired engineers know that the cash is
446:27 engineers know that the cash is non-negotiable. Equity. Yeah, sure. I'll
446:29 non-negotiable. Equity. Yeah, sure. I'll take some upside.
446:32 take some upside. And so my contention is that this model
446:34 And so my contention is that this model needs to be reinvented in the age of AI.
446:37 needs to be reinvented in the age of AI. We need to directly incentivize people
446:40 We need to directly incentivize people to use these tools and to use them well
446:42 to use these tools and to use them well and to still maintain really high
446:45 and to still maintain really high quality standards of code. And so here's
446:48 quality standards of code. And so here's how it works for us. So we basically
446:50 how it works for us. So we basically just to take a step back, we do two
446:51 just to take a step back, we do two types of work at 10X. One is road
446:53 types of work at 10X. One is road mapping and one is execution. So
446:56 mapping and one is execution. So companies come to us and they say, "Hey,
446:58 companies come to us and they say, "Hey, we want AI." That's generally the
447:01 we want AI." That's generally the request. Sometimes it's more specific.
447:03 request. Sometimes it's more specific. It's like, hey, I want my customer
447:04 It's like, hey, I want my customer service team to have 10% more uh output
447:08 service team to have 10% more uh output using AI, right? But but generally they
447:09 using AI, right? But but generally they come to us with a request. We do a bunch
447:11 come to us with a request. We do a bunch of studying and learning and then we
447:13 of studying and learning and then we output a road map and based on that road
447:16 output a road map and based on that road map, they can take it and work on it on
447:19 map, they can take it and work on it on their own or we can do it. For a lot of
447:21 their own or we can do it. For a lot of things, we're taking off-the-shelf
447:22 things, we're taking off-the-shelf tools, but a lot of what we do is custom
447:25 tools, but a lot of what we do is custom builds and that's where the story point
447:26 builds and that's where the story point model comes in. So we will build a
447:29 model comes in. So we will build a roadmap for a lot of our clients. But
447:31 roadmap for a lot of our clients. But once they see that then they're putting
447:33 once they see that then they're putting in requests on their own as well. And we
447:35 in requests on their own as well. And we have two roles in the company that are
447:36 have two roles in the company that are client facing. One is the strategist and
447:38 client facing. One is the strategist and the other is the AI engineer. The
447:40 the other is the AI engineer. The strategists are all are mostly
447:43 strategists are all are mostly technical. And so we've have we have
447:45 technical. And so we've have we have former PMs, we have former engineers.
447:48 former PMs, we have former engineers. They are doing PM type work, consulting
447:50 They are doing PM type work, consulting type work. They're the ones that are
447:52 type work. They're the ones that are taking the product requirements and
447:54 taking the product requirements and distilling that down with the client.
447:57 distilling that down with the client. Then they hand that over to the engineer
447:59 Then they hand that over to the engineer and the engineer puts together an
448:00 and the engineer puts together an architecture design document. They spend
448:03 architecture design document. They spend a lot of time doing that. In fact, that
448:05 a lot of time doing that. In fact, that is where most of our engineering time
448:07 is where most of our engineering time goes. Then they write code and they
448:11 goes. Then they write code and they start implementing that that
448:13 start implementing that that architecture design document includes
448:14 architecture design document includes tickets and each ticket is graded on
448:17 tickets and each ticket is graded on some number of story points. This is a
448:18 some number of story points. This is a very traditional method of doing work,
448:20 very traditional method of doing work, right?
448:22 right? And when that ticket is accepted, the
448:25 And when that ticket is accepted, the engineer gets paid a a fee per story
448:27 engineer gets paid a a fee per story point that they complete. Our engineers
448:29 point that they complete. Our engineers have a flat base that they're paid and
448:31 have a flat base that they're paid and then every quarter we round up based on
448:33 then every quarter we round up based on the story points that they've completed.
448:35 the story points that they've completed. And again, this has led to us being able
448:36 And again, this has led to us being able to hire incredible people, but we've
448:38 to hire incredible people, but we've also been able to do incredible work.
448:40 also been able to do incredible work. So, I'm going to walk through a couple
448:41 So, I'm going to walk through a couple of projects that we've done. So, this is
448:43 of projects that we've done. So, this is one. This is a billboard company.
448:46 one. This is a billboard company. If you go to Times Square right now,
448:47 If you go to Times Square right now, you'll see some billboards that they've
448:49 you'll see some billboards that they've sold that inventory for.
448:51 sold that inventory for. They sell in two ways. One is you can
448:53 They sell in two ways. One is you can call them up traditional sales, you can
448:55 call them up traditional sales, you can buy that inventory, but the other is
448:57 buy that inventory, but the other is they have like an Uber for billboards
448:59 they have like an Uber for billboards type of product where you can go online,
449:01 type of product where you can go online, you can upload a PNG, you can choose
449:03 you can upload a PNG, you can choose where you want this to run and for how
449:05 where you want this to run and for how long, similar to like a Facebook or
449:07 long, similar to like a Facebook or Google ad. It's very similar to that
449:09 Google ad. It's very similar to that experience. And they came to us and they
449:11 experience. And they came to us and they said, "Hey, we think that there's some
449:12 said, "Hey, we think that there's some opportunities for AI in our product." We
449:14 opportunities for AI in our product." We did an analysis and we found a few. One
449:16 did an analysis and we found a few. One of them is this. We found that when an
449:19 of them is this. We found that when an image is uploaded to their system, it
449:21 image is uploaded to their system, it has to go through two rounds of
449:23 has to go through two rounds of moderation. One is internal to the
449:24 moderation. One is internal to the company and the other is with the
449:26 company and the other is with the billboard owner. Internal to their
449:29 billboard owner. Internal to their company, they're spending money on that
449:31 company, they're spending money on that to actually hire the people to do that
449:33 to actually hire the people to do that and there's a lot of inaccuracy
449:36 and there's a lot of inaccuracy and it takes a lot of time. So that
449:38 and it takes a lot of time. So that costs them money and it costs them
449:40 costs them money and it costs them revenue because every moment that the
449:42 revenue because every moment that the billboard is not running, they're not
449:44 billboard is not running, they're not making money. And so we found what if we
449:46 making money. And so we found what if we could build an AI model that can
449:48 could build an AI model that can actually do this moderation for them. We
449:50 actually do this moderation for them. We scoped that out. We built the
449:52 scoped that out. We built the architecture uh design doc. We broke it
449:54 architecture uh design doc. We broke it down into tickets and we built this for
449:55 down into tickets and we built this for them. We did it in two weeks and we got
449:57 them. We did it in two weeks and we got to 96% accuracy when compared to the
449:59 to 96% accuracy when compared to the human moderator. We've done a lot of
450:01 human moderator. We've done a lot of other projects with this company as
450:02 other projects with this company as well. This is another company. This is
450:05 well. This is another company. This is they work with retailers all around the
450:07 they work with retailers all around the world and currently they have devices in
450:10 world and currently they have devices in these retailers and they're low power
450:12 these retailers and they're low power devices. And so because of this, they're
450:14 devices. And so because of this, they're able to run one AI model on device. And
450:18 able to run one AI model on device. And what this model does is does heat
450:19 what this model does is does heat mapping. So imagine there's a camera in
450:20 mapping. So imagine there's a camera in this room looks down and it can
450:22 this room looks down and it can basically generate a heat map of where
450:24 basically generate a heat map of where the traffic is throughout the day. And
450:26 the traffic is throughout the day. And for retailers, of course, this is very,
450:27 for retailers, of course, this is very, very useful. But there's other things
450:30 very useful. But there's other things you can do too, right? If we just sit
450:32 you can do too, right? If we just sit here for a few minutes, we can probably
450:33 here for a few minutes, we can probably come up with a lot of ideas of if you
450:35 come up with a lot of ideas of if you have a camera with a chip, you can make
450:37 have a camera with a chip, you can make a lot of money from that. You can show
450:39 a lot of money from that. You can show really useful information. And so that's
450:41 really useful information. And so that's what we did. We we came up with what are
450:43 what we did. We we came up with what are some of the things that we could do with
450:45 some of the things that we could do with this? If you put a little bit little bit
450:47 this? If you put a little bit little bit more power in that chip, if you make the
450:49 more power in that chip, if you make the models, if you quantize them so they can
450:51 models, if you quantize them so they can run in parallel, what could you do? And
450:54 run in parallel, what could you do? And so we gave them this report and then we
450:56 so we gave them this report and then we built them five models that can run in
450:57 built them five models that can run in parallel. It does everything from heat
450:59 parallel. It does everything from heat mapping to Q detection to theft
451:00 mapping to Q detection to theft detection and more. And again, we start
451:03 detection and more. And again, we start with the product requirement stock. We
451:05 with the product requirement stock. We break this down into architecture. Then
451:06 break this down into architecture. Then we build it and then we pay engineers
451:08 we build it and then we pay engineers based on the output.
451:14 This is the big question. What are the risks? Right? I just talked about
451:16 risks? Right? I just talked about dandelions and rainbows, right? Uh so I
451:20 dandelions and rainbows, right? Uh so I promised you that my goal is not to
451:21 promised you that my goal is not to convince you to do this. And part of
451:23 convince you to do this. And part of this is showing you what the potential
451:25 this is showing you what the potential risks are. These are a few that come up.
451:28 risks are. These are a few that come up. One is what if an engineer inflates the
451:31 One is what if an engineer inflates the story points, right? What if an engineer
451:33 story points, right? What if an engineer says, "Okay, you want me to add a
451:34 says, "Okay, you want me to add a button? 45 story points." Right?
451:38 button? 45 story points." Right? What if an engineer rushes and quality
451:40 What if an engineer rushes and quality drops? You're saying that it took two
451:42 drops? You're saying that it took two weeks to do that? Well, was it good? Did
451:44 weeks to do that? Well, was it good? Did it work?
451:47 it work? And what if engineers get sharp elbowed?
451:50 And what if engineers get sharp elbowed? I started this by saying that we
451:51 I started this by saying that we compensate engineers like salespeople.
451:54 compensate engineers like salespeople. It's not a it's not a culture that we
451:56 It's not a it's not a culture that we necessarily want to emulate in software
451:58 necessarily want to emulate in software engineering, right? So, how do we how do
452:00 engineering, right? So, how do we how do we uh make sure that that's not
452:01 we uh make sure that that's not happening?
452:04 happening? First of all, I mentioned that we have
452:05 First of all, I mentioned that we have two different roles and we compensate
452:07 two different roles and we compensate like a counterbalance. So strategists
452:09 like a counterbalance. So strategists are compensated based on NR which really
452:12 are compensated based on NR which really is like customer happiness
452:14 is like customer happiness and every single ticket has to be
452:16 and every single ticket has to be approved internally with multiple rounds
452:18 approved internally with multiple rounds of QA of which the strategist is
452:20 of QA of which the strategist is involved but also by the client and so
452:22 involved but also by the client and so there's a counterbalance to every single
452:24 there's a counterbalance to every single ticket that is delivered.
452:28 ticket that is delivered. Uh, I skipped to the second one. For the
452:30 Uh, I skipped to the second one. For the first one, inflating story points. The
452:32 first one, inflating story points. The strategists are the ones who scope it.
452:33 strategists are the ones who scope it. And again, we have to review all of
452:35 And again, we have to review all of that. And for the third, how do you make
452:38 that. And for the third, how do you make sure that all of this is correct? And
452:39 sure that all of this is correct? And how do you make sure that there's no
452:40 how do you make sure that there's no sharp elbows? How do you make sure that
452:42 sharp elbows? How do you make sure that everybody is happy and the dandelions
452:43 everybody is happy and the dandelions and rainbows are continue throughout
452:45 and rainbows are continue throughout this parade of joy? Well, you have to
452:47 this parade of joy? Well, you have to hire the right people. And this is what
452:50 hire the right people. And this is what I tell everybody.
452:52 I tell everybody. We make hiring incredibly difficult for
452:54 We make hiring incredibly difficult for ourselves so that everything else is
452:56 ourselves so that everything else is easy. And that is a principle that we
452:58 easy. And that is a principle that we all know and we all stand true to. And
453:00 all know and we all stand true to. And this is incredibly important with AI. My
453:02 this is incredibly important with AI. My co-founder, Alex, always says, "AI makes
453:04 co-founder, Alex, always says, "AI makes people look like one of those crazy
453:06 people look like one of those crazy mirrors where any one of your
453:08 mirrors where any one of your attributes, it makes it 10 times
453:09 attributes, it makes it 10 times larger." If you're a great engineer, AI
453:13 larger." If you're a great engineer, AI makes you great. If you're not, it makes
453:16 makes you great. If you're not, it makes you sloppier. And this is the case with
453:18 you sloppier. And this is the case with all of these things. You have to start
453:19 all of these things. You have to start with hiring.
453:21 with hiring. Our belief is that AI gives people
453:24 Our belief is that AI gives people superpowers and it makes all of us
453:26 superpowers and it makes all of us smarter, faster, and better at what we
453:27 smarter, faster, and better at what we do. But my belief is that the current
453:29 do. But my belief is that the current way that we compensate people is
453:31 way that we compensate people is actually holding them back. And I would
453:34 actually holding them back. And I would invite you to think about how can you
453:35 invite you to think about how can you compensate people on your team
453:37 compensate people on your team differently, whether it's software
453:38 differently, whether it's software engineering or anything else. If you
453:41 engineering or anything else. If you want to unlock your employees potential,
453:43 want to unlock your employees potential, feel free to reach out at arman@10x.co.
453:45 feel free to reach out at arman@10x.co. Thank you. [applause]
453:48 Thank you. [applause] Our
453:58 next presenter [music] is deputy CTO at DX, the engineering intelligence
454:01 DX, the engineering intelligence platform designed by leading researchers
454:04 platform designed by leading researchers speaking about effective leadership in
454:06 speaking about effective leadership in AI enhanced organizations.
454:09 AI enhanced organizations. Please join me in welcoming to the stage
454:12 Please join me in welcoming to the stage Justin Rio.
454:24 >> Hello. Thanks for joining me in one of the
454:25 Thanks for joining me in one of the later day sessions. Looks like we we we
454:26 later day sessions. Looks like we we we kept a lot of people here. This is a
454:28 kept a lot of people here. This is a nice full room. I'm great to see it.
454:29 nice full room. I'm great to see it. We're going to go through a lot of
454:30 We're going to go through a lot of content in a short amount of time. So,
454:32 content in a short amount of time. So, I'm going to get right into it. If you
454:33 I'm going to get right into it. If you want to get deeper into any of this
454:35 want to get deeper into any of this stuff, we have published this uh AI
454:37 stuff, we have published this uh AI strategy playbook for senior executives.
454:40 strategy playbook for senior executives. And uh a lot of the content that I'm
454:41 And uh a lot of the content that I'm going to go through, I'm not going to
454:42 going to go through, I'm not going to have time to get quite as deep, but this
454:44 have time to get quite as deep, but this is just a nice PDF copy that you can
454:46 is just a nice PDF copy that you can come and refer to later. If you missed
454:47 come and refer to later. If you missed this QR code, don't worry. I'll show it
454:49 this QR code, don't worry. I'll show it again uh at the end. So, what is the
454:52 again uh at the end. So, what is the current impact of Genai?
454:54 current impact of Genai? Nobody knows, right? We've got Google on
454:56 Nobody knows, right? We've got Google on the one hand telling us that everyone's
454:58 the one hand telling us that everyone's 10% more productive. That's interesting.
455:00 10% more productive. That's interesting. Now, they're Google. They were already
455:01 Now, they're Google. They were already pretty productive to begin with. But we
455:02 pretty productive to begin with. But we have this sort of now infamous meter MER
455:06 have this sort of now infamous meter MER study which has some flaws in the way
455:07 study which has some flaws in the way that study was put together that showed
455:09 that study was put together that showed actually a 19% decrease in productivity
455:12 actually a 19% decrease in productivity using codec assistance. So there's a lot
455:14 using codec assistance. So there's a lot of volatility, a lot of variability. Uh
455:16 of volatility, a lot of variability. Uh what was really interesting about this
455:17 what was really interesting about this study even though I I mentioned there
455:19 study even though I I mentioned there were some flaws. Um but every engineer
455:21 were some flaws. Um but every engineer that took part in this study felt more
455:25 that took part in this study felt more productive but then the data actually
455:26 productive but then the data actually bore out that they were less productive.
455:29 bore out that they were less productive. Kind of interesting, right? we've got
455:30 Kind of interesting, right? we've got this induced flow uh that makes us feel
455:32 this induced flow uh that makes us feel really good about what we're doing. So,
455:34 really good about what we're doing. So, we need to address this. Dora has put
455:36 we need to address this. Dora has put out some really good research on this
455:38 out some really good research on this too. But this is based on industry
455:40 too. But this is based on industry averages. This is impact based on what
455:42 averages. This is impact based on what do we look at when we see a large sample
455:44 do we look at when we see a large sample and an average of how certain factors
455:46 and an average of how certain factors are being impacted by in this case 25%
455:49 are being impacted by in this case 25% increase in AI adoption. We see these
455:52 increase in AI adoption. We see these modest but positive leaning indicators.
455:55 modest but positive leaning indicators. 7.5% increase in documentation quality
455:59 7.5% increase in documentation quality and uh increase in code quality by about
456:01 and uh increase in code quality by about 3.4%. At least that's not leaning in the
456:03 3.4%. At least that's not leaning in the other direction, right? And when we
456:05 other direction, right? And when we started digging through some of DX's
456:07 started digging through some of DX's data, we have, you know, we're the
456:08 data, we have, you know, we're the developer productivity measurement
456:09 developer productivity measurement company. We have lots of aggregate data
456:11 company. We have lots of aggregate data that we can look at with this. We found
456:13 that we can look at with this. We found the same thing. When we looked at
456:15 the same thing. When we looked at averages, we see about a 2.6% 6%
456:17 averages, we see about a 2.6% 6% increase in overall uh change
456:19 increase in overall uh change confidence, which is a a percentage of
456:22 confidence, which is a a percentage of people who answered positively that they
456:24 people who answered positively that they feel confident in the changes that
456:25 feel confident in the changes that they're putting into production. Uh
456:27 they're putting into production. Uh similar positive leaning average when we
456:30 similar positive leaning average when we looked at code maintainability, another
456:31 looked at code maintainability, another qualitative metric, a1% reduction in
456:35 qualitative metric, a1% reduction in change failure rate. uh which when you
456:37 change failure rate. uh which when you think about the industry benchmark being
456:39 think about the industry benchmark being 4% it's not insignificant
456:41 4% it's not insignificant but this is not the full story because
456:43 but this is not the full story because this is what we saw when we broke the
456:45 this is what we saw when we broke the same studies down per company. Every
456:48 same studies down per company. Every company here is a every every bar
456:50 company here is a every every bar represents a company right we have some
456:53 represents a company right we have some that are seeing 20% increases in change
456:55 that are seeing 20% increases in change confidence while others are seeing 20%
456:58 confidence while others are seeing 20% decreases. We're seeing extreme
456:59 decreases. We're seeing extreme volatility which is why these averages
457:02 volatility which is why these averages look so innocuous but they're belying
457:04 look so innocuous but they're belying the greater story of variability. See
457:07 the greater story of variability. See the same thing with code
457:08 the same thing with code maintainability.
457:09 maintainability. The same thing with change failure rate.
457:11 The same thing with change failure rate. So this is a 2% increase in change
457:14 So this is a 2% increase in change failure rate up here at the top. Again
457:16 failure rate up here at the top. Again with an industry benchmark of 4%. That
457:19 with an industry benchmark of 4%. That means shipping as much as 50% more
457:22 means shipping as much as 50% more defects than we were shipping before.
457:23 defects than we were shipping before. Right? We want to make sure we're on the
457:25 Right? We want to make sure we're on the lower end of this. But how? Like what
457:26 lower end of this. But how? Like what should we be doing? Well, we found some
457:30 should we be doing? Well, we found some patterns here. We see that some
457:32 patterns here. We see that some organizations are seeing positive
457:33 organizations are seeing positive impacts to KPIs, but others are
457:35 impacts to KPIs, but others are struggling with adoption and even seeing
457:36 struggling with adoption and even seeing some of these negative impacts. Top down
457:39 some of these negative impacts. Top down mandates are not working, right? Driving
457:41 mandates are not working, right? Driving towards, oh, we must have 100% adoption
457:43 towards, oh, we must have 100% adoption of AI. Great, I will update my read my
457:45 of AI. Great, I will update my read my file every morning and I will be
457:47 file every morning and I will be compliant, right? We're not actually
457:48 compliant, right? We're not actually moving the needle anywhere when we do
457:49 moving the needle anywhere when we do that. We also find that lack of
457:52 that. We also find that lack of education and enablement uh has a big
457:55 education and enablement uh has a big impact on sort of negatively impacting
457:57 impact on sort of negatively impacting this. Some organizations just turn on
457:59 this. Some organizations just turn on the tech and expect it to just start
458:01 the tech and expect it to just start working and everybody to know the best
458:02 working and everybody to know the best ways to use it. Uh and a difficulty
458:05 ways to use it. Uh and a difficulty measuring the impact or even knowing
458:07 measuring the impact or even knowing what we should be measuring like what
458:08 what we should be measuring like what metrics would should we be looking at
458:10 metrics would should we be looking at you know does utilization really tell us
458:12 you know does utilization really tell us much about the full story of genai
458:13 much about the full story of genai impact. This is another graph from Dora.
458:17 impact. This is another graph from Dora. uh this is a basian uh posterior
458:20 uh this is a basian uh posterior distribution which is an interesting way
458:21 distribution which is an interesting way of representing data. Basically you want
458:24 of representing data. Basically you want your mass to be on the yellow side of
458:26 your mass to be on the yellow side of this line uh the the uh the right side
458:29 this line uh the the uh the right side of this line for the audience. Yeah. And
458:31 of this line for the audience. Yeah. And you want a sharp peak which is telling
458:33 you want a sharp peak which is telling you that we're pretty confident that
458:34 you that we're pretty confident that this initiative will have this impact.
458:36 this initiative will have this impact. And if we look at some of the topline
458:38 And if we look at some of the topline initiatives here, these are things like
458:40 initiatives here, these are things like clear AI policies. All right, we want to
458:42 clear AI policies. All right, we want to make sure we have that. We want time to
458:44 make sure we have that. We want time to learn. Not just giving people materials,
458:46 learn. Not just giving people materials, but actually giving them space to
458:47 but actually giving them space to experiment, right? Um, and so these
458:50 experiment, right? Um, and so these types of factors are the ones that seem
458:52 types of factors are the ones that seem to be moving the needle the most. So,
458:54 to be moving the needle the most. So, we're going to go over some quick tips
458:56 we're going to go over some quick tips on how we can do all of these things.
458:57 on how we can do all of these things. And again, the guide will go deeper into
458:59 And again, the guide will go deeper into this. We want to integrate across the
459:01 this. We want to integrate across the SDLC. All right. For most organizations,
459:04 SDLC. All right. For most organizations, writing code has never been the
459:06 writing code has never been the bottleneck, right? We can in uh we can
459:09 bottleneck, right? We can in uh we can increase productivity a bit by helping
459:10 increase productivity a bit by helping with code completion, but our our
459:12 with code completion, but our our biggest bottlenecks are elsewhere within
459:14 biggest bottlenecks are elsewhere within the SDLC. There's a lot more to creating
459:16 the SDLC. There's a lot more to creating software than just writing code. We want
459:18 software than just writing code. We want to unblock usage. We can't just say,
459:19 to unblock usage. We can't just say, well, we're worried about data
459:21 well, we're worried about data xfiltration, so we can't try this thing
459:22 xfiltration, so we can't try this thing like no, get creative about it. We've
459:24 like no, get creative about it. We've got really good infrastructure out there
459:25 got really good infrastructure out there now like bedrock and fireworks AI that
459:28 now like bedrock and fireworks AI that can let us run powerful models in safe
459:31 can let us run powerful models in safe spaces. We have to have open discussions
459:34 spaces. We have to have open discussions about these metrics. We need to
459:35 about these metrics. We need to evangelize the wins and we need to let
459:37 evangelize the wins and we need to let our engineers know why we're gathering
459:40 our engineers know why we're gathering metrics and data. What is it that we're
459:42 metrics and data. What is it that we're trying to improve? We have to reduce the
459:44 trying to improve? We have to reduce the fear of AI, right? We have to make sure
459:46 fear of AI, right? We have to make sure that people understand that this is not
459:48 that people understand that this is not a technology that is ready to replace
459:50 a technology that is ready to replace engineers. This is a a technology that's
459:53 engineers. This is a a technology that's really good at augmenting engineers and
459:55 really good at augmenting engineers and increasing the throughput of our
459:57 increasing the throughput of our business. We have to establish better
459:59 business. We have to establish better compliance and trust and we need to tie
460:01 compliance and trust and we need to tie this stuff to employee success. These
460:03 this stuff to employee success. These are new skill sets. AI is not coming for
460:05 are new skill sets. AI is not coming for your job, but somebody really good at AI
460:08 your job, but somebody really good at AI might take your job. And so, as leaders,
460:11 might take your job. And so, as leaders, we have the opportunity to help our
460:12 we have the opportunity to help our employees become more successful with
460:14 employees become more successful with this technology. So, how do we reduce
460:16 this technology. So, how do we reduce the fear? Well, first of all, why do we
460:18 the fear? Well, first of all, why do we need to do this? Well, there's a lot of
460:19 need to do this? Well, there's a lot of good reasons, but I love to point to
460:22 good reasons, but I love to point to Google's project Aristotle. This was a
460:24 Google's project Aristotle. This was a 2012 study where Google wanted to figure
460:26 2012 study where Google wanted to figure out what are the characteristics of
460:28 out what are the characteristics of highly performant teams. uh they thought
460:30 highly performant teams. uh they thought that the recipe was just going to be
460:31 that the recipe was just going to be what Google had this combination of high
460:33 what Google had this combination of high performers, experienced managers and
460:35 performers, experienced managers and basically unlimited resources and they
460:37 basically unlimited resources and they were dead wrong. Overwhelmingly the
460:40 were dead wrong. Overwhelmingly the biggest indicator of productivity was
460:42 biggest indicator of productivity was psychological safety. Okay. And so that
460:44 psychological safety. Okay. And so that very much applies now. We also have data
460:47 very much applies now. We also have data like this is SWEBench. I'm sure a lot of
460:49 like this is SWEBench. I'm sure a lot of you have seen this and there are some
460:50 you have seen this and there are some impressive benchmarks that the agents
460:53 impressive benchmarks that the agents can do like a third of the things
460:55 can do like a third of the things they're asked to do without any human
460:57 they're asked to do without any human intervention. That means that they're
460:59 intervention. That means that they're not able to do twothirds of them. Right?
461:02 not able to do twothirds of them. Right? Again, we are augmenting. We're not
461:03 Again, we are augmenting. We're not replacing. We're not ready. We may never
461:05 replacing. We're not ready. We may never be ready. So, we need to be very
461:06 be ready. So, we need to be very transparent with what we're doing. We
461:08 transparent with what we're doing. We need to set very clear intents. Why, you
461:10 need to set very clear intents. Why, you know, are we uh using this to to
461:13 know, are we uh using this to to augment, not to replace. We need to be
461:15 augment, not to replace. We need to be proactive in the way that we communicate
461:17 proactive in the way that we communicate that and not just wait for people to get
461:19 that and not just wait for people to get upset and possibly scared. We need to
461:21 upset and possibly scared. We need to say, "No, we are here to help you to
461:23 say, "No, we are here to help you to give you a better developer experience
461:25 give you a better developer experience and to increase the throughput of the
461:27 and to increase the throughput of the business." And again we have to have
461:28 business." And again we have to have these discussions about metrics. Now
461:31 these discussions about metrics. Now what metrics what should we be looking
461:32 what metrics what should we be looking at? Well DX again developer experience
461:35 at? Well DX again developer experience and productivity measurement company. Um
461:38 and productivity measurement company. Um there are two sort of classes of metrics
461:40 there are two sort of classes of metrics that we can be looking at really two
461:42 that we can be looking at really two levers that matter here and that's speed
461:44 levers that matter here and that's speed and quality. Right? We want to increase
461:46 and quality. Right? We want to increase PR throughput. We want to increase our
461:48 PR throughput. We want to increase our velocity but not by just creating a
461:50 velocity but not by just creating a bunch of slop that's going to give us a
461:52 bunch of slop that's going to give us a bunch of tech debt later that we're
461:53 bunch of tech debt later that we're going to have to deal with and we just
461:54 going to have to deal with and we just kick the bottleneck down the road if we
461:56 kick the bottleneck down the road if we do that. Right? So we want to be looking
461:57 do that. Right? So we want to be looking at things like change failure rate, our
462:00 at things like change failure rate, our overall perception of quality, change
462:01 overall perception of quality, change confidence, maintainability.
462:04 confidence, maintainability. And we have three types of metrics that
462:06 And we have three types of metrics that we can be looking at here. We have our
462:08 we can be looking at here. We have our telemetry metrics. These are the things
462:10 telemetry metrics. These are the things coming out of the API. And they're good
462:12 coming out of the API. And they're good for some stuff, but they're not always
462:14 for some stuff, but they're not always accurate, right? We know like accept
462:16 accurate, right? We know like accept versus suggest was kind of like all the
462:19 versus suggest was kind of like all the rage until we realize that engineers
462:20 rage until we realize that engineers need to click accept in the IDE in order
462:23 need to click accept in the IDE in order for the API to know about it. even if
462:25 for the API to know about it. even if they do click accept, who's to say they
462:26 they do click accept, who's to say they didn't just go back and rewrite every
462:28 didn't just go back and rewrite every line that was suggested, right? So
462:30 line that was suggested, right? So that's providing us some context, but we
462:32 that's providing us some context, but we also need to do some experience
462:33 also need to do some experience sampling. We need to like for instance
462:35 sampling. We need to like for instance add a new field to a PR form that says I
462:38 add a new field to a PR form that says I used AI to generate this PR or I enjoyed
462:41 used AI to generate this PR or I enjoyed using AI to generate this PR and get
462:43 using AI to generate this PR and get some data that way. And then
462:45 some data that way. And then self-reported data or survey data. We
462:47 self-reported data or survey data. We are big on surveys, but let me
462:49 are big on surveys, but let me underscore we're big on effective
462:50 underscore we're big on effective surveys. 90% plus participation rates
462:54 surveys. 90% plus participation rates engineered against questions that treat
462:56 engineered against questions that treat developer experience as a systems
462:59 developer experience as a systems problem not a people problem because
463:01 problem not a people problem because that's what it is W. Edwards Deming 90
463:03 that's what it is W. Edwards Deming 90 to 95% of the productivity output of an
463:06 to 95% of the productivity output of an organization is determined by the system
463:08 organization is determined by the system and not the worker. Okay, so
463:11 and not the worker. Okay, so foundational developer experience and
463:13 foundational developer experience and developer productivity metrics still
463:15 developer productivity metrics still matter the most. Right? Our AI metrics
463:17 matter the most. Right? Our AI metrics like utilization and things are telling
463:19 like utilization and things are telling us what's happening with the tech, but
463:21 us what's happening with the tech, but these core metrics that we've been able
463:23 these core metrics that we've been able to trust are telling us whether these
463:24 to trust are telling us whether these initiatives are actually working, right?
463:27 initiatives are actually working, right? Are we actually moving the needle and
463:28 Are we actually moving the needle and having the outcomes that we want to see?
463:31 having the outcomes that we want to see? So top companies are looking at
463:32 So top companies are looking at different things, right? We are seeing
463:34 different things, right? We are seeing like adoption metrics coming out of
463:36 like adoption metrics coming out of Microsoft. They've also got this great
463:37 Microsoft. They've also got this great metric called a bad developer day. I'm
463:40 metric called a bad developer day. I'm not going to go into it, but there's a
463:41 not going to go into it, but there's a really good white paper that shows like
463:43 really good white paper that shows like all the different telemetry that they
463:44 all the different telemetry that they can look at to determine what makes a
463:46 can look at to determine what makes a bad developer day. Dropbox is looking at
463:48 bad developer day. Dropbox is looking at similar stuff. Adoption like weekly
463:51 similar stuff. Adoption like weekly active users, daily active users, that
463:52 active users, daily active users, that sort of thing, but also looking at
463:54 sort of thing, but also looking at quality metrics like change failure
463:55 quality metrics like change failure rate. And booking is looking at similar
463:57 rate. And booking is looking at similar stuff as well. And so we built a
463:59 stuff as well. And so we built a framework around this. We were first to
464:01 framework around this. We were first to market with what we call our DXAI
464:03 market with what we call our DXAI measurement framework. And this is very
464:05 measurement framework. And this is very much inspired by things like Dora space
464:07 much inspired by things like Dora space framework, DevX just like our core four
464:09 framework, DevX just like our core four metric set which you can ask me about
464:11 metric set which you can ask me about later. Uh and we take these metrics and
464:14 later. Uh and we take these metrics and we uh normalize them into these three
464:17 we uh normalize them into these three dimensions of utilization, impact and
464:20 dimensions of utilization, impact and cost. And you can kind of think about
464:22 cost. And you can kind of think about this as a maturity curve too. A lot of
464:24 this as a maturity curve too. A lot of people start just figuring out okay
464:26 people start just figuring out okay what's happening? who's using the tech,
464:28 what's happening? who's using the tech, what's the percentage of pull requests
464:30 what's the percentage of pull requests that we're getting that are AI assisted
464:31 that we're getting that are AI assisted maybe through experience sampling? How
464:33 maybe through experience sampling? How many tasks are being assigned to agents?
464:35 many tasks are being assigned to agents? But then we can mature that perspective
464:37 But then we can mature that perspective a little bit and we can correlate that
464:39 a little bit and we can correlate that utilization to impact. What is this
464:41 utilization to impact. What is this actually doing to velocity? What is this
464:44 actually doing to velocity? What is this actually doing to quality? And this is
464:46 actually doing to quality? And this is when we start getting more mature in our
464:47 when we start getting more mature in our picture of our impact. And then finally,
464:50 picture of our impact. And then finally, cost. Although I like to joke that we're
464:51 cost. Although I like to joke that we're 15 years past the last hype cycle, which
464:53 15 years past the last hype cycle, which was cloud, and we still have new
464:55 was cloud, and we still have new companies spinning up that are teaching
464:56 companies spinning up that are teaching us how to understand and optimize our
464:58 us how to understand and optimize our cloud costs. So, we will see if we get
465:00 cloud costs. So, we will see if we get there. Although, I also hear horror
465:01 there. Although, I also hear horror stories about people burning through
465:02 stories about people burning through 2,000 tokens at $2,000 worth of tokens a
465:05 2,000 tokens at $2,000 worth of tokens a day. So, we probably do need to hit that
465:07 day. So, we probably do need to hit that as well. What about compliance and
465:09 as well. What about compliance and trust? What can we do to ensure that the
465:11 trust? What can we do to ensure that the output uh that that's being generated is
465:13 output uh that that's being generated is something that can be trusted by our
465:15 something that can be trusted by our engineers? We have a lot of levers to
465:17 engineers? We have a lot of levers to pull here, but one of the ones that I'd
465:19 pull here, but one of the ones that I'd like to talk about is setting up a
465:21 like to talk about is setting up a feedback loop for our system prompts. So
465:24 feedback loop for our system prompts. So these could be called system prompts,
465:26 these could be called system prompts, cursor rules, agent markdown. Pretty
465:28 cursor rules, agent markdown. Pretty much all of the mainstream solutions
465:30 much all of the mainstream solutions have something like this where you can
465:32 have something like this where you can go and provide a set of rules uh to
465:34 go and provide a set of rules uh to control how these models behave. Uh and
465:37 control how these models behave. Uh and I won't get too much into the technical
465:39 I won't get too much into the technical details here. We have an example where
465:40 details here. We have an example where like the uh models have been providing
465:42 like the uh models have been providing outdated Spring Boot uh stuff. We want
465:45 outdated Spring Boot uh stuff. We want Spring Boot 3. It's It's been sending us
465:47 Spring Boot 3. It's It's been sending us Spring Boot 2 stuff. The big takeaway
465:49 Spring Boot 2 stuff. The big takeaway here is to have the feedback loop. Have
465:51 here is to have the feedback loop. Have a gatekeeper, right? Have somebody or a
465:53 a gatekeeper, right? Have somebody or a group in the organization that can
465:55 group in the organization that can receive this feedback that understand
465:57 receive this feedback that understand how to maintain and continuously improve
465:59 how to maintain and continuously improve these system prompts, right? And that
466:01 these system prompts, right? And that way we're always maintaining the way
466:03 way we're always maintaining the way that these assistants or models or
466:05 that these assistants or models or agents affect the whole business. It
466:07 agents affect the whole business. It also pays to understand the way that uh
466:10 also pays to understand the way that uh temperature works, especially when we're
466:12 temperature works, especially when we're building agents, right? we do have some
466:14 building agents, right? we do have some control over the determinism and
466:16 control over the determinism and nondeterminism of these models. Uh again
466:18 nondeterminism of these models. Uh again like when a model is predicting a next
466:20 like when a model is predicting a next token, it doesn't just have like one
466:22 token, it doesn't just have like one token. It has a matrix of tokens and
466:24 token. It has a matrix of tokens and those are associated with a certain
466:25 those are associated with a certain probability of that being like the right
466:27 probability of that being like the right token. And so we have this setting
466:29 token. And so we have this setting called temperature, which is heat, which
466:31 called temperature, which is heat, which is entropy, which is randomness that can
466:33 is entropy, which is randomness that can control the amount of randomness
466:34 control the amount of randomness involved in actually picking that token.
466:36 involved in actually picking that token. This is sometimes called increasing the
466:37 This is sometimes called increasing the creativity of the model. And it's a
466:40 creativity of the model. And it's a number between 0 and one. For those
466:41 number between 0 and one. For those reasons I just mentioned, don't use zero
466:43 reasons I just mentioned, don't use zero or don't use one. Weird things will
466:45 or don't use one. Weird things will happen. But you want some decimal in
466:47 happen. But you want some decimal in between zero and one. When we have a
466:49 between zero and one. When we have a lower temperature, like we're seeing
466:50 lower temperature, like we're seeing here, 0.001,
466:52 here, 0.001, we give it the same task twice, and it
466:55 we give it the same task twice, and it gives us the exact same output character
466:57 gives us the exact same output character for character. When we set that
466:58 for character. When we set that temperature higher, this is an example
467:00 temperature higher, this is an example of 0.9. I'm asking the agent to create a
467:03 of 0.9. I'm asking the agent to create a gradient for me, uh, simple task. It's
467:06 gradient for me, uh, simple task. It's giving me two relatively valid
467:08 giving me two relatively valid solutions. I did ask it for a JavaScript
467:10 solutions. I did ask it for a JavaScript method and this is the only one that's
467:12 method and this is the only one that's giving me a JavaScript method. But the
467:14 giving me a JavaScript method. But the point is they are wildly different
467:15 point is they are wildly different approaches to the same problem when I've
467:17 approaches to the same problem when I've increased the creativity of that model.
467:19 increased the creativity of that model. So we need to think about like use case-
467:21 So we need to think about like use case- wise where should we have more
467:23 wise where should we have more creativity and where should we have more
467:25 creativity and where should we have more determinism and temperature is another
467:27 determinism and temperature is another setting that we have that can help
467:28 setting that we have that can help control this. You can experiment with
467:30 control this. You can experiment with all this using like docker model runner
467:32 all this using like docker model runner lama lm studio that sort of thing. How
467:35 lama lm studio that sort of thing. How can we tie this to better employee
467:36 can we tie this to better employee success? We had to provide both
467:38 success? We had to provide both education and adequate time to learn. So
467:42 education and adequate time to learn. So we put together a study where we sampled
467:44 we put together a study where we sampled a bunch of uh developers that were
467:46 a bunch of uh developers that were saving at least an hour a day uh uh
467:49 saving at least an hour a day uh uh excuse me an hour a week and we asked
467:50 excuse me an hour a week and we asked them to stack rank their top five most
467:53 them to stack rank their top five most valuable use cases. And we built a guide
467:55 valuable use cases. And we built a guide around that. A guide that effectively
467:57 around that. A guide that effectively goes through code examples, prompting
468:00 goes through code examples, prompting examples uh of what we determined using
468:02 examples uh of what we determined using the sort of data approach where we
468:05 the sort of data approach where we should get more reflexive about our best
468:06 should get more reflexive about our best practice and about uh the use cases that
468:09 practice and about uh the use cases that we're becoming reflexive in in our use
468:11 we're becoming reflexive in in our use of AI. And so that's what this guide was
468:13 of AI. And so that's what this guide was about. And uh we've had this become
468:15 about. And uh we've had this become required reading in certain engineering
468:16 required reading in certain engineering groups and uh proud of that. And this is
468:18 groups and uh proud of that. And this is another way that we can help educate.
468:20 another way that we can help educate. But we need to give time. Uh we don't
468:21 But we need to give time. Uh we don't have time to go through all of this. I
468:23 have time to go through all of this. I do think it's interesting that the
468:25 do think it's interesting that the number one use case for this was stack
468:26 number one use case for this was stack trace analysis, right? So, not a
468:28 trace analysis, right? So, not a generative use case, actually more of an
468:30 generative use case, actually more of an interpretive use case. And we see some
468:32 interpretive use case. And we see some other ones here that are not too
468:33 other ones here that are not too surprising. And there's examples of each
468:35 surprising. And there's examples of each of these. What about unblocking usage?
468:38 of these. What about unblocking usage? How can we make sure that we can
468:39 How can we make sure that we can creatively ensure that engineers can
468:41 creatively ensure that engineers can take the most advantage of this? Well,
468:43 take the most advantage of this? Well, leverage self-hosted and private models.
468:45 leverage self-hosted and private models. That's getting easier and easier to do.
468:47 That's getting easier and easier to do. Partner with compliance on day one,
468:49 Partner with compliance on day one, right? Make sure that what you're doing
468:51 right? Make sure that what you're doing is in line with your organization's
468:53 is in line with your organization's compliance. You may find that you're
468:55 compliance. You may find that you're making a lot of assumptions about things
468:56 making a lot of assumptions about things that you don't think you can do that you
468:58 that you don't think you can do that you can actually do, right? And then think
469:00 can actually do, right? And then think creatively around various barriers.
469:02 creatively around various barriers. Finally, how can we integrate across the
469:05 Finally, how can we integrate across the SDLC? What should we think about doing
469:07 SDLC? What should we think about doing there? You know, and I'm a big Ellie
469:09 there? You know, and I'm a big Ellie Gold theory of constraints fan. Probably
469:11 Gold theory of constraints fan. Probably have some others in the audience. An
469:12 have some others in the audience. An hour saved on something that isn't the
469:14 hour saved on something that isn't the bottleneck is worthless. And when we
469:17 bottleneck is worthless. And when we look at data across in this case almost
469:19 look at data across in this case almost 140,000 engineers, we find that there
469:22 140,000 engineers, we find that there are definitely good like annualized time
469:24 are definitely good like annualized time savings with AI that are being eclipsed
469:27 savings with AI that are being eclipsed by sources of context switching and
469:29 by sources of context switching and interruption, meeting heavy days, these
469:32 interruption, meeting heavy days, these other things that it's like, yeah, we
469:33 other things that it's like, yeah, we can save time here, but we're losing so
469:35 can save time here, but we're losing so much more time over there. So find the
469:37 much more time over there. So find the bottleneck, fix the bottleneck, right?
469:40 bottleneck, fix the bottleneck, right? Morgan Stanley's been very public about
469:41 Morgan Stanley's been very public about their uh building this thing called Dev
469:44 their uh building this thing called Dev Gen AI that looks at a bunch of legacy
469:46 Gen AI that looks at a bunch of legacy code, Cobalt, mainframe natural. I hate
469:48 code, Cobalt, mainframe natural. I hate to admit Pearl because I'm an old school
469:50 to admit Pearl because I'm an old school Pearl developer. Uh but apparently
469:52 Pearl developer. Uh but apparently that's legacy now, too. And basically
469:54 that's legacy now, too. And basically creating specs uh for developers that
469:57 creating specs uh for developers that can just be handed to developers to
469:58 can just be handed to developers to start modernizing the code without
470:00 start modernizing the code without having to do all that reverse
470:02 having to do all that reverse engineering, right? And they're saving
470:03 engineering, right? And they're saving about 300,000 hours annually right now
470:05 about 300,000 hours annually right now doing this. There's a Wall Street
470:06 doing this. There's a Wall Street Journal journal article about this,
470:08 Journal journal article about this, Business Insider article about it. Uh
470:10 Business Insider article about it. Uh they're very public about that. Zapier,
470:13 they're very public about that. Zapier, Zapier should be the example for
470:15 Zapier should be the example for everyone. They have a whole series of
470:18 everyone. They have a whole series of bots and agents that are doing things
470:19 bots and agents that are doing things like assisting with onboarding. They can
470:21 like assisting with onboarding. They can now make engineers effective in two
470:23 now make engineers effective in two weeks. Industry benchmark on the good
470:25 weeks. Industry benchmark on the good side is like a month. On the medium side
470:28 side is like a month. On the medium side is like 90 days. And uh because they're
470:32 is like 90 days. And uh because they're able to increase the effectiveness of
470:34 able to increase the effectiveness of the engineers that they're h that
470:36 the engineers that they're h that they've bringing into the organization,
470:37 they've bringing into the organization, they realized that they should be hiring
470:39 they realized that they should be hiring more, right? As opposed to trying to
470:42 more, right? As opposed to trying to maintain status quo by like cutting
470:44 maintain status quo by like cutting headcount and trying to make individual
470:46 headcount and trying to make individual engineers more productive. They said,
470:47 engineers more productive. They said, "No, we could get more value out of a
470:50 "No, we could get more value out of a single engineer. We should be hiring
470:51 single engineer. We should be hiring faster than ever." And they are, and
470:54 faster than ever." And they are, and it's really increasing their competitive
470:55 it's really increasing their competitive edge. I think that's the right attitude.
470:58 edge. I think that's the right attitude. Spotify has been helping out their S
470:59 Spotify has been helping out their S surres by pulling together context when
471:02 surres by pulling together context when incidents uh are detected and then
471:05 incidents uh are detected and then taking things like run but steps and and
471:07 taking things like run but steps and and other areas of context and documentation
471:09 other areas of context and documentation and pushing them directly into S sur
471:11 and pushing them directly into S sur channels so that those critical minutes
471:14 channels so that those critical minutes of trying to get to the bottom of what's
471:15 of trying to get to the bottom of what's actually happening and what we should do
471:17 actually happening and what we should do do to resolve the incident uh they just
471:20 do to resolve the incident uh they just eliminated that time right it's
471:21 eliminated that time right it's significantly increased their MTTR so
471:23 significantly increased their MTTR so let's get creative about areas in the
471:25 let's get creative about areas in the STLC that are our actual bottlenecks
471:28 STLC that are our actual bottlenecks All right, next steps. Uh, distribute
471:30 All right, next steps. Uh, distribute this guide as a reference for
471:32 this guide as a reference for integrating AI into the development
471:34 integrating AI into the development workflows that you have. Uh, determine a
471:36 workflows that you have. Uh, determine a method for measuring and evaluating
471:38 method for measuring and evaluating Genai impact. It's really important to
471:40 Genai impact. It's really important to make sure that we're not on the bad
471:42 make sure that we're not on the bad sides of those graphs that I showed you
471:44 sides of those graphs that I showed you earlier and then track and measure AI
471:46 earlier and then track and measure AI adoption and and see how that correlates
471:48 adoption and and see how that correlates to overall impact metrics and iterate on
471:51 to overall impact metrics and iterate on best practices and use cases. And here's
471:53 best practices and use cases. And here's a guide. Again, thank you so much Our
471:56 a guide. Again, thank you so much Our [applause]
472:08 closing presentation will teach us how to build an AI native company even if
472:11 to build an AI native company even if that company is 50 years old. Please
472:13 that company is 50 years old. Please join me in welcoming to the stage the
472:16 join me in welcoming to the stage the founder of Every, Dan [music] Shipper.
472:21 founder of Every, Dan [music] Shipper. >> Hello.
472:22 >> Hello. [applause]
472:29 How's it going everybody? I'm the last speaker of the day, so I'm just between
472:31 speaker of the day, so I'm just between you and dinner or drinks. So, I'm going
472:33 you and dinner or drinks. So, I'm going to try to make this fun and hopefully a
472:34 to try to make this fun and hopefully a little bit short.
472:42 So, first of all, I just want to say I'm very glad to see everybody and I'm
472:44 very glad to see everybody and I'm actually kind of surprised to see so
472:46 actually kind of surprised to see so many people here um because I've been I
472:49 many people here um because I've been I live here, but I've been traveling. I
472:50 live here, but I've been traveling. I was in Portugal uh last week and I was
472:54 was in Portugal uh last week and I was on Twitter and someone said that
472:57 on Twitter and someone said that everyone was moving to San Francisco.
473:00 everyone was moving to San Francisco. Uh but it's great to have everybody here
473:03 Uh but it's great to have everybody here instead because I love New York.
473:08 instead because I love New York. [laughter]
473:08 [laughter] Come on. Come on.
473:11 Come on. Come on. >> [applause]
473:12 >> [applause] >> Um, so I'm supposed to talk uh today
473:15 >> Um, so I'm supposed to talk uh today about uh how to build an a playbook for
473:19 about uh how to build an a playbook for how to build an AI native company. And
473:21 how to build an AI native company. And um I actually don't have one
473:23 um I actually don't have one unfortunately. Um
473:26 unfortunately. Um and that's because I think the playbook
473:29 and that's because I think the playbook is actually being invented right now. So
473:32 is actually being invented right now. So we're doing it at the company that I run
473:33 we're doing it at the company that I run every but all of you are doing it here
473:35 every but all of you are doing it here today as well. and and and so I don't
473:38 today as well. and and and so I don't want to do this talk from the
473:40 want to do this talk from the perspective of I have all the answers
473:41 perspective of I have all the answers and I'm going to tell you the framework
473:42 and I'm going to tell you the framework and the playbook and all that kind of
473:44 and the playbook and all that kind of stuff. Um but um I do think it is
473:48 stuff. Um but um I do think it is helpful when we're in this beginning
473:51 helpful when we're in this beginning stage of
473:54 stage of uh learning how to use AI to do
473:57 uh learning how to use AI to do engineering to build companies uh to
474:00 engineering to build companies uh to share like the the personal experiences
474:02 share like the the personal experiences that we're having inside of our
474:03 that we're having inside of our companies um and uh and sort of
474:06 companies um and uh and sort of collaboratively figure out the playbook
474:08 collaboratively figure out the playbook together. So I think the best that I can
474:10 together. So I think the best that I can offer is really just sort of dispatches
474:12 offer is really just sort of dispatches from the future. Uh notes on what I've
474:14 from the future. Uh notes on what I've figured out um and the work that we've
474:16 figured out um and the work that we've done inside of every um and I think the
474:19 done inside of every um and I think the the first big thing the first the first
474:20 the first big thing the first the first big thing I really noticed is that there
474:23 big thing I really noticed is that there is definitely a huge there's a 10x
474:26 is definitely a huge there's a 10x difference between an org where 90% of
474:28 difference between an org where 90% of the engineers are using AI versus an org
474:31 the engineers are using AI versus an org where 100% of the engineers are using
474:33 where 100% of the engineers are using AI. It's it's it's totally different.
474:35 AI. It's it's it's totally different. Um, I think the I think the big thing is
474:37 Um, I think the I think the big thing is if even 10% of your company is
474:42 if even 10% of your company is uh is using a more traditional
474:44 uh is using a more traditional engineering method, you you sort of have
474:45 engineering method, you you sort of have to lean all the way back over into that
474:48 to lean all the way back over into that world. Um, and so it it prevents you
474:50 world. Um, and so it it prevents you from doing some of the things that you
474:51 from doing some of the things that you might do if everyone was uh not typing
474:55 might do if everyone was uh not typing into a code editor all the time. Um,
474:59 into a code editor all the time. Um, and I know this because this is what we
475:01 and I know this because this is what we do at every um, which is the company
475:04 do at every um, which is the company that I run and it has totally
475:06 that I run and it has totally transformed what we are able to do as a
475:09 transformed what we are able to do as a small company. Um, and so I think of us
475:11 small company. Um, and so I think of us as like a little bit of a lab for what's
475:13 as like a little bit of a lab for what's possible that I I'm excited to share
475:15 possible that I I'm excited to share with you. So for people who don't know,
475:18 with you. So for people who don't know, I run every um, inside of every we have
475:23 I run every um, inside of every we have six business units. We have four
475:26 six business units. We have four software products. We run four software
475:28 software products. We run four software products with just 15 people, which is
475:30 products with just 15 people, which is kind of crazy. Um, and these software
475:33 kind of crazy. Um, and these software products are not toys. We've grown at
475:35 products are not toys. We've grown at every we've grown MR by double digits
475:38 every we've grown MR by double digits every month for the last 6 months. We
475:40 every month for the last 6 months. We have over 7,000 paying subscribers and
475:42 have over 7,000 paying subscribers and over 100,000 free subscribers. Um, and
475:45 over 100,000 free subscribers. Um, and we've done this in a very capital-like
475:47 we've done this in a very capital-like way. We've only raised about a million
475:48 way. We've only raised about a million dollars in total. Um and very
475:50 dollars in total. Um and very importantly for for this audience and
475:52 importantly for for this audience and for this discussion um 99% of our code
475:56 for this discussion um 99% of our code is written by AI agents. Uh no one is
475:59 is written by AI agents. Uh no one is handwriting code. No one is writing code
476:01 handwriting code. No one is writing code at all. Um it's all done with cloud
476:04 at all. Um it's all done with cloud code, codeex, Droid, what have you. Um
476:08 code, codeex, Droid, what have you. Um uh coding agent of your of your choice.
476:13 uh coding agent of your of your choice. Um, and also really importantly for the
476:15 Um, and also really importantly for the size of team we are, each one of our
476:18 size of team we are, each one of our apps is built by a single developer,
476:21 apps is built by a single developer, which is crazy. And these are not like
476:23 which is crazy. And these are not like uh little apps. Uh, here here's an
476:25 uh little apps. Uh, here here's an example. This is Kora, which is a um AI
476:29 example. This is Kora, which is a um AI email management app. Um, it's sort of
476:31 email management app. Um, it's sort of an it's an it is it's an assistant for
476:33 an it's an it is it's an assistant for your email. It on on the left over here,
476:36 your email. It on on the left over here, it summarizes all of your all of your
476:37 it summarizes all of your all of your emails that come in. So, you can kind of
476:38 emails that come in. So, you can kind of read your email that way. This is what
476:39 read your email that way. This is what my inbox looks like. on the right is a
476:42 my inbox looks like. on the right is a um email assistant that you can ask
476:44 um email assistant that you can ask questions like I asked where's when's my
476:46 questions like I asked where's when's my AI engineer talk um today and it gave me
476:49 AI engineer talk um today and it gave me just gave me the answer um and this is
476:51 just gave me the answer um and this is built primarily by one engineer um that
476:55 built primarily by one engineer um that he's got one or two contractors that
476:57 he's got one or two contractors that have helped in in certain ways but like
476:59 have helped in in certain ways but like almost all of this is built by one guy
477:01 almost all of this is built by one guy same thing for um
477:04 same thing for um uh this app which is another one that we
477:07 uh this app which is another one that we we make called monologue which is a
477:09 we make called monologue which is a speechto text app It's sort of like
477:11 speechto text app It's sort of like Super Whisper or Whisper Flow if you
477:13 Super Whisper or Whisper Flow if you know of those. Um, again, one guy,
477:16 know of those. Um, again, one guy, thousands of users. Um, I I love it.
477:19 thousands of users. Um, I I love it. It's a it's a it's just a beautifully
477:21 It's a it's a it's just a beautifully done app and it's not it's not simple.
477:23 done app and it's not it's not simple. It's complicated. There's a lot of stuff
477:25 It's complicated. There's a lot of stuff to it. Same thing for this app called
477:27 to it. Same thing for this app called Spiral. You can see there's it's it's
477:30 Spiral. You can see there's it's it's big. Um, and again, one engineer.
477:35 big. Um, and again, one engineer. So, obviously, this would not have been
477:37 So, obviously, this would not have been possible um a few years ago. it would
477:39 possible um a few years ago. it would not have been possible even a year ago.
477:41 not have been possible even a year ago. And I think the big change that happened
477:43 And I think the big change that happened that we're all starting to catch up to
477:45 that we're all starting to catch up to is um it started with cloud code, the
477:48 is um it started with cloud code, the sort of like terminal UI that gets rid
477:49 sort of like terminal UI that gets rid of the code editor
477:51 of the code editor really push pushed us into a place where
477:54 really push pushed us into a place where um we are delegating tasks to these
477:57 um we are delegating tasks to these agents. We are and and that allows us to
478:01 agents. We are and and that allows us to uh work in parallel and do much more
478:03 uh work in parallel and do much more than we would have ordinarily. Um,
478:07 than we would have ordinarily. Um, so some of the things that some of the
478:09 so some of the things that some of the things that I've noticed that we can do
478:11 things that I've noticed that we can do that I I assume people in this room are
478:12 that I I assume people in this room are starting to see but um [snorts] I think
478:15 starting to see but um [snorts] I think is is sort of important to put put our
478:17 is is sort of important to put put our finger on is uh the reason we can go
478:20 finger on is uh the reason we can go much faster is we can work on multiple
478:21 much faster is we can work on multiple multiple features and bugs in parallel.
478:23 multiple features and bugs in parallel. And I think that there's a um
478:26 And I think that there's a um there's like a little bit of a meme of
478:27 there's like a little bit of a meme of the vibe coder on Twitter that is oh
478:30 the vibe coder on Twitter that is oh [snorts] like they they have um they
478:32 [snorts] like they they have um they have four panes open but they're not
478:34 have four panes open but they're not actually doing any work. And I actually
478:37 actually doing any work. And I actually you can do it that way and I think there
478:39 you can do it that way and I think there are also definitely engineers and I know
478:41 are also definitely engineers and I know that they are because they work at every
478:43 that they are because they work at every that are productively using four panes
478:46 that are productively using four panes of agents at the same time. Um, and
478:50 of agents at the same time. Um, and that's that's crazy and that that
478:51 that's that's crazy and that that contributes a lot to the um ability for
478:54 contributes a lot to the um ability for a single developer to build and run a
478:56 a single developer to build and run a production application. Um, another like
479:00 production application. Um, another like really important thing about this, a
479:01 really important thing about this, a really big um unlock is because code is
479:04 really big um unlock is because code is cheap, you can prototype risky ideas and
479:07 cheap, you can prototype risky ideas and that allows you to do more experiments
479:09 that allows you to do more experiments than you would ordinarily. And that lets
479:10 than you would ordinarily. And that lets you make way more progress because the
479:14 you make way more progress because the starting energy to try something is so
479:16 starting energy to try something is so much lower because you just like say,
479:17 much lower because you just like say, "Oh, go do this. go do some research on
479:19 "Oh, go do this. go do some research on this like big refactor I might want to
479:20 this like big refactor I might want to do and then you go off and do something
479:22 do and then you go off and do something else. And that's a really big deal.
479:30 Um, and another really interesting thing that I love about this stuff that I'
479:33 that I love about this stuff that I' I've noticed in inside of inside of our
479:35 I've noticed in inside of inside of our organization is we move we're moving a
479:37 organization is we move we're moving a bit more toward a demo culture where um
479:41 bit more toward a demo culture where um instead of you know previously if you
479:43 instead of you know previously if you wanted to make something you'd have to
479:44 wanted to make something you'd have to be like maybe write a memo or do a do a
479:46 be like maybe write a memo or do a do a deck or um or you know convince a bunch
479:50 deck or um or you know convince a bunch of people that it was a good idea to
479:51 of people that it was a good idea to spend time on because you can vibe code
479:53 spend time on because you can vibe code something uh in a couple hours that sort
479:57 something uh in a couple hours that sort of shows the thing that you're uh that
479:59 of shows the thing that you're uh that you want to make. It it allows you to
480:01 you want to make. It it allows you to show everybody and uh I think that being
480:04 show everybody and uh I think that being a being a sort of de democultural allows
480:06 a being a sort of de democultural allows you to do weirder things that you only
480:09 you to do weirder things that you only get if you can feel it. Um which is I
480:12 get if you can feel it. Um which is I think really amazing
480:14 think really amazing and beyond just like sort of the basic
480:17 and beyond just like sort of the basic productivity unlocks.
480:19 productivity unlocks. um
480:21 um AI has and the way that we use it has
480:23 AI has and the way that we use it has caused us to sort of invent an entirely
480:25 caused us to sort of invent an entirely new set of engineering primitives and
480:26 new set of engineering primitives and processes which I'm sure that everybody
480:29 processes which I'm sure that everybody in this room is starting to do already.
480:31 in this room is starting to do already. I think everyone is sort of approaching
480:33 I think everyone is sort of approaching the same things from different angles
480:35 the same things from different angles and a lot of them definitely do echo
480:38 and a lot of them definitely do echo engineering processes from the past but
480:39 engineering processes from the past but I think it's really helpful to try to
480:41 I think it's really helpful to try to put our finger on okay what is the new
480:44 put our finger on okay what is the new way of programming if we're moving up a
480:47 way of programming if we're moving up a level of the stack and and we're moving
480:48 level of the stack and and we're moving from you know Python and JavaScript and
480:50 from you know Python and JavaScript and scripting languages up into um up into
480:53 scripting languages up into um up into English and the uh the the name that
480:57 English and the uh the the name that we've given to this process is
480:59 we've given to this process is compounding engineering
481:02 compounding engineering Um, and the way that I talk about
481:04 Um, and the way that I talk about compounding engineering is in
481:05 compounding engineering is in traditional engineering, each feature
481:08 traditional engineering, each feature makes the next feature harder to build.
481:11 makes the next feature harder to build. In compounding engineering, your goal is
481:13 In compounding engineering, your goal is to make sure that each feature makes the
481:16 to make sure that each feature makes the next feature easier to build. Um, and we
481:20 next feature easier to build. Um, and we do that in this loop.
481:22 do that in this loop. Um, the loop has four steps. The first
481:24 Um, the loop has four steps. The first one is plan. And if you're you've been
481:26 one is plan. And if you're you've been here today, you've been paying
481:27 here today, you've been paying attention, you know how important it is
481:29 attention, you know how important it is when you're working with agents to make
481:30 when you're working with agents to make a really really detailed plan. So I
481:32 a really really detailed plan. So I think everyone is doing that. Second
481:34 think everyone is doing that. Second step is delegate. Just like go tell the
481:35 step is delegate. Just like go tell the agent to do it. Everyone's doing that
481:37 agent to do it. Everyone's doing that too. Third step is assess. And we have
481:40 too. Third step is assess. And we have tons and tons of ways to um assess
481:42 tons and tons of ways to um assess whether the work that the agent did is
481:44 whether the work that the agent did is any good. There's tests, there's trying
481:46 any good. There's tests, there's trying it, there's having the agent uh figure
481:48 it, there's having the agent uh figure it out. There's there's code review,
481:50 it out. There's there's code review, there's agent code review, there's all
481:51 there's agent code review, there's all this types of stuff. And then the last
481:53 this types of stuff. And then the last step which is I think the most
481:55 step which is I think the most interesting one is codify. And this is
481:56 interesting one is codify. And this is kind of like the the money step which is
481:58 kind of like the the money step which is where you compound
482:00 where you compound everything that you've learned from the
482:03 everything that you've learned from the planning stage, the delegation stage,
482:05 planning stage, the delegation stage, the assessment stage back into prompts
482:07 the assessment stage back into prompts that go into your, you know, your cloud
482:10 that go into your, you know, your cloud MD file or your um your sub aents or
482:13 MD file or your um your sub aents or your slash commands and you start to um
482:17 your slash commands and you start to um basically create this library. You take
482:20 basically create this library. You take all the tacet knowledge that you pick up
482:23 all the tacet knowledge that you pick up um that all your engineers are picking
482:24 um that all your engineers are picking up um as they find bugs, fix plans, um
482:30 up um as they find bugs, fix plans, um delegate work, and you um you make it
482:33 delegate work, and you um you make it into an explicit collection of prompts
482:34 into an explicit collection of prompts that you can spread for your entire
482:36 that you can spread for your entire organization.
482:37 organization. And um when you do that really well,
482:40 And um when you do that really well, there's a lot of like really interesting
482:43 there's a lot of like really interesting um second order effects that are are not
482:46 um second order effects that are are not I think that well understood or or that
482:48 I think that well understood or or that commonly talked about that I think would
482:50 commonly talked about that I think would be interesting to to bring here because
482:52 be interesting to to bring here because my guess is that um some people are
482:55 my guess is that um some people are already seeing this, but like maybe it
482:57 already seeing this, but like maybe it needs to be pushed on a little bit more
482:59 needs to be pushed on a little bit more to like really be brought out and some
483:01 to like really be brought out and some people uh it might be an interesting way
483:04 people uh it might be an interesting way to get more of your organization to buy
483:05 to get more of your organization to buy into using these tools. tools 100% of
483:08 into using these tools. tools 100% of the time.
483:10 the time. Um so the first thing that you notice if
483:13 Um so the first thing that you notice if you sort of if you set up this process
483:14 you sort of if you set up this process and you and you're like 100% bought in
483:16 and you and you're like 100% bought in on something like compounding
483:17 on something like compounding engineering um is that tacet code
483:20 engineering um is that tacet code sharing
483:22 sharing becomes much easier. So uh we have we
483:25 becomes much easier. So uh we have we have multiple products at every a lot of
483:28 have multiple products at every a lot of a lot of products a lot of times need to
483:29 a lot of products a lot of times need to implement similar things even if they
483:31 implement similar things even if they use different technologies or imple
483:32 use different technologies or imple implementing similar things like a
483:34 implementing similar things like a team's feature or a certain type of ooth
483:36 team's feature or a certain type of ooth or whatever. Um
483:39 or whatever. Um previously in order to share code you'd
483:41 previously in order to share code you'd have to like abstract out whatever you
483:42 have to like abstract out whatever you did into a library and then like allow
483:44 did into a library and then like allow someone else to download and it it'd be
483:46 someone else to download and it it'd be hard to do or you'd have to talk about
483:47 hard to do or you'd have to talk about it.
483:49 it. With agents, um you can just point your
483:51 With agents, um you can just point your Cloud Code instance at um the repo from
483:54 Cloud Code instance at um the repo from the developer sitting next to you and
483:56 the developer sitting next to you and learn the process that they went through
483:57 learn the process that they went through to build the feature that they that you
483:59 to build the feature that they that you need to reimplement and reimplement it
484:02 need to reimplement and reimplement it yourself in your own tech stack in your
484:04 yourself in your own tech stack in your own framework and in your own way. Um,
484:06 own framework and in your own way. Um, and that's really really cool to kind of
484:07 and that's really really cool to kind of have this the more developers you have
484:10 have this the more developers you have working on different things inside of
484:11 working on different things inside of the org, the more you can um share
484:14 the org, the more you can um share without any extra cost because AI can
484:17 without any extra cost because AI can just go read all the code and and um and
484:19 just go read all the code and and um and use it. Um, another really cool thing
484:22 use it. Um, another really cool thing that I've noticed is that new hires are
484:25 that I've noticed is that new hires are productive on their first day because
484:27 productive on their first day because you've taken all of the things that
484:28 you've taken all of the things that you've learned about like, okay, how do
484:30 you've learned about like, okay, how do I set up an environment and what does a
484:32 I set up an environment and what does a good commit look like and all this kind
484:34 good commit look like and all this kind of stuff and on the first day they have
484:36 of stuff and on the first day they have all that set up in their in in their,
484:38 all that set up in their in in their, you know, cloud MD files or their cursor
484:40 you know, cloud MD files or their cursor files or uh codeex files or whatever and
484:44 files or uh codeex files or whatever and um the agent just sets up their local
484:47 um the agent just sets up their local environment and knows how write a good
484:50 environment and knows how write a good PR. That's really cool. It also helps if
484:54 PR. That's really cool. It also helps if you um want to hire like expert
484:57 you um want to hire like expert freelancers. Like there's some there's
484:59 freelancers. Like there's some there's one guy there's one person who just is
485:01 one guy there's one person who just is really good at this one specific thing.
485:03 really good at this one specific thing. You can have them come in for a day and
485:04 You can have them come in for a day and like do that thing. It's I think of it a
485:06 like do that thing. It's I think of it a little bit like um like a DJ or whatever
485:10 little bit like um like a DJ or whatever can like go in on like a couple bars of
485:11 can like go in on like a couple bars of a song. Like you can just sort of drop
485:13 a song. Like you can just sort of drop in and that's really helpful. it's it
485:15 in and that's really helpful. it's it would ordinarily be like too hard to
485:17 would ordinarily be like too hard to collaborate because the the startup cost
485:19 collaborate because the the startup cost is too high, but you can do that a lot
485:21 is too high, but you can do that a lot better now.
485:24 better now. Um, another thing that I've noticed
485:26 Um, another thing that I've noticed which is really cool too is um
485:28 which is really cool too is um developers inside of every commit to um
485:31 developers inside of every commit to um other products. So, uh you know, we have
485:34 other products. So, uh you know, we have four products that run internally.
485:37 four products that run internally. Everybody uses all the products. If
485:39 Everybody uses all the products. If someone uh runs into a bug or a paper
485:41 someone uh runs into a bug or a paper cutter, like a little minor quality of
485:43 cutter, like a little minor quality of life thing that they want, they will um
485:45 life thing that they want, they will um often just um they will often just uh
485:50 often just um they will often just uh just submit a poll request for it to
485:53 just submit a poll request for it to other GM of the app um because it's very
485:55 other GM of the app um because it's very easy for them to go download the repo
485:57 easy for them to go download the repo and figure out uh or have really have
486:00 and figure out uh or have really have Claude or Codex figure out, okay, this
486:01 Claude or Codex figure out, okay, this is how we fix the bug or this is how we
486:03 is how we fix the bug or this is how we fix the paper cut. Um and that's really
486:06 fix the paper cut. Um and that's really really cool because you have this
486:08 really cool because you have this much um much easier way of collaborating
486:11 much um much easier way of collaborating across apps that I I think over the next
486:15 across apps that I I think over the next couple years. I imagine that you will
486:17 couple years. I imagine that you will also be able to let customers do this to
486:19 also be able to let customers do this to some extent. Like if you run into a bug,
486:21 some extent. Like if you run into a bug, um this is, you know, speculative, but
486:24 um this is, you know, speculative, but if you run into a bug, you can have your
486:26 if you run into a bug, you can have your little agent fix it um and submit it as
486:27 little agent fix it um and submit it as pull request. It's a weird open source
486:30 pull request. It's a weird open source thing, but um yeah, this is really
486:32 thing, but um yeah, this is really really cool and and definitely is
486:34 really cool and and definitely is happening a lot inside of our company.
486:37 happening a lot inside of our company. Um,
486:39 Um, another really cool thing is um we we
486:41 another really cool thing is um we we have not this may get different as we as
486:44 have not this may get different as we as we scale but um we have not yet had to
486:47 we scale but um we have not yet had to standardize onto a particular stack or
486:49 standardize onto a particular stack or language. We instead let everyone who's
486:51 language. We instead let everyone who's building different products like pick
486:52 building different products like pick the thing that they like best and the
486:54 the thing that they like best and the reason is because it makes it AI makes
486:57 reason is because it makes it AI makes it much easier to translate between
486:58 it much easier to translate between them. Um and it makes it much easier to
487:01 them. Um and it makes it much easier to to jump into any language and framework
487:03 to jump into any language and framework and environment and be productive. And
487:06 and environment and be productive. And so it we don't uh it's easier for us to
487:08 so it we don't uh it's easier for us to let people just do the thing that that
487:10 let people just do the thing that that they like and let AI kind of like handle
487:11 they like and let AI kind of like handle the translation in between.
487:14 the translation in between. Um and the last thing which is my
487:16 Um and the last thing which is my favorite but like is also the horror I
487:18 favorite but like is also the horror I think of of some developers and to some
487:20 think of of some developers and to some degree maybe the horror of my team um is
487:23 degree maybe the horror of my team um is that managers can commit code. um if
487:25 that managers can commit code. um if you're technical uh even the CEO and
487:29 you're technical uh even the CEO and um for for me like I have no business
487:32 um for for me like I have no business committing code because we've got four
487:34 committing code because we've got four products we've got 15 people we're
487:35 products we've got 15 people we're growing really fast um I'm doing tons
487:38 growing really fast um I'm doing tons and tons of other things but I can and I
487:40 and tons of other things but I can and I I have like committed production code
487:42 I have like committed production code over the last couple months and the
487:43 over the last couple months and the reason for that is AI allows um
487:47 reason for that is AI allows um engineers to work with fractured
487:49 engineers to work with fractured attention so previously you might have
487:52 attention so previously you might have needed like a 3 or 4 hour block of focus
487:54 needed like a 3 or 4 hour block of focus time in order to like get anything done.
487:56 time in order to like get anything done. Um, but with cloud code, you can kind of
487:58 Um, but with cloud code, you can kind of like get out of meeting and say, "Hey,
488:00 like get out of meeting and say, "Hey, like I want you to investigate this
488:01 like I want you to investigate this bug." And then go do something else and
488:03 bug." And then go do something else and then come back and you have like a a
488:05 then come back and you have like a a plan or like a um root cause fix and
488:08 plan or like a um root cause fix and then you can submit a PR. And it's not
488:11 then you can submit a PR. And it's not easy. It's not magic, but it is actually
488:13 easy. It's not magic, but it is actually possible. And I think that's a that's
488:16 possible. And I think that's a that's just a totally new way of thinking how
488:18 just a totally new way of thinking how thinking of thinking about how managers
488:21 thinking of thinking about how managers interact with the products that they
488:23 interact with the products that they make.
488:30 So, um, just to just to summarize, um, there's a I really think there's a 10x
488:32 there's a I really think there's a 10x difference in how things work when you
488:35 difference in how things work when you hit 100% AI adoption. I think, um, from
488:39 hit 100% AI adoption. I think, um, from what we've seen, a single engineer
488:40 what we've seen, a single engineer should be able to build and maintain a
488:42 should be able to build and maintain a complex production product. what we call
488:45 complex production product. what we call compounding engineering, but I think
488:46 compounding engineering, but I think what all of us are are sort of pointing
488:48 what all of us are are sort of pointing to um is I I think really works to make
488:52 to um is I I think really works to make each feature easier to build and then
488:55 each feature easier to build and then creates all of these sort of nonobvious
488:58 creates all of these sort of nonobvious second order effects that makes it
488:59 second order effects that makes it easier for the entire organization to
489:01 easier for the entire organization to collaborate together.
489:03 collaborate together. And very importantly, many people in San
489:05 And very importantly, many people in San Francisco don't know this yet. Um so
489:07 Francisco don't know this yet. Um so you're you're the first to hear it. Um
489:10 you're you're the first to hear it. Um so that is my talk. So, if you're
489:14 so that is my talk. So, if you're interested in um in what we do, uh I run
489:17 interested in um in what we do, uh I run every uh Every is the only subscription
489:19 every uh Every is the only subscription you need to stay at the edge of AI. You
489:21 you need to stay at the edge of AI. You can find us at every.to. Um we uh we
489:25 can find us at every.to. Um we uh we have a daily newsletter about AI. So, we
489:27 have a daily newsletter about AI. So, we do ideas, apps, and training. We have a
489:28 do ideas, apps, and training. We have a on the ideas side, we have a daily
489:30 on the ideas side, we have a daily newsletter. We review all the new models
489:32 newsletter. We review all the new models when they come out and all the new
489:33 when they come out and all the new products when they come out. The apps,
489:35 products when they come out. The apps, you already saw, we've a bundle of all
489:36 you already saw, we've a bundle of all these apps and then we do training and
489:38 these apps and then we do training and consulting with big companies to help
489:39 consulting with big companies to help them use AI and it's all bundled into
489:42 them use AI and it's all bundled into one subscription. So you get everything
489:43 one subscription. So you get everything for one price and that's it. Thank you
489:47 for one price and that's it. Thank you very much. [applause]
489:50 very much. [applause] [music]
489:56 [music] >> Ladies and gentlemen, please welcome
489:58 >> Ladies and gentlemen, please welcome back to the stage Alex Lieberman.
490:10 Okay, 8 hours in. We did it. Um I have some housekeeping. We have to finish the
490:11 some housekeeping. We have to finish the day with housekeeping. First of all, I
490:13 day with housekeeping. First of all, I want to thank you all. It has been
490:14 want to thank you all. It has been phenomenal to be on this journey with
490:17 phenomenal to be on this journey with you all. But uh let's give a a shout out
490:19 you all. But uh let's give a a shout out just to you all for being here, going
490:21 just to you all for being here, going through a full day listening to the
490:22 through a full day listening to the programming. So, round of applause for
490:23 programming. So, round of applause for everyone in the crowd, everyone
490:25 everyone in the crowd, everyone [applause] online who's been watching.
490:27 [applause] online who's been watching. Let's also keep it going for all the
490:29 Let's also keep it going for all the team in production behind the scenes
490:31 team in production behind the scenes making this possible. I watched them
490:33 making this possible. I watched them work tirelessly throughout the day to
490:35 work tirelessly throughout the day to make this happen. And then finally,
490:37 make this happen. And then finally, let's give a huge shout out to Swix and
490:38 let's give a huge shout out to Swix and Ben who made this whole thing happen.
490:41 Ben who made this whole thing happen. >> [applause]
490:46 >> So get comfortable for a second. I have some housekeeping. Make sure everyone
490:48 some housekeeping. Make sure everyone knows where to go. And then we have one
490:50 knows where to go. And then we have one final speaker who's going to chat uh
490:52 final speaker who's going to chat uh right after I hop off stage. So let's
490:55 right after I hop off stage. So let's just dive in for a sec. Uh tomorrow is
490:58 just dive in for a sec. Uh tomorrow is the engineering session day. I will not
491:01 the engineering session day. I will not be your MC. So you will be taken care of
491:03 be your MC. So you will be taken care of by Jed who works at Google. I spent the
491:05 by Jed who works at Google. I spent the day with Jed. He is incredible. He's
491:07 day with Jed. He is incredible. He's just like a taller, better looking
491:09 just like a taller, better looking version of me. and he's actually an
491:11 version of me. and he's actually an engineer. So, you get a true engineer
491:13 engineer. So, you get a true engineer tomorrow. Uh, if you have a bundle pass,
491:16 tomorrow. Uh, if you have a bundle pass, your ticket includes tomorrow's track.
491:19 your ticket includes tomorrow's track. So, we'll see you tomorrow at 8:00 a.m.
491:21 So, we'll see you tomorrow at 8:00 a.m. here. If you have the leadership pass
491:24 here. If you have the leadership pass only, your ticket does not include
491:26 only, your ticket does not include access to the sessions or the venue
491:28 access to the sessions or the venue tomorrow. However, we have organized an
491:31 tomorrow. However, we have organized an off-site brunch for you on us at a
491:34 off-site brunch for you on us at a restaurant not far from here. So, check
491:36 restaurant not far from here. So, check your calendar for the invite and the
491:38 your calendar for the invite and the location. But right now we are headed
491:40 location. But right now we are headed into the afterparty. And not only is
491:42 into the afterparty. And not only is there an afterparty, but there are after
491:44 there an afterparty, but there are after afterparties. There's a lot of side
491:45 afterparties. There's a lot of side events. So your entire night is planned
491:47 events. So your entire night is planned for you. And we have Graphite to thank
491:49 for you. And we have Graphite to thank for sponsoring the afterparty. So here
491:52 for sponsoring the afterparty. So here to give us the last word for a brief
491:54 to give us the last word for a brief message is the co-founder and CEO of
491:56 message is the co-founder and CEO of Graphite, Mel Lutzky.
492:03 [applause] >> [music]
492:09 >> Good evening everyone. My name is Mel Lutzky and I'm the co-founder and CEO of
492:11 Lutzky and I'm the co-founder and CEO of Graphite. Uh we're the AI powered code
492:14 Graphite. Uh we're the AI powered code review platform for this new age of
492:15 review platform for this new age of agentic software development. Now I know
492:18 agentic software development. Now I know you guys heard a lot today about agents
492:20 you guys heard a lot today about agents and how to make them as effective as
492:21 and how to make them as effective as possible in generating code and building
492:24 possible in generating code and building features faster than ever. And they're
492:26 features faster than ever. And they're incredible at this. But I think
492:28 incredible at this. But I think everybody who's who's built software in
492:30 everybody who's who's built software in a professional environment knows that
492:32 a professional environment knows that writing the code is only the first part
492:33 writing the code is only the first part of the story. Every code change then
492:36 of the story. Every code change then needs to be tested. It needs to be
492:38 needs to be tested. It needs to be reviewed, merged, deployed. And
492:40 reviewed, merged, deployed. And oftentimes that second half of the
492:42 oftentimes that second half of the process takes just as long if not longer
492:44 process takes just as long if not longer than actually generating the code. And
492:46 than actually generating the code. And that's what we do with graphite. We're
492:48 that's what we do with graphite. We're applying AI to the entire development
492:50 applying AI to the entire development process and making code review as
492:52 process and making code review as quickly as as quick as possible. Uh we
492:55 quickly as as quick as possible. Uh we have an agent that's integrated fully
492:57 have an agent that's integrated fully into our pull request page. Um it's like
492:59 into our pull request page. Um it's like reviewing code in 2025 and not you it
493:02 reviewing code in 2025 and not you it doesn't feel like 2015 anymore. U that's
493:04 doesn't feel like 2015 anymore. U that's what we build. Um we're super excited
493:06 what we build. Um we're super excited about it. Uh if you want to come check
493:08 about it. Uh if you want to come check it out, we have our booth uh in the expo
493:10 it out, we have our booth uh in the expo hall and also we're going to be around
493:12 hall and also we're going to be around all day tomorrow. We're the official
493:13 all day tomorrow. We're the official sponsors of tonight's afterparty and
493:16 sponsors of tonight's afterparty and also tomorrow's event at public records.
493:18 also tomorrow's event at public records. So we wanted you all you guys who came
493:19 So we wanted you all you guys who came from out of town out of town, we wanted
493:21 from out of town out of town, we wanted to show you good time in New York. Uh,
493:22 to show you good time in New York. Uh, so we have two events for you, uh, to
493:24 so we have two events for you, uh, to make sure that you have you have a good
493:26 make sure that you have you have a good time and, uh, see what New York is all
493:27 time and, uh, see what New York is all about. Uh, want to give a big shout out
493:29 about. Uh, want to give a big shout out to Swix and Ben and the whole AIG team
493:32 to Swix and Ben and the whole AIG team for organizing and we're excited to see
493:34 for organizing and we're excited to see you guys all at the party tonight. Thank
493:36 you guys all at the party tonight. Thank you very much.
493:38 you very much. [applause]
493:50 taking place in the halls on both doors. Expo
495:41 >> [music] >> Heat. Heat.