Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
AIE CODE 2025: AI Leadership ft Anthropic, OpenAI, McKinsey, Bloomberg, Google Deepmind, and Tenex | AI Engineer | YouTubeToText
YouTube Transcript: AIE CODE 2025: AI Leadership ft Anthropic, OpenAI, McKinsey, Bloomberg, Google Deepmind, and Tenex
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Video Summary
Summary
Core Theme
The AI Engineer Code Summit 2025 explored the transformative impact of AI on software development, focusing on how AI agents are reshaping coding practices, developer workflows, and organizational structures to drive efficiency, innovation, and new business models.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
[music]
Typing thoughts into [music] the darkest
part becomes design. Words evolve
[music] to whispers meant for something
more divine. Syntax bends and breeze. I
see the language change. I'm not
instructing anymore. I'm rearranging
faith. Every loop I write [singing]
rewrites me. Every function hums with
meaning. I feel the interface dissolve
new code. Not on the screen but in the
soul where [music] thought becomes the
motion and creation takes control. No
lines no rules just balance [music] in
between the zero and the one. The
>> [music]
>> systems shape our fragile skin. They
mold [singing] the way we move. We live
inside the logic gates [music] of what
we think is true. But deep beneath the
data post, [music] there's something undefined.
undefined.
A [singing] universe compiling the image
of our [music] minds. Every line reveals
reflection. Every loop replace [music]
connection. We're not building, we're
becoming. And the code becomes confession.
This is the [music] new code. Not on the
screen, but in the soul with thought
becomes the motion. [music] Creation
takes control. No lines, no rules, just
balance in between the [music] zero and
the one. The silence and the dream. [music]
[music]
Don't worry. [music] Uh, we're just
giving you something to do while Codex
[music] Each prompt, each breath, each
fragile spin, a universe [music] renewing.
This is the new code.
Alive and [music] undefined.
Where logic meets emotion and structure
bends to mind. [music] The system hs
eternal but the soul writes the line. We
are the new code.
I'm fired inside. [music]
[applause]
Ladies and gentlemen, please join me in
welcoming to the stage the co-founder [music]
[music]
of Morning Brew and the managing partner
of 10X, your host for the leadership
[music] track session day, Alex Lieberman.
Keep it going. Let's get a quick read of
the room. If you are coming from right
here in the Big Apple from New York,
make some noise.
Okay, now I have to say it. I assume
this is the biggest group. San Francisco.
Francisco.
>> Wow, that is surprising. Uh, Austin.
>> Okay, we got Austin. Who thinks they
came from the furthest place and is in
the room today?
>> Where? Where?
>> Ecuador. Can anyone beat Ecuador? [applause]
[applause]
>> New Zealand.
>> I don't think anyone's going to beat New
Zealand. There we go. Well, first of
all, uh, I am so excited to welcome you
all to the AI Engineer Code Summit 2025.
Uh, I'm Alex Lieberman, co-founder of
Morning Brew and your MC for the day.
Um, now you may be wondering, why is a
newsletter guy hosting an AI engineer
conference? It's a great question. Well,
after I left my role at Morning Brew, I
asked myself one simple question, and it
was, what space do I want to spend my
time in for the next 20 years where I
can build something consequential and
spend my time with some of the smartest
people I've ever met? And the answer
became obvious. I wanted to be as close
to the frontier of AI as humanly
possible. Which is why I co-founded
10x.co, Co. which is an AI
transformation firm helping mid-market
and enterprise companies learn how to
use AI within their business. And I
spend basically all of my time now with
AI engineers like yourselves. I'm the
only non-technical person in the
business and I wouldn't have it any
other way. So as you know this year has
been a banner year for the industry. And
I would think of today as both a look
back on where we've been as well as a
tactical view of where we are headed in
companies small and large, old and new.
We're going to hear from the labs. We'll
hear from Unicorn AI startups. We'll
hear from academics, big-time management
consultants, and Fortune50 brands. But
before we do that, we have to give the
brands that made this day possible their
flowers. So, let's go into it. Let's
give it up for Google DeepMind. today's
presenting sponsor. [applause]
Love it. Keep it going for Anthropic,
the platinum sponsor for the day. [applause]
[applause]
And then one more round of applause for
all of the gold and silver sponsors who
you can meet in the expo downstairs
throughout the day. One more. Let's keep
Are you guys ready to do the damn thing?
>> Let's do it. To kick things off, let's
give a huge welcome to head of
engineering of the Clawed developer
platform, Caitlyn Les. Let's welcome
Good morning. Um, so first let's give a
huge thank you to Swix and the whole AI
engineer organizing team for bringing us
I'm Caitlyn and I lead the claw
developer platform team at Anthropic.
Um, so let's start with a show of hands.
Who here is integrated against an LLM
API to build agents?
Okay, I'm talking to the right people.
Love it. Um, so today I want to share
how we're evolving our platform to help
you build really powerful agentic
systems using claude.
So we love working with developers who
do what we call raising the ceiling of
intelligence. They're always trying to
be on the frontier. They're always
trying to get the best out of our models
and build the most high performing
systems. Um, and so I want to walk you
through how we're building a platform
that helps you get the best out of
cloud. Um, and I'm going to do that
using a product that you hopefully have
all heard of before. Um it's an agenda
coding product. We love it a lot and
So when we think about maximizing
performance um from our models, we think
about building a platform that helps you
do three things. Um so first the
platform helps you harness Claude's
capabilities. We're training Claude to
get good at a lot of stuff and we need
to give you the tools in our API to use
the things that Claude is actually
getting good at. Next, we help you
manage Claude's context window. Keeping
the right context in the window at any
given time is really really critical to
getting the best outcomes from Claude.
And third, we're really excited about
this lately. We think you should just
give Claude a computer and let it do its
thing. So I'll talk about how we're
we're evolving the platform to give you
the infrastructure and otherwise that
So starting with harnessing Claude's
capabilities. Um, so we're getting
Claude really good at a bunch of stuff
and here are the ways that we expose
that to you um in our API as ideally
customizable features. So here's a first
example um relatively basic. Claude got
good at thinking um and Claude's
performance on various tasks um scales
with the amount of time you give it to
reason through those problems. Um, and
so, uh, we expose this to you as an API
feature that you can decide, do you want
Claude to think longer for something
more complex or do you want Claude to
just give you a quick answer. Um, we
also expose this with a budget. Um, so
you can tell Claude how many tokens to
essentially spend on thinking. Um, and
so for cloud code, um, pretty good
example. Obviously, you're often
debugging pretty complex systems with
cloud code or sometimes you just want a
quick, um, answer to the thing you're
trying to do. And so, um, Claude Code
takes advantage of this feature in our
API to decide whether or not to have
Claude think longer.
Another basic example is tool use.
Claude has gotten really good at
reliably calling tools. Um, so we expose
this in our API with both our own
built-in tools like our web search tool,
um, as well as the ability to create
your own custom tools. You just define a
name, a description, and an input
schema. Um, and Claude is pretty good at
reliably knowing when to actually go um,
and call those tools and pass the right
arguments. So, this is relevant for
cloud code. Cloud code has many, many,
many tools and it's calling them all the
time to do things like read files,
search for files, write to files, um,
and do stuff like rerun tests and otherwise.
otherwise.
So, the next way we're evolving the
platform to help you ma maximize
intelligence from claude um, is helping
you manage Claude's context window.
Getting the right context at the right
time in the window is one of the most
important things that you can do to
maximize performance.
But context management is really complex
to get right. Um especially for a coding
agent like claude code. You've got your
technical designs, you've got your
entire code base. Um you've got
instructions, you've got tool calls. All
these things might be in the window at
any given time. And so how do you make
sure the right set of those things are
in the window? Um, so getting that
context right and keeping it optimized
over time is something that we've
thought a lot about.
So let's start with MCP model context
protocol. We introduced this a year ago
and it's been really cool to see the
community swarm around adopting um MCP
as a standardized way for agents to
interact with external systems. Um, and
so for cloud code, you might imagine
GitHub or Sentry. there are plenty of
places kind of outside of the agent's
context where there might be additional
information or tools or otherwise that
you want your agent to be able to
interact with or the cloud code agent to
be able to interact with. Um, and so
this will obviously get you much better
performance than an agent that only sees
the things that are in its window as a
Uh, so the next thing is memory. So, if
you can use tools like MCP to get
context into your window, we introduced
a memory tool to help you actually keep
context outside of the window that
Claude knows how to pull back into the
window only when it actually needs it.
Um, and so we introduced the first
iteration of our memory tool as
essentially a clientside file system.
So, you control your data, but Claude is
good at knowing, oh, this is like a good
thing that I should store away for
later. And then, uh, it knows when to
pull that context back in. So for cloud
code, you could imagine um your patterns
for your codebase or maybe your
preferences for your git workflows.
These are all things that claude can
store away in memory and pull back in
And so the third thing is context
editing. If memory helps you keep stuff
outside the window and pull it back in
when it makes sense, context editing
helps you clear stuff out that's not
relevant right now and shouldn't be in
the window. Um, so our first iteration
of our context editing is just clearing
out old tool results. Um, and we did
this because tool results can actually
just be really large and take up a lot
of space in the window. And we found
that tool results from past calls are
not necessarily super relevant to help
claude get good responses later on in a
session. And so you can think about for
cloud code, cla code is calling hundreds
of tools. Um, those files that it read
otherwise, all these things are taking
up space within the window. Um so they
take advantage of um context management
And so um we found that if we combined
our memory tool with context editing, we
saw a 39% bump in performance over over
the benchmark on our own internal evals.
Um which was really really huge. And so
it just kind of shows you the importance
of keeping things in the window that are
only relevant at any given time. And
we're expanding on this by giving you
larger context windows. So for some of
our models, you can have a million token
context window. Combining that larger
window with the tools to actually edit
what's in your window maximizes your
performance. Um, and over time, we're
teaching Claude to get better and better
at actually understanding what's in its
context window. So maybe it has a lot of
room to run, maybe it's almost out of
space. Um, and Claude will respond
accordingly depending on how much time
uh or how much room it has left in the window.
So, here's the third thing. Um, we think
you should give Claude a computer and
just let it do its thing. We're really
excited about this one. Um, because
there's a lot of discourse right now
around agent harnesses. Um, you know,
how much scaffolding should you have?
How opinionated should it be? Should it
be heavy? Should it be light? Um, and I
think at the end of the day, Claude has
access to writing code. And if Claude
has access to running that same code, it
can accomplish anything. you can get
really great professional outputs for
the things that you're doing just by
giving Claude runway to go and do that.
But the challenge for letting you do
that is actually the infrastructure as
well as stuff like expertise like how do
you give claude access to things that um
when it's using a computer it will get
you better results.
So a fun story is we recently launched
cloud code on web and mobile. Um and
this was a fun project for our team
because we had a lot of problems to
solve. When you're running cloud code
locally, cloud code is essentially using
your machine as its computer. But if
you're starting a session on the web or
on mobile and then you're walking away,
what's happening? Like where is that
where is um cloud code running? Where is
it doing its work? Um and so we had some
hard problems to solve. We needed a
secure environment for claude to be able
to write and run code that's not
necessarily like approved code by you.
Um we needed to solve or container
orchestration at scale. Um and we needed
session persistence um because uh we
launched this and many of you were
excited about it and started many many
sessions and walked away and we had to
make sure that um all of these things
were ready to go when you came back and
um wanted to see the results of what
Claude did.
So one key primitive in this is our code
execution tool. Um so we released our
code execution tool in the API um which
allows claw to run write code and run
that code in a secure sandboxed
environment. Um, so our platform handles
containers, it handles security, and you
don't have to think about these things
because they're running on our servers.
Um, so you can imagine deciding that um,
you you want Claude to write some code
and you want Claude to go and be able to
run that code. And for cloud code,
there's plenty of examples here. Um,
like make an animation more sparkly that
uh, you want Claude to actually be able
to run that code. Um, so we really think
the future of agents is letting the
model work pretty autonomously within a
sandbox environment and we're giving you
the infrastructure to be able to do that.
that.
And this gets really powerful once you
think about giving the model actual
domain expertise in the things that
you're trying to do. So we recently
released agent skills which you can use
in combination with our code execution
tool. Skills are basically just folders
of scripts, instructions, and resources
that Claude has access to and can decide
to run within its sandbox environment.
Um, it decides to do that based on the
request that you gave it as well as the
description of a skill. Um, and Claude
is really good at knowing like this is
the right time to pull this skill into
context and go ahead and use it. And you
can combine skills with tools like MCP.
So MCP gives you access to tools and
access to context. Um, and then skills
give you the expertise to actually make
use of those tools and make use of that
context. Um, and so for cloud code, a
good example is web design. Maybe
whenever you launch a new product or a
new feature, um, you build landing
pages. And when you build those landing
pages, you want them to follow your
design system and you want them to
follow the patterns that you've set out.
Um, and so Claude will know, okay, I'm
being told to build a landing page. This
is a good time to pull in the web design
skill. um and use the right patterns and
and design system for that landing page.
Uh tomorrow Barry and Mahes from our
team are giving a talk on skills.
They'll go much deeper and I definitely
recommend checking that out.
So these are the ways that we're
evolving our platform um to help you
take advantage of everything that Claude
can do to get the absolute best
performance for the things that you're
building. First, harnessing Claude's
capabilities. So, as our research team
trains Claude, we give you the API
features to take advantage of those
things. Next, managing Claude's context.
It's really, really important to keep
your context window clean with the right
context at the right time. And third,
giving Claude a computer and just
So, we're going to keep evolving our
platform. Um, as Claude gets better and
has more capabilities and gets better at
the capabilities it already has, we'll
continue to evolve the API around that
so that you can stay on the frontier and
take advantage of the best that Claude
has to offer. Um, second, as uh, memory
and context evolve, we're going to up
the ante on the tools that we give you
in order to let Claude decide what to
pull in, what to store away for later,
and what to clean out of the context
window. And third, we're really going to
keep leaning into agent infrastructure.
Some of the biggest problems with the
idea of just let Claude have a computer
and do its thing are those problems that
I talked about around orchestration,
secure environments, and sandboxing. And
so we're going to keep working um to
make sure that those are um ready for
you to take advantage of.
Um and I'm hiring. We're hiring at
Anthropic. We're really growing our
team. Um, and so if you're someone who
loves um, building delightful developer
products um, and if you're excited about
what we're doing with Claude, we would
love to work with you across end product
design um, Devril, lots of functions. So
please reach out to us
Our next [music] presenter is the
president and head of AI at Replet. He's
here to speak about building the future
of coding. Please join me in welcoming
All right, good morning everyone. So at
Replet we're building a coding agent for
nontechnical users. It's a very peculiar
challenge I would say compared to many
people in this room. And what I'm going
to talk about today is why autonomy has
become kind of the northstar that we
keep chasing you know since we launched
the very first version of rapid agent
September last year.
Let's start from this very interesting
plot in case my clicker worked which now
does. Um I'm sure you all have seen it.
you know the semiacing value of that
published by Zwix a few weeks ago and it
kind of clarified a bit the landscape
you know for all of us uh agent builders
on one hand you have the low latency
interactions that really allow you to
stay in the loop you know so you can do
deep work and focus really on the on the
coding task at hand but you need to be
an expert you need to know exactly what
to prom the model for and you need to
understand quickly if you want to accept
the changes or not then for several
months many of us including rapid We
kind of live in this I think value that
where the agent wasn't autonomous enough
to really delegate a task and come back
and see it accomplished but at the same
time it run long enough not to keep in
the zone not to keep in the loop likely
over time we managed to go all the way
on the right and now we have agents that
runs for several hours in a row. What
I'm going to be arguing with today and
hope is not going to stop inviting me to
this event is the fact that there is an
additional dimension like a third
dimension to this plot that you know it
hasn't been covered here and namely the
fact is how do we build autonomous
agents for nontechnical users.
So what I'm going to be arguing today is
that there are two types of autonomy.
One of it is more supervised. So think
of the you know Tesla FSD example. When
you sit in a Tesla, you're still
expected to have a driving license.
You're going to be sitting in front of
the steering wheel. Perhaps 99% of the
time, you're not going to use it, but
you're there in order to take care of
the longtail events. And similarly, a
lot of the coding agents that we have
today require you to be technically
savvy in order to use them correctly.
We at Reply and uh other companies at
this point are focusing on kind of the
Whimo experience for autonomous coding
agents. So you're expected to sit in the
back. You don't even have access to the
steering wheel. And I expect you
basically not to need any driving
license. Uh why is this important?
Because we want to empower every
knowledge worker to create software. And
I can't expect knowledge workers to know
what kind of technical decisions an
agent should be making. We should
offload completely the level of
complexity away from them.
Of course, it took a while to get here.
So I'm I'm sure what I'm showing you
here is something that all of you are
very familiar with. It took several
years to go from I know maybe less than
a minute feedback loop constant
supervision and talking about
completions and talking about
assistance. These are areas where the AI
power is and really been pioneering this
this type of user interaction. Then we
slowly climbed through you know higher
levels of autonomy. So we had the first
version of the agents based on on react.
So we concocted autonomy with a very
simple paradigm on top of LMS. Then
likely AI providers understood that tool
calling was extremely important poured a
lot of effort on that. So we built the
next version of agents with native tool
calling. And then I would say there is a
third generation of agents which I call
autonomous and that's when we started to
break the barrier of say one hour of
autonomy. Basically the the agent being
capable of running on long horizon tasks
and remaining coherent. It happens to be
the case that those are also the
versions of rapid agent that we launched
over the last year. So the B3 is the one
that we launched a couple of months ago
and it has exactly showcases those
properties. So the question for today is
can we actually build fully autonomous
agents and how do we get there.
So I'm going to try to redefine the
definition of autonomy today. I think
that often times we conflate autonomy
with a concept of something in the lungs
for a for a lot of time and usually as a
user you lose control. In reality what
the autonomy that I want to give to
agents can be very specifically scoped
and what I mean by that is especially
with rapid agent tree what we accomplish
is we we make sure that our agent takes
holy technical decisions. Of course,
that could lead to very long gap between
the different user interactions and in
case the agent again runs for several
hours. But this happens if and only if
the scope of the task you're giving to
the agent is really broad. And it turns
out that in reality you can have an
agent that is really autonomous and is
still fast as long as you give it a very
narrow scope for the task, you know, at
hand. So what we can accomplish in this
way is that the user still maintains
control on the aspects that they care
about and a user cares about what
they're building. Especially again our
users, knowledge workers, they don't
care about how something has been built.
They just want to see their goals to be
accomplished. So autonomy should not be
basically conflated with long run times.
And similarly, it shouldn't become a
dity metric. You know, a lot of us are
talking about it as a as a badge of
honor. And it's definitely been exciting
to see in the last few months that you
know many of us broke the barrier of uh
running several hours in a row. But I
think in terms of how to build agents
that are going to be more powerful and
more suitable in the future, we kind of
have to change a bit uh the the target
the metric that we that we keep in mind.
So think about it in this way. Tasks
have a natural level of complexity and
basically what we care about is that
they have a minimum irreducible amount
of work that they express. What agents
do is that they always go through this
loop of planning, implementing and
testing. And of course to make this
happen and to make it work correctly,
you want this work to be happening over
a long quing trajectory. So our goal is
to maximize the reducible runtime of the
agent. By reducible, I mean having a
span of time where the user doesn't have
to make any technical decisions and the
agent can accomplish the task again in
full autonomy. This is especially
important for us because I can't trust
our users to make technical decisions.
So they they need a proper technical
collaborator by their side. I want to
abstract away as much complexity as
possible from the process of software
creation. And last but not least, I want
the users to feel in control of what
they're creating without startling their
creativity because they have also to
think about the technical decision that
the agent is making.
So now what are the pillars of autonomy?
How are we making this happen? I would
say there are three pillars that are
extremely important to think about. The
first one is of course the capabilities
of frontier models like the baseline IQ
that we inject in the main agentic loop.
I'm going to leave this as an exercise
to the reader and to other people in the
room. I'm really glad a lot of you are
building amazing models that you know we
use all the time at Rabbit. So this is
the pillar number one. The second pillar
is verification. It's very important
that we test for local correctness of
our agent at every step that it takes
and the reason is fairly intuitive. If
you are building on very shaky
foundations, eventually the castle will
topple down. So we brought verification
in the loop to make sure that in a sense
you are having you know nines or
reliability whereing the compounding
errors that an agent will make
unavoidably if you know you don't put
any control on it. And last but not
least, you heard it on stage even
earlier. I'm sure are going to be
hearing this you know the entire day or
the entire duration of the conference.
Uh the importance of context management.
So on one end you want to have an agent
that is capable of being globally
coherent. So it's align with the intent
of the user the expectation of the user
but at the same time it is also to be
capable of managing both the high level
goal and the single task that the agent
is working on. I think we made amazing
progress in the last months on context
management. But I'm also excited to see
you know where we're going as a field.
Let's start from the first pillar that
we work actively at rapid which is verification.
verification.
So why did we focus on this? Over the
know last year we realize something that
I think each one of you has experienced.
So without testing agents build a lot of
painted doors. In our case the painted
doors are very visible because we create
a lot of web applications. So you end up
basically trying to click on a button
and the handler is not looked up or some
of the data that we're showing is
actually mock data and it's not coming
it's not coming from a database. But in
general this phenomenon spans you know
across every type of component you're
building being it front end or back end
a lot of components are actually not
fully fleshed uh by the agent. So we run
some evaluations internally. We found
out that more than 30% of the individual
features happen to be broken know the
first time that are cooked by the agent.
And that also means that almost every
applications have at least one broken
feature or painted door. They're hard to
find. The reason is users are not going
to spend time testing every single
button, every single field. And this is
also probably one of the reasons why a
lot of our users, especially the
nontechnical ones, still can't trust
coding agents very much. They are
shocked when they find that there is a
painted door out there. So, how do we
solve this problem?
Fundamentally, we need an agent must
gather all the feedback that they need
from their environment, right? It's
easier said than done. Um again
nontechnical users not only cannot make
technical decisions but also they cannot
provide the technical feedback that you
know an agent is required to make
progress and most what they can do is
basic you know quality assurance
testing. They can literally go around
the UI click interact with the
application. I'm I'm sure you have tried
it in your life. This is extremely
tedious to do and it leads to a very bad
user experience. And even though we
relied on that with our first release of
the agent last year, quickly we found
out that users don't want to spend time
doing testing. So we had to find a
complete, you know, orthogonal solution
to that which is autonomous testing and
it solves several different issues. The
first one is it breaks the feedback
bottleneck. Even if again we ask
feedback to the user, we were not given
enough of that. Now we don't have to
wait anymore for human feedback. we have
a way to elicit as much information as
possible from the app autonomously. We
also want to prevent the accumulation of
small errors. What I was saying before,
we don't want to have compounding errors
while the agent is building. And last
but not least, we have to overcome the
laziness of frontier models. So we need
to verify that whenever a model tells us
that a task has been completed, there is
actually the truth and that result is
not being elucinated.
There is a wide spectrum of code
verification that you know you you can
accomplish. I think we all started from
the very left. You know you have basic
study code analysis with LSPs. We have
been executing the code since we had
basically lams that were capable of
debugging and then we slowly started to
move towards the right. So generating
unit tests and running them it has a
limitation. It's limited only to
functional correctness. Uh unit testing
is not very powerful to do like proper
integration testing by definition. We
started also to do now API testing but
it's only limited to API code. So you
can test endpoint of an applications you
can't really test how a web app
functions and looks like and for this
reason in the last few months has and
other companies are putting a lot of
effort in really creating autonomous
testing based on the browser you know in
case the app that we're building is a
web application. There are two main
categories here. One is computer use.
It's a onetoone mapping with user
interface. So the model is directly
interacting with the application. It
requires screenshots. It tends to be
fairly expensive and fairly slow. I'm
sure you you tested it yourself. A good
way in the middle is browser use where
we simulate the user interface. You can
then interact with the browser and with
the web application and it relies on
basically accessing the DOM through abstractions.
abstractions.
So how do we how do we make this work in
Weblet? Um what we do is that we
generate applications that are amenable
to testing and we sort of merge
everything together from the previous
slides that I showed you. So we allow
the our testing agent to interact with
an application and gather screenshots in
case nothing has worked. So we have a
full back to computer use. But the vast
majority of times what we do is that we
have programmatic interactions with the
applications. So we interact with the
database, we read the logs, we do API
calls, we literally click on the app and
get back all the information that we
need. And by putting all of this
together, we collect enough feedback
that allows our agent both to make
progress and also to fix all the painted
doors that it encounters.
Just a know short technical deep dive on
how we accomplish this. I'm sure you
have seen a lot of the toolbased uh
browser use. There are amazing libraries
out there. First one comes to man stage
and the idea is that you have an agent
that has a few very generic tools
exposed. So know the agent can create a
new tab, can click, can fill forms etc
etc. The limitation here is that it's
difficult to enumerate all the different
type of interactions you could be having
with a browser. The problem of testing
is very similar to the Tesla analogy I
was making before. Maybe this cardality
of tools available is enough for 99% of
the interaction types. But then there is
always a long tail of idiosyncratic
interactions that a user makes with the
with a web application that are hard to
map into these tool these different tool
codes. So what we do uh in our case at
rapid is we directly write playrite code
and playwrite code is first of all very
manable for LLMs. LLMs are kind of
amazing at writing playright. You know
this is the experience that we had uh
since we started to work on this project
is also very powerful and expressive. So
in a sense it's a super set of what you
can express uh on the compared to the
left on the tools uh testing. And last
but not least, there is beauty in
creating playright code because you can
reuse those tests. The moment you write
a test in script, then you can rerun it
as many times as you want. So in a
sense, the moment you created a test,
you're also creating a regression test
suite that you can keep running in the
future. And all these kind of uh tricks
that I explained to you right now, they
helped us to create something that is
roughly a order of magnitude cheaper and
faster compared to computer use. And
we'll go back later on how important
latency is.
The second thing that the second pillar
that I wanted to talk about today of
course is context management. And I'm
going to go very fast here because I
think you're going to be hearing a lot
of talks today about it. The the high
level message here is that long context
models are not needed to work on quer
and long trajectories. Uh from
experience we found that most of the
tasks even the more ambitious one can be
accomplished within the 200,000 tokens.
So we're still not in a world where
working with models that have 10 million
or 100 million uh context windows is
necessary to actually run autonomous
agents. And we accomplish this by means
of learning how to do context management
correctly. So first of all, there are
several different ways to maintain state
which don't imply chucking all the state
into your context window. You can do
that for example by using the codebase
itself to maintain state. So you can
write documentation while the agent is
creating new code. You can also include
the plan description and all the
different task list that the agent is
working on. You can persist them on the
file system. So even there like have a
lot of ways to offload your memories.
And last but not least and this is
something I think you know Antropic has
been uh really evangelizing about um you
can even dump directly your memories in
the file system and then making sure
that your agent decides when to write
them back the moment they become
relevant to your work. So for this
reason we have been seeing a lot of
announcements in the last couple of
months. I just picked this one from
entropic you know with cloth sonet 4.7
so I wish 4.5 uh they have been able to
run uh focus task for more than 30 hours
in a row we have seen similar results
from open AI on the math problems. So I
think we we kind of broke the barrier of
running for long and you know being able
to have querant tasks.
I would say the key ingredient to make
this happen has been how good models
hand as agent builders have become in
doing sub agent orchestration. Subages
basically work by means of they are
invoked in the core loop. So it's a
completely it's starting from a blank
slate uh from a completely fresh
context. You as an agent builder decide
what subset of the context to inject
when this sub agent starts. And it's a
concept that is very similar I think to
everyone who's been writing software you
know in the last decades is separation
of concerns. So you decide what your sub
engine is going to be working on. You
give it the least possible amount of
context. You allow it to run to
completion. You only get the output the
results. You inject them back into the
main loop and you keep running in this
way. Of course it significantly improves
the number of memories per compression.
I just brought this plot from directly
from reputation run in production. The
moment we kicked in our new subvision
orchestrator on the ax on the y-axis you
can see the number of memories per
compression. So we went from roughly 35
to 4550 recently. So big improvement in
terms of how often we are recompressing
our context just because we can offload
a lot of the context pollution by means
of using sub aents.
I'm going to give an example where this
made the difference for us. You know
what I'm showing you here is more kind
of a cost optimization in a sense like
you're compressing less. You also have
separation of concerns which definitely
make your agent a bit smarter. In the
case of testing
working with sub agent was almost
mandatory for us and basically we
started to work on automated testing
even before we were very advanced in
terms of subent orchestration. And what
we found out is of course again as I was
saying before it makes things easier
better cost less pollution but when you
allow the main loop not only to create
code but also to do browser opt browser
actions to put back the observation of
your browser actions into the main loop
you tend to confuse the the hent loop
very much because at this point there is
a lot of heterogeneity in terms of the
action that your main loop is looking
at. So in order to make this work not
only we have to build all the playright
framework that I was showing to you
before but we also have to move our
entire architecture into sub agents. So
at this point you can see very clearly
why there is a separation of concern
here. Get the main agent loop running.
We decide at a certain point that it's
time to verify if the output of the
agent has been correct. We make this
happen all within a sub agent. Then we
scratch the context window of that sub
agent. We just return back the last
observation to the agent loop and then
we keep running in that way. So if
you're having issues today making your
sub agents uh work correctly, this is
one of the reasons why that you want to
take a look at.
So I think we covered the high level of
how to create more and more powerful uh
autonomous agents over time and I only
see us as a field becoming even more
proficient than that in the next months.
There is one additional ingredient
though that is going to make the
difference and it's parallelism. And I
will argue that parallelism is important
not because it's going to make agents
more powerful per se, but rather because
it's going to make the user experience
more exciting. So of course it is great
to have an agent that is capable of
running autonomously for long, but at
the same time it comes with the price of
making the user experience less
thrilling. You are not in the zone
anymore. What you do is that you write a
very long prompt. It's translated into a
task list. Uh and then you go to have
lunch with your colleagues and then you
come back and you hope that the agent is
done. That is not the kind of experience
that most of the productive people want
to have in life. You know, you want to
see as much work as done as possible in
the shortest span of time.
So what we do as a as a field at this
point has been to create parallel
agents. It's a very common trade-off
which by the way doesn't only apply to
agents. it it applies to computing in
general and for parallel agents what you
do is that you you trade off basically
extra compute in exchange for time. Why
there is this trade-off? So first of all
when you're running agents in parallel
you're gathering the same context in
multiple context windows. So every
single parallel agent that you will be
running probably shares say 80% of the
context across the board. So of course
you are just putting more computed work
because you're running those agents in
parallel. There is also another cost
that is kind of intangible for a lot of
you here in the room because I'm sure
you're all expert software developers.
But what do you do with the output of
multiple par agents at the end? Often
times you need to resolve merge
conflicts. So as a reminder, my users
don't even know what's the concept of
merge conflicts. It's something that I
have to figure out on our own. So the
current way in which we think of
parallel agents in in the space doesn't
really apply to rapid. Now at the same
time I still want to very much to
accomplish this. There are so many
interesting features that you can enable
with parallelism. Aside from the fact
that you can get more work done u at
times you want to you want testing to be
running in parallel with the agent that
creates code. Testing no matter how much
we optimize it is still very slow. If an
agent is only spending time on testing
users are not going to be engaging with
your application anymore. Um, at the
same time, it's also great to have a
synchronous process running while your
agent is running because you can inject
useful information back into the main
core loop. And last but not least is a
very common technique that we know boost
performance if you have enough budget to
do so. You should be sampling multiple
trajectories at the same time. So a lot
of perks are coming with parallel
agents. But u the way in which we
implement them today which I call
basically call user has an orchestrator
is the fact that tasks the parallel task
that you want to run are determined by
you by the user and each task is
dispatched in its own thread. So there
is a bit of manual process even the task
de composition in a sense is happening
in your mind while you're thinking about
which agents you want to run and then
the moment you get back all the results
you need to go through the problem of
merge conflicts and often times this is
not trivial at all no matter how many
amazing tools are out there. So what
we're working on today for our next
version of the agent is having the core
loop as the orchestrator. So the key
difference here is the fact that the the
subtask that we're going to be working
on are not determined by the user but
they are determined by the corion loop
and the parallelism is basically decided
on the fly. The agent does the task de
composition on behalf of the user and
this comes with a couple of advantages.
First of all again there's no cognitive
burden to for the user to understand how
they should be decomposing the task. At
the same time also there are ways in
which you can create tasks that sort of
mitigate the problem of merge conflicts.
I'm not claiming that we're going to be
able to mitigate it 100%. There are so
many corner cases in which merge
conflict will still represent a problem
but there are a lot of different
techniques known in software engineering
to make sure that you can try to have
multiple subage and not stepping on each
other toes. So the core loop as an
orchestrator is going to be the our main
bet for the next few months.
And in case you're passionate about
these topics,
[music] I'm always hiring a rabbit.
Thank you. [applause]
From transforming support tickets into
merge requests to helping teams ship
fixes faster than ever, our next
presenter has been at the center of
Zapier's AI agent journey. Please
[music] [applause]
Hello.
I'm so excited to tell you about how at
Zapier we are empowering our support
team to ship code. Before I tell you
about that, has anybody here visited the
Grand Canyon?
It's a good amount. Anybody rafted
through the Grand Canyon?
I see one person. I just got off an
18-day trip rafting through the Grand
Canyon over 200 miles. It was
incredible. No internet, no cell
service. The moment I got off, I found
out I was giving this talk. I didn't
think about uh work at all on the river,
but once I got off, I started thinking
about the parallels between the Grand
Canyon and Zapier. And we have one thing
in common and that is erosion.
Now natural erosion happens over
millions of years with wind, water and
time. It creates the beautiful canyon
that we experience and it's never
stopping, always continuing. At Zapier,
we have over 8,000 integrations built on
third party APIs and they are constantly
changing, which I'm now thinking of as
app erosion.
We've been around for 14 years. Some of
our apps are that old. API changes and
deprecations impact us and create
reliability issues. Again, it never stops.
stops.
So, I like to think of our apps as like
layers in the Grand Canyon, and they
need constant attention.
So, if we were to create our own Zapier
Canyon and our apps would be at the
walls, here's our support team flowing
down the middle watching out for app
erosion. And we have a backlog crisis.
Tickets were coming in faster than we
could handle them.
Creates integration reliability issues,
poor customer experience, even churn. So
to solve for app erosion, we kicked off
two parallel experiments. The first was
moving support from just triaging to
also fixing these bugs. It's experiment
number one. Experiment number two, we
were asking can AI help solve app
erosion faster.
So let's jump into experiment one. This
get kicked off two years ago, but had to
start with the why. We needed to get
that buy in to empower our support team
to ship code.
So apparosion is one of the major
sources of bugs coming through to from
support to engineering. So there's a big
need support is eager [laughter] for
this experience to a lot of them want to
go into engineering eventually and
unofficially many support members were
already helping to maintain our apps.
This moves us into how we started this
out. Put on some guard rails. We started
with just four target apps to uh focus
our fixes on. Engineering was set to
review any merge requests coming from
support and we kept the focus on app fixes.
fixes.
So jumping into experiment 2, this is
what I've been leading for the last
couple of years. How can we use codegen
to help solve for app erosion? And so
fortuitously, the name of this project
is Scout, which ties in so well to the
Grand Canyon experience that I've just
been through.
As any good product manager, we start
with discovery. We did some dog fooding,
so I shipped some app fixes. Uh we
shadowed engineers and support team
members as they were going through the
app fix process. We designed out uh what
are the pain points experienced along
the way, what are the phases of the work
and how much time is spent.
One big discovery we had is how much
time is spent gathering the context
going to the third party AP API docs
even crawling the internet looking for
information about a bug that's emerging
maybe somebody else has already
discovered and solved for it outside of
Zapier. internal context, logs, all of
this is a lot of context to go and
search for as a human uh and a lot to
gro and work through. This is something
we knew we needed to solve for.
where we started with all this great uh
opportunities and pain points is we
started building APIs that we believed
would solve for these individual um pain
points and some of these APIs are using
LLMs to you know for our diagnosis tool
gathering all that context on behalf of
the uh support person engineer and
curating that context building a
diagnosis that's [clears throat] using
an LLM. And then some aren't like we
have a unit test uh or unit test
generator is, but the um test case
finder is simply using a search query to
look for the right test cases to pull in
for your unit test. We built a bunch of
APIs. We had a bunch of great ideas. So
there was a lot for us to test with, but
we ran into some challenges in this
first phase. We had APIs but they were
not embedded into our engineers process.
So our tool I just said they don't like
to go to so many web pages to find all
their context. They would love all this
information to come to them. And yet our
web interface where we've we've created
a playground we call autocode internally
where you can come and play around with
our APIs. And our ask to the teams was
come try out our APIs and give us feedback.
feedback.
Now this is just one more window to go
to. So we didn't get a lot of
engagement. Also because we had shipped
so many uh APIs our team was spread
pretty thin. Cursor launched at the same
time which has gotten great adoption at
Zapier. We're all huge fans of cursor.
But from our side, it made some of our
tools no longer necessary.
But there was one major win in this
phase, which is one of our APIs became a
support darling. It's diagnosis. That
number one pain point of needing to go
out and find all of your context, curate
it for yourself so you can start solving
the problem. We were doing that on uh
the support team's behalf with the
diagnosis API
and support loved it enough that they
decided to embed it into their process.
They asked us to build a zap year
integration on our autocode APIs so they
could embed it into their zap that
creates the jur ticket from the support
issue and now diagnosis is included.
So embedding tools is the key to usage
as we find out. So how can we embed more
of our tools? Well, then MCP spins up
and that solves our problem.
We can now embed these API tools into
our engineers workflow. Specifically,
our engineers are pulling in these MCP
tools as they're using Cursor.
Our builders using Scout MCP tools are
leaving the IDE less, spending more time
in one window.
Still coming into challenges. One of our
uh our our key tool diagnosis
uh is so valuable to pull all that
context and to provide a recommendation,
but it takes a long time to run. Now, we
might run down that runtime. However, as
you're working synchronously on a ticket
in your ID, this was frustrating. We
also weren't keeping up with the
customization needs. Not only did MCP
launch and we started leveraging it, Zap
Your MCP launched too. And some of our
tools, if we weren't keeping up with the
customization needs, our engineers
internally looked to Zap Your MCP, which
is great. We're all on the same team
solving the same problem, but some of
our tools had a dead end. Also adoption
was scattered. We had a whole suite of
tools and we thought there was value in
each of them as it solves for different
problems across the different stages.
Not every engineer was using our tools
and if they were using tools, they're
only using a few of them. So we have
tool usage. We're happy about that. But
we were under the hypothesis that true
value is going to come from tying these
tools together.
So what if we owned orchestration of
these tools rather than saying here's a
suite of tools you use them as you wish
what if we combined them and created an
agent to orchestrate this. So this we
are calling scout agent. We take that
diagnosis run that against a ticket uh
use that information to actually spin up
a codegen tool which will then produce a
merge request using all the right context.
context.
So who would benefit the most from
orchestration? There are several
integration teams at Zapier who are
solving for these app fixes of various
levels of complexity and there's the
support team. So when we're saying who
should be our first customer scout
agent, we're thinking it should probably
be the the team fielding small bugs that
are emergent and coming hot off the
queue which is the support team. And now
our two experiments merge
and we have scout agent. We are building
for the support team.
And this is the flow of how it works.
Support is submitting an issue to scout
agent. We first categorize the issue. We
next assess its fixability.
Not every issue that comes from support
can be fixed. If we thinks it's fixable,
we'll move on to generating a merge
request. At that point, the support
team, this is the first time they're
picking up the ticket. It already has a
merge request attached to it. They'll
review and test. If it's not satisfying
what they believe is the actual solution
or the the what what the solution should
be to best address the customer's need,
they will make a request for an
adjustment that can happen right in
GitLab, which is where we do our work
and Scout will do another pass and
hopefully at that point we've gotten it
right and support can submit that MR for
review from engineering.
How we are running Scout, it's all
kicked off by a Zap. This is a picture
of one of our zaps. There are many zaps
that's run this whole process and it
embeds right into our support team's
zaps. We do a ton of dog fooding at Zapier.
Zapier.
We first run diagnosis and post that
result to the Jira ticket saying what
the categorization is. If we believe
it's fixable and then if we do believe
it's fixable, we then are kicking off a
GitLab CI/CD pipeline.
And we run three phases in that
pipeline. plan, execute and validate to
generate this merge request. The tools
used in this pipeline is Scoutm. So all
those APIs we invested in a year ago now
are really coming together and we're
orchestrating it uh within the GitLab
pipeline and we're also leveraging
cursor SDK.
Once the merge request has been
completed, we attach it to Jira and
support picks it up.
The latest addition to this is doing a
rapid iteration once a um uh once a
ticket has been posted with the merge
request and support team is looking at
it and they say you know it needs some
tweaks to save them more time so they
don't have to go pull that down to their
ID do the fixes and push it back up.
they can simply chat with the uh scout
agent in gitlab that'll kick off another
uh pipeline which does that phase with
that new feedback and posts the new
merge request
on our side we want to make sure scout
agent is working so we ask three
questions the categorization right is
was it actually fixable uh and was the
code fix accurate so far we have two eval
eval
to 75% accuracy for categorization and
fixibility. As we get more feedback and
process more tickets, those become our
test cases and we can move forward
improving scout agent over time. So what
has been scout agents impact on app erosion?
erosion?
40% of supports support teams app fixes
are being generated by scout. So we're
doing more of the work on behalf of the
support team.
This is resulting and for some of our
support team it's doubling their
velocity from one to two tickets per
week which already is amazing. That's
going from a support team that wasn't
shipping any fixes, well unofficially
they were sometimes to now shipping one
to two per week per person to now
shipping three to four with the help of Scout.
Scout.
Another uh process improvement, Scout
puts potentially fixable tickets right
there in the triage flow. takes away a
lot of the friction of looking for
something to grab from the backlog.
It's not just the support who's
benefiting, it's also engineering.
Engineering manager said, uh, it's a
great example of when it works. This
tool allows us to stay focused on the
more complex stuff.
And if you take away anything from this
talk, I hope it is that there is a
really powerful magic between support
and empowering them with codegen and
allowing them to ship fixes because they
have three superpowers. The first they
are the closest to customer pain which
mean they're closest to the context that
really matters for figuring out what's
the problem and how to solve it. They're
also troubleshooting in real time. These
tickets aren't stale. the context is
fresh, the logs aren't missing. You put
this ticket into engineering backlog
months later, you might not get access
to those logs anymore. And then three,
they're best at validation.
You've again you put the same ticket
into an engineering backlog. The
solution an engineer might come up with
may change the behavior and that might
be good for some customers but might not
necessarily be best for that one
customer who wrote in about the problem.
And one other major benefit of this is
uh support team members who have been part of this experiment are now
part of this experiment are now engineers.
engineers. I want to say thank you to the amazing
I want to say thank you to the amazing team who's helped built this process or
team who's helped built this process or built all the tools and the scout agent.
built all the tools and the scout agent. Andy is actually here in the audience.
Andy is actually here in the audience. So shout out to Andy. If you want to
So shout out to Andy. If you want to talk about any of the technical bits,
talk about any of the technical bits, he's here. And I want to impress upon
he's here. And I want to impress upon you two things. or hiring, but mostly if
you two things. or hiring, but mostly if you haven't rafted through the Grand
you haven't rafted through the Grand Canyon, please consider it. It's
Canyon, please consider it. It's lifechanging and you should go with ORS.
lifechanging and you should go with ORS. Thank you very much.
Thank you very much. [applause]
Our next presenters believe that [music] 2026
2026 is the year the IDE died. Please join me
is the year the IDE died. Please join me in welcoming to the stage engineering
in welcoming to the stage engineering leader at Source Graph and AMP, Steve JG
leader at Source Graph and AMP, Steve JG and author and researcher at IT
and author and researcher at IT Revolution, Jean Kim.
Revolution, Jean Kim. [music]
Hey everybody. Um, really happy to be here. I'm going to be talking the first
here. I'm going to be talking the first half. Co-author here, Jean Kim, is going
half. Co-author here, Jean Kim, is going to talk second half. All right. Looking
to talk second half. All right. Looking forward to it. Cheers. All right. Today
forward to it. Cheers. All right. Today I'm going to Well, we're going to talk
I'm going to Well, we're going to talk real fast. This time is going to go down
real fast. This time is going to go down fast. Uh I'm going to talk to you about
fast. Uh I'm going to talk to you about what tools look like next year. Last
what tools look like next year. Last year I was talking to you all about chat
year I was talking to you all about chat and everybody ignored me and now
and everybody ignored me and now everybody's using chat this year and
everybody's using chat this year and it's like we're gonna we're going to fix
it's like we're gonna we're going to fix that right now. All right. So here's
that right now. All right. So here's what it's looked like. I'm going to tell
what it's looked like. I'm going to tell you right now, everyone's in love with
you right now, everyone's in love with Cloud Code. There's probably 40
learning happening and it also helps to boost inner source
and it also helps to boost inner source contributions and then visit engineer
contributions and then visit engineer idea right often times team A wants to
idea right often times team A wants to do something team B let's say a platform
do something team B let's say a platform team have different prioritization and
team have different prioritization and the way we solve this is via inner
the way we solve this is via inner source or via visit engineer we just
source or via visit engineer we just move someone over the team work for six
move someone over the team work for six months a year get it done and then we
months a year get it done and then we can move on Um the last one is
can move on Um the last one is interesting. So our data shows
interesting. So our data shows individual contributors have a much
individual contributors have a much better stronger adoption than our
better stronger adoption than our leadership team. Now if you think about
leadership team. Now if you think about this a lot of software TLS and managers
this a lot of software TLS and managers in the age of AI they kind of don't
in the age of AI they kind of don't really have
really have um enough experience to truly guide
um enough experience to truly guide their teams to build software right so
their teams to build software right so often times the stuff that they learned
often times the stuff that they learned before might not be exactly applicable
before might not be exactly applicable it's still very valuable but there's
it's still very valuable but there's some missing piece there to make sure
some missing piece there to make sure they can continue to guide the team to
they can continue to guide the team to do the right thing. So, we're rolling
do the right thing. So, we're rolling out leadership workshops to make sure
out leadership workshops to make sure our leaders are equipped with whatever
our leaders are equipped with whatever knowledge they need to have to drive the
knowledge they need to have to drive the techn um innovation.
techn um innovation. So, um I'm going to close my part and to
So, um I'm going to close my part and to share with you what uh the part I'm I
share with you what uh the part I'm I feel most excited about. The part I feel
feel most excited about. The part I feel most excite most excited about is that
most excite most excited about is that with a lot of um creativity and
with a lot of um creativity and innovation in the geni space, it
innovation in the geni space, it actually changes the cost function of
actually changes the cost function of software engineering.
software engineering. Meaning
Meaning the trade-off decision of whether we do
the trade-off decision of whether we do something versus we don't do something
something versus we don't do something actually changed because some of the
actually changed because some of the work become a lot cheaper to do and some
work become a lot cheaper to do and some work become a lot more expensive to do.
work become a lot more expensive to do. I tend to think it is a great
I tend to think it is a great opportunity for engineers and
opportunity for engineers and engineering leaders to get back to some
engineering leaders to get back to some of the uh basic principles and sort of
of the uh basic principles and sort of ask a soul searching question. What is a
ask a soul searching question. What is a high quality soft engineering and how
high quality soft engineering and how can we use a tool for that purpose? So
can we use a tool for that purpose? So that's it. Thank you very much.
that's it. Thank you very much. [applause]
[applause] [music]
[music] Our
next speaker helped to reimagine a beloved browser from Arcadia by
beloved browser from Arcadia by rebuilding it around AI native
rebuilding it around AI native experiences.
experiences. Please welcome to the stage head of AI
Please welcome to the stage head of AI engineering at the browser company Samir
engineering at the browser company Samir Motti. [music]
Hey everyone. Oh wow. How's it going? Um, my name is Samir and I'm the head of
Um, my name is Samir and I'm the head of AI engineering at the browser company of
AI engineering at the browser company of New York. And today I'm going to talk a
New York. And today I'm going to talk a little bit about how we transitioned
little bit about how we transitioned from building ARC to DIA and the lessons
from building ARC to DIA and the lessons we learned in building an AI browser.
we learned in building an AI browser. But first, a little about the browser
But first, a little about the browser company.
company. So we started with a mission to rethink
So we started with a mission to rethink how people use the internet. At its
how people use the internet. At its core, we believe that the browser is one
core, we believe that the browser is one of the most important pieces of software
of the most important pieces of software in your life and it wasn't getting the
in your life and it wasn't getting the attention it deserved.
attention it deserved. Simply put, the way we've used a browser
Simply put, the way we've used a browser has changed over the last couple
has changed over the last couple decades, but the browser itself hadn't.
decades, but the browser itself hadn't. And think about this, we we started this
And think about this, we we started this company in 2019. Um, and so this is a
company in 2019. Um, and so this is a screen cap of Josh, our CEO, sharing a
screen cap of Josh, our CEO, sharing a little bit about our idea on the
little bit about our idea on the internet a few years ago, which we
internet a few years ago, which we endearingly called the internet
endearingly called the internet computer. So our mission has been to
computer. So our mission has been to build a browser that reflects how people
build a browser that reflects how people use the internet today and how we think
use the internet today and how we think the browser should be used tomorrow.
the browser should be used tomorrow. So through years of discovery, trial and
So through years of discovery, trial and error, and some ups and some downs, we
error, and some ups and some downs, we shipped our first browser, Arc, in 2022.
shipped our first browser, Arc, in 2022. It was a browser we felt was an
It was a browser we felt was an improvement over the browsers of that
improvement over the browsers of that time. It made the internet more
time. It made the internet more personal, more organized, and to us a
personal, more organized, and to us a little more delightful with a little
little more delightful with a little more craft.
more craft. And it was a browser that was loved by
And it was a browser that was loved by many. It still is by millions. many of
many. It still is by millions. many of whom are probably in this audience
whom are probably in this audience today. I've gotten a lot of questions
today. I've gotten a lot of questions about ARC today. Um, and it's great, but
about ARC today. Um, and it's great, but um, if we took a step back, we felt that
um, if we took a step back, we felt that ARC was still just an incremental
ARC was still just an incremental improvement over the browsers of that
improvement over the browsers of that time. And it didn't really hit the
time. And it didn't really hit the vision that we set out to create. And
vision that we set out to create. And so, uh, we kept building. And then in
so, uh, we kept building. And then in 2022, we got access to LLMs like the GPT
2022, we got access to LLMs like the GPT models. And so we started like we always
models. And so we started like we always do with prototyping. We started trying
do with prototyping. We started trying new ideas um and eventually shipped a
new ideas um and eventually shipped a few of them in ARC. But what started as
few of them in ARC. But what started as a you know a basic exploration turned
a you know a basic exploration turned into a fully formed thesis. In the
into a fully formed thesis. In the beginning of 2024, uh, our company put
beginning of 2024, uh, our company put out what we called act two, a video on
out what we called act two, a video on YouTube where we shared that thesis that
YouTube where we shared that thesis that we believe that AI is going to transform
we believe that AI is going to transform how people use the internet and in turn
how people use the internet and in turn fundamentally change the browser itself.
fundamentally change the browser itself. And so with that, we started building
And so with that, we started building again, but this time we built a new
again, but this time we built a new browser with AI speed and security in
browser with AI speed and security in mind and from the ground up. And later
mind and from the ground up. And later or sorry earlier this year we shipped
or sorry earlier this year we shipped DIA our AI native browser.
DIA our AI native browser. It allows you to have an assistant
It allows you to have an assistant alongside you in all the work you do in
alongside you in all the work you do in the browser. It gets to know you,
the browser. It gets to know you, personalizes, helps you get work done
personalizes, helps you get work done with your tabs and effectively get more
with your tabs and effectively get more work done through the apps you use. And
work done through the apps you use. And while it hasn't achieved our vision yet,
while it hasn't achieved our vision yet, we fully believe it's well on the way
we fully believe it's well on the way too.
So it is not easy to build a product. You all know that. Let alone two. The
You all know that. Let alone two. The latter of which an AI native one. We've
latter of which an AI native one. We've had a lot of years of iteration, trial
had a lot of years of iteration, trial and error. And through that we've
and error. And through that we've learned a lot. And I'm going to just
learned a lot. And I'm going to just talk about a few of those things uh here
talk about a few of those things uh here today.
today. The first I want to talk about is
The first I want to talk about is optimizing your tools and process for
optimizing your tools and process for faster iteration. From the beginning,
faster iteration. From the beginning, browser company has believed that we're
browser company has believed that we're not going to win unless we build the
not going to win unless we build the tools, the process, the platform, and
tools, the process, the platform, and the mindset to iterate, build, ship, and
the mindset to iterate, build, ship, and learn faster than everyone else. And
learn faster than everyone else. And that of course holds true today. But the
that of course holds true today. But the form it takes with AI and an AI native
form it takes with AI and an AI native product has changed.
product has changed. So even as a small company, where are we
So even as a small company, where are we investing in tooling these days? First
investing in tooling these days? First is prototyping for AI product features.
is prototyping for AI product features. Second is building and running evals.
Second is building and running evals. Third is collecting data for training
Third is collecting data for training and for evals. And uh last but
and for evals. And uh last but definitely not least automation for hill
definitely not least automation for hill climbing.
climbing. So let's start with tools. Initially uh
So let's start with tools. Initially uh as we always do we built some tools. The
as we always do we built some tools. The first was a very rudimentary uh prompt
first was a very rudimentary uh prompt editor and it was only in dev builds.
editor and it was only in dev builds. What did what did this mean for us? Well
What did what did this mean for us? Well it meant a few things. One limited
it meant a few things. One limited access as only engineers were able to
access as only engineers were able to access this. two slow iteration speeds
access this. two slow iteration speeds and three none of your personal context
and three none of your personal context and as you all know with an AI product
and as you all know with an AI product the context is what matters and what's
the context is what matters and what's gives you the feel of whether product is
gives you the feel of whether product is good or not. So we evolved and since
good or not. So we evolved and since then we built all of our tools into our
then we built all of our tools into our product. the product that we as a
product. the product that we as a company internally use every day and
company internally use every day and that includes the prompts, the tools,
that includes the prompts, the tools, the context, the models, every
the context, the models, every parameter. Um, which has not only
parameter. Um, which has not only allowed us to 10x our speed of ideating,
allowed us to 10x our speed of ideating, iterating and refining our products, but
iterating and refining our products, but has also widened the number of people
has also widened the number of people who can access and iterate on our
who can access and iterate on our products themselves from our CEO to our
products themselves from our CEO to our newest hire can ideate and create a new
newest hire can ideate and create a new product in DIA and also refine an
product in DIA and also refine an existing one all with their full
existing one all with their full context.
context. And this holds true with all of our
And this holds true with all of our major product protocols. We have tools
major product protocols. We have tools for optimizing our memory knowledge
for optimizing our memory knowledge graph which all of us use and we have
graph which all of us use and we have tools for creating iterating on our
tools for creating iterating on our computer use mechanism. We actually
computer use mechanism. We actually tried tens of different types of
tried tens of different types of computer use strategies before landing
computer use strategies before landing on one before even building it into the
on one before even building it into the product itself.
product itself. And I'll say and I'll end this part with
And I'll say and I'll end this part with uh it actually is a lot of fun. People
uh it actually is a lot of fun. People don't talk about that a lot but uh
don't talk about that a lot but uh actually building these tools into our
actually building these tools into our product has enabled so much creativity.
product has enabled so much creativity. It has enabled our PMs, our designers,
It has enabled our PMs, our designers, uh customer service and strategy and ops
uh customer service and strategy and ops to try out new ideas that are tailored
to try out new ideas that are tailored to their use cases. And that ultimately
to their use cases. And that ultimately is what we're trying to do.
is what we're trying to do. The next thing I want to talk about is
The next thing I want to talk about is how we evolve and optimize our prompts
how we evolve and optimize our prompts through a mechanism called Jeepa. This
through a mechanism called Jeepa. This for us is very nent but an important
for us is very nent but an important learning nevertheless.
learning nevertheless. How we hill climb and refine our AI
How we hill climb and refine our AI products is just as important as
products is just as important as ideating them in the first place. So
ideating them in the first place. So we're investing in mechanisms to help
we're investing in mechanisms to help with this to enable faster hill climbing
with this to enable faster hill climbing and one of those being Jeepa and this is
and one of those being Jeepa and this is based on a paper from earlier this year
based on a paper from earlier this year from a few smart folks.
from a few smart folks. So the key motivation here is simple.
So the key motivation here is simple. It's a sample efficient way to improve a
It's a sample efficient way to improve a complex LLM system without having to
complex LLM system without having to leverage RL or other fine-tuning
leverage RL or other fine-tuning techniques. And for us as a small
techniques. And for us as a small company that's hugely critical.
company that's hugely critical. And how it works is you're able to seed
And how it works is you're able to seed the system with a set of prompts, then
the system with a set of prompts, then execute it across a set of tasks and
execute it across a set of tasks and score them. Then leverage a mechanism
score them. Then leverage a mechanism called PA selection to select the best
called PA selection to select the best ones. And then leverage an LLM on top of
ones. And then leverage an LLM on top of that to reflect on what went well and
that to reflect on what went well and what didn't and then generate new
what didn't and then generate new prompts and then repeat with the key
prompts and then repeat with the key innovations here being around that
innovations here being around that reflective prompt mutation technique.
reflective prompt mutation technique. the selection process which allows you
the selection process which allows you to explore more of the space of
to explore more of the space of prompting rather than one avenue and the
prompting rather than one avenue and the ability to tune text and not weights.
ability to tune text and not weights. And here's a modest uh example of this
And here's a modest uh example of this at work for us. You know, you can
at work for us. You know, you can provide it a very simple uh a simple
provide it a very simple uh a simple simple prompt and run it through JPA and
simple prompt and run it through JPA and it's able to optimize it uh along the
it's able to optimize it uh along the metrics and scoring mechanisms that we
metrics and scoring mechanisms that we uh created to refine that prompt.
And so if I take a step back and talk about kind of how we build uh for
about kind of how we build uh for certain types of features, I would buck
certain types of features, I would buck it into a couple different phases. The
it into a couple different phases. The first is that prototyping and ideation
first is that prototyping and ideation phase where we have widened the breadth
phase where we have widened the breadth of number of ideas at the top of the
of number of ideas at the top of the funnel um and lower the threshold on who
funnel um and lower the threshold on who can build them and how. And so we try
can build them and how. And so we try out a bunch of ideas every week, every
out a bunch of ideas every week, every day from all types of people and we dog
day from all types of people and we dog food those. And if we feel like there's
food those. And if we feel like there's actually real utility there, it's
actually real utility there, it's solving a real problem for us and there
solving a real problem for us and there is a path towards actually hitting the
is a path towards actually hitting the quality threshold that we believe we
quality threshold that we believe we need to hit, then we'll move on to this
need to hit, then we'll move on to this next phase where we collect and refine
next phase where we collect and refine eval to clarify product requirements and
eval to clarify product requirements and then hill climb through code through
then hill climb through code through prompting and automated techniques like
prompting and automated techniques like Jeepa and then dog food as we always do
Jeepa and then dog food as we always do internally and then chip.
internally and then chip. And I do want to kind of double down on
And I do want to kind of double down on these phases. The ideation phase is
these phases. The ideation phase is extremely important just as much as that
extremely important just as much as that refinement phase.
And our goal is to enable faster ideation and a more efficient path to
ideation and a more efficient path to shipping because with all these AI
shipping because with all these AI advancements every week, new
advancements every week, new possibilities are unlocked in DIA. And
possibilities are unlocked in DIA. And it's up to us as a browser, as a product
it's up to us as a browser, as a product to get as many at bats with these new
to get as many at bats with these new ideas and try out as many of them and
ideas and try out as many of them and explore as many of them as possible. At
explore as many of them as possible. At the same time though not underestimating
the same time though not underestimating the path it takes to ship some of these
the path it takes to ship some of these ideas to productions as a high quality
ideas to productions as a high quality experience.
Next uh I want to talk about treating model behavior as a craft and
model behavior as a craft and discipline.
discipline. So what is model behavior to us? It's
So what is model behavior to us? It's the function that defines evaluates and
the function that defines evaluates and ships the desired behavior models. It's
ships the desired behavior models. It's turning principles into product
turning principles into product requirements, prompts and evals, and
requirements, prompts and evals, and ultimately shaping the behavior and the
ultimately shaping the behavior and the personality of our LLM products and
personality of our LLM products and ultimately for us our DIA assistant.
ultimately for us our DIA assistant. So, I'd buck it into a few different
So, I'd buck it into a few different areas. First, it's that behavior design
areas. First, it's that behavior design defining the product experience we
defining the product experience we actually want, the style, the tone, the
actually want, the style, the tone, the shape of responses in some cases. Then,
shape of responses in some cases. Then, it's collecting that data for
it's collecting that data for measurement and training, clarifying
measurement and training, clarifying those product requirements through eval.
those product requirements through eval. And last but not least, it's the model
And last but not least, it's the model steering. It's the building of the
steering. It's the building of the product itself. It's the prompting. It's
product itself. It's the prompting. It's the model selection. It's defining the
the model selection. It's defining the what's in the context window, the
what's in the context window, the parameters, etc. Um, and so much more.
parameters, etc. Um, and so much more. And to us, that that process is
And to us, that that process is iterative, very iterative. We build,
iterative, very iterative. We build, refine, we create evals, and then we
refine, we create evals, and then we ship, and then we collect more feedback
ship, and then we collect more feedback and feed that into our iterative
and feed that into our iterative building process. That could be internal
building process. That could be internal feedback, and that could be also uh
feedback, and that could be also uh external feedback.
external feedback. And so I move on for a second. One
And so I move on for a second. One analogy we've thought about uh is for
analogy we've thought about uh is for model behaviors that to product design
model behaviors that to product design through the evolution of the internet.
through the evolution of the internet. At first websites were functional. They
At first websites were functional. They got the job done. But over time that
got the job done. But over time that evolved as we tried to achieve more on
evolved as we tried to achieve more on the internet and technology advanced. Uh
the internet and technology advanced. Uh product design and the craft of the
product design and the craft of the internet itself grew as well as well as
internet itself grew as well as well as the complexity.
the complexity. And so what might that be for model
And so what might that be for model behavior? Well, at first it was
behavior? Well, at first it was functional. We had prompts, we had
functional. We had prompts, we had evals, we had instructions in and output
evals, we had instructions in and output out. Now we frame it through agent
out. Now we frame it through agent behaviors. It's goal- directed
behaviors. It's goal- directed reasoning, the shaping of autonomous
reasoning, the shaping of autonomous tasks, selfcorrection in learning, and
tasks, selfcorrection in learning, and even shaping the personality of the LLM
even shaping the personality of the LLM models themselves.
models themselves. And so, what might the future hold? I'm
And so, what might the future hold? I'm excited to see. But what we believe is
excited to see. But what we believe is that we are in the early days of
that we are in the early days of building AI products and model behavior
building AI products and model behavior will continue to evolve and into a
will continue to evolve and into a specialized and prevalent function of
specialized and prevalent function of its own even at product companies.
its own even at product companies. And the last thing I'll leave you with
And the last thing I'll leave you with here is that the best people for it
here is that the best people for it might just surprise you. One of my
might just surprise you. One of my favorite stories about building DIA
favorite stories about building DIA these last couple years has been uh the
these last couple years has been uh the formation of actually this model
formation of actually this model behavior team. As I mentioned earlier,
behavior team. As I mentioned earlier, uh engineers were writing the prompts at
uh engineers were writing the prompts at first and then we built these prompt
first and then we built these prompt tools to enable more people at the
tools to enable more people at the company to actually prompt and iterate.
company to actually prompt and iterate. And there was a person on our team on
And there was a person on our team on the strategy and ops team. And he
the strategy and ops team. And he actually leveraged these prompt tools
actually leveraged these prompt tools one weekend to rewrite all our prompts.
one weekend to rewrite all our prompts. And he came in on a Monday morning and
And he came in on a Monday morning and dropped a Loom video sharing what he
dropped a Loom video sharing what he did, how he did it, and why. And a set
did, how he did it, and why. And a set of prompts. And those prompts alone
of prompts. And those prompts alone unlocked a new level of capability and
unlocked a new level of capability and quality and experience in our product.
quality and experience in our product. And consequentially uh it was the
And consequentially uh it was the formation of our model behavior team.
formation of our model behavior team. And so one thing I'd emphasize to you
And so one thing I'd emphasize to you all is to think about who are those
all is to think about who are those people at the company agnostic of their
people at the company agnostic of their role who can help shape your product and
role who can help shape your product and help shape and steer the model itself.
help shape and steer the model itself. It might not be an engineer or it might
It might not be an engineer or it might be it could also be someone on the
be it could also be someone on the strategy and ops team.
strategy and ops team. Next, I want to talk about AI security
Next, I want to talk about AI security as an emergent property of product
as an emergent property of product building. And today, I'm going to focus
building. And today, I'm going to focus specifically on prompt injections.
specifically on prompt injections. So, what is a prompt injection? Well,
So, what is a prompt injection? Well, it's a prompt attack in which a third
it's a prompt attack in which a third party can override the instructions of
party can override the instructions of an LLM to cause harm. That might be data
an LLM to cause harm. That might be data exfiltration, the execution of malicious
exfiltration, the execution of malicious commands, or ignoring safety rules.
commands, or ignoring safety rules. And so here's an example in which you
And so here's an example in which you give uh the context of a website to an
give uh the context of a website to an LLM and instruct it to summarize it.
LLM and instruct it to summarize it. Little did you know that there was a
Little did you know that there was a prompt injection hidden in that
prompt injection hidden in that website's uh HTML. So instead of
website's uh HTML. So instead of actually summarizing the web page, the
actually summarizing the web page, the LM actually gets directed to open a new
LM actually gets directed to open a new website, extracting your personal
website, extracting your personal information and embedding it as get
information and embedding it as get parameters in the website's URL,
parameters in the website's URL, effectively excfiltrating that data.
effectively excfiltrating that data. So, as a browser, prompt injections are
So, as a browser, prompt injections are extremely crucial for us to prevent.
extremely crucial for us to prevent. They're critical to prevent
They're critical to prevent because browsers sit at the middle of
because browsers sit at the middle of what we can call a lethal trifecta.
what we can call a lethal trifecta. It has access to your private data. It
It has access to your private data. It has exposure to untrusted content and it
has exposure to untrusted content and it has the ability to externally
has the ability to externally communicate. And for us, that means
communicate. And for us, that means opening websites, sending emails,
opening websites, sending emails, scheduling events, etc. So, how to
scheduling events, etc. So, how to prevent this? Well, there's some
prevent this? Well, there's some technical strategies we can try. First
technical strategies we can try. First is wrapping that untrusted context in
is wrapping that untrusted context in tax. You can tell the LM, listen to
tax. You can tell the LM, listen to these instructions around these tags and
these instructions around these tags and don't listen to the content around these
don't listen to the content around these tags. But this is easily escapable and
tags. But this is easily escapable and quite trivy, an attacker could still uh
quite trivy, an attacker could still uh leverage a prompt injection on your
leverage a prompt injection on your browser.
browser. Well, another solution we could try is
Well, another solution we could try is separating that data and that
separating that data and that instructions. We can assign uh the
instructions. We can assign uh the operating instructions to a system role
operating instructions to a system role and we can assign a user role for the
and we can assign a user role for the content of the third party and even
content of the third party and even layer on randomly generated tags to wrap
layer on randomly generated tags to wrap that user content to be extra sure that
that user content to be extra sure that the LM listens to the instructions and
the LM listens to the instructions and not the content. And while this can
not the content. And while this can help, there are no guarantees and prompt
help, there are no guarantees and prompt injections will still happen.
injections will still happen. So what do we do? Well, it's on us to
So what do we do? Well, it's on us to design a product with that in mind. We
design a product with that in mind. We have to blend technology approaches and
have to blend technology approaches and user experience and design into a
user experience and design into a cohesive story that actually builds them
cohesive story that actually builds them from the ground up and solves it
from the ground up and solves it together.
together. So, what that might what that excuse me
So, what that might what that excuse me what might that be for a feature in DIA?
what might that be for a feature in DIA? Well, let's take the autofill tool in
Well, let's take the autofill tool in DIA. The autofill tool allows you to
DIA. The autofill tool allows you to leverage an LLM with context, memory,
leverage an LLM with context, memory, and your details to fill forms on the
and your details to fill forms on the internet. It's extremely powerful, but
internet. It's extremely powerful, but as you can imagine, it has some
as you can imagine, it has some vulnerabilities. A prompt injection here
vulnerabilities. A prompt injection here could extract your data and put it on a
could extract your data and put it on a form, and once it's on that form, it's
form, and once it's on that form, it's out of your hands.
out of your hands. So, we try to build with that in mind.
So, we try to build with that in mind. In this case, before the form is written
In this case, before the form is written to, we actually let the user read and
to, we actually let the user read and confirm that data in plain text. This
confirm that data in plain text. This doesn't prevent a prompt injection, but
doesn't prevent a prompt injection, but it gives the user control, awareness,
it gives the user control, awareness, and trust in what is happening. And this
and trust in what is happening. And this is a framing we carry throughout our
is a framing we carry throughout our product and how we build every single
product and how we build every single feature. So here are some examples.
feature. So here are some examples. Scheduling events in DIA, we have a
Scheduling events in DIA, we have a similar confirmation step. Writing
similar confirmation step. Writing emails India, we also have a similar
emails India, we also have a similar confirmation step.
So I've talked about three different things here today. First is optimizing
things here today. First is optimizing your tools and process for fast
your tools and process for fast iteration. Second, treating model
iteration. Second, treating model behavior as a craft and discipline. And
behavior as a craft and discipline. And third, AI security as an emergent
third, AI security as an emergent property of building products.
property of building products. But uh the last thing I want to leave
But uh the last thing I want to leave you with, when we started on this
you with, when we started on this journey to building DIA, we recognized a
journey to building DIA, we recognized a technology shift and we sought to evolve
technology shift and we sought to evolve our product of ARC. We initially came at
our product of ARC. We initially came at it from a hey, how can we leverage AI to
it from a hey, how can we leverage AI to make ARC better, make the browser
make ARC better, make the browser better. But what we quickly learned and
better. But what we quickly learned and adapted to was that it wasn't just a
adapted to was that it wasn't just a product evolution. It was a company one
product evolution. It was a company one and today I shared a glimpse of that.
and today I shared a glimpse of that. How we build and how it's changed a team
How we build and how it's changed a team we've literally created around this and
we've literally created around this and how we think about security for AI
how we think about security for AI products. But really it's so much more.
products. But really it's so much more. It goes beyond that. It's how we train
It goes beyond that. It's how we train everyone here. It's how we hire. It's
everyone here. It's how we hire. It's how we communicate. It's how we
how we communicate. It's how we collaborate and so much more. And if
collaborate and so much more. And if there's one thing I'll leave you all
there's one thing I'll leave you all with, if there's one thing we've learned
with, if there's one thing we've learned over the last couple years, it's that
over the last couple years, it's that when when you recognize that technology
when when you recognize that technology shift, you have to embrace it. And you
shift, you have to embrace it. And you have to embrace it with conviction.
have to embrace it with conviction. Thank you.
Thank you. [applause]
Our next speaker [music] draws on over 20 years in enterprise
draws on over 20 years in enterprise developer experience to ask what will
developer experience to ask what will still matter when AI coding agents are
still matter when AI coding agents are everywhere. Please welcome to the stage
everywhere. Please welcome to the stage executive distinguished engineer at
executive distinguished engineer at Capital 1, Max Canet Alexander.
Capital 1, Max Canet Alexander. [music]
[music] [applause]
Hey, how's everybody doing? Still awake? Okay, great. So
Okay, great. So like the robot voice said, I have been
like the robot voice said, I have been doing developer experience for a very
doing developer experience for a very long time and I have never in my life
long time and I have never in my life seen anything like the last 12 months.
seen anything like the last 12 months. The you know about every two to three
The you know about every two to three weeks software engineers been making
weeks software engineers been making this face on the screen.
this face on the screen. Okay. And if you work in developer
Okay. And if you work in developer experience the problem is even worse.
experience the problem is even worse. You're like this guy on the screen every
You're like this guy on the screen every few weeks. You're like, "Oh yeah, yeah,
few weeks. You're like, "Oh yeah, yeah, yeah, yeah, yeah. Here's the new
yeah, yeah, yeah. Here's the new hotness." And then somebody else comes
hotness." And then somebody else comes up and they're like, "Well, can I use
up and they're like, "Well, can I use the the new new hotness?" And you know,
the the new new hotness?" And you know, people have been doing that for years.
people have been doing that for years. I've been working in developer
I've been working in developer experience for a long time. Everybody
experience for a long time. Everybody always shows up and they're like, "Oh,
always shows up and they're like, "Oh, can I use this tool that came out
can I use this tool that came out yesterday?" And you're like, "No, of
yesterday?" And you're like, "No, of course not." And now we're like, "Uh,
course not." And now we're like, "Uh, maybe yes." Right? And what this leads
maybe yes." Right? And what this leads to overall is the future is super hard
to overall is the future is super hard to predict right now. So
to predict right now. So a I think a lot of people a lot of CTO's
a I think a lot of people a lot of CTO's a lot of people who work in developer
a lot of people who work in developer experience to people who care about
experience to people who care about helping developers are asking themselves
helping developers are asking themselves this question
this question are all of my investments going to go to
are all of my investments going to go to waste like what could I invest in now
waste like what could I invest in now that if I look back at the end of 2026
that if I look back at the end of 2026 I'll be like I sure am glad that I
I'll be like I sure am glad that I invested in that for my developers and I
invested in that for my developers and I think a lot of people have just decided
think a lot of people have just decided well I don't know I guess it's just
well I don't know I guess it's just coding agents and I guess they'll fix
coding agents and I guess they'll fix every single thing about my entire
every single thing about my entire company by themselves.
company by themselves. they're amazing.
The first one is how can we use our understanding of the principles of
understanding of the principles of developer experience to know what's
developer experience to know what's going to be valuable no matter what
going to be valuable no matter what happens. Okay. And what do we need to do
happens. Okay. And what do we need to do to get the maximum possible value from
to get the maximum possible value from AI agents? like what would we need to
AI agents? like what would we need to fix at all levels outside of the agents
fix at all levels outside of the agents in order to make sure that the agents
in order to make sure that the agents and our developers can be as effective
and our developers can be as effective as possible. And this isn't like a minor
as possible. And this isn't like a minor question. These are the sorts of things
question. These are the sorts of things that could make or break you as a
that could make or break you as a software business going into the future.
software business going into the future. So let's talk about what some of those
So let's talk about what some of those things are that I think are no regrets
things are that I think are no regrets investments that will help both our
investments that will help both our human beings and our agents. So the in
human beings and our agents. So the in general one of the framings that I think
general one of the framings that I think about here is things that are inputs to
about here is things that are inputs to the agents things around the agents that
the agents things around the agents that help them be more effective. And one of
help them be more effective. And one of the biggest one is the development
the biggest one is the development environment. What are the tools that you
environment. What are the tools that you use to build your code? What package
use to build your code? What package manager do you use? What llinters do you
manager do you use? What llinters do you run? Those sorts of things. You want to
run? Those sorts of things. You want to use the industry standard tools in the
use the industry standard tools in the same way the industry uses them and
same way the industry uses them and ideally in the same way the outside
ideally in the same way the outside world uses them because that's what's in
world uses them because that's what's in the training set. And look, yes, you can
the training set. And look, yes, you can write instruction files and you can try
write instruction files and you can try your best to try to fight the training
your best to try to fight the training set and make it do something unnatural
set and make it do something unnatural and unholy with some crazy amalgamation
and unholy with some crazy amalgamation that or modification that you've made of
that or modification that you've made of those developer tools. Like you might be
those developer tools. Like you might be you invented your own package manager.
you invented your own package manager. You probably should not do that. you
You probably should not do that. you probably should undo that and try to go
probably should undo that and try to go back to the way the outside world does
back to the way the outside world does software development because then you
software development because then you are not fighting the training set. Um,
are not fighting the training set. Um, and also it means it means things like
and also it means it means things like you can't use obscure programming
you can't use obscure programming languages anymore. Look, I'm a
languages anymore. Look, I'm a programming language nerd. I love those
programming language nerd. I love those things. I do not use them anymore in my
things. I do not use them anymore in my day-to-day agentic software development
day-to-day agentic software development work. as an enthusiast, I do come
work. as an enthusiast, I do come sometimes go and I code on, you know,
sometimes go and I code on, you know, frontline uh software engineering
frontline uh software engineering languages, but not in my like real work
languages, but not in my like real work anymore.
anymore. So, what people ask me sometimes, does
So, what people ask me sometimes, does that mean like we're never going to ever
that mean like we're never going to ever have any new tools again because we're
have any new tools again because we're always going to be dependent on the
always going to be dependent on the tools that the model already knows?
tools that the model already knows? Probably not because like I said,
Probably not because like I said, there's still going to be enthusiasts.
there's still going to be enthusiasts. And also, but like I would like to make
And also, but like I would like to make a point. The thing that I'm talking
a point. The thing that I'm talking about has always been a real problem.
about has always been a real problem. Like there's always some developer at
Like there's always some developer at the company has always come up to you
the company has always come up to you and be like, "Can I use this technology
and be like, "Can I use this technology that came out last week and has never
that came out last week and has never been vetted in an enterprise to run my
been vetted in an enterprise to run my like 100,000 queries per second service
like 100,000 queries per second service that serves a billion users?" And I'm
that serves a billion users?" And I'm like, "No, you can't do that now and you
like, "No, you can't do that now and you can't do that yesterday. It's still the
can't do that yesterday. It's still the same."
same." Uh, another one is
Uh, another one is in order to take action today, agents
in order to take action today, agents need either a CLI or an API to take that
need either a CLI or an API to take that action. Yes, there's computer use. Yes,
action. Yes, there's computer use. Yes, you can make them write playright and
you can make them write playright and orchestrate a browser. But why? Like if
orchestrate a browser. But why? Like if you could have a CLI that the agent can
you could have a CLI that the agent can just execute natively in its normal
just execute natively in its normal format that it understands the most
format that it understands the most natively, which is text interaction, why
natively, which is text interaction, why why would you choose to do something
why would you choose to do something else, especially in an area where
else, especially in an area where accuracy matters dramatically and where
accuracy matters dramatically and where that accuracy dramatically influences
that accuracy dramatically influences the effectiveness of the agent?
the effectiveness of the agent? One of the most important things that
One of the most important things that you can invest in is validation. So any
you can invest in is validation. So any kind of objective deterministic
kind of objective deterministic validation that you give an agent will
validation that you give an agent will increase its capabilities. So yes,
increase its capabilities. So yes, sometimes you can create this with the
sometimes you can create this with the agent. I'm going to talk about that in a
agent. I'm going to talk about that in a second. But it doesn't really matter how
second. But it doesn't really matter how you get it or where you get it from. You
you get it or where you get it from. You just need to think about how do I have
just need to think about how do I have high quality validation that produces
high quality validation that produces very clear error messages. This is the
very clear error messages. This is the same thing you always wanted by the way
same thing you always wanted by the way in your tests and your llinters, right?
in your tests and your llinters, right? But it's even more important for the
But it's even more important for the agents because the agents cannot divine
agents because the agents cannot divine what you mean by 500 internal error with
what you mean by 500 internal error with no other message, right? Like they need
no other message, right? Like they need a way to actually understand what the
a way to actually understand what the problem was and what they should do
problem was and what they should do about it.
about it. However, there is a problem here. So,
However, there is a problem here. So, you know, you think, okay, I'll just get
you know, you think, okay, I'll just get the agent to do it. They'll write my
the agent to do it. They'll write my tests and then I'll be fine. But have
tests and then I'll be fine. But have you ever asked an agent to write a test
you ever asked an agent to write a test on a completely untestable codebase?
on a completely untestable codebase? They do kind of what it's like is
They do kind of what it's like is happening on the screen here. They will
happening on the screen here. They will write a test that says, "Hey boss, I
write a test that says, "Hey boss, I pushed the button and the button pushed
pushed the button and the button pushed successfully. Test passed."
successfully. Test passed." Um, like so there is a sort of a a
Um, like so there is a sort of a a larger problem that a lot of enterprises
larger problem that a lot of enterprises have in particular, which is there's a
have in particular, which is there's a lot of legacy code bases that either
lot of legacy code bases that either were not designed with testing in mind
were not designed with testing in mind or were not designed with like high
or were not designed with like high quality testing in mind. like maybe they
quality testing in mind. like maybe they just have like some very high level
just have like some very high level endto-end tests and they don't have like
endto-end tests and they don't have like great unit tests that the agent can
great unit tests that the agent can actually run iteratively in a loop and
actually run iteratively in a loop and that will produce actionable and useful
that will produce actionable and useful errors.
errors. So another thing that you can invest in
So another thing that you can invest in that will can be perennially valuable
that will can be perennially valuable both to humans and to agents is
both to humans and to agents is structure of your systems and structure
structure of your systems and structure of your code bases. Agents work better
of your code bases. Agents work better on better structured code bases. And for
on better structured code bases. And for those of you who have never worked in a
those of you who have never worked in a large enterprise and seen very old
large enterprise and seen very old legacy codebases, you might not be
legacy codebases, you might not be familiar with what I'm talking about.
familiar with what I'm talking about. But for those who have, you know that
But for those who have, you know that there are code bases that no human being
there are code bases that no human being could reason about in any kind of
could reason about in any kind of successful way because the information
successful way because the information necessary to reason about that codebase
necessary to reason about that codebase isn't in the codebase and the structure
isn't in the codebase and the structure of the codebase makes the codebase
of the codebase makes the codebase impossible to reason about looking at
impossible to reason about looking at it. Yes, you the agents can do the same
it. Yes, you the agents can do the same thing human beings do in that case,
thing human beings do in that case, which is sort of go through an iterative
which is sort of go through an iterative process of trying to run the thing and
process of trying to run the thing and see what breaks, but that decreases the
see what breaks, but that decreases the capability of the agent so much compared
capability of the agent so much compared to just it having the ability to just
to just it having the ability to just look at the code and reason about it the
look at the code and reason about it the exact same way the human capability is
exact same way the human capability is decreased. And of course, like I said,
decreased. And of course, like I said, that all has to lead up to being
that all has to lead up to being testable. If the only thing I can do
testable. If the only thing I can do with your codebase is push a button and
with your codebase is push a button and know if the button pushed successfully
know if the button pushed successfully and not see the explosion behind it,
and not see the explosion behind it, like if if if there's no way to get that
like if if if there's no way to get that information out of the codebase from the
information out of the codebase from the test, then the agent's not going to be
test, then the agent's not going to be able to do that either unless it it goes
able to do that either unless it it goes and refactors it or you go and refactor
and refactors it or you go and refactor it first.
And you know, there's a lot of talk about documentation. There's always been
about documentation. There's always been a lot of talk about documentation in the
a lot of talk about documentation in the field of developer experience, in the
field of developer experience, in the field of improving things. And there's
field of improving things. And there's people go back and forth about it.
people go back and forth about it. Engineers hate writing documentation.
Engineers hate writing documentation. Uh, and the value of it is often debated
Uh, and the value of it is often debated like what kind of documentation you want
like what kind of documentation you want or don't want, do or don't want. But
or don't want, do or don't want. But here's the thing. The agent, let's just
here's the thing. The agent, let's just take this in the context of the agent.
take this in the context of the agent. The agent cannot read your mind. It did
The agent cannot read your mind. It did not attend your verbal meeting that had
not attend your verbal meeting that had no transcript.
no transcript. Okay? Now there are many companies in
Okay? Now there are many companies in the world that depend on that sort of
the world that depend on that sort of tribal knowledge to understand what the
tribal knowledge to understand what the requirements are for the system. Why the
requirements are for the system. Why the code is being written? What is the
code is being written? What is the specification that we're we're writing
specification that we're we're writing towards if things are not written down?
towards if things are not written down? And that sounds like blatantly obvious
And that sounds like blatantly obvious but like there are a lot of things that
but like there are a lot of things that are fundamentally written like if the
are fundamentally written like if the code is comprehensible like all the
code is comprehensible like all the other steps are in that we've gotten to
other steps are in that we've gotten to so far. you don't need to reexplain
so far. you don't need to reexplain what's in the code. So, there's actually
what's in the code. So, there's actually probably a whole class of documentation
probably a whole class of documentation that we may not need anymore where you
that we may not need anymore where you can just ask the agent like, "Hey, tell
can just ask the agent like, "Hey, tell me about the structure of this codebase
me about the structure of this codebase overall and it'll just do it, but it
overall and it'll just do it, but it won't be able to ever know why you wrote
won't be able to ever know why you wrote it unless that's written down
it unless that's written down somewhere." or things that happen
somewhere." or things that happen outside of the program like what is the
outside of the program like what is the shape of the data that comes in from
shape of the data that comes in from this URL parameter as an example like if
this URL parameter as an example like if you have already written the code
you have already written the code there's a validator and that does
there's a validator and that does explain it but if you haven't written
explain it but if you haven't written the code yet it doesn't know what comes
the code yet it doesn't know what comes in from the outside world so basically
in from the outside world so basically anything that can't be in the code or
anything that can't be in the code or isn't in the code needs to somehow be
isn't in the code needs to somehow be written somewhere that the agent can
written somewhere that the agent can access
now we've covered sort of a few technical aspects of things that we need
technical aspects of things that we need to improve improve. But there's a point
to improve improve. But there's a point about software development in general
about software development in general and it that's always been true and one
and it that's always been true and one of and that's you've heard this we spend
of and that's you've heard this we spend more time reading code than writing it.
more time reading code than writing it. The difference today is that writing
The difference today is that writing code has become reading code. So even
code has become reading code. So even now when we are writing code we spend
now when we are writing code we spend more time reading it than actually
more time reading it than actually typing things into the terminal.
typing things into the terminal. And what that means is
And what that means is every software engineer becomes a code
every software engineer becomes a code reviewer as basically their primary job.
reviewer as basically their primary job. In addition, as anybody who has worked
In addition, as anybody who has worked in a in a shop that has deeply adopted
in a in a shop that has deeply adopted agentic coding, we generate far more PRs
agentic coding, we generate far more PRs than ever before, which has led to code
than ever before, which has led to code review itself, the like the big scale
review itself, the like the big scale code review being a bottleneck.
code review being a bottleneck. So one of the things that we need to do
So one of the things that we need to do is we need to figure out how to improve
is we need to figure out how to improve code review velocity both for the big
code review velocity both for the big code reviews that we like where we you
code reviews that we like where we you send a PR and somebody like you know
send a PR and somebody like you know writes comments on it and you go back
writes comments on it and you go back and forth and also just the iterative
and forth and also just the iterative process of working with the agent. How
process of working with the agent. How do you speed up a person's ability to
do you speed up a person's ability to look at code and know what to do with
look at code and know what to do with it? So
it? So the principles are pretty similar for
the principles are pretty similar for both of those, but the exact way you
both of those, but the exact way you implement them is a little bit
implement them is a little bit different. What you care about the most
different. What you care about the most is making each individual response fast.
is making each individual response fast. You don't actually want to shorten the
You don't actually want to shorten the whole timeline of code review generally
whole timeline of code review generally because code review is a quality
because code review is a quality process. It's the same thing with agent
process. It's the same thing with agent iteration. Like what you want with agent
iteration. Like what you want with agent iteration is you want to get to the
iteration is you want to get to the place where you've got the right result.
place where you've got the right result. You don't want to like just be like,
You don't want to like just be like, "Well, I guess I've hit my five minute
"Well, I guess I've hit my five minute time limit, so I'm going to check in
time limit, so I'm going to check in this garbage that doesn't work, right?
this garbage that doesn't work, right? You you But what you do want is you want
You you But what you do want is you want the iterations to be fast." Not just the
the iterations to be fast." Not just the agents iterations, but the human
agents iterations, but the human response time to the agent to be fast.
response time to the agent to be fast. And in order to do that, they have to
And in order to do that, they have to get very good at doing code reviews or
get very good at doing code reviews or knowing what the next step is to do with
knowing what the next step is to do with a lot of code. At the big code of review
a lot of code. At the big code of review level, one thing that I see that I think
level, one thing that I see that I think is sort of a social disease that has
is sort of a social disease that has infected a lot of companies is when
infected a lot of companies is when people want PR reviews, they just send a
people want PR reviews, they just send a Slack message to a team channel and say,
Slack message to a team channel and say, "Hey, could one of the 10 of you review
"Hey, could one of the 10 of you review my PR?" And what and you know what that
my PR?" And what and you know what that means is one person does all those
means is one person does all those reviews. That's what really happens.
reviews. That's what really happens. There there's like when you look at the
There there's like when you look at the code of review stats of teams like that
code of review stats of teams like that there's one person who has like 50 and
there's one person who has like 50 and the other person have like three two
the other person have like three two five seven because there's just one
five seven because there's just one person is like super responsive. So but
person is like super responsive. So but what that means is if you start
what that means is if you start generating dramatically more PRs that
generating dramatically more PRs that one person cannot handle the load you
one person cannot handle the load you have to distribute it and really the
have to distribute it and really the only way to distribute it is to assign
only way to distribute it is to assign it to specific individuals have a system
it to specific individuals have a system that distributes it among those
that distributes it among those individuals and then set SLOs's that
individuals and then set SLOs's that have some mechanism of enforcement. And
have some mechanism of enforcement. And another thing is like that GitHub, for
another thing is like that GitHub, for example, is not very good at today is
example, is not very good at today is making it clear whose turn it is to take
making it clear whose turn it is to take action. Like I left a bunch of comments
action. Like I left a bunch of comments on your PR. Uh,
on your PR. Uh, you now responded to one of my comments.
you now responded to one of my comments. Should I come back again now? Oh, wait.
Should I come back again now? Oh, wait. No, no, now you pushed a no change.
No, no, now you pushed a no change. Should I come back now? Okay. No, no,
Should I come back now? Okay. No, no, now you've responded to more comments.
now you've responded to more comments. What I rely on mostly is people telling
What I rely on mostly is people telling me in Slack, I'm ready for you to review
me in Slack, I'm ready for you to review my PR again. Which is a terrible and
my PR again. Which is a terrible and inefficient system.
And another thing you got to think about a lot is the quality of code reviews.
a lot is the quality of code reviews. And I mean this once again both for the
And I mean this once again both for the individual developers doing it with the
individual developers doing it with the agent and the people doing it in the
agent and the people doing it in the code review pipeline.
code review pipeline. You have to keep holding a high bar. I
You have to keep holding a high bar. I know that people have other opinions
know that people have other opinions about this. And yes, depending on the
about this. And yes, depending on the timeline that you expect your software
timeline that you expect your software to live, you might not need as much
to live, you might not need as much software design. Like look, it's
software design. Like look, it's software design is not the goal of
software design is not the goal of perfection. It's a goal of good enough
perfection. It's a goal of good enough and better than you had before, right?
and better than you had before, right? But sometimes good enough for a very
But sometimes good enough for a very long lived system is a much higher bar
long lived system is a much higher bar than people expect it to be.
than people expect it to be. And if you don't have a process that is
And if you don't have a process that is capable of rejecting things that
capable of rejecting things that shouldn't go in, you will very likely
shouldn't go in, you will very likely actually see decreasing productivity
actually see decreasing productivity gains from your agentic coders over time
gains from your agentic coders over time as the system becomes harder and harder
as the system becomes harder and harder for both the agent and the human to work
for both the agent and the human to work with.
with. The problem is this. In many companies,
The problem is this. In many companies, we have the people who are the best code
we have the people who are the best code reviewers not doing any of their time
reviewers not doing any of their time doing code review. They are spending all
doing code review. They are spending all their times in meetings doing highle
their times in meetings doing highle reviews doing strategy. And so we aren't
reviews doing strategy. And so we aren't teaching junior engineers to be better
teaching junior engineers to be better software engineers and to be better code
software engineers and to be better code reviewers. So we have to have some
reviewers. So we have to have some mechanism that allows the people who are
mechanism that allows the people who are the best at this to do this through
the best at this to do this through apprenticeship. If somebody else has a
apprenticeship. If somebody else has a better way of doing this than doing code
better way of doing this than doing code reviews with people, I would love to
reviews with people, I would love to know because in the 20 plus years that
know because in the 20 plus years that I've been doing this, I have never found
I've been doing this, I have never found a way to teach people to be good code
a way to teach people to be good code reviewers other than doing good code
reviewers other than doing good code reviews with them.
Now, if you do if you don't do all the things that I talked about, what is the
things that I talked about, what is the danger?
danger? The danger is you take a bad codebase
The danger is you take a bad codebase with a confusing environment. You give
with a confusing environment. You give it to an agent or a developer working
it to an agent or a developer working with that agent. The agent produces
with that agent. The agent produces relative levels of nonsense
relative levels of nonsense and the developer experiences more or
and the developer experiences more or less frustration. And depending on how
less frustration. And depending on how persistent they are, at some point they
persistent they are, at some point they give up and they just send their PR off
give up and they just send their PR off for review. They're like, I think it
for review. They're like, I think it works. Right? And then if you have
works. Right? And then if you have low-quality code reviews or code
low-quality code reviews or code reviewers who are overwhelmed, they go,
reviewers who are overwhelmed, they go, I don't know. I don't know what to do
I don't know. I don't know what to do with this. I guess it's okay. And you
with this. I guess it's okay. And you just have lots and lots and lots of bad
just have lots and lots and lots of bad rubber stamp PRs that keep going in and
rubber stamp PRs that keep going in and you get into a vicious cycle where what
you get into a vicious cycle where what I expect to occur and what my prediction
I expect to occur and what my prediction is is if you are in this cycle, uh, your
is is if you are in this cycle, uh, your agent productivity will decrease
agent productivity will decrease consistently through the year. On the
consistently through the year. On the other hand, we live in an amazing time
other hand, we live in an amazing time where if we increase the ability of the
where if we increase the ability of the agents to help us be productive, then
agents to help us be productive, then they can actually help us be more
they can actually help us be more productive and we actually get into a
productive and we actually get into a virtuous cycle instead where we actually
virtuous cycle instead where we actually accelerate more and more and more and
accelerate more and more and more and more. And yes, some of these things
more. And yes, some of these things sound like very expensive fundamental
sound like very expensive fundamental investments, but I think now is the time
investments, but I think now is the time to make them because now is one of the
to make them because now is one of the times you're going to have the biggest
times you're going to have the biggest differentiation in your business in
differentiation in your business in terms of software engineering velocity
terms of software engineering velocity if you can do these things versus other
if you can do these things versus other in industries or companies that can't
in industries or companies that can't structurally do these things.
structurally do these things. So to summarize, here's a few things.
So to summarize, here's a few things. Not literally everything in the world
Not literally everything in the world you can do that's no regrets, but you
you can do that's no regrets, but you can standardize your development
can standardize your development environments. You can make CLIs or APIs
environments. You can make CLIs or APIs for anything that needs a CLI or API.
for anything that needs a CLI or API. Those CLIs or APIs have to run at
Those CLIs or APIs have to run at development time. By the way, too,
development time. By the way, too, another big thing that people miss is
another big thing that people miss is sometimes they have things that only run
sometimes they have things that only run in CI. If you're CI takes 15, 20 minutes
in CI. If you're CI takes 15, 20 minutes and you know, agents are like way more
and you know, agents are like way more persistent and patient than a human
persistent and patient than a human being is. So like, but they're also more
being is. So like, but they're also more errorprone than human beings are. So
errorprone than human beings are. So like they will run the thing and then
like they will run the thing and then run your test and then run a thing and
run your test and then run a thing and then run your test and then run a thing
then run your test and then run a thing and then run your test and they'll do it
and then run your test and they'll do it like five times in a row. If that takes
like five times in a row. If that takes 20 minutes, your developers productivity
20 minutes, your developers productivity is going to be shot to heck. Whereas, if
is going to be shot to heck. Whereas, if it takes 30 seconds, you're going to
it takes 30 seconds, you're going to have a they're going to have a much
have a they're going to have a much better experience. You can improve
better experience. You can improve validation. You can refactor for both
validation. You can refactor for both testability and the ability to reason
testability and the ability to reason about the codebase. You can make sure
about the codebase. You can make sure all the external context and your
all the external context and your intentions, the why is written down. You
intentions, the why is written down. You can make every response during code
can make every response during code review faster. And you can raise the bar
review faster. And you can raise the bar on code review quality. But if you look
on code review quality. But if you look at all of these things, there's one
at all of these things, there's one lesson and one principle that we take
lesson and one principle that we take away from all these things that covers
away from all these things that covers even more things than this. And it's
even more things than this. And it's basically that what's good for humans is
basically that what's good for humans is good for AI. And the great thing about
good for AI. And the great thing about this, one second. The great thing about
this, one second. The great thing about this is that it means that when we
this is that it means that when we invest in this thing, we will help our
invest in this thing, we will help our developers no matter what. Even if
developers no matter what. Even if sometimes we miss on helping the agent,
sometimes we miss on helping the agent, we are guaranteed to help the humans.
we are guaranteed to help the humans. Thank you very much. [applause]
>> Ladies and gentlemen, please welcome back to the stage, Alex Lieberman.
back to the stage, Alex Lieberman. [music] Let's give it up again for Max.
[music] Let's give it up again for Max. [applause]
[applause] We have one more break now and then the
We have one more break now and then the last block of sessions where we'll have
last block of sessions where we'll have speakers talking about AI. uh
speakers talking about AI. uh consultancies uh paying engineers like
consultancies uh paying engineers like salespeople and how to make your company
salespeople and how to make your company AI native. So be back here at 4 o'clock
AI native. So be back here at 4 o'clock or if you're watching the live stream be
or if you're watching the live stream be back online at four o'clock and we'll
back online at four o'clock and we'll see you then. Thanks everyone.
Heat. [music]
>> Heat. Heat. [music]
[music] >> Heat. Heat.
>> Heat [music]
[music] >> Heat.
>> Heat. Heat.
>> Down no [music]
no [music] down.
Heat up [music]
>> Heat. Heat.
>> Heat. Heat. [music]
Heat >> [music]
>> up here. [music]
Heat. [music]
[music] >> Heat. Heat.
[music] Heat.
up [music]
Heat. [music]
>> [music] >> Heat. Heat.
>> [music] >> Heat. Heat.
Heat. [music]
up Heat.
>> [music] >> Heat.
>> Heat. Heat.
Heat. Heat. [music]
>> Heat up [music] here.
Heat [music]
up here.
>> Heat. Heat. [music]
[music] Heat. Heat.
[music] Heat. Heat.
Heat [music]
Heat. Heat. [music]
[music] Heat
[music] >> up here.
>> [music] >> Heat. Heat.
>> [music] >> Heat. Heat.
[music] Heat.
How we doing? We are officially 7 hours in. How's the energy level? 7 hours in.
in. How's the energy level? 7 hours in. Let's hear it. There we go. There we go.
Let's hear it. There we go. There we go. So this is our last block of sessions
So this is our last block of sessions before you all get to enjoy the graphite
before you all get to enjoy the graphite afterparty. More coming on that in a
afterparty. More coming on that in a few. And for this block, we're going to
few. And for this block, we're going to cover a lot. AI consulting in practice,
cover a lot. AI consulting in practice, paying engineers like salespeople, as I
paying engineers like salespeople, as I mentioned earlier, leadership in AI
mentioned earlier, leadership in AI assisted engineering, and how to build
assisted engineering, and how to build an AI native company. You guys ready for
an AI native company. You guys ready for this?
this? >> Oh, come on. Let's go. So [applause]
>> Oh, come on. Let's go. So [applause] with that, please join me in welcoming
with that, please join me in welcoming our next speaker and one of last year's
our next speaker and one of last year's MC's to talk about helping organizations
MC's to talk about helping organizations transform with AI. Let's hear it for
transform with AI. Let's hear it for NLW.
[music] All right. Great to be back here, guys.
All right. Great to be back here, guys. Uh for those of you who are here in
Uh for those of you who are here in February, I had the privilege of MCing.
February, I had the privilege of MCing. Uh and today I'm excited to talk about
Uh and today I'm excited to talk about something a little bit different. So
something a little bit different. So right now uh there's been the last
right now uh there's been the last couple of months have been an
couple of months have been an interesting time in AI. There's been a
interesting time in AI. There's been a sort of surge in the air uh the
sort of surge in the air uh the narrative of an AI bubble. A lot of it
narrative of an AI bubble. A lot of it driven by dubious studies uh like the
driven by dubious studies uh like the MIT report. And so what I wanted to do
MIT report. And so what I wanted to do today is get into not so much the
today is get into not so much the practice of consulting and transforming
practice of consulting and transforming but what organizations are actually
but what organizations are actually finding value in right now. So for those
finding value in right now. So for those of you who don't know me, uh there's
of you who don't know me, uh there's kind of two contexts I bring to this
kind of two contexts I bring to this conversation. The first is as the host
conversation. The first is as the host of the AI daily brief which is a daily
of the AI daily brief which is a daily uh news analysis podcast about AI. The
uh news analysis podcast about AI. The second is as the CEO of super
second is as the CEO of super intelligent which is an AI planning
intelligent which is an AI planning platform. And so the different
platform. And so the different perspectives are sort of very high level
perspectives are sort of very high level macro thinking about the news that's
macro thinking about the news that's happening and then a much more kind of
happening and then a much more kind of ground level view where we're spending a
ground level view where we're spending a ton of time interviewing executives
ton of time interviewing executives about what's going on inside their
about what's going on inside their organizations. And what we're going to
organizations. And what we're going to talk about is sort of one kind of
talk about is sort of one kind of briefly in the first part just the
briefly in the first part just the status of enterprise adoption uh as it
status of enterprise adoption uh as it currently stands. And two, um, and the
currently stands. And two, um, and the more interesting part is we've been live
more interesting part is we've been live with a study in in the market for about
with a study in in the market for about a month now collecting self-reported
a month now collecting self-reported information about ROI around different
information about ROI around different use cases. And this will be the first
use cases. And this will be the first time uh, this week was the first time I
time uh, this week was the first time I did some analysis on it. And so I'm
did some analysis on it. And so I'm going to share what people have uh, what
going to share what people have uh, what people have told us around the first
people have told us around the first kind of 2500 or so use cases that
kind of 2500 or so use cases that they've shared. Um, so it should be
they've shared. Um, so it should be pretty pretty interesting stuff. talking
pretty pretty interesting stuff. talking about kind of enterprise AI adoption
about kind of enterprise AI adoption first. I'll go through this pretty
first. I'll go through this pretty quickly because it's um pretty
quickly because it's um pretty well-known stuff. Uh the short of it is
well-known stuff. Uh the short of it is enterprises are adopting AI uh in in a
enterprises are adopting AI uh in in a growing fashion. Um pretty much everyone
growing fashion. Um pretty much everyone is using it at least a little bit. Uh
is using it at least a little bit. Uh and increasingly they're using it a lot.
and increasingly they're using it a lot. Uh this year I will need to tell none of
Uh this year I will need to tell none of you that there is a major inflection
you that there is a major inflection around um specifically adoption in the
around um specifically adoption in the uh coding and software engineering.
uh coding and software engineering. Right? You saw a huge huge uptick in
Right? You saw a huge huge uptick in this. Um there's a lot that's
this. Um there's a lot that's interesting about that from an
interesting about that from an enterprise perspective because it wasn't
enterprise perspective because it wasn't just with the software engineering
just with the software engineering organizations. Other parts of the
organizations. Other parts of the organization are also now thinking about
organization are also now thinking about how they can communicate with code,
how they can communicate with code, build things with code. Uh but that's a
build things with code. Uh but that's a huge huge theme of this year [snorts]
huge huge theme of this year [snorts] coming into 2025. One of the big sort of
coming into 2025. One of the big sort of thoughts that many people had was that
thoughts that many people had was that this would be the year of agents inside
this would be the year of agents inside the enterprise, right? That big chunks
the enterprise, right? That big chunks of work would get automated away. And on
of work would get automated away. And on the one hand, I think it's pretty clear
the one hand, I think it's pretty clear that we didn't see some sort of mass
that we didn't see some sort of mass shift towards automation uh at large
shift towards automation uh at large across different functions in the
across different functions in the organization. But when you dig into the
organization. But when you dig into the numbers, there has been actually pretty
numbers, there has been actually pretty significant uh shifts in the patterns of
significant uh shifts in the patterns of of agent adoption. So this is from
of agent adoption. So this is from KPMG's quarterly pulse survey. And it's
KPMG's quarterly pulse survey. And it's a measure of how many enterprises that
a measure of how many enterprises that are a part of their survey, which is all
are a part of their survey, which is all companies over a billion dollars in
companies over a billion dollars in revenue, have uh actual sort of full
revenue, have uh actual sort of full production agents in deployment. So this
production agents in deployment. So this isn't pilots, this isn't experiments.
isn't pilots, this isn't experiments. This is where they consider uh some
This is where they consider uh some agent that's actually doing doing kind
agent that's actually doing doing kind of work in a in a full way. And it's
of work in a in a full way. And it's jumped from 11% in Q1 of this year to
jumped from 11% in Q1 of this year to 42% in their most recent study for Q3.
42% in their most recent study for Q3. So you actually are seeing pretty
So you actually are seeing pretty meaningful uptake of of agents inside
meaningful uptake of of agents inside the enterprise. In fact, I would argue
the enterprise. In fact, I would argue based on our conversations that people
based on our conversations that people have that it's moved more quickly
have that it's moved more quickly through the pilot or experimental phase
through the pilot or experimental phase than people might have thought. um so
than people might have thought. um so much so that you're actually seeing now
much so that you're actually seeing now a big shift in the emphasis around kind
a big shift in the emphasis around kind of the human side of agents and how
of the human side of agents and how humans are going to interact with agents
humans are going to interact with agents and it's involving a shift in upskilling
and it's involving a shift in upskilling and and uh and enablement work. Um
and and uh and enablement work. Um you're seeing a decrease in the sort of
you're seeing a decrease in the sort of resistance to agents as people start to
resistance to agents as people start to actually dig in with them. You're seeing
actually dig in with them. You're seeing more experiments like these sandboxes
more experiments like these sandboxes where people can interact with agents.
where people can interact with agents. So this is a big theme even if it wasn't
So this is a big theme even if it wasn't necessarily the dominant theme that some
necessarily the dominant theme that some thought it might be coming into this
thought it might be coming into this year. At the same time, it is absolutely
year. At the same time, it is absolutely the case that many many if not most
the case that many many if not most enterprises are broadly speaking stuck
enterprises are broadly speaking stuck inside sort of pilot and experimental
inside sort of pilot and experimental phases. There is a lot of challenge
phases. There is a lot of challenge around moving from some of those first
around moving from some of those first exciting experiments to something that's
exciting experiments to something that's more scaled. Um, so this is from
more scaled. Um, so this is from McKenzie state of AI study which came
McKenzie state of AI study which came out I think a couple weeks ago now and
out I think a couple weeks ago now and you can see only 7% of the organizations
you can see only 7% of the organizations that they talk to claim or sort of see
that they talk to claim or sort of see themselves as as fully at scale with
themselves as as fully at scale with with AI and agents. and it's something
with AI and agents. and it's something like 62% are either still experimenting
like 62% are either still experimenting or piloting.
or piloting. Interestingly, big organizations are on
Interestingly, big organizations are on in general a little bit ahead in terms
in general a little bit ahead in terms of uh the organizations that are scaling
of uh the organizations that are scaling as compared to small organizations. This
as compared to small organizations. This has been a a thing that we've noticed
has been a a thing that we've noticed kind of throughout the trajectory of uh
kind of throughout the trajectory of uh of AI um adoption over the last couple
of AI um adoption over the last couple of years that you would think that
of years that you would think that perhaps smaller, more nimble companies
perhaps smaller, more nimble companies uh would be more kind of quick to adopt
uh would be more kind of quick to adopt these things, but in fact, it's often
these things, but in fact, it's often been the opposite with the biggest
been the opposite with the biggest organizations making the biggest
organizations making the biggest efforts. You can also see from the chart
efforts. You can also see from the chart on the bottom that there's very sort of
on the bottom that there's very sort of jagged patterns of adoption, right?
jagged patterns of adoption, right? you're starting to see uh from you know
you're starting to see uh from you know last year if you looked there's very
last year if you looked there's very similar kind of rates of experimentation
similar kind of rates of experimentation across lots of different departments
across lots of different departments you're starting to see some pretty big
you're starting to see some pretty big breakouts now uh with for example you
breakouts now uh with for example you know IT operations kind of jumping out
know IT operations kind of jumping out ahead of other functions
ahead of other functions I won't spend too much time on this sort
I won't spend too much time on this sort of high performer piece but I think the
of high performer piece but I think the thing to note because it comes back in
thing to note because it comes back in and in and some of the stuff that we
and in and some of the stuff that we found with our ROI study is that you are
found with our ROI study is that you are also starting to see a pretty
also starting to see a pretty significant bifurcation between leaders
significant bifurcation between leaders and laggers when it comes to AI
and laggers when it comes to AI adoption. And one of the things that
adoption. And one of the things that tends to distinguish the companies that
tends to distinguish the companies that are leading is that they are just doing
are leading is that they are just doing more of it and they are thinking more
more of it and they are thinking more comprehensively and systematically about
comprehensively and systematically about AI and agent adoption. So they are not
AI and agent adoption. So they are not just sort of doing spot experiments.
just sort of doing spot experiments. They're thinking about their strategy as
They're thinking about their strategy as a whole. They're doing multiple things
a whole. They're doing multiple things at once. And importantly, they're not
at once. And importantly, they're not just thinking about sort of the very
just thinking about sort of the very kind of first tier time savings or
kind of first tier time savings or productivity types of use cases. They're
productivity types of use cases. They're also thinking about how do we grow
also thinking about how do we grow revenue? How do we create new
revenue? How do we create new capabilities? How do we create new
capabilities? How do we create new product lines?
product lines? Overall, it's very clear that despite
Overall, it's very clear that despite what is sort of, you know, the the the
what is sort of, you know, the the the concerns in the media that spend is
concerns in the media that spend is going to do nothing but increase on
going to do nothing but increase on this. Um, so the bottom is the KPMG
this. Um, so the bottom is the KPMG pulse survey again, and this is a an
pulse survey again, and this is a an estimation of the amount of money that
estimation of the amount of money that these organizations intend to spend on
these organizations intend to spend on AI over the next 12 months. The
AI over the next 12 months. The beginning of the year was 114, which by
beginning of the year was 114, which by the way was up from like 88 in Q4 of
the way was up from like 88 in Q4 of last year. It's now up to in their last
last year. It's now up to in their last study 130 million is what they expect to
study 130 million is what they expect to spend uh in the in the year ahead which
spend uh in the in the year ahead which obviously the the total magnitude
obviously the the total magnitude doesn't matter as much as the change. Um
doesn't matter as much as the change. Um you also see the green charts are from
you also see the green charts are from Deote and you can see 90% plus of
Deote and you can see 90% plus of organizations intend to increase their
organizations intend to increase their spend uh on AI in the next 12 months and
spend uh on AI in the next 12 months and as part of that I think that you're
as part of that I think that you're going to see a much more determined
going to see a much more determined conversation around impact and ROI uh
conversation around impact and ROI uh which is a particularly thorny topic but
which is a particularly thorny topic but interestingly
interestingly there has been an increase in optimism
there has been an increase in optimism over the course of this year around the
over the course of this year around the realization of AI. So this is from a
realization of AI. So this is from a different KPMG study, their annual CEO
different KPMG study, their annual CEO survey, which interviews tons and tons
survey, which interviews tons and tons of CEOs. And if you look at the 2024
of CEOs. And if you look at the 2024 numbers, 63% of those pled thought that
numbers, 63% of those pled thought that it would take between 3 and 5 years to
it would take between 3 and 5 years to realize ROI from their AI investments.
realize ROI from their AI investments. 20% said 1 to three and 16% said more
20% said 1 to three and 16% said more than five. This year in that same
than five. This year in that same survey, the number that said 1 to 3
survey, the number that said 1 to 3 years had gone up to 67%. There were now
years had gone up to 67%. There were now 19% who said 6 months to one year. uh
19% who said 6 months to one year. uh and 3 to 5 years was down to just 12%.
and 3 to 5 years was down to just 12%. So huge huge kind of pull forward of
So huge huge kind of pull forward of expectations of of ROI realization. The
expectations of of ROI realization. The challenge is that ROI is really tough.
challenge is that ROI is really tough. So this is back to the pulse survey. 78%
So this is back to the pulse survey. 78% of those pled in that in that survey
of those pled in that in that survey said that they thought that ROI was
said that they thought that ROI was going to basically become a bigger
going to basically become a bigger consideration in the year to come. Uh
consideration in the year to come. Uh but also 78% said that traditional
but also 78% said that traditional impact metrics and measures were having
impact metrics and measures were having a very hard time keeping up with the
a very hard time keeping up with the with the new reality that we were living
with the new reality that we were living in. And this is something that I've
in. And this is something that I've heard constantly over and over from CIOS
heard constantly over and over from CIOS and other people who are in charge of
and other people who are in charge of these investments that the the the ways
these investments that the the the ways that we have measured impact of previous
that we have measured impact of previous technologies and just previous
technologies and just previous initiatives are kind of falling flat
initiatives are kind of falling flat with AI. And so that got us thinking
with AI. And so that got us thinking about the the the overall need that we
about the the the overall need that we have to just have more information. I'm
have to just have more information. I'm not even talking about good systematic
not even talking about good systematic information, just more information
information, just more information around what ROI looks like, what impact
around what ROI looks like, what impact looks like, and you know, I've got this
looks like, and you know, I've got this great podcast audience. They're super
great podcast audience. They're super engaged. And so, we just decided, screw
engaged. And so, we just decided, screw it. We're going to ask them, we're just
it. We're going to ask them, we're just going to ask them to report on what ROI
going to ask them to report on what ROI they're finding from their use cases.
they're finding from their use cases. So, this went up at the very end of
So, this went up at the very end of October. Uh like I said as of this
October. Uh like I said as of this morning or when I looked last looked
morning or when I looked last looked we've had over a thousand submissions uh
we've had over a thousand submissions uh a thousand individual organizations
a thousand individual organizations rather submit something like 3500 use
rather submit something like 3500 use cases and um this is uh some some of the
cases and um this is uh some some of the first observations that we had around um
first observations that we had around um kind of the first 2500.
kind of the first 2500. So the impact categories the way that we
So the impact categories the way that we divided things was into sort of eight
divided things was into sort of eight broad categories of impact um which will
broad categories of impact um which will all I think be very intuitive to you
all I think be very intuitive to you guys. time savings, increased output,
guys. time savings, increased output, improvement in quality, new
improvement in quality, new capabilities, improved decision- making,
capabilities, improved decision- making, cost savings, increased revenue, and
cost savings, increased revenue, and risk reduction. So, basically, it was
risk reduction. So, basically, it was trying to think of like kind of a a
trying to think of like kind of a a broad simple heristic for uh for for
broad simple heristic for uh for for kind of dividing or subdividing the
kind of dividing or subdividing the different the different ways that people
different the different ways that people are thinking about ROI. And TLDDR is
are thinking about ROI. And TLDDR is that people are finding uh ROI right
that people are finding uh ROI right now. Um, now again, the caveats are that
now. Um, now again, the caveats are that this is a highly infranchised audience.
this is a highly infranchised audience. they're listening to a daily AI podcast
they're listening to a daily AI podcast and they are voluntarily sharing this.
and they are voluntarily sharing this. So, I think that, you know, there's
So, I think that, you know, there's there's some caveing there, but you have
there's some caveing there, but you have 44.3% saying that they're seeing modest
44.3% saying that they're seeing modest ROI right now. And then you have another
ROI right now. And then you have another 37.6% seeing high ROI. For the purposes
37.6% seeing high ROI. For the purposes of a lot of these stats, high ROI will
of a lot of these stats, high ROI will be significant plus transformational. Uh
be significant plus transformational. Uh only 5% or so are seeing negative ROI.
only 5% or so are seeing negative ROI. And keep in mind, negative ROI doesn't
And keep in mind, negative ROI doesn't mean that they think programs are
mean that they think programs are failing. It just means they haven't
failing. It just means they haven't they've spent more than they've gained
they've spent more than they've gained uh in terms of how their their
uh in terms of how their their perception is. More [snorts] than that,
perception is. More [snorts] than that, expectations are absolutely skyhigh. 67%
expectations are absolutely skyhigh. 67% think over the next year they will see
think over the next year they will see uh increased and high growth in their
uh increased and high growth in their ROI. So we have really optimistic sense
ROI. So we have really optimistic sense from the ground view of where ROI is
from the ground view of where ROI is going to be in AI. Um you even have the
going to be in AI. Um you even have the teams that are currently experiencing
teams that are currently experiencing negative ROI. 53% say that they're going
negative ROI. 53% say that they're going to see high growth. So very very
to see high growth. So very very optimistic. Um as [snorts] you might
optimistic. Um as [snorts] you might imagine, time savings is the default.
imagine, time savings is the default. It's the starting point for so many
It's the starting point for so many organizations. It represents about 35%
organizations. It represents about 35% of the use cases. After that, increasing
of the use cases. After that, increasing output, quality improvement, basically
output, quality improvement, basically all those things that you would imagine
all those things that you would imagine around productivity are sort of like the
around productivity are sort of like the dominant categories when it comes to
dominant categories when it comes to these uh when it comes to these use
these uh when it comes to these use cases. When it comes to the specifics
cases. When it comes to the specifics around time savings, you see a real
around time savings, you see a real cluster between 1 and 10 hours,
cluster between 1 and 10 hours, especially right around 5 hours. And I
especially right around 5 hours. And I think this is interesting to call out
think this is interesting to call out because it's so obvious to all of us who
because it's so obvious to all of us who are inside building these things uh
are inside building these things uh whether you are a developer or an
whether you are a developer or an entrepreneur or just someone sort of in
entrepreneur or just someone sort of in and around it how the the vast breadth
and around it how the the vast breadth of opportunity that AI represents new
of opportunity that AI represents new capabilities things unimagined yet. It's
capabilities things unimagined yet. It's hard to or it's easy to forget that if
hard to or it's easy to forget that if you save 5 hours a week or 10 hours a
you save 5 hours a week or 10 hours a week you're talking about winning back 7
week you're talking about winning back 7 to 10 work weeks a year. Uh and that's
to 10 work weeks a year. Uh and that's very very powerful. And when it comes to
very very powerful. And when it comes to a lot of these enterprises, that is a
a lot of these enterprises, that is a very meaningful thing, even if it's not
very meaningful thing, even if it's not what they're ultimately in it for.
what they're ultimately in it for. Interestingly though, it's very clear
Interestingly though, it's very clear that the story, although it might be uh
that the story, although it might be uh have a concentration in time savings, is
have a concentration in time savings, is about much more than time savings. So
about much more than time savings. So this is the ROI distribution category uh
this is the ROI distribution category uh ROI distribution by organization size.
ROI distribution by organization size. And this starts to get really
And this starts to get really interesting where you can see that there
interesting where you can see that there are some differences in where different
are some differences in where different size organizations are focused. So for
size organizations are focused. So for example, the organization size between
example, the organization size between 200 and a,000 people has a higher
200 and a,000 people has a higher portion of their use cases concentrated
portion of their use cases concentrated in increasing output. Now we haven't
in increasing output. Now we haven't taken the time yet to really figure out
taken the time yet to really figure out exactly what this means or even
exactly what this means or even speculate on on what this means. But I
speculate on on what this means. But I think it's interesting that this is a
think it's interesting that this is a category of organization that has often
category of organization that has often reached a certain scale but is still
reached a certain scale but is still very much striving for more and so seems
very much striving for more and so seems to be focused more on use cases that
to be focused more on use cases that expand their capabilities.
expand their capabilities. Same thing with uh when you start to
Same thing with uh when you start to divide things by role you see real kind
divide things by role you see real kind of variance where for example seuitees
of variance where for example seuitees and leaders uh are less focused on those
and leaders uh are less focused on those time savings use cases and more focused
time savings use cases and more focused on other things like increased output
on other things like increased output and uh and new capabilities
and uh and new capabilities in general we're finding that sea
in general we're finding that sea leaders uh and just sort of seuite and
leaders uh and just sort of seuite and and leaders in general are even more
and leaders in general are even more optimistic and excited and seeing
optimistic and excited and seeing transformational impact than people who
transformational impact than people who are in more junior positions. Now, some
are in more junior positions. Now, some of this might be sort of selection bias
of this might be sort of selection bias in terms of um what types of use cases
in terms of um what types of use cases you are focused on. If you are in that
you are focused on. If you are in that seuite, you're thinking about things
seuite, you're thinking about things that inherently if they work are more
that inherently if they work are more transformational. Uh but it is notable
transformational. Uh but it is notable that 17% of uh of the use cases that
that 17% of uh of the use cases that that people in those leadership
that people in those leadership positions have submitted uh they say
positions have submitted uh they say have transformational impact and ROI
have transformational impact and ROI already. [snorts] Uh I'm going to skip
already. [snorts] Uh I'm going to skip this because there's we don't have time
this because there's we don't have time for too much. um you're seeing
for too much. um you're seeing interestingly uh a concentration um
interestingly uh a concentration um where the smallest organizations are
where the smallest organizations are getting more of that transformational
getting more of that transformational benefit early. Um one of the things that
benefit early. Um one of the things that I want to do following this study is
I want to do following this study is maybe do a sort of second round where we
maybe do a sort of second round where we dig into what this 1 to 50 person uh
dig into what this 1 to 50 person uh size really looks like. I actually think
size really looks like. I actually think that whereas there might be a lot of
that whereas there might be a lot of similarity between a 1000 and a 2,000
similarity between a 1000 and a 2,000 person organization, there could be a
person organization, there could be a wild difference between a threeperson,
wild difference between a threeperson, you know, small company and a 40 person
you know, small company and a 40 person company. And so I'd really like to dig
company. And so I'd really like to dig into that more. But you are definitely
into that more. But you are definitely seeing a a lot of impact in those sort
seeing a a lot of impact in those sort of more small nimble moving
of more small nimble moving organizations.
organizations. Uh as you might expect, coding and uh
Uh as you might expect, coding and uh and software related or uh use cases
and software related or uh use cases have a higher ROI than average and a
have a higher ROI than average and a lower negative ROI than average. Um one
lower negative ROI than average. Um one really interesting kind of you know
really interesting kind of you know pulling on a specific category of use
pulling on a specific category of use cases. Risk reduction is our lowest
cases. Risk reduction is our lowest category in terms of the percentage of
category in terms of the percentage of use cases that that that was their
use cases that that that was their primary benefit. So when you're filling
primary benefit. So when you're filling out the survey, which is by the way at
out the survey, which is by the way at ROI survey.ai AI if you want to check it
ROI survey.ai AI if you want to check it out. Uh you basically only get to pick a
out. Uh you basically only get to pick a primary ROI benefit. We didn't want it
primary ROI benefit. We didn't want it to be super sort of um we wanted you to
to be super sort of um we wanted you to pick and and hone in on the thing that
pick and and hone in on the thing that was uh seemed most important or most
was uh seemed most important or most significant. And so only 3.4% have risk
significant. And so only 3.4% have risk reduction as their primary benefit uh in
reduction as their primary benefit uh in terms of ROI categories. But it is by
terms of ROI categories. But it is by far those use cases are by far the most
far those use cases are by far the most likely to have transformational impact
likely to have transformational impact as as the as as their outcome. It's at
as as the as as their outcome. It's at 25%. So a full quarter of those uh have
25%. So a full quarter of those uh have transformational ROI. And interestingly,
transformational ROI. And interestingly, I was having this conversation with a
I was having this conversation with a couple of my friends who work in sort of
couple of my friends who work in sort of back office and compliance and risk
back office and compliance and risk functions, and this has been their
functions, and this has been their experience as well, where there are a
experience as well, where there are a lot of uh a lot of the the the
lot of uh a lot of the the the challenges for those organizations
challenges for those organizations involve sheer volume and quantity uh in
involve sheer volume and quantity uh in ways that that AI can be really helpful
ways that that AI can be really helpful for.
for. We also are finding some interesting
We also are finding some interesting patterns among organizations. And again,
patterns among organizations. And again, this is where we get into some of the
this is where we get into some of the limits of this just being a whoever
limits of this just being a whoever walks through the door of my listeners.
walks through the door of my listeners. We have a pretty heavy concentration
We have a pretty heavy concentration among technology, as you might expect,
among technology, as you might expect, industries and among professional
industries and among professional services, but we still have fairly
services, but we still have fairly decent sample sizes for some others. And
decent sample sizes for some others. And in both healthcare and manufacturing,
in both healthcare and manufacturing, the use cases are meaningfully higher
the use cases are meaningfully higher impact on average uh than the average
impact on average uh than the average across all organizations. Um, which I
across all organizations. Um, which I think is uh it was kind of worthy of
think is uh it was kind of worthy of further study.
further study. Last sort of part of this as I wrap up,
Last sort of part of this as I wrap up, you know, a lot of these use cases as
you know, a lot of these use cases as you saw have to do with that sort of
you saw have to do with that sort of first tier that most enterprises are
first tier that most enterprises are going to be in. Uh, increasing the
going to be in. Uh, increasing the amount of content that you output,
amount of content that you output, increasing the quality of that content,
increasing the quality of that content, just finding ways to win back, you know,
just finding ways to win back, you know, your 5 hours a week. Um but increasingly
your 5 hours a week. Um but increasingly there are automation and agentic use
there are automation and agentic use cases and we are absolutely seeing that
cases and we are absolutely seeing that where those are the the focus where
where those are the the focus where those use cases mention certain types of
those use cases mention certain types of automation or they mention agents they
automation or they mention agents they wildly outperform in terms of the
wildly outperform in terms of the self-reported ROI from them that's both
self-reported ROI from them that's both on automation and it's on agents and I
on automation and it's on agents and I think that that's sort of a a trend
think that that's sort of a a trend towards where we're headed with sort of
towards where we're headed with sort of the next layer of more advanced use
the next layer of more advanced use cases.
cases. The last thing that uh from this sort of
The last thing that uh from this sort of first first look of observations is
first first look of observations is there is clearly benefits and this goes
there is clearly benefits and this goes back to to what we saw with that
back to to what we saw with that Mackenzie study as well of thinking
Mackenzie study as well of thinking about AI and agentic transformation in
about AI and agentic transformation in systematic cross-organizational
systematic cross-organizational cross-disciplinary types of terms. um
cross-disciplinary types of terms. um effectively pretty much uh directly the
effectively pretty much uh directly the more use cases that a person or an
more use cases that a person or an organization submitted the the better
organization submitted the the better they tended to see uh ROI for. Now
they tended to see uh ROI for. Now there's lots of reasons for that but I
there's lots of reasons for that but I do think it speaks to that that core
do think it speaks to that that core idea that once you move beyond kind of
idea that once you move beyond kind of your single spot experiments there's a
your single spot experiments there's a lot of opportunity uh to to sort of grow
lot of opportunity uh to to sort of grow grow the impact of the organization. So,
grow the impact of the organization. So, like I said, that is the the first look.
like I said, that is the the first look. Uh, it's kind of the first twothirds of
Uh, it's kind of the first twothirds of these uh of these use cases. We'll be
these uh of these use cases. We'll be open for another week and then we'll
open for another week and then we'll have the full study out at the beginning
have the full study out at the beginning of December. Um, I'm really excited, I
of December. Um, I'm really excited, I think, heading into next year to see how
think, heading into next year to see how we move from sort of generic
we move from sort of generic conversations about impact uh and our
conversations about impact uh and our gut senses about impact to a lot more
gut senses about impact to a lot more random experiments like this to figure
random experiments like this to figure out where the impact really is and uh
out where the impact really is and uh and where we go next. So, look at that.
and where we go next. So, look at that. I'm going to end 27 seconds early and
I'm going to end 27 seconds early and really throw off the time, but
really throw off the time, but appreciate you guys all being here. Uh,
appreciate you guys all being here. Uh, and again, if you want to check this
and again, if you want to check this out, it's roicervey.ai.
As AI [music] changes our business and engineering landscape, do we need to
engineering landscape, do we need to rethink how we incentivize and
rethink how we incentivize and compensate engineers? Here to provide us
compensate engineers? Here to provide us with a case study for scaling output,
with a case study for scaling output, not overhead, is the co-founder and
not overhead, is the co-founder and managing partner at 10X, Arman Hezarki.
How's everybody feeling? It's been uh 7 and 1/2 hours. We doing what? Are we
and 1/2 hours. We doing what? Are we doing okay?
doing okay? >> Awesome. I'm Arman. Uh like the voice of
>> Awesome. I'm Arman. Uh like the voice of God apparently. That's what they're
God apparently. That's what they're called, Voice of God. Apparently. Uh so
called, Voice of God. Apparently. Uh so my name's Armon. I'm one of the
my name's Armon. I'm one of the co-founders and managing partners at a
co-founders and managing partners at a company called 10X. Uh my co-founder is
company called 10X. Uh my co-founder is Alex who's been uh kindly announcing
Alex who's been uh kindly announcing everybody all day. We do a lot of cool
everybody all day. We do a lot of cool work. We uh we help companies with their
work. We uh we help companies with their AI transformation. We have incredible
AI transformation. We have incredible clients all over the world. But I'm not
clients all over the world. But I'm not going to talk about any of that today.
going to talk about any of that today. I'm going to talk about something much
I'm going to talk about something much more niche. I'm going to talk about how
more niche. I'm going to talk about how we pay engineers. And we pay engineers
we pay engineers. And we pay engineers like salespeople. Earlier I was just in
like salespeople. Earlier I was just in the green room with a bunch of
the green room with a bunch of distinguished engineers that I've grown
distinguished engineers that I've grown to uh respect for my entire career. And
to uh respect for my entire career. And we were talking and I was telling them
we were talking and I was telling them that we pay engineers based on the story
that we pay engineers based on the story points that they complete. And we had a
points that they complete. And we had a lot of people roll their eyes and and
lot of people roll their eyes and and laugh. And they asked, "What do you
laugh. And they asked, "What do you mean?" And I said, "Clients pay us for
mean?" And I said, "Clients pay us for the number of story points that we
the number of story points that we deliver and we pay engineers based on
deliver and we pay engineers based on the number of story points that they
the number of story points that they complete." And similar to the looks that
complete." And similar to the looks that I'm getting from some of you, there was
I'm getting from some of you, there was skepticism.
skepticism. And I know this sounds crazy, but it's
And I know this sounds crazy, but it's working. We've been able to hire
working. We've been able to hire incredible engineers, many of whom have
incredible engineers, many of whom have started and exited uh companies before
started and exited uh companies before this. We have been able to hire
this. We have been able to hire worldclass machine learning and AI
worldclass machine learning and AI researchers. We've hired rocket
researchers. We've hired rocket scientists from NASA. We are shipping
scientists from NASA. We are shipping code incredibly quickly, and it's
code incredibly quickly, and it's maintainable and high quality code. Of
maintainable and high quality code. Of course, that is everyone's dream.
course, that is everyone's dream. Everybody wants to hire great people.
Everybody wants to hire great people. Everyone wants to deliver really uh fast
Everyone wants to deliver really uh fast code.
code. So, my goal here is not to convince you
So, my goal here is not to convince you all to adopt our model. My goal is to
all to adopt our model. My goal is to show you what compensation looks like in
show you what compensation looks like in AI and hopefully provide a new
AI and hopefully provide a new perspective on the fact that things
perspective on the fact that things might change as we introduce this
might change as we introduce this technology. Before I jump in though, I
technology. Before I jump in though, I want to talk about uh how we got here.
want to talk about uh how we got here. So, I'm a software engineer by training.
So, I'm a software engineer by training. I went to Carnegie Melon and then I
I went to Carnegie Melon and then I taught there in their school of computer
taught there in their school of computer science. After that, I went to Google
science. After that, I went to Google and I helped them scale their AI, cloud,
and I helped them scale their AI, cloud, and mobile practices internationally
and mobile practices internationally before starting a few ventureback
before starting a few ventureback startups. And in my last startup, I
startups. And in my last startup, I would work out of a weiwork. And I was
would work out of a weiwork. And I was sitting in this uh 33 Irving Weiwork. If
sitting in this uh 33 Irving Weiwork. If any of you are from New York, you you
any of you are from New York, you you might have worked out of that we work.
might have worked out of that we work. And they have these big tables and there
And they have these big tables and there were 12 of 12 of us kind of sitting
were 12 of 12 of us kind of sitting around. No one's talking. Everyone has
around. No one's talking. Everyone has their headphones in. And I look to my
their headphones in. And I look to my left and I see I see somebody with
left and I see I see somebody with Visual Studio Code open, right? I'm
Visual Studio Code open, right? I'm like, "Okay, I have a fellow engineer to
like, "Okay, I have a fellow engineer to my left." And I see that he was typing,
my left." And I see that he was typing, but I didn't see a chat window. This
but I didn't see a chat window. This person was typing into the code editor.
person was typing into the code editor. They were typing fo
They were typing fo like a caveman. This this poor person
like a caveman. This this poor person was typing like with their little
was typing like with their little chopstick fingers individual characters.
chopstick fingers individual characters. I I I couldn't believe it. On my
I I I couldn't believe it. On my computer, I had 45 agents. Three were
computer, I had 45 agents. Three were ordering me lunch. Two were writing
ordering me lunch. Two were writing code. One was doing research. Just
code. One was doing research. Just different worlds were happening on my
different worlds were happening on my computer versus this person's computer.
computer versus this person's computer. And I felt bad. I thought maybe we
And I felt bad. I thought maybe we should do a GoFundMe or something. But I
should do a GoFundMe or something. But I I I tried to look deeply at what is
I I tried to look deeply at what is actually causing this difference. Why am
actually causing this difference. Why am I using AI in the way that I am? And why
I using AI in the way that I am? And why is this person not?
is this person not? There are different ways that that
There are different ways that that people try AI and there are different
people try AI and there are different reasons why people don't use it. We've
reasons why people don't use it. We've all heard people who have tried it and
all heard people who have tried it and have said it's not as good as me. We've
have said it's not as good as me. We've all heard people who have not tried it
all heard people who have not tried it because they don't want to. But
because they don't want to. But regardless, my belief is that this is an
regardless, my belief is that this is an incentive issue. For me, I was a founder
incentive issue. For me, I was a founder and I wanted to squeak out every bit of
and I wanted to squeak out every bit of incremental value and and efficiency
incremental value and and efficiency that I could. And so I would sit on
that I could. And so I would sit on Twitter and LinkedIn and read blog posts
Twitter and LinkedIn and read blog posts and try to understand what is the
and try to understand what is the cutting edge in software engineering and
cutting edge in software engineering and what's going to give me the ability to
what's going to give me the ability to output more code higher quality faster.
output more code higher quality faster. And because of that, I was using all
And because of that, I was using all these all these different agents. But
these all these different agents. But this person probably worked at a
this person probably worked at a startup, probably had a base salary with
startup, probably had a base salary with an annual bonus and some equity. And
an annual bonus and some equity. And that was supposed to be the model that
that was supposed to be the model that incentivized people to be innovated, to
incentivized people to be innovated, to be innovative, and to work smarter and
be innovative, and to work smarter and faster and harder. But it wasn't
faster and harder. But it wasn't working. And so, in order to understand
working. And so, in order to understand how we got to where we are, I'm going to
how we got to where we are, I'm going to do a brief uh history of compensation.
do a brief uh history of compensation. And this is by no means accurate. I'm
And this is by no means accurate. I'm making a lot of things up here. It's all
making a lot of things up here. It's all illustrative. Okay. So, back in the day,
illustrative. Okay. So, back in the day, we had some cavemen who were writing
we had some cavemen who were writing code. We were we're uh probably
code. We were we're uh probably inscribing C in a in a tablet somewhere
inscribing C in a in a tablet somewhere and we were paying people hourly, right?
and we were paying people hourly, right? This makes sense. I look at somebody
This makes sense. I look at somebody sitting in a chair and I'm going to pay
sitting in a chair and I'm going to pay them some amount of dollars for some
them some amount of dollars for some amount of time. That makes sense for me
amount of time. That makes sense for me and it makes sense for the for the
and it makes sense for the for the engineer. But why is that broken? I
engineer. But why is that broken? I actually I want to hear from people. Why
actually I want to hear from people. Why is hourly broken?
>> It's slow output. >> No upside.
>> No upside. >> There's no upside. there's no reason to
>> There's no upside. there's no reason to work faster, right? And in fact, there's
work faster, right? And in fact, there's a disincentive to work faster. And so,
a disincentive to work faster. And so, what if I I notice this as the buyer of
what if I I notice this as the buyer of this technology and I say, "Okay, how
this technology and I say, "Okay, how long is it going to take you? It's going
long is it going to take you? It's going to take you five hours. Okay, so I'll
to take you five hours. Okay, so I'll pay you 500 bucks, right? Hourly $100.
pay you 500 bucks, right? Hourly $100. Multiply that by five." And then you as
Multiply that by five." And then you as the engineer, if you work faster, great.
the engineer, if you work faster, great. You get to keep the $500. And if you
You get to keep the $500. And if you work slower, that's on you. As
work slower, that's on you. As engineers, we're really, really bad at
engineers, we're really, really bad at estimating how long things are going to
estimating how long things are going to take. And so because of that, I'm not
take. And so because of that, I'm not going to say it's going to take five
going to say it's going to take five hours. I'm going to say it's going to
hours. I'm going to say it's going to take 15 hours, 20 hours, so that I have
take 15 hours, 20 hours, so that I have no downside. And so again, as the buyer,
no downside. And so again, as the buyer, I don't want to pay you based on the
I don't want to pay you based on the project.
project. So what if we hire people on salary and
So what if we hire people on salary and give them a bonus, right? Well, we in
give them a bonus, right? Well, we in the startup community know what happens
the startup community know what happens in that when when this is the case.
in that when when this is the case. People punch in at five, leave or nine,
People punch in at five, leave or nine, leave at five. And so I'm Larry Paige. I
leave at five. And so I'm Larry Paige. I notice this and I see why am I working
notice this and I see why am I working so hard at Google? Why am I putting my
so hard at Google? Why am I putting my blood, sweat, and tears into this? It's
blood, sweat, and tears into this? It's because I have some of the upside. I own
because I have some of the upside. I own the company, right? And so when we exit
the company, right? And so when we exit for for many, many dollars, I'm going to
for for many, many dollars, I'm going to see that. So what if I can share that
see that. So what if I can share that with my employees? And that's when
with my employees? And that's when equity comes in. And and this has
equity comes in. And and this has worked. This has worked for many many
worked. This has worked for many many years to incentivize employees. This is
years to incentivize employees. This is this is the foundation of the startup
this is the foundation of the startup community that we all know and are a
community that we all know and are a part of. It's incredible.
part of. It's incredible. But
But the not every company is Google. In
the not every company is Google. In fact, for every one Google, there are
fact, for every one Google, there are many many failures. And software
many many failures. And software engineers know this, right? For those
engineers know this, right? For those who want to take the risk, many will
who want to take the risk, many will just go to YC or or start their own
just go to YC or or start their own company. And for the ones who don't want
company. And for the ones who don't want the risk, they're opting for cash over
the risk, they're opting for cash over equity. Many of us who've hired
equity. Many of us who've hired engineers know that the cash is
engineers know that the cash is non-negotiable. Equity. Yeah, sure. I'll
non-negotiable. Equity. Yeah, sure. I'll take some upside.
take some upside. And so my contention is that this model
And so my contention is that this model needs to be reinvented in the age of AI.
needs to be reinvented in the age of AI. We need to directly incentivize people
We need to directly incentivize people to use these tools and to use them well
to use these tools and to use them well and to still maintain really high
and to still maintain really high quality standards of code. And so here's
quality standards of code. And so here's how it works for us. So we basically
how it works for us. So we basically just to take a step back, we do two
just to take a step back, we do two types of work at 10X. One is road
types of work at 10X. One is road mapping and one is execution. So
mapping and one is execution. So companies come to us and they say, "Hey,
companies come to us and they say, "Hey, we want AI." That's generally the
we want AI." That's generally the request. Sometimes it's more specific.
request. Sometimes it's more specific. It's like, hey, I want my customer
It's like, hey, I want my customer service team to have 10% more uh output
service team to have 10% more uh output using AI, right? But but generally they
using AI, right? But but generally they come to us with a request. We do a bunch
come to us with a request. We do a bunch of studying and learning and then we
of studying and learning and then we output a road map and based on that road
output a road map and based on that road map, they can take it and work on it on
map, they can take it and work on it on their own or we can do it. For a lot of
their own or we can do it. For a lot of things, we're taking off-the-shelf
things, we're taking off-the-shelf tools, but a lot of what we do is custom
tools, but a lot of what we do is custom builds and that's where the story point
builds and that's where the story point model comes in. So we will build a
model comes in. So we will build a roadmap for a lot of our clients. But
roadmap for a lot of our clients. But once they see that then they're putting
once they see that then they're putting in requests on their own as well. And we
in requests on their own as well. And we have two roles in the company that are
have two roles in the company that are client facing. One is the strategist and
client facing. One is the strategist and the other is the AI engineer. The
the other is the AI engineer. The strategists are all are mostly
strategists are all are mostly technical. And so we've have we have
technical. And so we've have we have former PMs, we have former engineers.
former PMs, we have former engineers. They are doing PM type work, consulting
They are doing PM type work, consulting type work. They're the ones that are
type work. They're the ones that are taking the product requirements and
taking the product requirements and distilling that down with the client.
distilling that down with the client. Then they hand that over to the engineer
Then they hand that over to the engineer and the engineer puts together an
and the engineer puts together an architecture design document. They spend
architecture design document. They spend a lot of time doing that. In fact, that
a lot of time doing that. In fact, that is where most of our engineering time
is where most of our engineering time goes. Then they write code and they
goes. Then they write code and they start implementing that that
start implementing that that architecture design document includes
architecture design document includes tickets and each ticket is graded on
tickets and each ticket is graded on some number of story points. This is a
some number of story points. This is a very traditional method of doing work,
very traditional method of doing work, right?
right? And when that ticket is accepted, the
And when that ticket is accepted, the engineer gets paid a a fee per story
engineer gets paid a a fee per story point that they complete. Our engineers
point that they complete. Our engineers have a flat base that they're paid and
have a flat base that they're paid and then every quarter we round up based on
then every quarter we round up based on the story points that they've completed.
the story points that they've completed. And again, this has led to us being able
And again, this has led to us being able to hire incredible people, but we've
to hire incredible people, but we've also been able to do incredible work.
also been able to do incredible work. So, I'm going to walk through a couple
So, I'm going to walk through a couple of projects that we've done. So, this is
of projects that we've done. So, this is one. This is a billboard company.
one. This is a billboard company. If you go to Times Square right now,
If you go to Times Square right now, you'll see some billboards that they've
you'll see some billboards that they've sold that inventory for.
sold that inventory for. They sell in two ways. One is you can
They sell in two ways. One is you can call them up traditional sales, you can
call them up traditional sales, you can buy that inventory, but the other is
buy that inventory, but the other is they have like an Uber for billboards
they have like an Uber for billboards type of product where you can go online,
type of product where you can go online, you can upload a PNG, you can choose
you can upload a PNG, you can choose where you want this to run and for how
where you want this to run and for how long, similar to like a Facebook or
long, similar to like a Facebook or Google ad. It's very similar to that
Google ad. It's very similar to that experience. And they came to us and they
experience. And they came to us and they said, "Hey, we think that there's some
said, "Hey, we think that there's some opportunities for AI in our product." We
opportunities for AI in our product." We did an analysis and we found a few. One
did an analysis and we found a few. One of them is this. We found that when an
of them is this. We found that when an image is uploaded to their system, it
image is uploaded to their system, it has to go through two rounds of
has to go through two rounds of moderation. One is internal to the
moderation. One is internal to the company and the other is with the
company and the other is with the billboard owner. Internal to their
billboard owner. Internal to their company, they're spending money on that
company, they're spending money on that to actually hire the people to do that
to actually hire the people to do that and there's a lot of inaccuracy
and there's a lot of inaccuracy and it takes a lot of time. So that
and it takes a lot of time. So that costs them money and it costs them
costs them money and it costs them revenue because every moment that the
revenue because every moment that the billboard is not running, they're not
billboard is not running, they're not making money. And so we found what if we
making money. And so we found what if we could build an AI model that can
could build an AI model that can actually do this moderation for them. We
actually do this moderation for them. We scoped that out. We built the
scoped that out. We built the architecture uh design doc. We broke it
architecture uh design doc. We broke it down into tickets and we built this for
down into tickets and we built this for them. We did it in two weeks and we got
them. We did it in two weeks and we got to 96% accuracy when compared to the
to 96% accuracy when compared to the human moderator. We've done a lot of
human moderator. We've done a lot of other projects with this company as
other projects with this company as well. This is another company. This is
well. This is another company. This is they work with retailers all around the
they work with retailers all around the world and currently they have devices in
world and currently they have devices in these retailers and they're low power
these retailers and they're low power devices. And so because of this, they're
devices. And so because of this, they're able to run one AI model on device. And
able to run one AI model on device. And what this model does is does heat
what this model does is does heat mapping. So imagine there's a camera in
mapping. So imagine there's a camera in this room looks down and it can
this room looks down and it can basically generate a heat map of where
basically generate a heat map of where the traffic is throughout the day. And
the traffic is throughout the day. And for retailers, of course, this is very,
for retailers, of course, this is very, very useful. But there's other things
very useful. But there's other things you can do too, right? If we just sit
you can do too, right? If we just sit here for a few minutes, we can probably
here for a few minutes, we can probably come up with a lot of ideas of if you
come up with a lot of ideas of if you have a camera with a chip, you can make
have a camera with a chip, you can make a lot of money from that. You can show
a lot of money from that. You can show really useful information. And so that's
really useful information. And so that's what we did. We we came up with what are
what we did. We we came up with what are some of the things that we could do with
some of the things that we could do with this? If you put a little bit little bit
this? If you put a little bit little bit more power in that chip, if you make the
more power in that chip, if you make the models, if you quantize them so they can
models, if you quantize them so they can run in parallel, what could you do? And
run in parallel, what could you do? And so we gave them this report and then we
so we gave them this report and then we built them five models that can run in
built them five models that can run in parallel. It does everything from heat
parallel. It does everything from heat mapping to Q detection to theft
mapping to Q detection to theft detection and more. And again, we start
detection and more. And again, we start with the product requirement stock. We
with the product requirement stock. We break this down into architecture. Then
break this down into architecture. Then we build it and then we pay engineers
we build it and then we pay engineers based on the output.
This is the big question. What are the risks? Right? I just talked about
risks? Right? I just talked about dandelions and rainbows, right? Uh so I
dandelions and rainbows, right? Uh so I promised you that my goal is not to
promised you that my goal is not to convince you to do this. And part of
convince you to do this. And part of this is showing you what the potential
this is showing you what the potential risks are. These are a few that come up.
risks are. These are a few that come up. One is what if an engineer inflates the
One is what if an engineer inflates the story points, right? What if an engineer
story points, right? What if an engineer says, "Okay, you want me to add a
says, "Okay, you want me to add a button? 45 story points." Right?
button? 45 story points." Right? What if an engineer rushes and quality
What if an engineer rushes and quality drops? You're saying that it took two
drops? You're saying that it took two weeks to do that? Well, was it good? Did
weeks to do that? Well, was it good? Did it work?
it work? And what if engineers get sharp elbowed?
And what if engineers get sharp elbowed? I started this by saying that we
I started this by saying that we compensate engineers like salespeople.
compensate engineers like salespeople. It's not a it's not a culture that we
It's not a it's not a culture that we necessarily want to emulate in software
necessarily want to emulate in software engineering, right? So, how do we how do
engineering, right? So, how do we how do we uh make sure that that's not
we uh make sure that that's not happening?
happening? First of all, I mentioned that we have
First of all, I mentioned that we have two different roles and we compensate
two different roles and we compensate like a counterbalance. So strategists
like a counterbalance. So strategists are compensated based on NR which really
are compensated based on NR which really is like customer happiness
is like customer happiness and every single ticket has to be
and every single ticket has to be approved internally with multiple rounds
approved internally with multiple rounds of QA of which the strategist is
of QA of which the strategist is involved but also by the client and so
involved but also by the client and so there's a counterbalance to every single
there's a counterbalance to every single ticket that is delivered.
ticket that is delivered. Uh, I skipped to the second one. For the
Uh, I skipped to the second one. For the first one, inflating story points. The
first one, inflating story points. The strategists are the ones who scope it.
strategists are the ones who scope it. And again, we have to review all of
And again, we have to review all of that. And for the third, how do you make
that. And for the third, how do you make sure that all of this is correct? And
sure that all of this is correct? And how do you make sure that there's no
how do you make sure that there's no sharp elbows? How do you make sure that
sharp elbows? How do you make sure that everybody is happy and the dandelions
everybody is happy and the dandelions and rainbows are continue throughout
and rainbows are continue throughout this parade of joy? Well, you have to
this parade of joy? Well, you have to hire the right people. And this is what
hire the right people. And this is what I tell everybody.
I tell everybody. We make hiring incredibly difficult for
We make hiring incredibly difficult for ourselves so that everything else is
ourselves so that everything else is easy. And that is a principle that we
easy. And that is a principle that we all know and we all stand true to. And
all know and we all stand true to. And this is incredibly important with AI. My
this is incredibly important with AI. My co-founder, Alex, always says, "AI makes
co-founder, Alex, always says, "AI makes people look like one of those crazy
people look like one of those crazy mirrors where any one of your
mirrors where any one of your attributes, it makes it 10 times
attributes, it makes it 10 times larger." If you're a great engineer, AI
larger." If you're a great engineer, AI makes you great. If you're not, it makes
makes you great. If you're not, it makes you sloppier. And this is the case with
you sloppier. And this is the case with all of these things. You have to start
all of these things. You have to start with hiring.
with hiring. Our belief is that AI gives people
Our belief is that AI gives people superpowers and it makes all of us
superpowers and it makes all of us smarter, faster, and better at what we
smarter, faster, and better at what we do. But my belief is that the current
do. But my belief is that the current way that we compensate people is
way that we compensate people is actually holding them back. And I would
actually holding them back. And I would invite you to think about how can you
invite you to think about how can you compensate people on your team
compensate people on your team differently, whether it's software
differently, whether it's software engineering or anything else. If you
engineering or anything else. If you want to unlock your employees potential,
want to unlock your employees potential, feel free to reach out at arman@10x.co.
feel free to reach out at arman@10x.co. Thank you. [applause]
Thank you. [applause] Our
next presenter [music] is deputy CTO at DX, the engineering intelligence
DX, the engineering intelligence platform designed by leading researchers
platform designed by leading researchers speaking about effective leadership in
speaking about effective leadership in AI enhanced organizations.
AI enhanced organizations. Please join me in welcoming to the stage
Please join me in welcoming to the stage Justin Rio.
>> Hello. Thanks for joining me in one of the
Thanks for joining me in one of the later day sessions. Looks like we we we
later day sessions. Looks like we we we kept a lot of people here. This is a
kept a lot of people here. This is a nice full room. I'm great to see it.
nice full room. I'm great to see it. We're going to go through a lot of
We're going to go through a lot of content in a short amount of time. So,
content in a short amount of time. So, I'm going to get right into it. If you
I'm going to get right into it. If you want to get deeper into any of this
want to get deeper into any of this stuff, we have published this uh AI
stuff, we have published this uh AI strategy playbook for senior executives.
strategy playbook for senior executives. And uh a lot of the content that I'm
And uh a lot of the content that I'm going to go through, I'm not going to
going to go through, I'm not going to have time to get quite as deep, but this
have time to get quite as deep, but this is just a nice PDF copy that you can
is just a nice PDF copy that you can come and refer to later. If you missed
come and refer to later. If you missed this QR code, don't worry. I'll show it
this QR code, don't worry. I'll show it again uh at the end. So, what is the
again uh at the end. So, what is the current impact of Genai?
current impact of Genai? Nobody knows, right? We've got Google on
Nobody knows, right? We've got Google on the one hand telling us that everyone's
the one hand telling us that everyone's 10% more productive. That's interesting.
10% more productive. That's interesting. Now, they're Google. They were already
Now, they're Google. They were already pretty productive to begin with. But we
pretty productive to begin with. But we have this sort of now infamous meter MER
have this sort of now infamous meter MER study which has some flaws in the way
study which has some flaws in the way that study was put together that showed
that study was put together that showed actually a 19% decrease in productivity
actually a 19% decrease in productivity using codec assistance. So there's a lot
using codec assistance. So there's a lot of volatility, a lot of variability. Uh
of volatility, a lot of variability. Uh what was really interesting about this
what was really interesting about this study even though I I mentioned there
study even though I I mentioned there were some flaws. Um but every engineer
were some flaws. Um but every engineer that took part in this study felt more
that took part in this study felt more productive but then the data actually
productive but then the data actually bore out that they were less productive.
bore out that they were less productive. Kind of interesting, right? we've got
Kind of interesting, right? we've got this induced flow uh that makes us feel
this induced flow uh that makes us feel really good about what we're doing. So,
really good about what we're doing. So, we need to address this. Dora has put
we need to address this. Dora has put out some really good research on this
out some really good research on this too. But this is based on industry
too. But this is based on industry averages. This is impact based on what
averages. This is impact based on what do we look at when we see a large sample
do we look at when we see a large sample and an average of how certain factors
and an average of how certain factors are being impacted by in this case 25%
are being impacted by in this case 25% increase in AI adoption. We see these
increase in AI adoption. We see these modest but positive leaning indicators.
modest but positive leaning indicators. 7.5% increase in documentation quality
7.5% increase in documentation quality and uh increase in code quality by about
and uh increase in code quality by about 3.4%. At least that's not leaning in the
3.4%. At least that's not leaning in the other direction, right? And when we
other direction, right? And when we started digging through some of DX's
started digging through some of DX's data, we have, you know, we're the
data, we have, you know, we're the developer productivity measurement
developer productivity measurement company. We have lots of aggregate data
company. We have lots of aggregate data that we can look at with this. We found
that we can look at with this. We found the same thing. When we looked at
the same thing. When we looked at averages, we see about a 2.6% 6%
averages, we see about a 2.6% 6% increase in overall uh change
increase in overall uh change confidence, which is a a percentage of
confidence, which is a a percentage of people who answered positively that they
people who answered positively that they feel confident in the changes that
feel confident in the changes that they're putting into production. Uh
they're putting into production. Uh similar positive leaning average when we
similar positive leaning average when we looked at code maintainability, another
looked at code maintainability, another qualitative metric, a1% reduction in
qualitative metric, a1% reduction in change failure rate. uh which when you
change failure rate. uh which when you think about the industry benchmark being
think about the industry benchmark being 4% it's not insignificant
4% it's not insignificant but this is not the full story because
but this is not the full story because this is what we saw when we broke the
this is what we saw when we broke the same studies down per company. Every
same studies down per company. Every company here is a every every bar
company here is a every every bar represents a company right we have some
represents a company right we have some that are seeing 20% increases in change
that are seeing 20% increases in change confidence while others are seeing 20%
confidence while others are seeing 20% decreases. We're seeing extreme
decreases. We're seeing extreme volatility which is why these averages
volatility which is why these averages look so innocuous but they're belying
look so innocuous but they're belying the greater story of variability. See
the greater story of variability. See the same thing with code
the same thing with code maintainability.
maintainability. The same thing with change failure rate.
The same thing with change failure rate. So this is a 2% increase in change
So this is a 2% increase in change failure rate up here at the top. Again
failure rate up here at the top. Again with an industry benchmark of 4%. That
with an industry benchmark of 4%. That means shipping as much as 50% more
means shipping as much as 50% more defects than we were shipping before.
defects than we were shipping before. Right? We want to make sure we're on the
Right? We want to make sure we're on the lower end of this. But how? Like what
lower end of this. But how? Like what should we be doing? Well, we found some
should we be doing? Well, we found some patterns here. We see that some
patterns here. We see that some organizations are seeing positive
organizations are seeing positive impacts to KPIs, but others are
impacts to KPIs, but others are struggling with adoption and even seeing
struggling with adoption and even seeing some of these negative impacts. Top down
some of these negative impacts. Top down mandates are not working, right? Driving
mandates are not working, right? Driving towards, oh, we must have 100% adoption
towards, oh, we must have 100% adoption of AI. Great, I will update my read my
of AI. Great, I will update my read my file every morning and I will be
file every morning and I will be compliant, right? We're not actually
compliant, right? We're not actually moving the needle anywhere when we do
moving the needle anywhere when we do that. We also find that lack of
that. We also find that lack of education and enablement uh has a big
education and enablement uh has a big impact on sort of negatively impacting
impact on sort of negatively impacting this. Some organizations just turn on
this. Some organizations just turn on the tech and expect it to just start
the tech and expect it to just start working and everybody to know the best
working and everybody to know the best ways to use it. Uh and a difficulty
ways to use it. Uh and a difficulty measuring the impact or even knowing
measuring the impact or even knowing what we should be measuring like what
what we should be measuring like what metrics would should we be looking at
metrics would should we be looking at you know does utilization really tell us
you know does utilization really tell us much about the full story of genai
much about the full story of genai impact. This is another graph from Dora.
impact. This is another graph from Dora. uh this is a basian uh posterior
uh this is a basian uh posterior distribution which is an interesting way
distribution which is an interesting way of representing data. Basically you want
of representing data. Basically you want your mass to be on the yellow side of
your mass to be on the yellow side of this line uh the the uh the right side
this line uh the the uh the right side of this line for the audience. Yeah. And
of this line for the audience. Yeah. And you want a sharp peak which is telling
you want a sharp peak which is telling you that we're pretty confident that
you that we're pretty confident that this initiative will have this impact.
this initiative will have this impact. And if we look at some of the topline
And if we look at some of the topline initiatives here, these are things like
initiatives here, these are things like clear AI policies. All right, we want to
clear AI policies. All right, we want to make sure we have that. We want time to
make sure we have that. We want time to learn. Not just giving people materials,
learn. Not just giving people materials, but actually giving them space to
but actually giving them space to experiment, right? Um, and so these
experiment, right? Um, and so these types of factors are the ones that seem
types of factors are the ones that seem to be moving the needle the most. So,
to be moving the needle the most. So, we're going to go over some quick tips
we're going to go over some quick tips on how we can do all of these things.
on how we can do all of these things. And again, the guide will go deeper into
And again, the guide will go deeper into this. We want to integrate across the
this. We want to integrate across the SDLC. All right. For most organizations,
SDLC. All right. For most organizations, writing code has never been the
writing code has never been the bottleneck, right? We can in uh we can
bottleneck, right? We can in uh we can increase productivity a bit by helping
increase productivity a bit by helping with code completion, but our our
with code completion, but our our biggest bottlenecks are elsewhere within
biggest bottlenecks are elsewhere within the SDLC. There's a lot more to creating
the SDLC. There's a lot more to creating software than just writing code. We want
software than just writing code. We want to unblock usage. We can't just say,
to unblock usage. We can't just say, well, we're worried about data
well, we're worried about data xfiltration, so we can't try this thing
xfiltration, so we can't try this thing like no, get creative about it. We've
like no, get creative about it. We've got really good infrastructure out there
got really good infrastructure out there now like bedrock and fireworks AI that
now like bedrock and fireworks AI that can let us run powerful models in safe
can let us run powerful models in safe spaces. We have to have open discussions
spaces. We have to have open discussions about these metrics. We need to
about these metrics. We need to evangelize the wins and we need to let
evangelize the wins and we need to let our engineers know why we're gathering
our engineers know why we're gathering metrics and data. What is it that we're
metrics and data. What is it that we're trying to improve? We have to reduce the
trying to improve? We have to reduce the fear of AI, right? We have to make sure
fear of AI, right? We have to make sure that people understand that this is not
that people understand that this is not a technology that is ready to replace
a technology that is ready to replace engineers. This is a a technology that's
engineers. This is a a technology that's really good at augmenting engineers and
really good at augmenting engineers and increasing the throughput of our
increasing the throughput of our business. We have to establish better
business. We have to establish better compliance and trust and we need to tie
compliance and trust and we need to tie this stuff to employee success. These
this stuff to employee success. These are new skill sets. AI is not coming for
are new skill sets. AI is not coming for your job, but somebody really good at AI
your job, but somebody really good at AI might take your job. And so, as leaders,
might take your job. And so, as leaders, we have the opportunity to help our
we have the opportunity to help our employees become more successful with
employees become more successful with this technology. So, how do we reduce
this technology. So, how do we reduce the fear? Well, first of all, why do we
the fear? Well, first of all, why do we need to do this? Well, there's a lot of
need to do this? Well, there's a lot of good reasons, but I love to point to
good reasons, but I love to point to Google's project Aristotle. This was a
Google's project Aristotle. This was a 2012 study where Google wanted to figure
2012 study where Google wanted to figure out what are the characteristics of
out what are the characteristics of highly performant teams. uh they thought
highly performant teams. uh they thought that the recipe was just going to be
that the recipe was just going to be what Google had this combination of high
what Google had this combination of high performers, experienced managers and
performers, experienced managers and basically unlimited resources and they
basically unlimited resources and they were dead wrong. Overwhelmingly the
were dead wrong. Overwhelmingly the biggest indicator of productivity was
biggest indicator of productivity was psychological safety. Okay. And so that
psychological safety. Okay. And so that very much applies now. We also have data
very much applies now. We also have data like this is SWEBench. I'm sure a lot of
like this is SWEBench. I'm sure a lot of you have seen this and there are some
you have seen this and there are some impressive benchmarks that the agents
impressive benchmarks that the agents can do like a third of the things
can do like a third of the things they're asked to do without any human
they're asked to do without any human intervention. That means that they're
intervention. That means that they're not able to do twothirds of them. Right?
not able to do twothirds of them. Right? Again, we are augmenting. We're not
Again, we are augmenting. We're not replacing. We're not ready. We may never
replacing. We're not ready. We may never be ready. So, we need to be very
be ready. So, we need to be very transparent with what we're doing. We
transparent with what we're doing. We need to set very clear intents. Why, you
need to set very clear intents. Why, you know, are we uh using this to to
know, are we uh using this to to augment, not to replace. We need to be
augment, not to replace. We need to be proactive in the way that we communicate
proactive in the way that we communicate that and not just wait for people to get
that and not just wait for people to get upset and possibly scared. We need to
upset and possibly scared. We need to say, "No, we are here to help you to
say, "No, we are here to help you to give you a better developer experience
give you a better developer experience and to increase the throughput of the
and to increase the throughput of the business." And again we have to have
business." And again we have to have these discussions about metrics. Now
these discussions about metrics. Now what metrics what should we be looking
what metrics what should we be looking at? Well DX again developer experience
at? Well DX again developer experience and productivity measurement company. Um
and productivity measurement company. Um there are two sort of classes of metrics
there are two sort of classes of metrics that we can be looking at really two
that we can be looking at really two levers that matter here and that's speed
levers that matter here and that's speed and quality. Right? We want to increase
and quality. Right? We want to increase PR throughput. We want to increase our
PR throughput. We want to increase our velocity but not by just creating a
velocity but not by just creating a bunch of slop that's going to give us a
bunch of slop that's going to give us a bunch of tech debt later that we're
bunch of tech debt later that we're going to have to deal with and we just
going to have to deal with and we just kick the bottleneck down the road if we
kick the bottleneck down the road if we do that. Right? So we want to be looking
do that. Right? So we want to be looking at things like change failure rate, our
at things like change failure rate, our overall perception of quality, change
overall perception of quality, change confidence, maintainability.
confidence, maintainability. And we have three types of metrics that
And we have three types of metrics that we can be looking at here. We have our
we can be looking at here. We have our telemetry metrics. These are the things
telemetry metrics. These are the things coming out of the API. And they're good
coming out of the API. And they're good for some stuff, but they're not always
for some stuff, but they're not always accurate, right? We know like accept
accurate, right? We know like accept versus suggest was kind of like all the
versus suggest was kind of like all the rage until we realize that engineers
rage until we realize that engineers need to click accept in the IDE in order
need to click accept in the IDE in order for the API to know about it. even if
for the API to know about it. even if they do click accept, who's to say they
they do click accept, who's to say they didn't just go back and rewrite every
didn't just go back and rewrite every line that was suggested, right? So
line that was suggested, right? So that's providing us some context, but we
that's providing us some context, but we also need to do some experience
also need to do some experience sampling. We need to like for instance
sampling. We need to like for instance add a new field to a PR form that says I
add a new field to a PR form that says I used AI to generate this PR or I enjoyed
used AI to generate this PR or I enjoyed using AI to generate this PR and get
using AI to generate this PR and get some data that way. And then
some data that way. And then self-reported data or survey data. We
self-reported data or survey data. We are big on surveys, but let me
are big on surveys, but let me underscore we're big on effective
underscore we're big on effective surveys. 90% plus participation rates
surveys. 90% plus participation rates engineered against questions that treat
engineered against questions that treat developer experience as a systems
developer experience as a systems problem not a people problem because
problem not a people problem because that's what it is W. Edwards Deming 90
that's what it is W. Edwards Deming 90 to 95% of the productivity output of an
to 95% of the productivity output of an organization is determined by the system
organization is determined by the system and not the worker. Okay, so
and not the worker. Okay, so foundational developer experience and
foundational developer experience and developer productivity metrics still
developer productivity metrics still matter the most. Right? Our AI metrics
matter the most. Right? Our AI metrics like utilization and things are telling
like utilization and things are telling us what's happening with the tech, but
us what's happening with the tech, but these core metrics that we've been able
these core metrics that we've been able to trust are telling us whether these
to trust are telling us whether these initiatives are actually working, right?
initiatives are actually working, right? Are we actually moving the needle and
Are we actually moving the needle and having the outcomes that we want to see?
having the outcomes that we want to see? So top companies are looking at
So top companies are looking at different things, right? We are seeing
different things, right? We are seeing like adoption metrics coming out of
like adoption metrics coming out of Microsoft. They've also got this great
Microsoft. They've also got this great metric called a bad developer day. I'm
metric called a bad developer day. I'm not going to go into it, but there's a
not going to go into it, but there's a really good white paper that shows like
really good white paper that shows like all the different telemetry that they
all the different telemetry that they can look at to determine what makes a
can look at to determine what makes a bad developer day. Dropbox is looking at
bad developer day. Dropbox is looking at similar stuff. Adoption like weekly
similar stuff. Adoption like weekly active users, daily active users, that
active users, daily active users, that sort of thing, but also looking at
sort of thing, but also looking at quality metrics like change failure
quality metrics like change failure rate. And booking is looking at similar
rate. And booking is looking at similar stuff as well. And so we built a
stuff as well. And so we built a framework around this. We were first to
framework around this. We were first to market with what we call our DXAI
market with what we call our DXAI measurement framework. And this is very
measurement framework. And this is very much inspired by things like Dora space
much inspired by things like Dora space framework, DevX just like our core four
framework, DevX just like our core four metric set which you can ask me about
metric set which you can ask me about later. Uh and we take these metrics and
later. Uh and we take these metrics and we uh normalize them into these three
we uh normalize them into these three dimensions of utilization, impact and
dimensions of utilization, impact and cost. And you can kind of think about
cost. And you can kind of think about this as a maturity curve too. A lot of
this as a maturity curve too. A lot of people start just figuring out okay
people start just figuring out okay what's happening? who's using the tech,
what's happening? who's using the tech, what's the percentage of pull requests
what's the percentage of pull requests that we're getting that are AI assisted
that we're getting that are AI assisted maybe through experience sampling? How
maybe through experience sampling? How many tasks are being assigned to agents?
many tasks are being assigned to agents? But then we can mature that perspective
But then we can mature that perspective a little bit and we can correlate that
a little bit and we can correlate that utilization to impact. What is this
utilization to impact. What is this actually doing to velocity? What is this
actually doing to velocity? What is this actually doing to quality? And this is
actually doing to quality? And this is when we start getting more mature in our
when we start getting more mature in our picture of our impact. And then finally,
picture of our impact. And then finally, cost. Although I like to joke that we're
cost. Although I like to joke that we're 15 years past the last hype cycle, which
15 years past the last hype cycle, which was cloud, and we still have new
was cloud, and we still have new companies spinning up that are teaching
companies spinning up that are teaching us how to understand and optimize our
us how to understand and optimize our cloud costs. So, we will see if we get
cloud costs. So, we will see if we get there. Although, I also hear horror
there. Although, I also hear horror stories about people burning through
stories about people burning through 2,000 tokens at $2,000 worth of tokens a
2,000 tokens at $2,000 worth of tokens a day. So, we probably do need to hit that
day. So, we probably do need to hit that as well. What about compliance and
as well. What about compliance and trust? What can we do to ensure that the
trust? What can we do to ensure that the output uh that that's being generated is
output uh that that's being generated is something that can be trusted by our
something that can be trusted by our engineers? We have a lot of levers to
engineers? We have a lot of levers to pull here, but one of the ones that I'd
pull here, but one of the ones that I'd like to talk about is setting up a
like to talk about is setting up a feedback loop for our system prompts. So
feedback loop for our system prompts. So these could be called system prompts,
these could be called system prompts, cursor rules, agent markdown. Pretty
cursor rules, agent markdown. Pretty much all of the mainstream solutions
much all of the mainstream solutions have something like this where you can
have something like this where you can go and provide a set of rules uh to
go and provide a set of rules uh to control how these models behave. Uh and
control how these models behave. Uh and I won't get too much into the technical
I won't get too much into the technical details here. We have an example where
details here. We have an example where like the uh models have been providing
like the uh models have been providing outdated Spring Boot uh stuff. We want
outdated Spring Boot uh stuff. We want Spring Boot 3. It's It's been sending us
Spring Boot 3. It's It's been sending us Spring Boot 2 stuff. The big takeaway
Spring Boot 2 stuff. The big takeaway here is to have the feedback loop. Have
here is to have the feedback loop. Have a gatekeeper, right? Have somebody or a
a gatekeeper, right? Have somebody or a group in the organization that can
group in the organization that can receive this feedback that understand
receive this feedback that understand how to maintain and continuously improve
how to maintain and continuously improve these system prompts, right? And that
these system prompts, right? And that way we're always maintaining the way
way we're always maintaining the way that these assistants or models or
that these assistants or models or agents affect the whole business. It
agents affect the whole business. It also pays to understand the way that uh
also pays to understand the way that uh temperature works, especially when we're
temperature works, especially when we're building agents, right? we do have some
building agents, right? we do have some control over the determinism and
control over the determinism and nondeterminism of these models. Uh again
nondeterminism of these models. Uh again like when a model is predicting a next
like when a model is predicting a next token, it doesn't just have like one
token, it doesn't just have like one token. It has a matrix of tokens and
token. It has a matrix of tokens and those are associated with a certain
those are associated with a certain probability of that being like the right
probability of that being like the right token. And so we have this setting
token. And so we have this setting called temperature, which is heat, which
called temperature, which is heat, which is entropy, which is randomness that can
is entropy, which is randomness that can control the amount of randomness
control the amount of randomness involved in actually picking that token.
involved in actually picking that token. This is sometimes called increasing the
This is sometimes called increasing the creativity of the model. And it's a
creativity of the model. And it's a number between 0 and one. For those
number between 0 and one. For those reasons I just mentioned, don't use zero
reasons I just mentioned, don't use zero or don't use one. Weird things will
or don't use one. Weird things will happen. But you want some decimal in
happen. But you want some decimal in between zero and one. When we have a
between zero and one. When we have a lower temperature, like we're seeing
lower temperature, like we're seeing here, 0.001,
here, 0.001, we give it the same task twice, and it
we give it the same task twice, and it gives us the exact same output character
gives us the exact same output character for character. When we set that
for character. When we set that temperature higher, this is an example
temperature higher, this is an example of 0.9. I'm asking the agent to create a
of 0.9. I'm asking the agent to create a gradient for me, uh, simple task. It's
gradient for me, uh, simple task. It's giving me two relatively valid
giving me two relatively valid solutions. I did ask it for a JavaScript
solutions. I did ask it for a JavaScript method and this is the only one that's
method and this is the only one that's giving me a JavaScript method. But the
giving me a JavaScript method. But the point is they are wildly different
point is they are wildly different approaches to the same problem when I've
approaches to the same problem when I've increased the creativity of that model.
increased the creativity of that model. So we need to think about like use case-
So we need to think about like use case- wise where should we have more
wise where should we have more creativity and where should we have more
creativity and where should we have more determinism and temperature is another
determinism and temperature is another setting that we have that can help
setting that we have that can help control this. You can experiment with
control this. You can experiment with all this using like docker model runner
all this using like docker model runner lama lm studio that sort of thing. How
lama lm studio that sort of thing. How can we tie this to better employee
can we tie this to better employee success? We had to provide both
success? We had to provide both education and adequate time to learn. So
education and adequate time to learn. So we put together a study where we sampled
we put together a study where we sampled a bunch of uh developers that were
a bunch of uh developers that were saving at least an hour a day uh uh
saving at least an hour a day uh uh excuse me an hour a week and we asked
excuse me an hour a week and we asked them to stack rank their top five most
them to stack rank their top five most valuable use cases. And we built a guide
valuable use cases. And we built a guide around that. A guide that effectively
around that. A guide that effectively goes through code examples, prompting
goes through code examples, prompting examples uh of what we determined using
examples uh of what we determined using the sort of data approach where we
the sort of data approach where we should get more reflexive about our best
should get more reflexive about our best practice and about uh the use cases that
practice and about uh the use cases that we're becoming reflexive in in our use
we're becoming reflexive in in our use of AI. And so that's what this guide was
of AI. And so that's what this guide was about. And uh we've had this become
about. And uh we've had this become required reading in certain engineering
required reading in certain engineering groups and uh proud of that. And this is
groups and uh proud of that. And this is another way that we can help educate.
another way that we can help educate. But we need to give time. Uh we don't
But we need to give time. Uh we don't have time to go through all of this. I
have time to go through all of this. I do think it's interesting that the
do think it's interesting that the number one use case for this was stack
number one use case for this was stack trace analysis, right? So, not a
trace analysis, right? So, not a generative use case, actually more of an
generative use case, actually more of an interpretive use case. And we see some
interpretive use case. And we see some other ones here that are not too
other ones here that are not too surprising. And there's examples of each
surprising. And there's examples of each of these. What about unblocking usage?
of these. What about unblocking usage? How can we make sure that we can
How can we make sure that we can creatively ensure that engineers can
creatively ensure that engineers can take the most advantage of this? Well,
take the most advantage of this? Well, leverage self-hosted and private models.
leverage self-hosted and private models. That's getting easier and easier to do.
That's getting easier and easier to do. Partner with compliance on day one,
Partner with compliance on day one, right? Make sure that what you're doing
right? Make sure that what you're doing is in line with your organization's
is in line with your organization's compliance. You may find that you're
compliance. You may find that you're making a lot of assumptions about things
making a lot of assumptions about things that you don't think you can do that you
that you don't think you can do that you can actually do, right? And then think
can actually do, right? And then think creatively around various barriers.
creatively around various barriers. Finally, how can we integrate across the
Finally, how can we integrate across the SDLC? What should we think about doing
SDLC? What should we think about doing there? You know, and I'm a big Ellie
there? You know, and I'm a big Ellie Gold theory of constraints fan. Probably
Gold theory of constraints fan. Probably have some others in the audience. An
have some others in the audience. An hour saved on something that isn't the
hour saved on something that isn't the bottleneck is worthless. And when we
bottleneck is worthless. And when we look at data across in this case almost
look at data across in this case almost 140,000 engineers, we find that there
140,000 engineers, we find that there are definitely good like annualized time
are definitely good like annualized time savings with AI that are being eclipsed
savings with AI that are being eclipsed by sources of context switching and
by sources of context switching and interruption, meeting heavy days, these
interruption, meeting heavy days, these other things that it's like, yeah, we
other things that it's like, yeah, we can save time here, but we're losing so
can save time here, but we're losing so much more time over there. So find the
much more time over there. So find the bottleneck, fix the bottleneck, right?
bottleneck, fix the bottleneck, right? Morgan Stanley's been very public about
Morgan Stanley's been very public about their uh building this thing called Dev
their uh building this thing called Dev Gen AI that looks at a bunch of legacy
Gen AI that looks at a bunch of legacy code, Cobalt, mainframe natural. I hate
code, Cobalt, mainframe natural. I hate to admit Pearl because I'm an old school
to admit Pearl because I'm an old school Pearl developer. Uh but apparently
Pearl developer. Uh but apparently that's legacy now, too. And basically
that's legacy now, too. And basically creating specs uh for developers that
creating specs uh for developers that can just be handed to developers to
can just be handed to developers to start modernizing the code without
start modernizing the code without having to do all that reverse
having to do all that reverse engineering, right? And they're saving
engineering, right? And they're saving about 300,000 hours annually right now
about 300,000 hours annually right now doing this. There's a Wall Street
doing this. There's a Wall Street Journal journal article about this,
Journal journal article about this, Business Insider article about it. Uh
Business Insider article about it. Uh they're very public about that. Zapier,
they're very public about that. Zapier, Zapier should be the example for
Zapier should be the example for everyone. They have a whole series of
everyone. They have a whole series of bots and agents that are doing things
bots and agents that are doing things like assisting with onboarding. They can
like assisting with onboarding. They can now make engineers effective in two
now make engineers effective in two weeks. Industry benchmark on the good
weeks. Industry benchmark on the good side is like a month. On the medium side
side is like a month. On the medium side is like 90 days. And uh because they're
is like 90 days. And uh because they're able to increase the effectiveness of
able to increase the effectiveness of the engineers that they're h that
the engineers that they're h that they've bringing into the organization,
they've bringing into the organization, they realized that they should be hiring
they realized that they should be hiring more, right? As opposed to trying to
more, right? As opposed to trying to maintain status quo by like cutting
maintain status quo by like cutting headcount and trying to make individual
headcount and trying to make individual engineers more productive. They said,
engineers more productive. They said, "No, we could get more value out of a
"No, we could get more value out of a single engineer. We should be hiring
single engineer. We should be hiring faster than ever." And they are, and
faster than ever." And they are, and it's really increasing their competitive
it's really increasing their competitive edge. I think that's the right attitude.
edge. I think that's the right attitude. Spotify has been helping out their S
Spotify has been helping out their S surres by pulling together context when
surres by pulling together context when incidents uh are detected and then
incidents uh are detected and then taking things like run but steps and and
taking things like run but steps and and other areas of context and documentation
other areas of context and documentation and pushing them directly into S sur
and pushing them directly into S sur channels so that those critical minutes
channels so that those critical minutes of trying to get to the bottom of what's
of trying to get to the bottom of what's actually happening and what we should do
actually happening and what we should do do to resolve the incident uh they just
do to resolve the incident uh they just eliminated that time right it's
eliminated that time right it's significantly increased their MTTR so
significantly increased their MTTR so let's get creative about areas in the
let's get creative about areas in the STLC that are our actual bottlenecks
STLC that are our actual bottlenecks All right, next steps. Uh, distribute
All right, next steps. Uh, distribute this guide as a reference for
this guide as a reference for integrating AI into the development
integrating AI into the development workflows that you have. Uh, determine a
workflows that you have. Uh, determine a method for measuring and evaluating
method for measuring and evaluating Genai impact. It's really important to
Genai impact. It's really important to make sure that we're not on the bad
make sure that we're not on the bad sides of those graphs that I showed you
sides of those graphs that I showed you earlier and then track and measure AI
earlier and then track and measure AI adoption and and see how that correlates
adoption and and see how that correlates to overall impact metrics and iterate on
to overall impact metrics and iterate on best practices and use cases. And here's
best practices and use cases. And here's a guide. Again, thank you so much Our
a guide. Again, thank you so much Our [applause]
closing presentation will teach us how to build an AI native company even if
to build an AI native company even if that company is 50 years old. Please
that company is 50 years old. Please join me in welcoming to the stage the
join me in welcoming to the stage the founder of Every, Dan [music] Shipper.
founder of Every, Dan [music] Shipper. >> Hello.
>> Hello. [applause]
How's it going everybody? I'm the last speaker of the day, so I'm just between
speaker of the day, so I'm just between you and dinner or drinks. So, I'm going
you and dinner or drinks. So, I'm going to try to make this fun and hopefully a
to try to make this fun and hopefully a little bit short.
So, first of all, I just want to say I'm very glad to see everybody and I'm
very glad to see everybody and I'm actually kind of surprised to see so
actually kind of surprised to see so many people here um because I've been I
many people here um because I've been I live here, but I've been traveling. I
live here, but I've been traveling. I was in Portugal uh last week and I was
was in Portugal uh last week and I was on Twitter and someone said that
on Twitter and someone said that everyone was moving to San Francisco.
everyone was moving to San Francisco. Uh but it's great to have everybody here
Uh but it's great to have everybody here instead because I love New York.
instead because I love New York. [laughter]
[laughter] Come on. Come on.
Come on. Come on. >> [applause]
>> [applause] >> Um, so I'm supposed to talk uh today
>> Um, so I'm supposed to talk uh today about uh how to build an a playbook for
about uh how to build an a playbook for how to build an AI native company. And
how to build an AI native company. And um I actually don't have one
um I actually don't have one unfortunately. Um
unfortunately. Um and that's because I think the playbook
and that's because I think the playbook is actually being invented right now. So
is actually being invented right now. So we're doing it at the company that I run
we're doing it at the company that I run every but all of you are doing it here
every but all of you are doing it here today as well. and and and so I don't
today as well. and and and so I don't want to do this talk from the
want to do this talk from the perspective of I have all the answers
perspective of I have all the answers and I'm going to tell you the framework
and I'm going to tell you the framework and the playbook and all that kind of
and the playbook and all that kind of stuff. Um but um I do think it is
stuff. Um but um I do think it is helpful when we're in this beginning
helpful when we're in this beginning stage of
stage of uh learning how to use AI to do
uh learning how to use AI to do engineering to build companies uh to
engineering to build companies uh to share like the the personal experiences
share like the the personal experiences that we're having inside of our
that we're having inside of our companies um and uh and sort of
companies um and uh and sort of collaboratively figure out the playbook
collaboratively figure out the playbook together. So I think the best that I can
together. So I think the best that I can offer is really just sort of dispatches
offer is really just sort of dispatches from the future. Uh notes on what I've
from the future. Uh notes on what I've figured out um and the work that we've
figured out um and the work that we've done inside of every um and I think the
done inside of every um and I think the the first big thing the first the first
the first big thing the first the first big thing I really noticed is that there
big thing I really noticed is that there is definitely a huge there's a 10x
is definitely a huge there's a 10x difference between an org where 90% of
difference between an org where 90% of the engineers are using AI versus an org
the engineers are using AI versus an org where 100% of the engineers are using
where 100% of the engineers are using AI. It's it's it's totally different.
AI. It's it's it's totally different. Um, I think the I think the big thing is
Um, I think the I think the big thing is if even 10% of your company is
if even 10% of your company is uh is using a more traditional
uh is using a more traditional engineering method, you you sort of have
engineering method, you you sort of have to lean all the way back over into that
to lean all the way back over into that world. Um, and so it it prevents you
world. Um, and so it it prevents you from doing some of the things that you
from doing some of the things that you might do if everyone was uh not typing
might do if everyone was uh not typing into a code editor all the time. Um,
into a code editor all the time. Um, and I know this because this is what we
and I know this because this is what we do at every um, which is the company
do at every um, which is the company that I run and it has totally
that I run and it has totally transformed what we are able to do as a
transformed what we are able to do as a small company. Um, and so I think of us
small company. Um, and so I think of us as like a little bit of a lab for what's
as like a little bit of a lab for what's possible that I I'm excited to share
possible that I I'm excited to share with you. So for people who don't know,
with you. So for people who don't know, I run every um, inside of every we have
I run every um, inside of every we have six business units. We have four
six business units. We have four software products. We run four software
software products. We run four software products with just 15 people, which is
products with just 15 people, which is kind of crazy. Um, and these software
kind of crazy. Um, and these software products are not toys. We've grown at
products are not toys. We've grown at every we've grown MR by double digits
every we've grown MR by double digits every month for the last 6 months. We
every month for the last 6 months. We have over 7,000 paying subscribers and
have over 7,000 paying subscribers and over 100,000 free subscribers. Um, and
over 100,000 free subscribers. Um, and we've done this in a very capital-like
we've done this in a very capital-like way. We've only raised about a million
way. We've only raised about a million dollars in total. Um and very
dollars in total. Um and very importantly for for this audience and
importantly for for this audience and for this discussion um 99% of our code
for this discussion um 99% of our code is written by AI agents. Uh no one is
is written by AI agents. Uh no one is handwriting code. No one is writing code
handwriting code. No one is writing code at all. Um it's all done with cloud
at all. Um it's all done with cloud code, codeex, Droid, what have you. Um
code, codeex, Droid, what have you. Um uh coding agent of your of your choice.
uh coding agent of your of your choice. Um, and also really importantly for the
Um, and also really importantly for the size of team we are, each one of our
size of team we are, each one of our apps is built by a single developer,
apps is built by a single developer, which is crazy. And these are not like
which is crazy. And these are not like uh little apps. Uh, here here's an
uh little apps. Uh, here here's an example. This is Kora, which is a um AI
example. This is Kora, which is a um AI email management app. Um, it's sort of
email management app. Um, it's sort of an it's an it is it's an assistant for
an it's an it is it's an assistant for your email. It on on the left over here,
your email. It on on the left over here, it summarizes all of your all of your
it summarizes all of your all of your emails that come in. So, you can kind of
emails that come in. So, you can kind of read your email that way. This is what
read your email that way. This is what my inbox looks like. on the right is a
my inbox looks like. on the right is a um email assistant that you can ask
um email assistant that you can ask questions like I asked where's when's my
questions like I asked where's when's my AI engineer talk um today and it gave me
AI engineer talk um today and it gave me just gave me the answer um and this is
just gave me the answer um and this is built primarily by one engineer um that
built primarily by one engineer um that he's got one or two contractors that
he's got one or two contractors that have helped in in certain ways but like
have helped in in certain ways but like almost all of this is built by one guy
almost all of this is built by one guy same thing for um
same thing for um uh this app which is another one that we
uh this app which is another one that we we make called monologue which is a
we make called monologue which is a speechto text app It's sort of like
speechto text app It's sort of like Super Whisper or Whisper Flow if you
Super Whisper or Whisper Flow if you know of those. Um, again, one guy,
know of those. Um, again, one guy, thousands of users. Um, I I love it.
thousands of users. Um, I I love it. It's a it's a it's just a beautifully
It's a it's a it's just a beautifully done app and it's not it's not simple.
done app and it's not it's not simple. It's complicated. There's a lot of stuff
It's complicated. There's a lot of stuff to it. Same thing for this app called
to it. Same thing for this app called Spiral. You can see there's it's it's
Spiral. You can see there's it's it's big. Um, and again, one engineer.
big. Um, and again, one engineer. So, obviously, this would not have been
So, obviously, this would not have been possible um a few years ago. it would
possible um a few years ago. it would not have been possible even a year ago.
not have been possible even a year ago. And I think the big change that happened
And I think the big change that happened that we're all starting to catch up to
that we're all starting to catch up to is um it started with cloud code, the
is um it started with cloud code, the sort of like terminal UI that gets rid
sort of like terminal UI that gets rid of the code editor
of the code editor really push pushed us into a place where
really push pushed us into a place where um we are delegating tasks to these
um we are delegating tasks to these agents. We are and and that allows us to
agents. We are and and that allows us to uh work in parallel and do much more
uh work in parallel and do much more than we would have ordinarily. Um,
than we would have ordinarily. Um, so some of the things that some of the
so some of the things that some of the things that I've noticed that we can do
things that I've noticed that we can do that I I assume people in this room are
that I I assume people in this room are starting to see but um [snorts] I think
starting to see but um [snorts] I think is is sort of important to put put our
is is sort of important to put put our finger on is uh the reason we can go
finger on is uh the reason we can go much faster is we can work on multiple
much faster is we can work on multiple multiple features and bugs in parallel.
multiple features and bugs in parallel. And I think that there's a um
And I think that there's a um there's like a little bit of a meme of
there's like a little bit of a meme of the vibe coder on Twitter that is oh
the vibe coder on Twitter that is oh [snorts] like they they have um they
[snorts] like they they have um they have four panes open but they're not
have four panes open but they're not actually doing any work. And I actually
actually doing any work. And I actually you can do it that way and I think there
you can do it that way and I think there are also definitely engineers and I know
are also definitely engineers and I know that they are because they work at every
that they are because they work at every that are productively using four panes
that are productively using four panes of agents at the same time. Um, and
of agents at the same time. Um, and that's that's crazy and that that
that's that's crazy and that that contributes a lot to the um ability for
contributes a lot to the um ability for a single developer to build and run a
a single developer to build and run a production application. Um, another like
production application. Um, another like really important thing about this, a
really important thing about this, a really big um unlock is because code is
really big um unlock is because code is cheap, you can prototype risky ideas and
cheap, you can prototype risky ideas and that allows you to do more experiments
that allows you to do more experiments than you would ordinarily. And that lets
than you would ordinarily. And that lets you make way more progress because the
you make way more progress because the starting energy to try something is so
starting energy to try something is so much lower because you just like say,
much lower because you just like say, "Oh, go do this. go do some research on
"Oh, go do this. go do some research on this like big refactor I might want to
this like big refactor I might want to do and then you go off and do something
do and then you go off and do something else. And that's a really big deal.
Um, and another really interesting thing that I love about this stuff that I'
that I love about this stuff that I' I've noticed in inside of inside of our
I've noticed in inside of inside of our organization is we move we're moving a
organization is we move we're moving a bit more toward a demo culture where um
bit more toward a demo culture where um instead of you know previously if you
instead of you know previously if you wanted to make something you'd have to
wanted to make something you'd have to be like maybe write a memo or do a do a
be like maybe write a memo or do a do a deck or um or you know convince a bunch
deck or um or you know convince a bunch of people that it was a good idea to
of people that it was a good idea to spend time on because you can vibe code
spend time on because you can vibe code something uh in a couple hours that sort
something uh in a couple hours that sort of shows the thing that you're uh that
of shows the thing that you're uh that you want to make. It it allows you to
you want to make. It it allows you to show everybody and uh I think that being
show everybody and uh I think that being a being a sort of de democultural allows
a being a sort of de democultural allows you to do weirder things that you only
you to do weirder things that you only get if you can feel it. Um which is I
get if you can feel it. Um which is I think really amazing
think really amazing and beyond just like sort of the basic
and beyond just like sort of the basic productivity unlocks.
productivity unlocks. um
um AI has and the way that we use it has
AI has and the way that we use it has caused us to sort of invent an entirely
caused us to sort of invent an entirely new set of engineering primitives and
new set of engineering primitives and processes which I'm sure that everybody
processes which I'm sure that everybody in this room is starting to do already.
in this room is starting to do already. I think everyone is sort of approaching
I think everyone is sort of approaching the same things from different angles
the same things from different angles and a lot of them definitely do echo
and a lot of them definitely do echo engineering processes from the past but
engineering processes from the past but I think it's really helpful to try to
I think it's really helpful to try to put our finger on okay what is the new
put our finger on okay what is the new way of programming if we're moving up a
way of programming if we're moving up a level of the stack and and we're moving
level of the stack and and we're moving from you know Python and JavaScript and
from you know Python and JavaScript and scripting languages up into um up into
scripting languages up into um up into English and the uh the the name that
English and the uh the the name that we've given to this process is
we've given to this process is compounding engineering
compounding engineering Um, and the way that I talk about
Um, and the way that I talk about compounding engineering is in
compounding engineering is in traditional engineering, each feature
traditional engineering, each feature makes the next feature harder to build.
makes the next feature harder to build. In compounding engineering, your goal is
In compounding engineering, your goal is to make sure that each feature makes the
to make sure that each feature makes the next feature easier to build. Um, and we
next feature easier to build. Um, and we do that in this loop.
do that in this loop. Um, the loop has four steps. The first
Um, the loop has four steps. The first one is plan. And if you're you've been
one is plan. And if you're you've been here today, you've been paying
here today, you've been paying attention, you know how important it is
attention, you know how important it is when you're working with agents to make
when you're working with agents to make a really really detailed plan. So I
a really really detailed plan. So I think everyone is doing that. Second
think everyone is doing that. Second step is delegate. Just like go tell the
step is delegate. Just like go tell the agent to do it. Everyone's doing that
agent to do it. Everyone's doing that too. Third step is assess. And we have
too. Third step is assess. And we have tons and tons of ways to um assess
tons and tons of ways to um assess whether the work that the agent did is
whether the work that the agent did is any good. There's tests, there's trying
any good. There's tests, there's trying it, there's having the agent uh figure
it, there's having the agent uh figure it out. There's there's code review,
it out. There's there's code review, there's agent code review, there's all
there's agent code review, there's all this types of stuff. And then the last
this types of stuff. And then the last step which is I think the most
step which is I think the most interesting one is codify. And this is
interesting one is codify. And this is kind of like the the money step which is
kind of like the the money step which is where you compound
where you compound everything that you've learned from the
everything that you've learned from the planning stage, the delegation stage,
planning stage, the delegation stage, the assessment stage back into prompts
the assessment stage back into prompts that go into your, you know, your cloud
that go into your, you know, your cloud MD file or your um your sub aents or
MD file or your um your sub aents or your slash commands and you start to um
your slash commands and you start to um basically create this library. You take
basically create this library. You take all the tacet knowledge that you pick up
all the tacet knowledge that you pick up um that all your engineers are picking
um that all your engineers are picking up um as they find bugs, fix plans, um
up um as they find bugs, fix plans, um delegate work, and you um you make it
delegate work, and you um you make it into an explicit collection of prompts
into an explicit collection of prompts that you can spread for your entire
that you can spread for your entire organization.
organization. And um when you do that really well,
And um when you do that really well, there's a lot of like really interesting
there's a lot of like really interesting um second order effects that are are not
um second order effects that are are not I think that well understood or or that
I think that well understood or or that commonly talked about that I think would
commonly talked about that I think would be interesting to to bring here because
be interesting to to bring here because my guess is that um some people are
my guess is that um some people are already seeing this, but like maybe it
already seeing this, but like maybe it needs to be pushed on a little bit more
needs to be pushed on a little bit more to like really be brought out and some
to like really be brought out and some people uh it might be an interesting way
people uh it might be an interesting way to get more of your organization to buy
to get more of your organization to buy into using these tools. tools 100% of
into using these tools. tools 100% of the time.
the time. Um so the first thing that you notice if
Um so the first thing that you notice if you sort of if you set up this process
you sort of if you set up this process and you and you're like 100% bought in
and you and you're like 100% bought in on something like compounding
on something like compounding engineering um is that tacet code
engineering um is that tacet code sharing
sharing becomes much easier. So uh we have we
becomes much easier. So uh we have we have multiple products at every a lot of
have multiple products at every a lot of a lot of products a lot of times need to
a lot of products a lot of times need to implement similar things even if they
implement similar things even if they use different technologies or imple
use different technologies or imple implementing similar things like a
implementing similar things like a team's feature or a certain type of ooth
team's feature or a certain type of ooth or whatever. Um
or whatever. Um previously in order to share code you'd
previously in order to share code you'd have to like abstract out whatever you
have to like abstract out whatever you did into a library and then like allow
did into a library and then like allow someone else to download and it it'd be
someone else to download and it it'd be hard to do or you'd have to talk about
hard to do or you'd have to talk about it.
it. With agents, um you can just point your
With agents, um you can just point your Cloud Code instance at um the repo from
Cloud Code instance at um the repo from the developer sitting next to you and
the developer sitting next to you and learn the process that they went through
learn the process that they went through to build the feature that they that you
to build the feature that they that you need to reimplement and reimplement it
need to reimplement and reimplement it yourself in your own tech stack in your
yourself in your own tech stack in your own framework and in your own way. Um,
own framework and in your own way. Um, and that's really really cool to kind of
and that's really really cool to kind of have this the more developers you have
have this the more developers you have working on different things inside of
working on different things inside of the org, the more you can um share
the org, the more you can um share without any extra cost because AI can
without any extra cost because AI can just go read all the code and and um and
just go read all the code and and um and use it. Um, another really cool thing
use it. Um, another really cool thing that I've noticed is that new hires are
that I've noticed is that new hires are productive on their first day because
productive on their first day because you've taken all of the things that
you've taken all of the things that you've learned about like, okay, how do
you've learned about like, okay, how do I set up an environment and what does a
I set up an environment and what does a good commit look like and all this kind
good commit look like and all this kind of stuff and on the first day they have
of stuff and on the first day they have all that set up in their in in their,
all that set up in their in in their, you know, cloud MD files or their cursor
you know, cloud MD files or their cursor files or uh codeex files or whatever and
files or uh codeex files or whatever and um the agent just sets up their local
um the agent just sets up their local environment and knows how write a good
environment and knows how write a good PR. That's really cool. It also helps if
PR. That's really cool. It also helps if you um want to hire like expert
you um want to hire like expert freelancers. Like there's some there's
freelancers. Like there's some there's one guy there's one person who just is
one guy there's one person who just is really good at this one specific thing.
really good at this one specific thing. You can have them come in for a day and
You can have them come in for a day and like do that thing. It's I think of it a
like do that thing. It's I think of it a little bit like um like a DJ or whatever
little bit like um like a DJ or whatever can like go in on like a couple bars of
can like go in on like a couple bars of a song. Like you can just sort of drop
a song. Like you can just sort of drop in and that's really helpful. it's it
in and that's really helpful. it's it would ordinarily be like too hard to
would ordinarily be like too hard to collaborate because the the startup cost
collaborate because the the startup cost is too high, but you can do that a lot
is too high, but you can do that a lot better now.
better now. Um, another thing that I've noticed
Um, another thing that I've noticed which is really cool too is um
which is really cool too is um developers inside of every commit to um
developers inside of every commit to um other products. So, uh you know, we have
other products. So, uh you know, we have four products that run internally.
four products that run internally. Everybody uses all the products. If
Everybody uses all the products. If someone uh runs into a bug or a paper
someone uh runs into a bug or a paper cutter, like a little minor quality of
cutter, like a little minor quality of life thing that they want, they will um
life thing that they want, they will um often just um they will often just uh
often just um they will often just uh just submit a poll request for it to
just submit a poll request for it to other GM of the app um because it's very
other GM of the app um because it's very easy for them to go download the repo
easy for them to go download the repo and figure out uh or have really have
and figure out uh or have really have Claude or Codex figure out, okay, this
Claude or Codex figure out, okay, this is how we fix the bug or this is how we
is how we fix the bug or this is how we fix the paper cut. Um and that's really
fix the paper cut. Um and that's really really cool because you have this
really cool because you have this much um much easier way of collaborating
much um much easier way of collaborating across apps that I I think over the next
across apps that I I think over the next couple years. I imagine that you will
couple years. I imagine that you will also be able to let customers do this to
also be able to let customers do this to some extent. Like if you run into a bug,
some extent. Like if you run into a bug, um this is, you know, speculative, but
um this is, you know, speculative, but if you run into a bug, you can have your
if you run into a bug, you can have your little agent fix it um and submit it as
little agent fix it um and submit it as pull request. It's a weird open source
pull request. It's a weird open source thing, but um yeah, this is really
thing, but um yeah, this is really really cool and and definitely is
really cool and and definitely is happening a lot inside of our company.
happening a lot inside of our company. Um,
Um, another really cool thing is um we we
another really cool thing is um we we have not this may get different as we as
have not this may get different as we as we scale but um we have not yet had to
we scale but um we have not yet had to standardize onto a particular stack or
standardize onto a particular stack or language. We instead let everyone who's
language. We instead let everyone who's building different products like pick
building different products like pick the thing that they like best and the
the thing that they like best and the reason is because it makes it AI makes
reason is because it makes it AI makes it much easier to translate between
it much easier to translate between them. Um and it makes it much easier to
them. Um and it makes it much easier to to jump into any language and framework
to jump into any language and framework and environment and be productive. And
and environment and be productive. And so it we don't uh it's easier for us to
so it we don't uh it's easier for us to let people just do the thing that that
let people just do the thing that that they like and let AI kind of like handle
they like and let AI kind of like handle the translation in between.
the translation in between. Um and the last thing which is my
Um and the last thing which is my favorite but like is also the horror I
favorite but like is also the horror I think of of some developers and to some
think of of some developers and to some degree maybe the horror of my team um is
degree maybe the horror of my team um is that managers can commit code. um if
that managers can commit code. um if you're technical uh even the CEO and
you're technical uh even the CEO and um for for me like I have no business
um for for me like I have no business committing code because we've got four
committing code because we've got four products we've got 15 people we're
products we've got 15 people we're growing really fast um I'm doing tons
growing really fast um I'm doing tons and tons of other things but I can and I
and tons of other things but I can and I I have like committed production code
I have like committed production code over the last couple months and the
over the last couple months and the reason for that is AI allows um
reason for that is AI allows um engineers to work with fractured
engineers to work with fractured attention so previously you might have
attention so previously you might have needed like a 3 or 4 hour block of focus
needed like a 3 or 4 hour block of focus time in order to like get anything done.
time in order to like get anything done. Um, but with cloud code, you can kind of
Um, but with cloud code, you can kind of like get out of meeting and say, "Hey,
like get out of meeting and say, "Hey, like I want you to investigate this
like I want you to investigate this bug." And then go do something else and
bug." And then go do something else and then come back and you have like a a
then come back and you have like a a plan or like a um root cause fix and
plan or like a um root cause fix and then you can submit a PR. And it's not
then you can submit a PR. And it's not easy. It's not magic, but it is actually
easy. It's not magic, but it is actually possible. And I think that's a that's
possible. And I think that's a that's just a totally new way of thinking how
just a totally new way of thinking how thinking of thinking about how managers
thinking of thinking about how managers interact with the products that they
interact with the products that they make.
So, um, just to just to summarize, um, there's a I really think there's a 10x
there's a I really think there's a 10x difference in how things work when you
difference in how things work when you hit 100% AI adoption. I think, um, from
hit 100% AI adoption. I think, um, from what we've seen, a single engineer
what we've seen, a single engineer should be able to build and maintain a
should be able to build and maintain a complex production product. what we call
complex production product. what we call compounding engineering, but I think
compounding engineering, but I think what all of us are are sort of pointing
what all of us are are sort of pointing to um is I I think really works to make
to um is I I think really works to make each feature easier to build and then
each feature easier to build and then creates all of these sort of nonobvious
creates all of these sort of nonobvious second order effects that makes it
second order effects that makes it easier for the entire organization to
easier for the entire organization to collaborate together.
collaborate together. And very importantly, many people in San
And very importantly, many people in San Francisco don't know this yet. Um so
Francisco don't know this yet. Um so you're you're the first to hear it. Um
you're you're the first to hear it. Um so that is my talk. So, if you're
so that is my talk. So, if you're interested in um in what we do, uh I run
interested in um in what we do, uh I run every uh Every is the only subscription
every uh Every is the only subscription you need to stay at the edge of AI. You
you need to stay at the edge of AI. You can find us at every.to. Um we uh we
can find us at every.to. Um we uh we have a daily newsletter about AI. So, we
have a daily newsletter about AI. So, we do ideas, apps, and training. We have a
do ideas, apps, and training. We have a on the ideas side, we have a daily
on the ideas side, we have a daily newsletter. We review all the new models
newsletter. We review all the new models when they come out and all the new
when they come out and all the new products when they come out. The apps,
products when they come out. The apps, you already saw, we've a bundle of all
you already saw, we've a bundle of all these apps and then we do training and
these apps and then we do training and consulting with big companies to help
consulting with big companies to help them use AI and it's all bundled into
them use AI and it's all bundled into one subscription. So you get everything
one subscription. So you get everything for one price and that's it. Thank you
for one price and that's it. Thank you very much. [applause]
very much. [applause] [music]
[music] >> Ladies and gentlemen, please welcome
>> Ladies and gentlemen, please welcome back to the stage Alex Lieberman.
Okay, 8 hours in. We did it. Um I have some housekeeping. We have to finish the
some housekeeping. We have to finish the day with housekeeping. First of all, I
day with housekeeping. First of all, I want to thank you all. It has been
want to thank you all. It has been phenomenal to be on this journey with
phenomenal to be on this journey with you all. But uh let's give a a shout out
you all. But uh let's give a a shout out just to you all for being here, going
just to you all for being here, going through a full day listening to the
through a full day listening to the programming. So, round of applause for
programming. So, round of applause for everyone in the crowd, everyone
everyone in the crowd, everyone [applause] online who's been watching.
[applause] online who's been watching. Let's also keep it going for all the
Let's also keep it going for all the team in production behind the scenes
team in production behind the scenes making this possible. I watched them
making this possible. I watched them work tirelessly throughout the day to
work tirelessly throughout the day to make this happen. And then finally,
make this happen. And then finally, let's give a huge shout out to Swix and
let's give a huge shout out to Swix and Ben who made this whole thing happen.
Ben who made this whole thing happen. >> [applause]
>> So get comfortable for a second. I have some housekeeping. Make sure everyone
some housekeeping. Make sure everyone knows where to go. And then we have one
knows where to go. And then we have one final speaker who's going to chat uh
final speaker who's going to chat uh right after I hop off stage. So let's
right after I hop off stage. So let's just dive in for a sec. Uh tomorrow is
just dive in for a sec. Uh tomorrow is the engineering session day. I will not
the engineering session day. I will not be your MC. So you will be taken care of
be your MC. So you will be taken care of by Jed who works at Google. I spent the
by Jed who works at Google. I spent the day with Jed. He is incredible. He's
day with Jed. He is incredible. He's just like a taller, better looking
just like a taller, better looking version of me. and he's actually an
version of me. and he's actually an engineer. So, you get a true engineer
engineer. So, you get a true engineer tomorrow. Uh, if you have a bundle pass,
tomorrow. Uh, if you have a bundle pass, your ticket includes tomorrow's track.
your ticket includes tomorrow's track. So, we'll see you tomorrow at 8:00 a.m.
So, we'll see you tomorrow at 8:00 a.m. here. If you have the leadership pass
here. If you have the leadership pass only, your ticket does not include
only, your ticket does not include access to the sessions or the venue
access to the sessions or the venue tomorrow. However, we have organized an
tomorrow. However, we have organized an off-site brunch for you on us at a
off-site brunch for you on us at a restaurant not far from here. So, check
restaurant not far from here. So, check your calendar for the invite and the
your calendar for the invite and the location. But right now we are headed
location. But right now we are headed into the afterparty. And not only is
into the afterparty. And not only is there an afterparty, but there are after
there an afterparty, but there are after afterparties. There's a lot of side
afterparties. There's a lot of side events. So your entire night is planned
events. So your entire night is planned for you. And we have Graphite to thank
for you. And we have Graphite to thank for sponsoring the afterparty. So here
for sponsoring the afterparty. So here to give us the last word for a brief
to give us the last word for a brief message is the co-founder and CEO of
message is the co-founder and CEO of Graphite, Mel Lutzky.
[applause] >> [music]
>> Good evening everyone. My name is Mel Lutzky and I'm the co-founder and CEO of
Lutzky and I'm the co-founder and CEO of Graphite. Uh we're the AI powered code
Graphite. Uh we're the AI powered code review platform for this new age of
review platform for this new age of agentic software development. Now I know
agentic software development. Now I know you guys heard a lot today about agents
you guys heard a lot today about agents and how to make them as effective as
and how to make them as effective as possible in generating code and building
possible in generating code and building features faster than ever. And they're
features faster than ever. And they're incredible at this. But I think
incredible at this. But I think everybody who's who's built software in
everybody who's who's built software in a professional environment knows that
a professional environment knows that writing the code is only the first part
writing the code is only the first part of the story. Every code change then
of the story. Every code change then needs to be tested. It needs to be
needs to be tested. It needs to be reviewed, merged, deployed. And
reviewed, merged, deployed. And oftentimes that second half of the
oftentimes that second half of the process takes just as long if not longer
process takes just as long if not longer than actually generating the code. And
than actually generating the code. And that's what we do with graphite. We're
that's what we do with graphite. We're applying AI to the entire development
applying AI to the entire development process and making code review as
process and making code review as quickly as as quick as possible. Uh we
quickly as as quick as possible. Uh we have an agent that's integrated fully
have an agent that's integrated fully into our pull request page. Um it's like
into our pull request page. Um it's like reviewing code in 2025 and not you it
reviewing code in 2025 and not you it doesn't feel like 2015 anymore. U that's
doesn't feel like 2015 anymore. U that's what we build. Um we're super excited
what we build. Um we're super excited about it. Uh if you want to come check
about it. Uh if you want to come check it out, we have our booth uh in the expo
it out, we have our booth uh in the expo hall and also we're going to be around
hall and also we're going to be around all day tomorrow. We're the official
all day tomorrow. We're the official sponsors of tonight's afterparty and
sponsors of tonight's afterparty and also tomorrow's event at public records.
also tomorrow's event at public records. So we wanted you all you guys who came
So we wanted you all you guys who came from out of town out of town, we wanted
from out of town out of town, we wanted to show you good time in New York. Uh,
to show you good time in New York. Uh, so we have two events for you, uh, to
so we have two events for you, uh, to make sure that you have you have a good
make sure that you have you have a good time and, uh, see what New York is all
time and, uh, see what New York is all about. Uh, want to give a big shout out
about. Uh, want to give a big shout out to Swix and Ben and the whole AIG team
to Swix and Ben and the whole AIG team for organizing and we're excited to see
for organizing and we're excited to see you guys all at the party tonight. Thank
you guys all at the party tonight. Thank you very much.
you very much. [applause]
taking place in the halls on both doors. Expo
>> [music] >> Heat. Heat.
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.