คำบรรยาย YouTube:
AIE CODE 2025: AI Leadership ft Anthropic, OpenAI, McKinsey, Bloomberg, Google Deepmind, and Tenex

ไม่ต้องนั่งดูวิดีโอทั้งหมด — รับคำบรรยายฉบับเต็ม ค้นหาคีย์เวิร์ด และคัดลอกด้วยคลิกเดียว

แชร์:

AutoDub

เข้าใจวิดีโอ YouTube ภาษาต่างประเทศ

พากย์ YouTube เป็นภาษาไทยแบบอิมเมอร์ซีฟ

ทลายกำแพงภาษา เปิดรับเนื้อหาคุณภาพจากทั่วโลก

ใช้ฟรี

คำบรรยายวิดีโอ

สรุปวิดีโอ

Summary

Core Theme

The AI Engineer Code Summit 2025 explored the transformative impact of AI on software development, focusing on how AI agents are reshaping coding practices, developer workflows, and organizational structures to drive efficiency, innovation, and new business models.

Mind Map

คลิกเพื่อขยาย

คลิกเพื่อสำรวจ Mind Map แบบอินเตอร์แอคทีฟฉบับเต็ม

[music]

Typing thoughts into [music] the darkest

part becomes design. Words evolve

[music] to whispers meant for something

more divine. Syntax bends and breeze. I

see the language change. I'm not

instructing anymore. I'm rearranging

faith. Every loop I write [singing]

rewrites me. Every function hums with

meaning. I feel the interface dissolve

new code. Not on the screen but in the

soul where [music] thought becomes the

motion and creation takes control. No

lines no rules just balance [music] in

between the zero and the one. The

>> [music]

>> systems shape our fragile skin. They

mold [singing] the way we move. We live

inside the logic gates [music] of what

we think is true. But deep beneath the

data post, [music] there's something undefined.

undefined.

A [singing] universe compiling the image

of our [music] minds. Every line reveals

reflection. Every loop replace [music]

connection. We're not building, we're

becoming. And the code becomes confession.

This is the [music] new code. Not on the

screen, but in the soul with thought

becomes the motion. [music] Creation

takes control. No lines, no rules, just

balance in between the [music] zero and

the one. The silence and the dream. [music]

[music]

Don't worry. [music] Uh, we're just

giving you something to do while Codex

[music] Each prompt, each breath, each

fragile spin, a universe [music] renewing.

This is the new code.

Alive and [music] undefined.

Where logic meets emotion and structure

bends to mind. [music] The system hs

eternal but the soul writes the line. We

are the new code.

I'm fired inside. [music]

[applause]

Ladies and gentlemen, please join me in

welcoming to the stage the co-founder [music]

[music]

of Morning Brew and the managing partner

of 10X, your host for the leadership

[music] track session day, Alex Lieberman.

Keep it going. Let's get a quick read of

the room. If you are coming from right

here in the Big Apple from New York,

make some noise.

Okay, now I have to say it. I assume

this is the biggest group. San Francisco.

Francisco.

>> Wow, that is surprising. Uh, Austin.

>> Okay, we got Austin. Who thinks they

came from the furthest place and is in

the room today?

>> Where? Where?

>> Ecuador. Can anyone beat Ecuador? [applause]

[applause]

>> New Zealand.

>> I don't think anyone's going to beat New

Zealand. There we go. Well, first of

all, uh, I am so excited to welcome you

all to the AI Engineer Code Summit 2025.

Uh, I'm Alex Lieberman, co-founder of

Morning Brew and your MC for the day.

Um, now you may be wondering, why is a

newsletter guy hosting an AI engineer

conference? It's a great question. Well,

after I left my role at Morning Brew, I

asked myself one simple question, and it

was, what space do I want to spend my

time in for the next 20 years where I

can build something consequential and

spend my time with some of the smartest

people I've ever met? And the answer

became obvious. I wanted to be as close

to the frontier of AI as humanly

possible. Which is why I co-founded

10x.co, Co. which is an AI

transformation firm helping mid-market

and enterprise companies learn how to

use AI within their business. And I

spend basically all of my time now with

AI engineers like yourselves. I'm the

only non-technical person in the

business and I wouldn't have it any

other way. So as you know this year has

been a banner year for the industry. And

I would think of today as both a look

back on where we've been as well as a

tactical view of where we are headed in

companies small and large, old and new.

We're going to hear from the labs. We'll

hear from Unicorn AI startups. We'll

hear from academics, big-time management

consultants, and Fortune50 brands. But

before we do that, we have to give the

brands that made this day possible their

flowers. So, let's go into it. Let's

give it up for Google DeepMind. today's

presenting sponsor. [applause]

Love it. Keep it going for Anthropic,

the platinum sponsor for the day. [applause]

[applause]

And then one more round of applause for

all of the gold and silver sponsors who

you can meet in the expo downstairs

throughout the day. One more. Let's keep

Are you guys ready to do the damn thing?

>> Let's do it. To kick things off, let's

give a huge welcome to head of

engineering of the Clawed developer

platform, Caitlyn Les. Let's welcome

Good morning. Um, so first let's give a

huge thank you to Swix and the whole AI

engineer organizing team for bringing us

I'm Caitlyn and I lead the claw

developer platform team at Anthropic.

Um, so let's start with a show of hands.

Who here is integrated against an LLM

API to build agents?

Okay, I'm talking to the right people.

Love it. Um, so today I want to share

how we're evolving our platform to help

you build really powerful agentic

systems using claude.

So we love working with developers who

do what we call raising the ceiling of

intelligence. They're always trying to

be on the frontier. They're always

trying to get the best out of our models

and build the most high performing

systems. Um, and so I want to walk you

through how we're building a platform

that helps you get the best out of

cloud. Um, and I'm going to do that

using a product that you hopefully have

all heard of before. Um it's an agenda

coding product. We love it a lot and

So when we think about maximizing

performance um from our models, we think

about building a platform that helps you

do three things. Um so first the

platform helps you harness Claude's

capabilities. We're training Claude to

get good at a lot of stuff and we need

to give you the tools in our API to use

the things that Claude is actually

getting good at. Next, we help you

manage Claude's context window. Keeping

the right context in the window at any

given time is really really critical to

getting the best outcomes from Claude.

And third, we're really excited about

this lately. We think you should just

give Claude a computer and let it do its

thing. So I'll talk about how we're

we're evolving the platform to give you

the infrastructure and otherwise that

So starting with harnessing Claude's

capabilities. Um, so we're getting

Claude really good at a bunch of stuff

and here are the ways that we expose

that to you um in our API as ideally

customizable features. So here's a first

example um relatively basic. Claude got

good at thinking um and Claude's

performance on various tasks um scales

with the amount of time you give it to

reason through those problems. Um, and

so, uh, we expose this to you as an API

feature that you can decide, do you want

Claude to think longer for something

more complex or do you want Claude to

just give you a quick answer. Um, we

also expose this with a budget. Um, so

you can tell Claude how many tokens to

essentially spend on thinking. Um, and

so for cloud code, um, pretty good

example. Obviously, you're often

debugging pretty complex systems with

cloud code or sometimes you just want a

quick, um, answer to the thing you're

trying to do. And so, um, Claude Code

takes advantage of this feature in our

API to decide whether or not to have

Claude think longer.

Another basic example is tool use.

Claude has gotten really good at

reliably calling tools. Um, so we expose

this in our API with both our own

built-in tools like our web search tool,

um, as well as the ability to create

your own custom tools. You just define a

name, a description, and an input

schema. Um, and Claude is pretty good at

reliably knowing when to actually go um,

and call those tools and pass the right

arguments. So, this is relevant for

cloud code. Cloud code has many, many,

many tools and it's calling them all the

time to do things like read files,

search for files, write to files, um,

and do stuff like rerun tests and otherwise.

otherwise.

So, the next way we're evolving the

platform to help you ma maximize

intelligence from claude um, is helping

you manage Claude's context window.

Getting the right context at the right

time in the window is one of the most

important things that you can do to

maximize performance.

But context management is really complex

to get right. Um especially for a coding

agent like claude code. You've got your

technical designs, you've got your

entire code base. Um you've got

instructions, you've got tool calls. All

these things might be in the window at

any given time. And so how do you make

sure the right set of those things are

in the window? Um, so getting that

context right and keeping it optimized

over time is something that we've

thought a lot about.

So let's start with MCP model context

protocol. We introduced this a year ago

and it's been really cool to see the

community swarm around adopting um MCP

as a standardized way for agents to

interact with external systems. Um, and

so for cloud code, you might imagine

GitHub or Sentry. there are plenty of

places kind of outside of the agent's

context where there might be additional

information or tools or otherwise that

you want your agent to be able to

interact with or the cloud code agent to

be able to interact with. Um, and so

this will obviously get you much better

performance than an agent that only sees

the things that are in its window as a

Uh, so the next thing is memory. So, if

you can use tools like MCP to get

context into your window, we introduced

a memory tool to help you actually keep

context outside of the window that

Claude knows how to pull back into the

window only when it actually needs it.

Um, and so we introduced the first

iteration of our memory tool as

essentially a clientside file system.

So, you control your data, but Claude is

good at knowing, oh, this is like a good

thing that I should store away for

later. And then, uh, it knows when to

pull that context back in. So for cloud

code, you could imagine um your patterns

for your codebase or maybe your

preferences for your git workflows.

These are all things that claude can

store away in memory and pull back in

And so the third thing is context

editing. If memory helps you keep stuff

outside the window and pull it back in

when it makes sense, context editing

helps you clear stuff out that's not

relevant right now and shouldn't be in

the window. Um, so our first iteration

of our context editing is just clearing

out old tool results. Um, and we did

this because tool results can actually

just be really large and take up a lot

of space in the window. And we found

that tool results from past calls are

not necessarily super relevant to help

claude get good responses later on in a

session. And so you can think about for

cloud code, cla code is calling hundreds

of tools. Um, those files that it read

otherwise, all these things are taking

up space within the window. Um so they

take advantage of um context management

And so um we found that if we combined

our memory tool with context editing, we

saw a 39% bump in performance over over

the benchmark on our own internal evals.

Um which was really really huge. And so

it just kind of shows you the importance

of keeping things in the window that are

only relevant at any given time. And

we're expanding on this by giving you

larger context windows. So for some of

our models, you can have a million token

context window. Combining that larger

window with the tools to actually edit

what's in your window maximizes your

performance. Um, and over time, we're

teaching Claude to get better and better

at actually understanding what's in its

context window. So maybe it has a lot of

room to run, maybe it's almost out of

space. Um, and Claude will respond

accordingly depending on how much time

uh or how much room it has left in the window.

So, here's the third thing. Um, we think

you should give Claude a computer and

just let it do its thing. We're really

excited about this one. Um, because

there's a lot of discourse right now

around agent harnesses. Um, you know,

how much scaffolding should you have?

How opinionated should it be? Should it

be heavy? Should it be light? Um, and I

think at the end of the day, Claude has

access to writing code. And if Claude

has access to running that same code, it

can accomplish anything. you can get

really great professional outputs for

the things that you're doing just by

giving Claude runway to go and do that.

But the challenge for letting you do

that is actually the infrastructure as

well as stuff like expertise like how do

you give claude access to things that um

when it's using a computer it will get

you better results.

So a fun story is we recently launched

cloud code on web and mobile. Um and

this was a fun project for our team

because we had a lot of problems to

solve. When you're running cloud code

locally, cloud code is essentially using

your machine as its computer. But if

you're starting a session on the web or

on mobile and then you're walking away,

what's happening? Like where is that

where is um cloud code running? Where is

it doing its work? Um and so we had some

hard problems to solve. We needed a

secure environment for claude to be able

to write and run code that's not

necessarily like approved code by you.

Um we needed to solve or container

orchestration at scale. Um and we needed

session persistence um because uh we

launched this and many of you were

excited about it and started many many

sessions and walked away and we had to

make sure that um all of these things

were ready to go when you came back and

um wanted to see the results of what

Claude did.

So one key primitive in this is our code

execution tool. Um so we released our

code execution tool in the API um which

allows claw to run write code and run

that code in a secure sandboxed

environment. Um, so our platform handles

containers, it handles security, and you

don't have to think about these things

because they're running on our servers.

Um, so you can imagine deciding that um,

you you want Claude to write some code

and you want Claude to go and be able to

run that code. And for cloud code,

there's plenty of examples here. Um,

like make an animation more sparkly that

uh, you want Claude to actually be able

to run that code. Um, so we really think

the future of agents is letting the

model work pretty autonomously within a

sandbox environment and we're giving you

the infrastructure to be able to do that.

that.

And this gets really powerful once you

think about giving the model actual

domain expertise in the things that

you're trying to do. So we recently

released agent skills which you can use

in combination with our code execution

tool. Skills are basically just folders

of scripts, instructions, and resources

that Claude has access to and can decide

to run within its sandbox environment.

Um, it decides to do that based on the

request that you gave it as well as the

description of a skill. Um, and Claude

is really good at knowing like this is

the right time to pull this skill into

context and go ahead and use it. And you

can combine skills with tools like MCP.

So MCP gives you access to tools and

access to context. Um, and then skills

give you the expertise to actually make

use of those tools and make use of that

context. Um, and so for cloud code, a

good example is web design. Maybe

whenever you launch a new product or a

new feature, um, you build landing

pages. And when you build those landing

pages, you want them to follow your

design system and you want them to

follow the patterns that you've set out.

Um, and so Claude will know, okay, I'm

being told to build a landing page. This

is a good time to pull in the web design

skill. um and use the right patterns and

and design system for that landing page.

Uh tomorrow Barry and Mahes from our

team are giving a talk on skills.

They'll go much deeper and I definitely

recommend checking that out.

So these are the ways that we're

evolving our platform um to help you

take advantage of everything that Claude

can do to get the absolute best

performance for the things that you're

building. First, harnessing Claude's

capabilities. So, as our research team

trains Claude, we give you the API

features to take advantage of those

things. Next, managing Claude's context.

It's really, really important to keep

your context window clean with the right

context at the right time. And third,

giving Claude a computer and just

So, we're going to keep evolving our

platform. Um, as Claude gets better and

has more capabilities and gets better at

the capabilities it already has, we'll

continue to evolve the API around that

so that you can stay on the frontier and

take advantage of the best that Claude

has to offer. Um, second, as uh, memory

and context evolve, we're going to up

the ante on the tools that we give you

in order to let Claude decide what to

pull in, what to store away for later,

and what to clean out of the context

window. And third, we're really going to

keep leaning into agent infrastructure.

Some of the biggest problems with the

idea of just let Claude have a computer

and do its thing are those problems that

I talked about around orchestration,

secure environments, and sandboxing. And

so we're going to keep working um to

make sure that those are um ready for

you to take advantage of.

Um and I'm hiring. We're hiring at

Anthropic. We're really growing our

team. Um, and so if you're someone who

loves um, building delightful developer

products um, and if you're excited about

what we're doing with Claude, we would

love to work with you across end product

design um, Devril, lots of functions. So

please reach out to us

Our next [music] presenter is the

president and head of AI at Replet. He's

here to speak about building the future

of coding. Please join me in welcoming

All right, good morning everyone. So at

Replet we're building a coding agent for

nontechnical users. It's a very peculiar

challenge I would say compared to many

people in this room. And what I'm going

to talk about today is why autonomy has

become kind of the northstar that we

keep chasing you know since we launched

the very first version of rapid agent

September last year.

Let's start from this very interesting

plot in case my clicker worked which now

does. Um I'm sure you all have seen it.

you know the semiacing value of that

published by Zwix a few weeks ago and it

kind of clarified a bit the landscape

you know for all of us uh agent builders

on one hand you have the low latency

interactions that really allow you to

stay in the loop you know so you can do

deep work and focus really on the on the

coding task at hand but you need to be

an expert you need to know exactly what

to prom the model for and you need to

understand quickly if you want to accept

the changes or not then for several

months many of us including rapid We

kind of live in this I think value that

where the agent wasn't autonomous enough

to really delegate a task and come back

and see it accomplished but at the same

time it run long enough not to keep in

the zone not to keep in the loop likely

over time we managed to go all the way

on the right and now we have agents that

runs for several hours in a row. What

I'm going to be arguing with today and

hope is not going to stop inviting me to

this event is the fact that there is an

additional dimension like a third

dimension to this plot that you know it

hasn't been covered here and namely the

fact is how do we build autonomous

agents for nontechnical users.

So what I'm going to be arguing today is

that there are two types of autonomy.

One of it is more supervised. So think

of the you know Tesla FSD example. When

you sit in a Tesla, you're still

expected to have a driving license.

You're going to be sitting in front of

the steering wheel. Perhaps 99% of the

time, you're not going to use it, but

you're there in order to take care of

the longtail events. And similarly, a

lot of the coding agents that we have

today require you to be technically

savvy in order to use them correctly.

We at Reply and uh other companies at

this point are focusing on kind of the

Whimo experience for autonomous coding

agents. So you're expected to sit in the

back. You don't even have access to the

steering wheel. And I expect you

basically not to need any driving

license. Uh why is this important?

Because we want to empower every

knowledge worker to create software. And

I can't expect knowledge workers to know

what kind of technical decisions an

agent should be making. We should

offload completely the level of

complexity away from them.

Of course, it took a while to get here.

So I'm I'm sure what I'm showing you

here is something that all of you are

very familiar with. It took several

years to go from I know maybe less than

a minute feedback loop constant

supervision and talking about

completions and talking about

assistance. These are areas where the AI

power is and really been pioneering this

this type of user interaction. Then we

slowly climbed through you know higher

levels of autonomy. So we had the first

version of the agents based on on react.

So we concocted autonomy with a very

simple paradigm on top of LMS. Then

likely AI providers understood that tool

calling was extremely important poured a

lot of effort on that. So we built the

next version of agents with native tool

calling. And then I would say there is a

third generation of agents which I call

autonomous and that's when we started to

break the barrier of say one hour of

autonomy. Basically the the agent being

capable of running on long horizon tasks

and remaining coherent. It happens to be

the case that those are also the

versions of rapid agent that we launched

over the last year. So the B3 is the one

that we launched a couple of months ago

and it has exactly showcases those

properties. So the question for today is

can we actually build fully autonomous

agents and how do we get there.

So I'm going to try to redefine the

definition of autonomy today. I think

that often times we conflate autonomy

with a concept of something in the lungs

for a for a lot of time and usually as a

user you lose control. In reality what

the autonomy that I want to give to

agents can be very specifically scoped

and what I mean by that is especially

with rapid agent tree what we accomplish

is we we make sure that our agent takes

holy technical decisions. Of course,

that could lead to very long gap between

the different user interactions and in

case the agent again runs for several

hours. But this happens if and only if

the scope of the task you're giving to

the agent is really broad. And it turns

out that in reality you can have an

agent that is really autonomous and is

still fast as long as you give it a very

narrow scope for the task, you know, at

hand. So what we can accomplish in this

way is that the user still maintains

control on the aspects that they care

about and a user cares about what

they're building. Especially again our

users, knowledge workers, they don't

care about how something has been built.

They just want to see their goals to be

accomplished. So autonomy should not be

basically conflated with long run times.

And similarly, it shouldn't become a

dity metric. You know, a lot of us are

talking about it as a as a badge of

honor. And it's definitely been exciting

to see in the last few months that you

know many of us broke the barrier of uh

running several hours in a row. But I

think in terms of how to build agents

that are going to be more powerful and

more suitable in the future, we kind of

have to change a bit uh the the target

the metric that we that we keep in mind.

So think about it in this way. Tasks

have a natural level of complexity and

basically what we care about is that

they have a minimum irreducible amount

of work that they express. What agents

do is that they always go through this

loop of planning, implementing and

testing. And of course to make this

happen and to make it work correctly,

you want this work to be happening over

a long quing trajectory. So our goal is

to maximize the reducible runtime of the

agent. By reducible, I mean having a

span of time where the user doesn't have

to make any technical decisions and the

agent can accomplish the task again in

full autonomy. This is especially

important for us because I can't trust

our users to make technical decisions.

So they they need a proper technical

collaborator by their side. I want to

abstract away as much complexity as

possible from the process of software

creation. And last but not least, I want

the users to feel in control of what

they're creating without startling their

creativity because they have also to

think about the technical decision that

the agent is making.

So now what are the pillars of autonomy?

How are we making this happen? I would

say there are three pillars that are

extremely important to think about. The

first one is of course the capabilities

of frontier models like the baseline IQ

that we inject in the main agentic loop.

I'm going to leave this as an exercise

to the reader and to other people in the

room. I'm really glad a lot of you are

building amazing models that you know we

use all the time at Rabbit. So this is

the pillar number one. The second pillar

is verification. It's very important

that we test for local correctness of

our agent at every step that it takes

and the reason is fairly intuitive. If

you are building on very shaky

foundations, eventually the castle will

topple down. So we brought verification

in the loop to make sure that in a sense

you are having you know nines or

reliability whereing the compounding

errors that an agent will make

unavoidably if you know you don't put

any control on it. And last but not

least, you heard it on stage even

earlier. I'm sure are going to be

hearing this you know the entire day or

the entire duration of the conference.

Uh the importance of context management.

So on one end you want to have an agent

that is capable of being globally

coherent. So it's align with the intent

of the user the expectation of the user

but at the same time it is also to be

capable of managing both the high level

goal and the single task that the agent

is working on. I think we made amazing

progress in the last months on context

management. But I'm also excited to see

you know where we're going as a field.

Let's start from the first pillar that

we work actively at rapid which is verification.

verification.

So why did we focus on this? Over the

know last year we realize something that

I think each one of you has experienced.

So without testing agents build a lot of

painted doors. In our case the painted

doors are very visible because we create

a lot of web applications. So you end up

basically trying to click on a button

and the handler is not looked up or some

of the data that we're showing is

actually mock data and it's not coming

it's not coming from a database. But in

general this phenomenon spans you know

across every type of component you're

building being it front end or back end

a lot of components are actually not

fully fleshed uh by the agent. So we run

some evaluations internally. We found

out that more than 30% of the individual

features happen to be broken know the

first time that are cooked by the agent.

And that also means that almost every

applications have at least one broken

feature or painted door. They're hard to

find. The reason is users are not going

to spend time testing every single

button, every single field. And this is

also probably one of the reasons why a

lot of our users, especially the

nontechnical ones, still can't trust

coding agents very much. They are

shocked when they find that there is a

painted door out there. So, how do we

solve this problem?

Fundamentally, we need an agent must

gather all the feedback that they need

from their environment, right? It's

easier said than done. Um again

nontechnical users not only cannot make

technical decisions but also they cannot

provide the technical feedback that you

know an agent is required to make

progress and most what they can do is

basic you know quality assurance

testing. They can literally go around

the UI click interact with the

application. I'm I'm sure you have tried

it in your life. This is extremely

tedious to do and it leads to a very bad

user experience. And even though we

relied on that with our first release of

the agent last year, quickly we found

out that users don't want to spend time

doing testing. So we had to find a

complete, you know, orthogonal solution

to that which is autonomous testing and

it solves several different issues. The

first one is it breaks the feedback

bottleneck. Even if again we ask

feedback to the user, we were not given

enough of that. Now we don't have to

wait anymore for human feedback. we have

a way to elicit as much information as

possible from the app autonomously. We

also want to prevent the accumulation of

small errors. What I was saying before,

we don't want to have compounding errors

while the agent is building. And last

but not least, we have to overcome the

laziness of frontier models. So we need

to verify that whenever a model tells us

that a task has been completed, there is

actually the truth and that result is

not being elucinated.

There is a wide spectrum of code

verification that you know you you can

accomplish. I think we all started from

the very left. You know you have basic

study code analysis with LSPs. We have

been executing the code since we had

basically lams that were capable of

debugging and then we slowly started to

move towards the right. So generating

unit tests and running them it has a

limitation. It's limited only to

functional correctness. Uh unit testing

is not very powerful to do like proper

integration testing by definition. We

started also to do now API testing but

it's only limited to API code. So you

can test endpoint of an applications you

can't really test how a web app

functions and looks like and for this

reason in the last few months has and

other companies are putting a lot of

effort in really creating autonomous

testing based on the browser you know in

case the app that we're building is a

web application. There are two main

categories here. One is computer use.

It's a onetoone mapping with user

interface. So the model is directly

interacting with the application. It

requires screenshots. It tends to be

fairly expensive and fairly slow. I'm

sure you you tested it yourself. A good

way in the middle is browser use where

we simulate the user interface. You can

then interact with the browser and with

the web application and it relies on

basically accessing the DOM through abstractions.

abstractions.

So how do we how do we make this work in

Weblet? Um what we do is that we

generate applications that are amenable

to testing and we sort of merge

everything together from the previous

slides that I showed you. So we allow

the our testing agent to interact with

an application and gather screenshots in

case nothing has worked. So we have a

full back to computer use. But the vast

majority of times what we do is that we

have programmatic interactions with the

applications. So we interact with the

database, we read the logs, we do API

calls, we literally click on the app and

get back all the information that we

need. And by putting all of this

together, we collect enough feedback

that allows our agent both to make

progress and also to fix all the painted

doors that it encounters.

Just a know short technical deep dive on

how we accomplish this. I'm sure you

have seen a lot of the toolbased uh

browser use. There are amazing libraries

out there. First one comes to man stage

and the idea is that you have an agent

that has a few very generic tools

exposed. So know the agent can create a

new tab, can click, can fill forms etc

etc. The limitation here is that it's

difficult to enumerate all the different

type of interactions you could be having

with a browser. The problem of testing

is very similar to the Tesla analogy I

was making before. Maybe this cardality

of tools available is enough for 99% of

the interaction types. But then there is

always a long tail of idiosyncratic

interactions that a user makes with the

with a web application that are hard to

map into these tool these different tool

codes. So what we do uh in our case at

rapid is we directly write playrite code

and playwrite code is first of all very

manable for LLMs. LLMs are kind of

amazing at writing playright. You know

this is the experience that we had uh

since we started to work on this project

is also very powerful and expressive. So

in a sense it's a super set of what you

can express uh on the compared to the

left on the tools uh testing. And last

but not least, there is beauty in

creating playright code because you can

reuse those tests. The moment you write

a test in script, then you can rerun it

as many times as you want. So in a

sense, the moment you created a test,

you're also creating a regression test

suite that you can keep running in the

future. And all these kind of uh tricks

that I explained to you right now, they

helped us to create something that is

roughly a order of magnitude cheaper and

faster compared to computer use. And

we'll go back later on how important

latency is.

The second thing that the second pillar

that I wanted to talk about today of

course is context management. And I'm

going to go very fast here because I

think you're going to be hearing a lot

of talks today about it. The the high

level message here is that long context

models are not needed to work on quer

and long trajectories. Uh from

experience we found that most of the

tasks even the more ambitious one can be

accomplished within the 200,000 tokens.

So we're still not in a world where

working with models that have 10 million

or 100 million uh context windows is

necessary to actually run autonomous

agents. And we accomplish this by means

of learning how to do context management

correctly. So first of all, there are

several different ways to maintain state

which don't imply chucking all the state

into your context window. You can do

that for example by using the codebase

itself to maintain state. So you can

write documentation while the agent is

creating new code. You can also include

the plan description and all the

different task list that the agent is

working on. You can persist them on the

file system. So even there like have a

lot of ways to offload your memories.

And last but not least and this is

something I think you know Antropic has

been uh really evangelizing about um you

can even dump directly your memories in

the file system and then making sure

that your agent decides when to write

them back the moment they become

relevant to your work. So for this

reason we have been seeing a lot of

announcements in the last couple of

months. I just picked this one from

entropic you know with cloth sonet 4.7

so I wish 4.5 uh they have been able to

run uh focus task for more than 30 hours

in a row we have seen similar results

from open AI on the math problems. So I

think we we kind of broke the barrier of

running for long and you know being able

to have querant tasks.

I would say the key ingredient to make

this happen has been how good models

hand as agent builders have become in

doing sub agent orchestration. Subages

basically work by means of they are

invoked in the core loop. So it's a

completely it's starting from a blank

slate uh from a completely fresh

context. You as an agent builder decide

what subset of the context to inject

when this sub agent starts. And it's a

concept that is very similar I think to

everyone who's been writing software you

know in the last decades is separation

of concerns. So you decide what your sub

engine is going to be working on. You

give it the least possible amount of

context. You allow it to run to

completion. You only get the output the

results. You inject them back into the

main loop and you keep running in this

way. Of course it significantly improves

the number of memories per compression.

I just brought this plot from directly

from reputation run in production. The

moment we kicked in our new subvision

orchestrator on the ax on the y-axis you

can see the number of memories per

compression. So we went from roughly 35

to 4550 recently. So big improvement in

terms of how often we are recompressing

our context just because we can offload

a lot of the context pollution by means

of using sub aents.

I'm going to give an example where this

made the difference for us. You know

what I'm showing you here is more kind

of a cost optimization in a sense like

you're compressing less. You also have

separation of concerns which definitely

make your agent a bit smarter. In the

case of testing

working with sub agent was almost

mandatory for us and basically we

started to work on automated testing

even before we were very advanced in

terms of subent orchestration. And what

we found out is of course again as I was

saying before it makes things easier

better cost less pollution but when you

allow the main loop not only to create

code but also to do browser opt browser

actions to put back the observation of

your browser actions into the main loop

you tend to confuse the the hent loop

very much because at this point there is

a lot of heterogeneity in terms of the

action that your main loop is looking

at. So in order to make this work not

only we have to build all the playright

framework that I was showing to you

before but we also have to move our

entire architecture into sub agents. So

at this point you can see very clearly

why there is a separation of concern

here. Get the main agent loop running.

We decide at a certain point that it's

time to verify if the output of the

agent has been correct. We make this

happen all within a sub agent. Then we

scratch the context window of that sub

agent. We just return back the last

observation to the agent loop and then

we keep running in that way. So if

you're having issues today making your

sub agents uh work correctly, this is

one of the reasons why that you want to

take a look at.

So I think we covered the high level of

how to create more and more powerful uh

autonomous agents over time and I only

see us as a field becoming even more

proficient than that in the next months.

There is one additional ingredient

though that is going to make the

difference and it's parallelism. And I

will argue that parallelism is important

not because it's going to make agents

more powerful per se, but rather because

it's going to make the user experience

more exciting. So of course it is great

to have an agent that is capable of

running autonomously for long, but at

the same time it comes with the price of

making the user experience less

thrilling. You are not in the zone

anymore. What you do is that you write a

very long prompt. It's translated into a

task list. Uh and then you go to have

lunch with your colleagues and then you

come back and you hope that the agent is

done. That is not the kind of experience

that most of the productive people want

to have in life. You know, you want to

see as much work as done as possible in

the shortest span of time.

So what we do as a as a field at this

point has been to create parallel

agents. It's a very common trade-off

which by the way doesn't only apply to

agents. it it applies to computing in

general and for parallel agents what you

do is that you you trade off basically

extra compute in exchange for time. Why

there is this trade-off? So first of all

when you're running agents in parallel

you're gathering the same context in

multiple context windows. So every

single parallel agent that you will be

running probably shares say 80% of the

context across the board. So of course

you are just putting more computed work

because you're running those agents in

parallel. There is also another cost

that is kind of intangible for a lot of

you here in the room because I'm sure

you're all expert software developers.

But what do you do with the output of

multiple par agents at the end? Often

times you need to resolve merge

conflicts. So as a reminder, my users

don't even know what's the concept of

merge conflicts. It's something that I

have to figure out on our own. So the

current way in which we think of

parallel agents in in the space doesn't

really apply to rapid. Now at the same

time I still want to very much to

accomplish this. There are so many

interesting features that you can enable

with parallelism. Aside from the fact

that you can get more work done u at

times you want to you want testing to be

running in parallel with the agent that

creates code. Testing no matter how much

we optimize it is still very slow. If an

agent is only spending time on testing

users are not going to be engaging with

your application anymore. Um, at the

same time, it's also great to have a

synchronous process running while your

agent is running because you can inject

useful information back into the main

core loop. And last but not least is a

very common technique that we know boost

performance if you have enough budget to

do so. You should be sampling multiple

trajectories at the same time. So a lot

of perks are coming with parallel

agents. But u the way in which we

implement them today which I call

basically call user has an orchestrator

is the fact that tasks the parallel task

that you want to run are determined by

you by the user and each task is

dispatched in its own thread. So there

is a bit of manual process even the task

de composition in a sense is happening

in your mind while you're thinking about

which agents you want to run and then

the moment you get back all the results

you need to go through the problem of

merge conflicts and often times this is

not trivial at all no matter how many

amazing tools are out there. So what

we're working on today for our next

version of the agent is having the core

loop as the orchestrator. So the key

difference here is the fact that the the

subtask that we're going to be working

on are not determined by the user but

they are determined by the corion loop

and the parallelism is basically decided

on the fly. The agent does the task de

composition on behalf of the user and

this comes with a couple of advantages.

First of all again there's no cognitive

burden to for the user to understand how

they should be decomposing the task. At

the same time also there are ways in

which you can create tasks that sort of

mitigate the problem of merge conflicts.

I'm not claiming that we're going to be

able to mitigate it 100%. There are so

many corner cases in which merge

conflict will still represent a problem

but there are a lot of different

techniques known in software engineering

to make sure that you can try to have

multiple subage and not stepping on each

other toes. So the core loop as an

orchestrator is going to be the our main

bet for the next few months.

And in case you're passionate about

these topics,

[music] I'm always hiring a rabbit.

Thank you. [applause]

From transforming support tickets into

merge requests to helping teams ship

fixes faster than ever, our next

presenter has been at the center of

Zapier's AI agent journey. Please

[music] [applause]

Hello.

I'm so excited to tell you about how at

Zapier we are empowering our support

team to ship code. Before I tell you

about that, has anybody here visited the

Grand Canyon?

It's a good amount. Anybody rafted

through the Grand Canyon?

I see one person. I just got off an

18-day trip rafting through the Grand

Canyon over 200 miles. It was

incredible. No internet, no cell

service. The moment I got off, I found

out I was giving this talk. I didn't

think about uh work at all on the river,

but once I got off, I started thinking

about the parallels between the Grand

Canyon and Zapier. And we have one thing

in common and that is erosion.

Now natural erosion happens over

millions of years with wind, water and

time. It creates the beautiful canyon

that we experience and it's never

stopping, always continuing. At Zapier,

we have over 8,000 integrations built on

third party APIs and they are constantly

changing, which I'm now thinking of as

app erosion.

We've been around for 14 years. Some of

our apps are that old. API changes and

deprecations impact us and create

reliability issues. Again, it never stops.

stops.

So, I like to think of our apps as like

layers in the Grand Canyon, and they

need constant attention.

So, if we were to create our own Zapier

Canyon and our apps would be at the

walls, here's our support team flowing

down the middle watching out for app

erosion. And we have a backlog crisis.

Tickets were coming in faster than we

could handle them.

Creates integration reliability issues,

poor customer experience, even churn. So

to solve for app erosion, we kicked off

two parallel experiments. The first was

moving support from just triaging to

also fixing these bugs. It's experiment

number one. Experiment number two, we

were asking can AI help solve app

erosion faster.

So let's jump into experiment one. This

get kicked off two years ago, but had to

start with the why. We needed to get

that buy in to empower our support team

to ship code.

So apparosion is one of the major

sources of bugs coming through to from

support to engineering. So there's a big

need support is eager [laughter] for

this experience to a lot of them want to

go into engineering eventually and

unofficially many support members were

already helping to maintain our apps.

This moves us into how we started this

out. Put on some guard rails. We started

with just four target apps to uh focus

our fixes on. Engineering was set to

review any merge requests coming from

support and we kept the focus on app fixes.

fixes.

So jumping into experiment 2, this is

what I've been leading for the last

couple of years. How can we use codegen

to help solve for app erosion? And so

fortuitously, the name of this project

is Scout, which ties in so well to the

Grand Canyon experience that I've just

been through.

As any good product manager, we start

with discovery. We did some dog fooding,

so I shipped some app fixes. Uh we

shadowed engineers and support team

members as they were going through the

app fix process. We designed out uh what

are the pain points experienced along

the way, what are the phases of the work

and how much time is spent.

One big discovery we had is how much

time is spent gathering the context

going to the third party AP API docs

even crawling the internet looking for

information about a bug that's emerging

maybe somebody else has already

discovered and solved for it outside of

Zapier. internal context, logs, all of

this is a lot of context to go and

search for as a human uh and a lot to

gro and work through. This is something

we knew we needed to solve for.

where we started with all this great uh

opportunities and pain points is we

started building APIs that we believed

would solve for these individual um pain

points and some of these APIs are using

LLMs to you know for our diagnosis tool

gathering all that context on behalf of

the uh support person engineer and

curating that context building a

diagnosis that's [clears throat] using

an LLM. And then some aren't like we

have a unit test uh or unit test

generator is, but the um test case

finder is simply using a search query to

look for the right test cases to pull in

for your unit test. We built a bunch of

APIs. We had a bunch of great ideas. So

there was a lot for us to test with, but

we ran into some challenges in this

first phase. We had APIs but they were

not embedded into our engineers process.

So our tool I just said they don't like

to go to so many web pages to find all

their context. They would love all this

information to come to them. And yet our

web interface where we've we've created

a playground we call autocode internally

where you can come and play around with

our APIs. And our ask to the teams was

come try out our APIs and give us feedback.

feedback.

Now this is just one more window to go

to. So we didn't get a lot of

engagement. Also because we had shipped

so many uh APIs our team was spread

pretty thin. Cursor launched at the same

time which has gotten great adoption at

Zapier. We're all huge fans of cursor.

But from our side, it made some of our

tools no longer necessary.

But there was one major win in this

phase, which is one of our APIs became a

support darling. It's diagnosis. That

number one pain point of needing to go

out and find all of your context, curate

it for yourself so you can start solving

the problem. We were doing that on uh

the support team's behalf with the

diagnosis API

and support loved it enough that they

decided to embed it into their process.

They asked us to build a zap year

integration on our autocode APIs so they

could embed it into their zap that

creates the jur ticket from the support

issue and now diagnosis is included.

So embedding tools is the key to usage

as we find out. So how can we embed more

of our tools? Well, then MCP spins up

and that solves our problem.

We can now embed these API tools into

our engineers workflow. Specifically,

our engineers are pulling in these MCP

tools as they're using Cursor.

Our builders using Scout MCP tools are

leaving the IDE less, spending more time

in one window.

Still coming into challenges. One of our

uh our our key tool diagnosis

uh is so valuable to pull all that

context and to provide a recommendation,

but it takes a long time to run. Now, we

might run down that runtime. However, as

you're working synchronously on a ticket

in your ID, this was frustrating. We

also weren't keeping up with the

customization needs. Not only did MCP

launch and we started leveraging it, Zap

Your MCP launched too. And some of our

tools, if we weren't keeping up with the

customization needs, our engineers

internally looked to Zap Your MCP, which

is great. We're all on the same team

solving the same problem, but some of

our tools had a dead end. Also adoption

was scattered. We had a whole suite of

tools and we thought there was value in

each of them as it solves for different

problems across the different stages.

Not every engineer was using our tools

and if they were using tools, they're

only using a few of them. So we have

tool usage. We're happy about that. But

we were under the hypothesis that true

value is going to come from tying these

tools together.

So what if we owned orchestration of

these tools rather than saying here's a

suite of tools you use them as you wish

what if we combined them and created an

agent to orchestrate this. So this we

are calling scout agent. We take that

diagnosis run that against a ticket uh

use that information to actually spin up

a codegen tool which will then produce a

merge request using all the right context.

context.

So who would benefit the most from

orchestration? There are several

integration teams at Zapier who are

solving for these app fixes of various

levels of complexity and there's the

support team. So when we're saying who

should be our first customer scout

agent, we're thinking it should probably

be the the team fielding small bugs that

are emergent and coming hot off the

queue which is the support team. And now

our two experiments merge

and we have scout agent. We are building

for the support team.

And this is the flow of how it works.

Support is submitting an issue to scout

agent. We first categorize the issue. We

next assess its fixability.

Not every issue that comes from support

can be fixed. If we thinks it's fixable,

we'll move on to generating a merge

request. At that point, the support

team, this is the first time they're

picking up the ticket. It already has a

merge request attached to it. They'll

review and test. If it's not satisfying

what they believe is the actual solution

or the the what what the solution should

be to best address the customer's need,

they will make a request for an

adjustment that can happen right in

GitLab, which is where we do our work

and Scout will do another pass and

hopefully at that point we've gotten it

right and support can submit that MR for

review from engineering.

How we are running Scout, it's all

kicked off by a Zap. This is a picture

of one of our zaps. There are many zaps

that's run this whole process and it

embeds right into our support team's

zaps. We do a ton of dog fooding at Zapier.

Zapier.

We first run diagnosis and post that

result to the Jira ticket saying what

the categorization is. If we believe

it's fixable and then if we do believe

it's fixable, we then are kicking off a

GitLab CI/CD pipeline.

And we run three phases in that

pipeline. plan, execute and validate to

generate this merge request. The tools

used in this pipeline is Scoutm. So all

those APIs we invested in a year ago now

are really coming together and we're

orchestrating it uh within the GitLab

pipeline and we're also leveraging

cursor SDK.

Once the merge request has been

completed, we attach it to Jira and

support picks it up.

The latest addition to this is doing a

rapid iteration once a um uh once a

ticket has been posted with the merge

request and support team is looking at

it and they say you know it needs some

tweaks to save them more time so they

don't have to go pull that down to their

ID do the fixes and push it back up.

they can simply chat with the uh scout

agent in gitlab that'll kick off another

uh pipeline which does that phase with

that new feedback and posts the new

merge request

on our side we want to make sure scout

agent is working so we ask three

questions the categorization right is

was it actually fixable uh and was the

code fix accurate so far we have two eval

eval

to 75% accuracy for categorization and

fixibility. As we get more feedback and

process more tickets, those become our

test cases and we can move forward

improving scout agent over time. So what

has been scout agents impact on app erosion?

erosion?

40% of supports support teams app fixes

are being generated by scout. So we're

doing more of the work on behalf of the

support team.

This is resulting and for some of our

support team it's doubling their

velocity from one to two tickets per

week which already is amazing. That's

going from a support team that wasn't

shipping any fixes, well unofficially

they were sometimes to now shipping one

to two per week per person to now

shipping three to four with the help of Scout.

Scout.

Another uh process improvement, Scout

puts potentially fixable tickets right

there in the triage flow. takes away a

lot of the friction of looking for

something to grab from the backlog.

It's not just the support who's

benefiting, it's also engineering.

Engineering manager said, uh, it's a

great example of when it works. This

tool allows us to stay focused on the

more complex stuff.

And if you take away anything from this

talk, I hope it is that there is a

really powerful magic between support

and empowering them with codegen and

allowing them to ship fixes because they

have three superpowers. The first they

are the closest to customer pain which

mean they're closest to the context that

really matters for figuring out what's

the problem and how to solve it. They're

also troubleshooting in real time. These

tickets aren't stale. the context is

fresh, the logs aren't missing. You put

this ticket into engineering backlog

months later, you might not get access

to those logs anymore. And then three,

they're best at validation.

You've again you put the same ticket

into an engineering backlog. The

solution an engineer might come up with

may change the behavior and that might

be good for some customers but might not

necessarily be best for that one

customer who wrote in about the problem.

And one other major benefit of this is

uh support team members who have been part of this experiment are now

part of this experiment are now engineers.

engineers. I want to say thank you to the amazing

I want to say thank you to the amazing team who's helped built this process or

team who's helped built this process or built all the tools and the scout agent.

built all the tools and the scout agent. Andy is actually here in the audience.

Andy is actually here in the audience. So shout out to Andy. If you want to

So shout out to Andy. If you want to talk about any of the technical bits,

talk about any of the technical bits, he's here. And I want to impress upon

he's here. And I want to impress upon you two things. or hiring, but mostly if

you two things. or hiring, but mostly if you haven't rafted through the Grand

you haven't rafted through the Grand Canyon, please consider it. It's

Canyon, please consider it. It's lifechanging and you should go with ORS.

lifechanging and you should go with ORS. Thank you very much.

Thank you very much. [applause]

Our next presenters believe that [music] 2026

2026 is the year the IDE died. Please join me

is the year the IDE died. Please join me in welcoming to the stage engineering

in welcoming to the stage engineering leader at Source Graph and AMP, Steve JG

leader at Source Graph and AMP, Steve JG and author and researcher at IT

and author and researcher at IT Revolution, Jean Kim.

Revolution, Jean Kim. [music]

Hey everybody. Um, really happy to be here. I'm going to be talking the first

here. I'm going to be talking the first half. Co-author here, Jean Kim, is going

half. Co-author here, Jean Kim, is going to talk second half. All right. Looking

to talk second half. All right. Looking forward to it. Cheers. All right. Today

forward to it. Cheers. All right. Today I'm going to Well, we're going to talk

I'm going to Well, we're going to talk real fast. This time is going to go down

real fast. This time is going to go down fast. Uh I'm going to talk to you about

fast. Uh I'm going to talk to you about what tools look like next year. Last

what tools look like next year. Last year I was talking to you all about chat

year I was talking to you all about chat and everybody ignored me and now

and everybody ignored me and now everybody's using chat this year and

everybody's using chat this year and it's like we're gonna we're going to fix

it's like we're gonna we're going to fix that right now. All right. So here's

that right now. All right. So here's what it's looked like. I'm going to tell

what it's looked like. I'm going to tell you right now, everyone's in love with

you right now, everyone's in love with Cloud Code. There's probably 40

Cloud Code. There's probably 40 competitors out there. Cloud Code ain't

competitors out there. Cloud Code ain't it.

it. completions wasn't it. I love cloud

completions wasn't it. I love cloud code. I use it 14 hours a day. I mean,

code. I use it 14 hours a day. I mean, come on. But it ain't it. Developers

come on. But it ain't it. Developers aren't adopting it. I'm going to talk

aren't adopting it. I'm going to talk about why in this talk. I'm going to

about why in this talk. I'm going to talk about what you can do about it and

talk about what you can do about it and what to look forward to. But the reason

what to look forward to. But the reason is they're too hard. Okay. Uh cognitive

is they're too hard. Okay. Uh cognitive overhead. Uh they lie, cheat, and steal.

overhead. Uh they lie, cheat, and steal. Gene and I talk a lot about this in our

Gene and I talk a lot about this in our book, all the different ways that they

book, all the different ways that they can lie, cheat, and steal. And uh most

can lie, cheat, and steal. And uh most devs just don't like this.

devs just don't like this. I have come to understand that claude

I have come to understand that claude code is very much like a drill or a saw,

code is very much like a drill or a saw, an electric one, right? How much damage

an electric one, right? How much damage can you do as an untrained person with a

can you do as an untrained person with a drill, right? Or a saw. Yeah. How much

drill, right? Or a saw. Yeah. How much damage can you do as an untrained

damage can you do as an untrained engineer with clawed code? It's real

engineer with clawed code? It's real similar. Yeah. You can cut your foot

similar. Yeah. You can cut your foot off,

off, but you can also be really, really

but you can also be really, really skilled with it and do really precision

skilled with it and do really precision work, right? like a craftsman. The

work, right? like a craftsman. The problem is software is infinitely large.

problem is software is infinitely large. Our ambition is infinitely large. And so

Our ambition is infinitely large. And so the analogy that I want to share with

the analogy that I want to share with you is next year will be the year from

you is next year will be the year from moving from saws and drills to CNC

moving from saws and drills to CNC machines. A CNC machine, you strap a

machines. A CNC machine, you strap a drill on and you give it coordinates and

drill on and you give it coordinates and it moves it around and very precise,

it moves it around and very precise, right? We've been doing this for

right? We've been doing this for centuries and we're not going to stop

centuries and we're not going to stop this year.

One thing I hear people say is, "Well, the models are plateaued." This is real

the models are plateaued." This is real common. Your engineers are probably

common. Your engineers are probably saying this, okay, even if they

saying this, okay, even if they plateaued, we have still discovered

plateaued, we have still discovered steam and electricity and it's going to

steam and electricity and it's going to take us a little time to harness it. But

take us a little time to harness it. But it's strictly an engineering problem at

it's strictly an engineering problem at this point. All code within a year, year

this point. All code within a year, year and a half will be written by giant

and a half will be written by giant grinding machines overseen by engineers

grinding machines overseen by engineers who no longer actually look at the code

who no longer actually look at the code directly anymore.

directly anymore. Weird new world. That is where we are

Weird new world. That is where we are going. Oh my gosh. Yeah. This this

going. Oh my gosh. Yeah. This this slide. So Gene and I talked to Andrew

slide. So Gene and I talked to Andrew Glover who I don't know is he here from

Glover who I don't know is he here from OpenAI and he said that they have this

OpenAI and he said that they have this incredible dichotomy unfolding at OpenAI

incredible dichotomy unfolding at OpenAI where you know some percentage of their

where you know some percentage of their engineers are using codecs and then some

engineers are using codecs and then some other percentage a larger percentage are

other percentage a larger percentage are not using codecs and the difference in

not using codecs and the difference in productivity is so staggering that

productivity is so staggering that they're having now alarms going off at

they're having now alarms going off at performance review time because how do

performance review time because how do you compare these these two engineers

you compare these these two engineers who are the same level, same title, same

who are the same level, same title, same everything and one of them is 10 times

everything and one of them is 10 times as productive as the other one by any

as productive as the other one by any measure.

measure. And the answer is they're freaking out.

And the answer is they're freaking out. They may have to fire 50% of their

They may have to fire 50% of their engineers. And this is unfolding at

engineers. And this is unfolding at other companies, too.

other companies, too. Who is refusing it? It's the senior and

Who is refusing it? It's the senior and staff engineers. How many minutes are we

staff engineers. How many minutes are we at?

at? >> Eight [clears throat] minutes.

>> Eight [clears throat] minutes. >> We're perfect. This is just like what

>> We're perfect. This is just like what happened to the Swiss mechanical watch

happened to the Swiss mechanical watch industry over a couple of Well, it was

industry over a couple of Well, it was built up for a couple of centuries and

built up for a couple of centuries and then courts killed it, you know, within

then courts killed it, you know, within a couple of years. And what happened was

a couple of years. And what happened was the craftsmen were doing the same thing

the craftsmen were doing the same thing our staff engineers are doing today. No

our staff engineers are doing today. No cheap.

cheap. That's word for word, right? That's what

That's word for word, right? That's what they say.

they say. All right. I didn't know where to put

All right. I didn't know where to put this slide. This is this is Claude's

this slide. This is this is Claude's view of what next year looks like. And I

view of what next year looks like. And I I was just like, what do you think it's

I was just like, what do you think it's going to look like? And it actually does

going to look like? And it actually does kind of look like this. Most of the

kind of look like this. Most of the words will be spelled correctly in in

words will be spelled correctly in in next year. But this is a lot prettier

next year. But this is a lot prettier than cloud code.

than cloud code. Yeah, this is what it has to look like.

Yeah, this is what it has to look like. Some form of a UI, not an IDE. This is

Some form of a UI, not an IDE. This is the new IDE. Okay. And people are

the new IDE. Okay. And people are building it. In fact, I think the

building it. In fact, I think the company that's the furthest along in

company that's the furthest along in this is Replet, who just talked to you.

this is Replet, who just talked to you. I think it's amazing what they're doing.

I think it's amazing what they're doing. It's absolutely bravo, right? We should

It's absolutely bravo, right? We should not be all chasing tail lights and

not be all chasing tail lights and building command line interfaces

building command line interfaces anymore. All right. and and more

anymore. All right. and and more importantly, Claude Code and all of its,

importantly, Claude Code and all of its, you know, competitors, they're all doing

you know, competitors, they're all doing it wrong because they're building the

it wrong because they're building the world's biggest ant. Okay, this is my my

world's biggest ant. Okay, this is my my buddy Brendan Hopper at Commonwealth

buddy Brendan Hopper at Commonwealth Bank of Australia, right? He's like,

Bank of Australia, right? He's like, "Nature builds ant swarms and Claude

"Nature builds ant swarms and Claude Code built this huge muscular ant that's

Code built this huge muscular ant that's just going to bite you in half and take

just going to bite you in half and take all your resources, right? I mean, it's

all your resources, right? I mean, it's a serious problem, right? If I say

a serious problem, right? If I say please analyze this codebase, I, you

please analyze this codebase, I, you know, go to the expensive model." If I

know, go to the expensive model." If I say, "Is my git ignore file still

say, "Is my git ignore file still there?" I've also gone to the expensive

there?" I've also gone to the expensive model, right? Everything that you say

model, right? Everything that you say goes to the expensive model. So, what's

goes to the expensive model. So, what's going to happen? Whoa. What happened? Oh

going to happen? Whoa. What happened? Oh gosh,

gosh, my slides are all messed up now.

my slides are all messed up now. Can you guys see them?

Can you guys see them? >> No.

>> No. >> Oh, this always happens to me, man.

>> Oh, this always happens to me, man. There's something going on. All right.

There's something going on. All right. So, I thought of a really cool analogy

So, I thought of a really cool analogy called the diver the diver metaphor,

called the diver the diver metaphor, which is your context window is like an

which is your context window is like an oxygen tank. Okay. This is why these

oxygen tank. Okay. This is why these things are fundamentally wrong because

things are fundamentally wrong because you're sending a diver down into your

you're sending a diver down into your codebase underwater to swim around and

codebase underwater to swim around and take care of stuff for you. One diver

take care of stuff for you. One diver and we're like, we're going to give him

and we're like, we're going to give him a bigger tank. 1 million tokens. He's

a bigger tank. 1 million tokens. He's still going to run out of oxygen. Like

still going to run out of oxygen. Like you don't, right? You should send a

you don't, right? You should send a product manager diver down first

product manager diver down first and then a coding diver, right? And then

and then a coding diver, right? And then a review diver and a test diver and a

a review diver and a test diver and a get merge diver, etc. Right? Nobody's

get merge diver, etc. Right? Nobody's doing this. Everyone's building a bigger

doing this. Everyone's building a bigger diver. I don't know my slides are all

diver. I don't know my slides are all messed up. My my my talk is almost done.

messed up. My my my talk is almost done. But um what we do is as engineers, task

But um what we do is as engineers, task decomposition,

decomposition, successive refinement, components, black

successive refinement, components, black boxes. This is how it's going to be

boxes. This is how it's going to be built in the future. And it's going to

built in the future. And it's going to be built with lots and lots of agents,

be built with lots and lots of agents, not just one agent.

not just one agent. All right. Until then, I think we're out

All right. Until then, I think we're out of time, but so until then, learn cloud

of time, but so until then, learn cloud code. Give up your IDE. Swix told me he

code. Give up your IDE. Swix told me he wants some hot take, so I'll give you

wants some hot take, so I'll give you one. If you're using an IDE starting on,

one. If you're using an IDE starting on, I'll give you till January 1st.

I'll give you till January 1st. You're a bad engineer.

You're a bad engineer. There's your hot take. All right, folks.

There's your hot take. All right, folks. [applause]

All right, cheers. Well, that that was actually my talk. Um [clears throat]

actually my talk. Um [clears throat] uh uh learn coding agents and oh yeah,

uh uh learn coding agents and oh yeah, then there's this guy. Speaking of bad

then there's this guy. Speaking of bad engineers, so this is this is Jordan

engineers, so this is this is Jordan Hubard uh who uh who's at NVIDIA and he

Hubard uh who uh who's at NVIDIA and he tweeted LinkedIn a really nice post on

tweeted LinkedIn a really nice post on how to get the most out of agents and

how to get the most out of agents and this guy responded with this, right?

this guy responded with this, right? This is everyone in your or this is 60%

This is everyone in your or this is 60% of your org right here. This guy's not

of your org right here. This guy's not an outlier. Okay, the backlash is very

an outlier. Okay, the backlash is very real against this. Yeah. And this is

real against this. Yeah. And this is going to be a problem I'm not going to

going to be a problem I'm not going to I'm not going to share with you. I don't

I'm not going to share with you. I don't have time to share how to fix it, but

have time to share how to fix it, but it's something you should be aware of.

it's something you should be aware of. And anyway, I'm going to turn it over to

And anyway, I'm going to turn it over to my co-author, Jean. We had a lot to talk

my co-author, Jean. We had a lot to talk about. He's got a lot to go. So, let's

about. He's got a lot to go. So, let's turn it over to Jean.

turn it over to Jean. >> Yeah. Thank you, Steve.

>> Yeah. Thank you, Steve. >> Hi, buddy. [applause]

>> Hi, buddy. [applause] >> Yeah. By the way, um I have let me start

>> Yeah. By the way, um I have let me start off by introducing myself and then I

off by introducing myself and then I want to share a little bit about like

want to share a little bit about like what it's been working like uh what's

what it's been working like uh what's been like working with Steve on the VIP

been like working with Steve on the VIP coding book. Uh and so just a little bit

coding book. Uh and so just a little bit about myself. I've had the privilege of

about myself. I've had the privilege of studying high performing technology

studying high performing technology organizations for 26 years. And that was

organizations for 26 years. And that was a journey that started when I was a

a journey that started when I was a technical founder uh of a company called

technical founder uh of a company called Tripwire. I was there for 13 years. But

Tripwire. I was there for 13 years. But our mission was really to understand

our mission was really to understand these amazing high performing technology

these amazing high performing technology organizations. They had the best project

organizations. They had the best project due date performance and development,

due date performance and development, the best operational reliability and

the best operational reliability and stability and also the best posture

stability and also the best posture compliance uh security and compliance.

compliance uh security and compliance. So we want to understand how did those

So we want to understand how did those amazing organizations make their good to

amazing organizations make their good to great transformation. So we got

great transformation. So we got understand how to how other

understand how to how other organizations replicate those amazing

organizations replicate those amazing outcomes. And so you can imagine in that

outcomes. And so you can imagine in that 26 year journey there are many

26 year journey there are many surprises. Among the biggest surprise

surprises. Among the biggest surprise was how it took me into the middle of

was how it took me into the middle of the DevOps movement which is so uh

the DevOps movement which is so uh amazing because it reshaped technology

amazing because it reshaped technology organizations. you know, it changed how

organizations. you know, it changed how test and operations worked, information

test and operations worked, information security. Um, and I thought that would

security. Um, and I thought that would be the most exciting adventure I'd be on

be the most exciting adventure I'd be on in my career until I met Steve Yagi in

in my career until I met Steve Yagi in person. And so, I've admired his work

person. And so, I've admired his work for over 11 years. And so, some of you

for over 11 years. And so, some of you may have read this memo of Jeff Bezos's

may have read this memo of Jeff Bezos's most audacious memo of how in early

most audacious memo of how in early 2000s they transformed from a gigantic

2000s they transformed from a gigantic monolith that coupled 3,500 engineers

monolith that coupled 3,500 engineers together, so none of them had

together, so none of them had independent action. And uh he talked

independent action. And uh he talked about how all teams must henceforth

about how all teams must henceforth communicate and coordinate only through

communicate and coordinate only through APIs. No back doors allowed. Right? Uh

APIs. No back doors allowed. Right? Uh anyone who doesn't do this will be

anyone who doesn't do this will be fired. Thank you and have a nice day.

fired. Thank you and have a nice day. And the amazing person who chronicled

And the amazing person who chronicled says number seven is obviously a joke

says number seven is obviously a joke because Bezos doesn't care whether you

because Bezos doesn't care whether you have a good day or not. And this is

have a good day or not. And this is actually enforced by Amazon CIO then

actually enforced by Amazon CIO then Rick Del. And so it turns out this memo

Rick Del. And so it turns out this memo that I've been quoting for 11 years uh

that I've been quoting for 11 years uh was written by Steve Yaggi uh which was

was written by Steve Yaggi uh which was meant to be a private uh memo on Google+

meant to be a private uh memo on Google+ which was made public which landed him

which was made public which landed him on the front page of the Wall Street

on the front page of the Wall Street Journal. Um and so I finally met him in

Journal. Um and so I finally met him in uh June and it turns out that we had

uh June and it turns out that we had many things in common. Uh but one of

many things in common. Uh but one of them was this uh love of AI and this

them was this uh love of AI and this sense that AI was going to shape coding

sense that AI was going to shape coding from underneath us. And so one of our

from underneath us. And so one of our beliefs is that uh the AI will reshape

beliefs is that uh the AI will reshape technology organizations you know maybe

technology organizations you know maybe even 100 times larger than what agile

even 100 times larger than what agile cloud CI/CD and mobile did you know 10

cloud CI/CD and mobile did you know 10 years ago. Um and that these technology

years ago. Um and that these technology breakthroughs not just reshape

breakthroughs not just reshape organizations but they reshape the

organizations but they reshape the entire economy. the entire economy

entire economy. the entire economy rearranges itself to take advantages of

rearranges itself to take advantages of these you know wild new better ways of

these you know wild new better ways of uh uh producing things and and uh so

uh uh producing things and and uh so over the last year and a half we've had

over the last year and a half we've had a chance to look at these case studies I

a chance to look at these case studies I think give us a glimpse of what these uh

think give us a glimpse of what these uh what the shape of technology

what the shape of technology organizations look like and so I'm going

organizations look like and so I'm going to share with that what we've learned

to share with that what we've learned but here's maybe a hint so some of you

but here's maybe a hint so some of you may know the work of Aiden Cochra he was

may know the work of Aiden Cochra he was a cloud architect at Netflix right he

a cloud architect at Netflix right he was what who drove uh the uh entire

was what who drove uh the uh entire Netflix infrastructure from a data

Netflix infrastructure from a data center uh back in 2009 to running

center uh back in 2009 to running entirely in the as cloud and so he wrote

entirely in the as cloud and so he wrote uh some months ago in 2011 some people

uh some months ago in 2011 some people got very upset in uh infrastructure and

got very upset in uh infrastructure and operations because they called it

operations because they called it noopops right and everyone laughed back

noopops right and everyone laughed back then but he said oh don't you know uh

then but he said oh don't you know uh it's happening again this time it might

it's happening again this time it might be called no dev right not so funny now

be called no dev right not so funny now right so it's it's interesting right

right so it's it's interesting right because we heard this amazing

because we heard this amazing presentation from zapier about like how

presentation from zapier about like how support ships and turns out designers

support ships and turns out designers are shipping UX is shipping right anyone

are shipping UX is shipping right anyone who's been frustrated by developers uh

who's been frustrated by developers uh who you know say get in line and you

who you know say get in line and you have to wait quarters or years or maybe

have to wait quarters or years or maybe never, right, are now suddenly in

never, right, are now suddenly in position where you can actually vibe

position where you can actually vibe code your own features into production,

code your own features into production, right? And that reshapes technology

right? And that reshapes technology organizations and it reshapes, you know,

organizations and it reshapes, you know, potentially the entire economy. And so,

potentially the entire economy. And so, uh, uh, Steve and I, we've had the

uh, uh, Steve and I, we've had the privilege of watching what happens, you

privilege of watching what happens, you know, when we change, uh, you know, the

know, when we change, uh, you know, the way we, uh, deploy, right? It wasn't so

way we, uh, deploy, right? It wasn't so long ago and 10 years ago, uh, I wrote a

long ago and 10 years ago, uh, I wrote a book called the Phoenix Project where it

book called the Phoenix Project where it was all about the catastrophic

was all about the catastrophic deployment. Would you believe, uh, that

deployment. Would you believe, uh, that it was, you know, 10 years ago, 15 years

it was, you know, 10 years ago, 15 years ago, most organizations shipped once a

ago, most organizations shipped once a year, right? Right. And so I got to work

year, right? Right. And so I got to work on a project called the state of DevOps

on a project called the state of DevOps research. It was a cross population

research. It was a cross population study that spanned 36,000 respondents uh

study that spanned 36,000 respondents uh from 2013 to 2019. And what we found uh

from 2013 to 2019. And what we found uh this was Dr. Nicole Forsgrren and Jez

this was Dr. Nicole Forsgrren and Jez Humble. Um and what we found was that

Humble. Um and what we found was that these high performers ship multiple

these high performers ship multiple times a day, right? They can ship in one

times a day, right? They can ship in one hour or less. And you know back in 2009,

hour or less. And you know back in 2009, people thought, "Oh my gosh, multiple

people thought, "Oh my gosh, multiple deployments per day, right? That's

deployments per day, right? That's reckless and irresponsible, maybe even

reckless and irresponsible, maybe even immoral, right? What sort of maniac

immoral, right? What sort of maniac would deploy multiple times a day,

would deploy multiple times a day, right? And yet it's very common place

right? And yet it's very common place these days. In fact, if you want to have

these days. In fact, if you want to have great reliability profiles, if you want

great reliability profiles, if you want to have short meantime prepare, you have

to have short meantime prepare, you have to do smaller deployments more

to do smaller deployments more frequently. And I think we're now seeing

frequently. And I think we're now seeing these kind of case studies that show

these kind of case studies that show that this better way of coding, right,

that this better way of coding, right, where you don't type in code by hand

where you don't type in code by hand might be, you know, just a vastly better

might be, you know, just a vastly better way uh to create value. And so our

way uh to create value. And so our definition of vibe coding that we put

definition of vibe coding that we put into the uh V coding book was that it's

into the uh V coding book was that it's basically anything where you don't type

basically anything where you don't type in code by hand. And so for some of

in code by hand. And so for some of those of you who don't understand that,

those of you who don't understand that, that's like sort of a uh typing an ID

that's like sort of a uh typing an ID hunched over, right? And you're actually

hunched over, right? And you're actually moving your fingers, right? That's sort

moving your fingers, right? That's sort of like how some people go into a dark

of like how some people go into a dark room to develop photographs, right?

room to develop photographs, right? Believe it or not, some people still do

Believe it or not, some people still do that. Um and and what I that's a great

that. Um and and what I that's a great definition that we uh loved until uh Dar

definition that we uh loved until uh Dar Amade u uh CEO and co-founder of um

Amade u uh CEO and co-founder of um Anthropic, he gave us an even better

Anthropic, he gave us an even better definition, right? The vibe coding is

definition, right? The vibe coding is really the iterative conversation uh

really the iterative conversation uh that results in AI writing your code.

that results in AI writing your code. And he said it's on one hand a beautiful

And he said it's on one hand a beautiful term because it evokes this different

term because it evokes this different way of coding but he said it's also

way of coding but he said it's also somewhat misleading because it sounds

somewhat misleading because it sounds jokey right uh but he said you know

jokey right uh but he said you know adanthropic there's no other game in

adanthropic there's no other game in town right and I just thought that was

town right and I just thought that was just a beautiful way to evoke you know

just a beautiful way to evoke you know how important uh vibe coding is uh this

how important uh vibe coding is uh this is Dr. Eric Meyer um you he's probably

is Dr. Eric Meyer um you he's probably considered one of the greatest

considered one of the greatest programming language designers of all

programming language designers of all time. Uh he was part of Visual Basic, C

time. Uh he was part of Visual Basic, C link, Haskell. He created the hack

link, Haskell. He created the hack programming language uh that migrated

programming language uh that migrated millions of lines of code at Meta, you

millions of lines of code at Meta, you know, within a year uh bringing static

know, within a year uh bringing static type checking to a bunch of PHP

type checking to a bunch of PHP programmers and he said we are probably

programmers and he said we are probably going to be the last generation of

going to be the last generation of developers uh to write code by hand. So

developers uh to write code by hand. So let's have fun doing it. Um so one of

let's have fun doing it. Um so one of the things and uh when uh Steve and I

the things and uh when uh Steve and I started working on the book last

started working on the book last November was uh watching him spend

November was uh watching him spend hundreds of dollars a day on coding

hundreds of dollars a day on coding agents uh and just seemed so strange

agents uh and just seemed so strange right um you know and so he's maxing out

right um you know and so he's maxing out not just you know the uh the monthly

not just you know the uh the monthly subscriptions right but he's actually

subscriptions right but he's actually you know going way above and beyond that

you know going way above and beyond that and yet uh you know things that we're

and yet uh you know things that we're hearing now is that as an engineer part

hearing now is that as an engineer part of my job is that I need to be spending

of my job is that I need to be spending as much on tokens per day as my salary

as much on tokens per day as my salary right so you know that think about like

right so you know that think about like $500 to $1,000 a day, right? Because

$500 to $1,000 a day, right? Because this is the mechanical advantage, the

this is the mechanical advantage, the cognitive advantage that these tools are

cognitive advantage that these tools are giving us, right? And as an engineer,

giving us, right? And as an engineer, right, I'm going to challenge myself,

right, I'm going to challenge myself, you know, to get that kind of value to

you know, to get that kind of value to deliver value to people who matter. Um,

deliver value to people who matter. Um, and so in the book, we talk about, you

and so in the book, we talk about, you know, why people would do this, right?

know, why people would do this, right? And [snorts] the, uh, acronym we came up

And [snorts] the, uh, acronym we came up with FAFO, right? Uh, the most obvious

with FAFO, right? Uh, the most obvious one is F for faster, right? Yeah, that's

one is F for faster, right? Yeah, that's obviously true, but I think it's the

obviously true, but I think it's the most superficial and um part of why we

most superficial and um part of why we do this because uh the second one is it

do this because uh the second one is it lets us do more ambitious things, right?

lets us do more ambitious things, right? Uh the impossible becomes possible. Uh

Uh the impossible becomes possible. Uh so that's one end of the spectrum. On

so that's one end of the spectrum. On the other end of the spectrum, you know,

the other end of the spectrum, you know, the uh the tedious and small tasks

the uh the tedious and small tasks become free. One of the things I uh the

become free. One of the things I uh the uh interview of the cloud code team that

uh interview of the cloud code team that I just loved was uh I think it was

I just loved was uh I think it was Katherine she said um uh one of the

Katherine she said um uh one of the things we've noticed is that you know

things we've noticed is that you know when customer issues come up uh instead

when customer issues come up uh instead of putting them on a jur backlog and you

of putting them on a jur backlog and you know arguing about it in the grooming

know arguing about it in the grooming sessions and so forth right we just fix

sessions and so forth right we just fix it on the spot right and ship to

it on the spot right and ship to production or whatever um you know

production or whatever um you know within 30 minutes right and so yes it

within 30 minutes right and so yes it gets recorded but you know that whole

gets recorded but you know that whole sort of coordination cost you know just

sort of coordination cost you know just disappears right so again the impossible

disappears right so again the impossible becomes possible right and uh the

becomes possible right and uh the annoying things just become free. The

annoying things just become free. The second A is uh um you know the ability

second A is uh um you know the ability to do things alone or more autonomously,

to do things alone or more autonomously, right? And so um you know there's really

right? And so um you know there's really two coordination costs are being

two coordination costs are being alleviated here. One is you know if you

alleviated here. One is you know if you ever have to wait for a developer or a

ever have to wait for a developer or a team of developers, you know, to do what

team of developers, you know, to do what you need to do, right? You have to

you need to do, right? You have to communicate and coordinate and

communicate and coordinate and synchronize and prioritize and cajul and

synchronize and prioritize and cajul and escalate, you know, do all sorts of

escalate, you know, do all sorts of things to get them to care about the

things to get them to care about the problem just as much as you do, right?

problem just as much as you do, right? And you know now you know with these

And you know now you know with these amazing new miraculous technologies you

amazing new miraculous technologies you can do them by yourself right so that's

can do them by yourself right so that's one coordination co uh tax the other one

one coordination co uh tax the other one is that even if you get someone to uh

is that even if you get someone to uh care about a problem as much as you uh

care about a problem as much as you uh they can't read your mind right and what

they can't read your mind right and what we're finding is that these LLMs are

we're finding is that these LLMs are just amazing intermediation vehicles

just amazing intermediation vehicles right um you know just through an LLM

right um you know just through an LLM you can coordinate with other functional

you can coordinate with other functional specialties right through a markdown

specialties right through a markdown file right that's not the end right but

file right that's not the end right but it's just this amazing way uh to have

it's just this amazing way uh to have these high bandwidth coordination so

these high bandwidth coordination so that you can essentially read each

that you can essentially read each other's minds, you know, because shared

other's minds, you know, because shared outcomes require shared goals and shared

outcomes require shared goals and shared understanding. The second F is fun,

understanding. The second F is fun, right? As that Steve says, vibe coding

right? As that Steve says, vibe coding is addictive. This is so true. I mean, I

is addictive. This is so true. I mean, I cannot I think what I love about the

cannot I think what I love about the book is that it's a story about two guys

book is that it's a story about two guys who both thought their best days of

who both thought their best days of coding were behind them, right? And

coding were behind them, right? And found that, you know, it's entirely the

found that, you know, it's entirely the opposite. Um, and I've had so much fun

opposite. Um, and I've had so much fun and uh, you know, I'm having to force

and uh, you know, I'm having to force myself to go to sleep at night because

myself to go to sleep at night because otherwise I'd be up till 2 or 3 in the

otherwise I'd be up till 2 or 3 in the morning every night. uh and you know so

morning every night. uh and you know so it's not all great but it certainly

it's not all great but it certainly beats being boring or tedious or you

beats being boring or tedious or you know horrible and then optionality you

know horrible and then optionality you know one of the things that uh I love

know one of the things that uh I love about Swiss is that he has a shared love

about Swiss is that he has a shared love of creating option value and he told us

of creating option value and he told us last night that option value is also

last night that option value is also important for poker players right

important for poker players right because you never want to paint yourself

because you never want to paint yourself in a corner so option value is um one of

in a corner so option value is um one of the biggest creators of economic value

the biggest creators of economic value right modularity the reason why it's so

right modularity the reason why it's so powerful is because it creates option

powerful is because it creates option value uh and so just the fact that you

value uh and so just the fact that you can have so many more swings of bat can

can have so many more swings of bat can do so many more parallel experiments,

do so many more parallel experiments, right? This is what v coding allows. So

right? This is what v coding allows. So this is gives us confidence that you

this is gives us confidence that you know this is not just uh this is a very

know this is not just uh this is a very powerful tool. Um uh here's the quote

powerful tool. Um uh here's the quote from Andy Glover that uh Steve Yaggi

from Andy Glover that uh Steve Yaggi said is that you know as um for people

said is that you know as um for people who have this aha moment and and

who have this aha moment and and positioned uh you know I think the

positioned uh you know I think the instinct is how do we elevate everyone's

instinct is how do we elevate everyone's productivity to be as productive as you

productivity to be as productive as you are now being um you know that since

are now being um you know that since you've had your aha moment. So uh let me

you've had your aha moment. So uh let me share with you maybe some of our top

share with you maybe some of our top kind of uh exciting case studies that

kind of uh exciting case studies that kind of give us a hint of the future. So

kind of give us a hint of the future. So uh I've run into this conference called

uh I've run into this conference called the enterprise technology leadership

the enterprise technology leadership summit for uh 11 years now and Swix we

summit for uh 11 years now and Swix we had uh the honor of having Swix there

had uh the honor of having Swix there talking about the rise of the AI

talking about the rise of the AI engineer just this amazing

engineer just this amazing prognostication. Uh this year we had a

prognostication. Uh this year we had a series of amazing uh case studies. One

series of amazing uh case studies. One was uh Bruno Pasos. He spoke this year

was uh Bruno Pasos. He spoke this year uh last year at this conference and he

uh last year at this conference and he presented on uh their in their evolving

presented on uh their in their evolving experiment to elevate developer

experiment to elevate developer productivity across 3,000 developers. Um

productivity across 3,000 developers. Um and this is at Booking.com, the world's

and this is at Booking.com, the world's largest travel agency and they're

largest travel agency and they're finding that they're getting double-

finding that they're getting double- digit increase in productivity, right?

digit increase in productivity, right? Uh mergers are going in quicker, peer

Uh mergers are going in quicker, peer review times are uh smaller and and so

review times are uh smaller and and so forth, right? And so that's just we feel

forth, right? And so that's just we feel like that's a incomplete view of uh what

like that's a incomplete view of uh what people are achieving. Uh this is Shri

people are achieving. Uh this is Shri Balakrishnan. uh he was head of product

Balakrishnan. uh he was head of product and technology at uh Travelopia. Uh so

and technology at uh Travelopia. Uh so they're a $ 1.5 billion a year uh travel

they're a $ 1.5 billion a year uh travel company and one of the things that uh he

company and one of the things that uh he said is that uh you know they were able

said is that uh you know they were able to uh replace a legacy application uh in

to uh replace a legacy application uh in six weeks with a pair of uh with a very

six weeks with a pair of uh with a very small team. In fact, one of his uh

small team. In fact, one of his uh conclusions is that before we would need

conclusions is that before we would need a team of eight people to do something

a team of eight people to do something meaningful, right? Six developers, a UX

meaningful, right? Six developers, a UX person and a product owner. and he said

person and a product owner. and he said maybe these days it might be two a

maybe these days it might be two a developer and you know a a domain expert

developer and you know a a domain expert in other words as Kent Beck said a

in other words as Kent Beck said a person with a problem and a person who

person with a problem and a person who can solve it right maybe maybe a pair of

can solve it right maybe maybe a pair of those teams right and so that's going to

those teams right and so that's going to reshape I think you know how they can go

reshape I think you know how they can go further and faster uh so again maybe

further and faster uh so again maybe just a hint of what teams will look like

just a hint of what teams will look like in the future this is the one that

in the future this is the one that excites me most this is Dr. top pal uh

excites me most this is Dr. top pal uh he helped drive the DevOps move in at

he helped drive the DevOps move in at Capital One um and he's now at uh

Capital One um and he's now at uh Fidelity and so um among other things he

Fidelity and so um among other things he owns an application uh that is the

owns an application uh that is the application you go to ask which

application you go to ask which applications you know the 25,000

applications you know the 25,000 applications there have log 4J right and

applications there have log 4J right and uh it's his team and he's had this

uh it's his team and he's had this vision of what this application should

vision of what this application should look like uh but every time he asked

look like uh but every time he asked like can can we build it his team would

like can can we build it his team would say it would take about five months

say it would take about five months right and we'd hire need to hire a a

right and we'd hire need to hire a a front-end person and he got so

front-end person and he got so frustrated that he spent five days just

frustrated that he spent five days just vibe coding it by himself right uh you

vibe coding it by himself right uh you know directly accessing read only the

know directly accessing read only the Neo4j uh database um and put it into

Neo4j uh database um and put it into production right and so I think we're

production right and so I think we're seeing a world where um you know leaders

seeing a world where um you know leaders even leaders with their own teams are

even leaders with their own teams are frustrated saying hey I can do this uh

frustrated saying hey I can do this uh can I do this better myself not better

can I do this better myself not better just can I prove that it can be done and

just can I prove that it can be done and uh by the way what happened afterwards

uh by the way what happened afterwards um he was looking around who can help me

um he was looking around who can help me maintain my application production and

maintain my application production and all the senior engineers like not

all the senior engineers like not So enter uh Swathy the most junior

So enter uh Swathy the most junior engineer on the team uh who is helping

engineer on the team uh who is helping maintain this application and probably

maintain this application and probably outarning you know everybody in the

outarning you know everybody in the organization

organization uh and interestingly uh he he's also

uh and interestingly uh he he's also getting more headcount because the

getting more headcount because the number of consumers of this application

number of consumers of this application just increased by 10fold right so who

just increased by 10fold right so who saw that coming right um so uh here's

saw that coming right um so uh here's John Rouser he's senior director of

John Rouser he's senior director of engineering at Cisco security and he

engineering at Cisco security and he convinces SVP to um require 100 of the

convinces SVP to um require 100 of the top leaders inside of Cisco security to

top leaders inside of Cisco security to vibe code one feature into production in

vibe code one feature into production in a quarter that ended last month, right?

a quarter that ended last month, right? And so um you know we're actually

And so um you know we're actually getting a chance to be able to survey

getting a chance to be able to survey those people, right? Who finished? Uh

those people, right? Who finished? Uh you know uh how many completed, didn't

you know uh how many completed, didn't complete, partially completed, etc. And

complete, partially completed, etc. And of those who completed, right, what was

of those who completed, right, what was what aha moment did they have as a

what aha moment did they have as a leader? What's the magnitude and

leader? What's the magnitude and direction of what they want to do? And

direction of what they want to do? And so we're going to go in and study that.

so we're going to go in and study that. And I just I my prediction is that we're

And I just I my prediction is that we're going to see parts of that organization

going to see parts of that organization get reshaped as leaders realize kind of

get reshaped as leaders realize kind of what's possible. Everything from

what's possible. Everything from strategy to processes and so forth. And

strategy to processes and so forth. And so let me just share with you one um you

so let me just share with you one um you know thing that really excites me which

know thing that really excites me which is uh I got a chance to uh get back into

is uh I got a chance to uh get back into the state of DevOps research the Dora

the state of DevOps research the Dora study with uh u the Google cloud team

study with uh u the Google cloud team and one of the things that didn't make

and one of the things that didn't make into the report that I just found really

into the report that I just found really exciting was around this. It was like

exciting was around this. It was like what how much do people trust AI? And

what how much do people trust AI? And we're using a very strange definition of

we're using a very strange definition of trust, which is to what degree can I

trust, which is to what degree can I predict how the other party will act and

predict how the other party will act and react, right? Because the more you trust

react, right? Because the more you trust the other party, right, you can give

the other party, right, you can give them bigger requests, you can use fewer

them bigger requests, you can use fewer words, you have less need for feedback,

words, you have less need for feedback, right? It's the whole notion of finger

right? It's the whole notion of finger spits and fuel, right? Like you know,

spits and fuel, right? Like you know, how many of the 10,000 hours that

how many of the 10,000 hours that requires to be good at anything have you

requires to be good at anything have you used to get good at AI? And one of the

used to get good at AI? And one of the stunning findings was that it's this

stunning findings was that it's this line. So on the x-axis is how long have

line. So on the x-axis is how long have you been using AI tools? Y is how much

you been using AI tools? Y is how much do you trust it? Right? And the longer

do you trust it? Right? And the longer you use AI, right, the more you trust

you use AI, right, the more you trust it, right? So every every person who

it, right? So every every person who says, "I tried it and it's terrible at

says, "I tried it and it's terrible at coding," right? On what basis did they

coding," right? On what basis did they make that conclusion after maybe using

make that conclusion after maybe using for an hour or two? And what this shows

for an hour or two? And what this shows us is that uh you know it requires

us is that uh you know it requires practice, right? And this is probably a

practice, right? And this is probably a teachable skill. Um so length of time on

teachable skill. Um so length of time on the x-axis is a very incomplete

the x-axis is a very incomplete expression, right? It's like frequency

expression, right? It's like frequency and intensity and how many hours, but

and intensity and how many hours, but it's there's signal there. So it just

it's there's signal there. So it just shows that uh you know part of your job

shows that uh you know part of your job is to help other people have the aha

is to help other people have the aha moment and then help them you practice

moment and then help them you practice right so they get very very good at it

right so they get very very good at it so they can use every one of these

so they can use every one of these amazing technologies to achieve their

amazing technologies to achieve their goals. So uh I'll leave you with one

goals. So uh I'll leave you with one last kind of vision. Stephen and I we

last kind of vision. Stephen and I we did a vibe coding workshop for leaders

did a vibe coding workshop for leaders um back six weeks ago and what was

um back six weeks ago and what was amazing to me was in the 3 hours we had

amazing to me was in the 3 hours we had a 100% completion rate. Everyone built

a 100% completion rate. Everyone built something, you know, they built a data

something, you know, they built a data visualization tool. In fact, uh one

visualization tool. In fact, uh one person uh built a an iOS app and another

person uh built a an iOS app and another person actually got it into the review

person actually got it into the review queue in the Apple iOS app store, right?

queue in the Apple iOS app store, right? Which is which is absolutely

Which is which is absolutely astonishing. Uh and here's a guy named

astonishing. Uh and here's a guy named Roger Safner. He said, "I used to be a C

Roger Safner. He said, "I used to be a C MVP way back in the day. I haven't coded

MVP way back in the day. I haven't coded in 15 years." Uh and he's showing off an

in 15 years." Uh and he's showing off an app that helped him automate the process

app that helped him automate the process of getting checked in to Southwest

of getting checked in to Southwest Airlines until the bot detection tools

Airlines until the bot detection tools cut him off. But look at look at the

cut him off. But look at look at the expression on his face. And so I think

expression on his face. And so I think uh what we're seeing is like what

uh what we're seeing is like what happens when support ships right and

happens when support ships right and support codes and ships when leaders

support codes and ships when leaders code and ship. And there's no doubt in

code and ship. And there's no doubt in my mind that this will reshape uh

my mind that this will reshape uh technology organizations. If you're one

technology organizations. If you're one of those, Stephen and I want to talk to

of those, Stephen and I want to talk to you, right? Because you are on the

you, right? Because you are on the frontier of something really, really

frontier of something really, really important. I'll share with you a couple

important. I'll share with you a couple quotes. Here's a technology leader. When

quotes. Here's a technology leader. When I told my team that I wrote an app that,

I told my team that I wrote an app that, you know, an AI wrote 60,000 lines of

you know, an AI wrote 60,000 lines of code and I haven't looked at any of it,

code and I haven't looked at any of it, they all looked at me as if they wished

they all looked at me as if they wished I were dead.

I were dead. Um, we've uh we've had these stupid

Um, we've uh we've had these stupid problems in legacy applications that

problems in legacy applications that have been there for over a decade. We

have been there for over a decade. We got a group of senior engineers

got a group of senior engineers together. We used AI to generate a fix

together. We used AI to generate a fix and we submitted PR and the team

and we submitted PR and the team accepted it. Right? Unlike the time when

accepted it. Right? Unlike the time when they said it was AI generated and they

they said it was AI generated and they rejected it as AI slop, right? So this

rejected it as AI slop, right? So this is maybe happening in your

is maybe happening in your organizations. Um, our code velocity is

organizations. Um, our code velocity is so high. Uh, we've concluded that we can

so high. Uh, we've concluded that we can only have one engineer per repo, right?

only have one engineer per repo, right? Because of merge conflicts, right? We

Because of merge conflicts, right? We haven't figured out the coordination

haven't figured out the coordination cost uh mechanism yet. And so like all

cost uh mechanism yet. And so like all these were some of the lessons that went

these were some of the lessons that went into the vibe coding book. Thank you for

into the vibe coding book. Thank you for everyone who were at the signing

everyone who were at the signing yesterday. And uh if you're interested

yesterday. And uh if you're interested in any of the talks we referenced in

in any of the talks we referenced in excerpts of our book in uh basically uh

excerpts of our book in uh basically uh all the links that uh are in this

all the links that uh are in this presentation, just send an email to real

presentation, just send an email to real gene cams.com

gene cams.com subjectline vibe and you'll get an

subjectline vibe and you'll get an automated response in a minute or two.

automated response in a minute or two. So with that, Steve and I thank you for

So with that, Steve and I thank you for your time and we were around all week.

your time and we were around all week. Thanks all. [applause]

[music] >> Ladies and gentlemen, please welcome

>> Ladies and gentlemen, please welcome back to the stage, Alex Lieberman.

back to the stage, Alex Lieberman. [music] Let's give it up again for

[music] Let's give it up again for Steven Jean and also the rest of the

Steven Jean and also the rest of the speakers from the morning session.

speakers from the morning session. Whether you are watching in person or on

Whether you are watching in person or on YouTube or on the AIE site, you've been

YouTube or on the AIE site, you've been breaking a mental sweat. So, we are

breaking a mental sweat. So, we are going to take a 30 minute break, get

going to take a 30 minute break, get some grub, get some coffee, recharge,

some grub, get some coffee, recharge, and we will see you back here at 11.

and we will see you back here at 11. Thanks everyone. Appreciate it.

Thanks everyone. Appreciate it. [applause]

Two flames lit the darkness, burning side by [music and singing] side. Both

side by [music and singing] side. Both sworn to creation. Both relentless in

sworn to creation. Both relentless in their stride. One walked through the

their stride. One walked through the mountains, [music] one soared across the

mountains, [music] one soared across the void. Both chasing the horizon of the

void. Both chasing the horizon of the worlds they would deploy. [music]

worlds they would deploy. [music] But the path is not a straight line. And

But the path is not a straight line. And the future is not flat. Some rules bend

the future is not flat. Some rules bend through [music] space time and some

through [music] space time and some break [singing] on impact. Effort is a

break [singing] on impact. Effort is a kingdom. [music]

kingdom. [music] Leverage is the key. One builds the

Leverage is the key. One builds the throne by hand. One shapes [music]

throne by hand. One shapes [music] reality.

There is a curvature of time. Not [music] a race, not a throne, but a

[music] a race, not a throne, but a shift in the dimension of how progress

shift in the dimension of how progress becomes known. [music] When the universe

becomes known. [music] When the universe is standing to the will inside the mind,

is standing to the will inside the mind, you don't win by moving faster. You win

you don't win by moving faster. You win by [music]

by [music] breaing

breaing time.

Holes of the past try to drag the present [music] down. Systems built on

present [music] down. Systems built on dust [singing] wearing yesterday as

dust [singing] wearing yesterday as [music] crown. Some are pulled beneath

[music] crown. Some are pulled beneath them, [singing] fighting gravity alone.

them, [singing] fighting gravity alone. Others learn to map the edges and escape

Others learn to map the edges and escape events horizons. Not all power [music]

events horizons. Not all power [music] is struggle. [singing] Not all mastery

is struggle. [singing] Not all mastery is pain. The ones who change direction.

is pain. The ones who change direction. Rewrite the laws of the game. You can

Rewrite the laws of the game. You can live your life in [music and singing]

live your life in [music and singing] labor or an impact that compounds. Every

labor or an impact that compounds. Every second can be linear or worth a thousand

second can be linear or worth a thousand rounds.

There is a curvature of time, [music] not a race, not a throne, but a shift in

not a race, not a throne, but a shift in the dimension of how [music] progress

the dimension of how [music] progress becomes known. When the universe is

becomes known. When the universe is bending to the will inside the mind, you

bending to the will inside the mind, you don't win by moving [music] faster. You

don't win by moving [music] faster. You win by rediding

win by rediding [music] time. [singing]

The future isn't [music] distant. It accelerates [singing] for those who

accelerates [singing] for those who wield the tools of power. Instead of

wield the tools of power. Instead of fighting with their goals, mastery is

fighting with their goals, mastery is leverage, [music] not a sentence carved

leverage, [music] not a sentence carved in stone. The horizon does not move

in stone. The horizon does not move [singing] unless you. [music]

There is a car of time [music] where the present multiplies where a lifetime

present multiplies where a lifetime holds a legacy that no clock [music] can

holds a legacy that no clock [music] can quantify. Not by force, not by fury, but

quantify. Not by force, not by fury, but by evolution inside. We become eternal

by evolution inside. We become eternal beings when [music] we synchronize

beings when [music] we synchronize with

footstep. their fade, but they never die. Shadows stretch across the sky.

A whisper grows into a [singing] roar. Do you [music] feel it? Do you want

Do you [music] feel it? Do you want more?

Every heartbeat a stone [music] in the street.

Ripples [music and singing] chasing an endless dream.

What we do in life [music] echoes in eternity.

eternity. Every spark ignit [music]

>> [music] >> Reach out to the [singing] empty air.

>> Reach out to the [singing] empty air. Trace the stars like they're waiting

Trace the stars like they're waiting there.

there. [music]

The clock ticks but the moment stays forever starts in a single [singing]

forever starts in a single [singing] prayer.

>> Every heartbeat [music] stone [singing] in the stream.

Ripples [music and singing] chasing an endless dream.

What we do in life echoes in eternity. There sparking lights a fire that will

There sparking lights a fire that will never see [music]

>> Shadows [music] crawl where the light won't stay.

crawl where the light won't stay. The echo whispers don't [music] look

The echo whispers don't [music] look away.

away. Heartbeat racing louder than my doubt.

Heartbeat racing louder than my doubt. Scream inside. I can't let

Scream inside. I can't let [music and singing] out. But I won't

[music and singing] out. But I won't fall. I won't drown in the storm all

fall. I won't drown in the storm all around.

Fear of the [music] mind. I won't let you in. It seems like a

I won't let you in. It seems like a ghost, [music] but I keep it within

ghost, [music] but I keep it within the mind.

the mind. I'm breaking [music] the chain.

>> [music] >> Cold winds how but they won't define me.

>> Cold winds how but they won't define me. The cracks in my soul let the light find

The cracks in my soul let the light find [music] me. Every step I take the ground

[music] me. Every step I take the ground fights back. But I'm the fire. I'm the

fights back. But I'm the fire. I'm the spark. I'm the attack. [music]

I won't freeze. I won't fade. Through the chaos I've remained.

[music] Fear is a killer. I won't let it win. It

Fear is a killer. I won't let it win. It creeps like a ghost, but I keep it

creeps like a ghost, but I keep it within.

within. Fear is a killer. I'm breaking [music]

Fear is a killer. I'm breaking [music] the chain. Don't for

>> [music] [singing]

[music] >> I hear the static in the [singing]

>> I hear the static in the [singing] night. It calls.

night. It calls. A whisper [music] rising,

A whisper [music] rising, breaking through the walls.

breaking through the walls. [music]

[music] Electric echoes in my veins. Stay home.

Electric echoes in my veins. Stay home. Chasing the shadows where [music] the

Chasing the shadows where [music] the wild ones run.

The air is still the weight is [music and singing] gone. Close your

[music and singing] gone. Close your eyes. The past is done.

Free your mind. Let it go. Let [music] it break the chains. Heat. Heat.

>> Waves come crash against the sky. Friends of a dream. [music]

Friends of a dream. [music] I see them inside.

Gravity is a [music] story. We don't need a weather thunder where the speed.

need a weather thunder where the speed. [music]

[music] The air is the weight is gone. Close

The air is the weight is gone. Close your eyes. The [singing] past is done.

[singing] Free your mind. [music] Let it go. Let

Free your mind. [music] Let it go. Let it break the chain. Heat. Heat.

Heat. [music] Heat.

Heat [music]

[music] they said the stars don't change

they said the stars don't change [singing] their course, but I've been

[singing] their course, but I've been running from their force. A mirror

running from their force. A mirror crack, but still it [music and singing]

crack, but still it [music and singing] shows. The fire is mine. It's mine to

shows. The fire is mine. It's mine to hold. I hear the echo and call [music]

hold. I hear the echo and call [music] my name,

but I'm not the shadow. [music] I'm not the same.

You are who you choose to be. The stars [music]

[music] are the history.

are the history. Every breath, every heartbeat.

Every breath, every heartbeat. [music]

[music] >> of thorns, a sky of glass. I've walked

>> of thorns, a sky of glass. I've walked through both. I've let them [singing]

through both. I've let them [singing] pass. The weight [music] is heavy, but

pass. The weight [music] is heavy, but I've grown. The voice I hear is now my

I've grown. The voice I hear is now my own. I see [music] the light

[music] change.

change. Heat. Heat. Heat.

>> I see the lines [music and singing] drawn in the sand. The map of chaos in

drawn in the sand. The map of chaos in mind.

Every step a [music] choice. Every beat of voice. The clock ticks

Every beat of voice. The clock ticks louder. But I stand. [music]

Close my eyes [singing] and feel it burn. Every [music] failure, every turn.

burn. Every [music] failure, every turn. It's fue for the fire inside. [music]

The air is heavy. It doesn't break. A thousand whispers in it wake.

Each breath [music] a climb. Each fall a sign, but I am more than I

Each fall a sign, but I am more than I can take. [music]

Close my eyes and feel it burn. Every failure, every turn is fueled for the

failure, every turn is fueled for the fire inside.

fire inside. [music]

>> No, no

[music] heat.

[music] >> The clock keeps ticking loud and clear.

I've been waiting for the light. [music]

[music] Holding breath through endless night.

[music] The air is shifting. Feel it break.

Feel it break. A single spark is all it [singing]

A single spark is all it [singing] takes.

takes. It starts today.

It starts today. It starts today.

It starts today. No more [music]

No more [music] running. No delay.

running. No delay. The world is spinning in my hands. It

The world is spinning in my hands. It [music] starts to

[music] starts to start today.

Every choice I made my own. [music] I see the dawn breaking through.

The air is shifting. [music] Feel it rain.

Feel it rain. A single spark is all in.

A single spark is all in. It starts to take it start.

[music] >> Heat up here.

Oh. [music]

>> [music] >> Fire in my chest is burning loud.

>> Fire in my chest is burning loud. Ashes fall, but I won't bow. [music]

Ashes fall, but I won't bow. [music] I've walked [singing] through the smoke.

I've walked [singing] through the smoke. I've tasted the scars. Each step I've

I've tasted the scars. Each step I've taken, lit up the stars. [music]

taken, lit up the stars. [music] Let it blaze, let it break. Feel the

Let it blaze, let it break. Feel the crack the ground sh

I'm forced in [singing] flame [music] I'm falling

I'm falling the pain they call me [music]

the pain they call me [music] deep

again from Heat. Heat. Heat.

from Heat. Heat. Heat. [music]

[music] >> The winds they how but I stand still.

>> The winds they how but I stand still. [music]

[music] The mountains crumble up my will.

The mountains crumble up my will. I'm not the same

I'm not the same I was before. A shadow of fear. I keep

let it blaze. [music and singing] Let it break. Feel the cracks. The ground will

break. Feel the cracks. The ground will shake.

I'm forged in flame. [music] Heat. Heat. Heat.

Heat. [music]

[music] Heat.

Shadows melt in the growing light. Time bends and twists. We feel it star.

A pulse [music] a spark [singing] an open heart.

open heart. Do you feel it? Feel it rise.

The weightless fire in the sky.

>> [music] >> has come.

>> has come. We're running to the sun.

electric in the trees that

in the trees that [music]

[music] stars collide, but we stay warm.

stars collide, but we stay warm. The past dissolves [music]

The past dissolves [music] like waves on storm.

like waves on storm. We stand together

The rush, [music] the fun, the everything.

the fun, the everything. A new age

A new age has come.

has come. We're running to the [music] sun. No

We're running to the [music] sun. No chains, no walls, just

chains, no walls, just with me. [music]

>> [music] >> Heat up

up [music] here.

>> up [music]

Heat [music]

[music] here.

[music] Heat.

Heat. >> [music]

Heat. [music] Heat.

[music] Heat. Heat.

>> Heat. Heat. [music]

[music] >> Heat. Heat.

>> Ladies and gentlemen, please welcome back to the stage Alex Lieberman.

Let's uh keep it going for the morning speakers. [music] Amazing job from

speakers. [music] Amazing job from everyone who spoke earlier. I asked

everyone who spoke earlier. I asked before who thought they came from the

before who thought they came from the furthest place on Earth to to watch this

furthest place on Earth to to watch this in person. And where's New Zealand

in person. And where's New Zealand again? I don't know. New Zealand. There

again? I don't know. New Zealand. There we go.

we go. >> From Bulgaria.

>> From Bulgaria. >> Bulgaria. Still, I think closer than New

>> Bulgaria. Still, I think closer than New Zealand, but still very far.

Zealand, but still very far. >> Australia via New Zealand.

>> Australia via New Zealand. >> Australia via New Zealand. We just got

>> Australia via New Zealand. We just got someone to one up New Zealand. I have

someone to one up New Zealand. I have another quick question since we just

another quick question since we just came back from a coffee break. Also, if

came back from a coffee break. Also, if you're watching live on YouTube, you can

you're watching live on YouTube, you can comment. Who thinks they're the most

comment. Who thinks they're the most caffeinated right now? Who thinks

caffeinated right now? Who thinks they're the most caffeinated in the

they're the most caffeinated in the room? How many cups of coffee? I'm four

room? How many cups of coffee? I'm four right now. Anyone beat four? Oh, we got

right now. Anyone beat four? Oh, we got four. We got a five, maybe. Wow,

four. We got a five, maybe. Wow, impressive. Well, we are back for an

impressive. Well, we are back for an incredible next block of sessions. We're

incredible next block of sessions. We're going to be covering everything from

going to be covering everything from future proofing uh coding agents to

future proofing uh coding agents to moving away from agile, how to quantify

moving away from agile, how to quantify AI ROI in software engineering, the

AI ROI in software engineering, the state of AI code quality, hype versus

state of AI code quality, hype versus reality, and Miniax M2. But I am so

reality, and Miniax M2. But I am so excited to kick off this next block of

excited to kick off this next block of talks with OpenAI. Please welcome to the

talks with OpenAI. Please welcome to the stage Bill Chen and Brian Fioa from the

stage Bill Chen and Brian Fioa from the Applied AI team at OpenAI. Let's hear it

Applied AI team at OpenAI. Let's hear it for them. [applause]

>> Hello everyone. Um, today we'll be talking about how to build coding

talking about how to build coding agents.

agents. And uh, I'm Bill. I work on the applied

And uh, I'm Bill. I work on the applied AI startups team at OpenAI.

AI startups team at OpenAI. >> And I'm Brian. I work with Bill on the

>> And I'm Brian. I work with Bill on the OpenAI startups team

OpenAI startups team >> and we specifically uh focus on uh

>> and we specifically uh focus on uh building coding agents here at OpenAI.

building coding agents here at OpenAI. Um yeah so why are we talk giving this

Um yeah so why are we talk giving this talk? Why why are we you know u talking

talk? Why why are we you know u talking about coding agents? Well it's really

about coding agents? Well it's really quite interesting because it's been

quite interesting because it's been booming for the the the past year

booming for the the the past year actually. It's just if you think about

actually. It's just if you think about it it's not that much time ago like only

it it's not that much time ago like only been a year or so. the ground keeps

been a year or so. the ground keeps shifting really under the uh harness on

shifting really under the uh harness on on the coding agents. But if you think

on the coding agents. But if you think about it, it's really like why it's

about it, it's really like why it's interesting is because it's really a

interesting is because it's really a signal on how close we are to AGI.

signal on how close we are to AGI. Software engineering can be set as a

Software engineering can be set as a universal medium for problem solving.

universal medium for problem solving. But because the ground is shifting so

But because the ground is shifting so fast, uh we h kept having to rebuild the

fast, uh we h kept having to rebuild the agent on top of the model whenever a

agent on top of the model whenever a model is released. And today we're going

model is released. And today we're going to talk a little bit about how we might

to talk a little bit about how we might be able to get around that.

be able to get around that. So, here's what we're going to go over

So, here's what we're going to go over today. We'll start with the anatomy of a

today. We'll start with the anatomy of a coding agent, especially going into the

coding agent, especially going into the details of models and harnesses and how

details of models and harnesses and how they work together. We'll share some

they work together. We'll share some lessons that we learned from putting

lessons that we learned from putting them together ourselves. And we're

them together ourselves. And we're specifically going to talk about codeex

specifically going to talk about codeex here, which is our own coding agent.

here, which is our own coding agent. We'll talk a little bit about emerging

We'll talk a little bit about emerging patterns that we're seeing from all of

patterns that we're seeing from all of you for using agents like Codeex in your

you for using agents like Codeex in your own products. And lastly, we'll talk a

own products. And lastly, we'll talk a little bit about what to expect from

little bit about what to expect from Codeex in the future so that you can

Codeex in the future so that you can build along with us if you want to.

To start, let's talk a little bit about what makes a coding agent an agent as a

what makes a coding agent an agent as a whole. Um, it really is quite simple. I

whole. Um, it really is quite simple. I think, you know, people kind of over

think, you know, people kind of over complicate things a little bit these

complicate things a little bit these days. It's made out of three parts. It's

days. It's made out of three parts. It's a user interface. It has a model. It's a

a user interface. It has a model. It's a harness, right? Uh the interface quite

harness, right? Uh the interface quite self-explanatory could be a computer uh

self-explanatory could be a computer uh like a CLI tool or it could be a uh

like a CLI tool or it could be a uh integrated developer environment could

integrated developer environment could be also cloud or background agent. Um,

be also cloud or background agent. Um, models also very quite self-explanatory

models also very quite self-explanatory are, you know, the things like the

are, you know, the things like the latest and greatest, the GPD 5.1 codeex

latest and greatest, the GPD 5.1 codeex uh max that we just released yesterday

uh max that we just released yesterday uh or the GPD 5.1 series of models or

uh or the GPD 5.1 series of models or other uh models from other providers as

other uh models from other providers as well. And the harness uh is a little bit

well. And the harness uh is a little bit more of an interesting part. This is the

more of an interesting part. This is the part that directly interacts with the

part that directly interacts with the model uh in the most reductive way. You

model uh in the most reductive way. You can sort of think of it as a collection

can sort of think of it as a collection of prompts and tools combined in a core

of prompts and tools combined in a core agent loop which provides input and

agent loop which provides input and outputs uh from a model. Uh the last

outputs uh from a model. Uh the last part will be our focus for today.

As touched on a bit earlier, coding is one of the most active frontiers in

one of the most active frontiers in applied AI and uh how models are

applied AI and uh how models are constantly getting released and we're

constantly getting released and we're not making the problem uh easier for

not making the problem uh easier for everybody

everybody is that people have to constantly adapt

is that people have to constantly adapt uh the agents to the new models.

So, um, Bill's done a great job of giving us an overview of coding agents,

giving us an overview of coding agents, what they're made up of. So, let's zoom

what they're made up of. So, let's zoom in a little bit on the harness. Um,

in a little bit on the harness. Um, turns out that's a little bit tricky.

turns out that's a little bit tricky. So, what is a harness? A harness is

So, what is a harness? A harness is really the interface layer to the model.

really the interface layer to the model. It's the surface area the model uses to

It's the surface area the model uses to talk to users and the code and perform

talk to users and the code and perform actions with tools. It's made up of all

actions with tools. It's made up of all of the pieces that the model needs to

of the pieces that the model needs to work over many turns, call tools, and

work over many turns, call tools, and and really write code for you and

and really write code for you and interpret what the user is actually

interpret what the user is actually asking. [snorts] Um, for some, the

asking. [snorts] Um, for some, the harness might actually be the special

harness might actually be the special sauce of the product. But as we're going

sauce of the product. But as we're going to go into a little bit more, it's

to go into a little bit more, it's really challenging work to build a good

really challenging work to build a good harness. And we'll talk about how we did

harness. And we'll talk about how we did that.

that. So let's see what are some of these

So let's see what are some of these challenges. Um just to name a few, AV is

challenges. Um just to name a few, AV is one. Um your [laughter]

one. Um your [laughter] um your brand new innovative custom tool

um your brand new innovative custom tool that you're giving to your agent might

that you're giving to your agent might not actually be something the model is

not actually be something the model is using is used to using. It may not have

using is used to using. It may not have ever seen that tool before in trading.

ever seen that tool before in trading. And even if it is, you need to spend

And even if it is, you need to spend time tuning your prompt to that

time tuning your prompt to that particular model and the habits that it

particular model and the habits that it comes with.

comes with. And new models are coming out all the

And new models are coming out all the time. What about latency? Like does the

time. What about latency? Like does the model take a while to think about

model take a while to think about certain things? Which things do you

certain things? Which things do you prompt it not to? How do you expose the

prompt it not to? How do you expose the UX of what a thinking model is doing

UX of what a thinking model is doing while it's thinking? Is it communicating

while it's thinking? Is it communicating with you while it's thinking or do you

with you while it's thinking or do you have to summarize it? Managing the

have to summarize it? Managing the context window and compaction can be

context window and compaction can be really challenging. We just launched

really challenging. We just launched Codeex Max that does that out of the box

Codeex Max that does that out of the box for you. you don't have to worry about

for you. you don't have to worry about compaction and context window

compaction and context window management. It's really hard to do. Um,

management. It's really hard to do. Um, and so if you were to do it yourself,

and so if you were to do it yourself, have fun. Um, and then also like the

have fun. Um, and then also like the APIs keep changing, right? So we have

APIs keep changing, right? So we have completions, we have responses, we have

completions, we have responses, we have whatever else is coming in the future.

whatever else is coming in the future. What does the model know how to use and

What does the model know how to use and get to get the most intelligence out of

get to get the most intelligence out of the box?

the box? And so

And so this is the interesting part. Fitting a

this is the interesting part. Fitting a model into a harness takes a lot of

model into a harness takes a lot of prompting.

prompting. It turns out that how the model is

It turns out that how the model is trained has side effects.

trained has side effects. I like to think about it this way.

I like to think about it this way. Intelligence plus habit. Intelligence.

Intelligence plus habit. Intelligence. What is the model good at? What

What is the model good at? What languages does it know really well? What

languages does it know really well? What is what is its capabilities in terms of

is what is its capabilities in terms of like how well it can write code in

like how well it can write code in certain frameworks? And then what habits

certain frameworks? And then what habits did it learn to to use to solve those

did it learn to to use to solve those problems? We've trained our models to

problems? We've trained our models to have habits of like planning a solution,

have habits of like planning a solution, looking around, gathering context, and

looking around, gathering context, and and thinking about a problem before

and thinking about a problem before diving in and writing code, and then

diving in and writing code, and then testing its work at the end.

testing its work at the end. Developing a feel for these habits is

Developing a feel for these habits is how you become a good prompt engineer.

how you become a good prompt engineer. If you don't instruct the model in ways

If you don't instruct the model in ways that it's familiar with, you can have

that it's familiar with, you can have problems. We saw this when we launched

problems. We saw this when we launched GPD5. A lot of people who weren't used

GPD5. A lot of people who weren't used to using our models encoding tried to

to using our models encoding tried to take prompts that existed for other

take prompts that existed for other models and put them into their harness

models and put them into their harness and have GPD5 follow those instructions.

and have GPD5 follow those instructions. And it turned out that we taught our

And it turned out that we taught our model to do some of the things that the

model to do some of the things that the other models didn't really do out of the

other models didn't really do out of the box. And so when they were prompting

box. And so when they were prompting them to look really hard at the context

them to look really hard at the context and like examine every single file

and like examine every single file before making a a code edit, our model

before making a a code edit, our model was being very kind of thorough about

was being very kind of thorough about that and it was taking a really long

that and it was taking a really long time and they weren't seeing the best

time and they weren't seeing the best performance. And so we figured out that

performance. And so we figured out that if you let the model just do the

if you let the model just do the behaviors that it's used to and don't

behaviors that it's used to and don't overprompt it, it'll actually perform

overprompt it, it'll actually perform really better. We found out by asking. I

really better. We found out by asking. I was literally like, "Hey, like I like

was literally like, "Hey, like I like the solution, but it took you a long

the solution, but it took you a long time to get there. What can I do

time to get there. What can I do differently in your instructions to help

differently in your instructions to help you get there faster next time?" And

you get there faster next time?" And literally it said, "Uh, you're telling

literally it said, "Uh, you're telling me to go look at everything and I don't

me to go look at everything and I don't really need to. So that's what's taking

really need to. So that's what's taking forever."

And so you can actually see the advantages of building both the model

advantages of building both the model and the harness together because you

and the harness together because you just like know all of that while you're

just like know all of that while you're building it. And that's why Codex is

building it. And that's why Codex is both a model and a harness combined.

both a model and a harness combined. So let's dig deeper into Codeex and what

So let's dig deeper into Codeex and what it can actually do.

it can actually do. So we built Codex to be an agent for

So we built Codex to be an agent for everywhere that you code. It's a VS Code

everywhere that you code. It's a VS Code plugin. It's a CLI. You can call it in

plugin. It's a CLI. You can call it in the cloud from the VS Code plugin or

the cloud from the VS Code plugin or from chatgbt from your phone. Um, and

from chatgbt from your phone. Um, and it's very basic. You can use it to turn

it's very basic. You can use it to turn your specs into runnable code starting

your specs into runnable code starting from a prompt. Um, having a plan. It

from a prompt. Um, having a plan. It navigates your repo to edit files. It

navigates your repo to edit files. It runs commands, executes tasks, and you

runs commands, executes tasks, and you can call it from Slack or you can have

can call it from Slack or you can have it review PRs and GitHub. So, all of the

it review PRs and GitHub. So, all of the things that you would expect.

things that you would expect. And that means that the that codec um

And that means that the that codec um the harness of codec needs to be able to

the harness of codec needs to be able to do a lot of really complex things. Uh

do a lot of really complex things. Uh when I talked to a member of the codeex

when I talked to a member of the codeex team about this slide and what should be

team about this slide and what should be on it, he was like it's way harder than

on it, he was like it's way harder than you think. [laughter]

you think. [laughter] You have to manage parallel tool calls

You have to manage parallel tool calls like thread merging and all of the

like thread merging and all of the things involved in that. Think about all

things involved in that. Think about all the security considerations you have

the security considerations you have with sandboxing, prompt forwarding,

with sandboxing, prompt forwarding, permissions, uh, port management. Um,

permissions, uh, port management. Um, compaction is a whole thing. Um, and

compaction is a whole thing. Um, and doing that well is really complex. When

doing that well is really complex. When do you trigger compaction? When do you

do you trigger compaction? When do you reingject? How do you worry about uh

reingject? How do you worry about uh cache optimization during that MCP,

cache optimization during that MCP, right? Like all of the uh plumbing you

right? Like all of the uh plumbing you have to build for MCP support into the

have to build for MCP support into the harness. Uh, and then not even

harness. Uh, and then not even mentioning images and what's the

mentioning images and what's the resolution that you need to compress

resolution that you need to compress them to to send them to the model. All

them to to send them to the model. All this all of this is like work that you

this all of this is like work that you have to do if you're going to build this

have to do if you're going to build this from scratch and keep it updated as new

from scratch and keep it updated as new features come online.

So since we've bundled all of these features together for you in an agent

features together for you in an agent that can safely write its own tools to

that can safely write its own tools to solve new problems that it encounters.

solve new problems that it encounters. Oops.

Oops. Uh we actually have here uh a computer

Uh we actually have here uh a computer use agent for the terminal.

Wow, that sounds quite a bit powerful than just plain old coding agent,

than just plain old coding agent, doesn't it? Um but just think about it

doesn't it? Um but just think about it again. Well, before browser and graphic

again. Well, before browser and graphic user interface was a thing, wasn't that

user interface was a thing, wasn't that how we always operate a computer?

how we always operate a computer? they're writing code and chaining them

they're writing code and chaining them together in a command line interface. So

together in a command line interface. So that means if you can express your tasks

that means if you can express your tasks in command line as well as files tasks

in command line as well as files tasks codeex will be able to know what to do.

codeex will be able to know what to do. Um the example is I like to use codeex

Um the example is I like to use codeex to organize a lot of the photos from my

to organize a lot of the photos from my desktop into a folder and that's a very

desktop into a folder and that's a very simple use case but what it can also do

simple use case but what it can also do is it can analyze huge amounts of CSV

is it can analyze huge amounts of CSV files inside of a folder uh doing data

files inside of a folder uh doing data analysis it does not have to be a coding

analysis it does not have to be a coding task and if it can be accomplished by

task and if it can be accomplished by running tools from command line you can

running tools from command line you can use codeex

use codeex so now that we see codeex is such a cool

so now that we see codeex is such a cool harness um I want to also share a little

harness um I want to also share a little a bit about how you can use it to build

a bit about how you can use it to build your own agents. And what you can do is

your own agents. And what you can do is you can use codeex the agent inside of

you can use codeex the agent inside of your own agent.

your own agent. Um, how does that work? Well, if you

Um, how does that work? Well, if you want to build uh a coding uh the next

want to build uh a coding uh the next coding startup, we don't really have all

coding startup, we don't really have all the answers, but we do have a few

the answers, but we do have a few patterns uh that we thought uh might

patterns uh that we thought uh might help you having worked with some of the

help you having worked with some of the top coding customers uh like cursor and

top coding customers uh like cursor and VS code. Uh one of those patterns is uh

VS code. Uh one of those patterns is uh harness becoming the new abstraction

harness becoming the new abstraction layer. The benefits of this is quite

layer. The benefits of this is quite obvious. Um, you no longer have to care

obvious. Um, you no longer have to care about prioritize uh optimizing the

about prioritize uh optimizing the prompt and tools with every model

prompt and tools with every model upgrade.

upgrade. >> But, um, does that mean you're just

>> But, um, does that mean you're just building a wrapper?

building a wrapper? >> Well, I disagree with that take.

>> Well, I disagree with that take. [snorts] I disagree. I was disagreeing

[snorts] I disagree. I was disagreeing with my colleague here. Um, just like

with my colleague here. Um, just like how building rappers on top of models I

how building rappers on top of models I think is really reductive on uh on the

think is really reductive on uh on the whole value prop of the infrastructure

whole value prop of the infrastructure layer. Sorry, I used to be a VC.

layer. Sorry, I used to be a VC. [laughter]

[laughter] >> Focusing most of your efforts on

>> Focusing most of your efforts on differentiating your product is what

differentiating your product is what this pattern allows you to do. And

this pattern allows you to do. And that's where most of the value lies.

Exactly. Okay. So, let's look at some of these patterns that we've seen and

these patterns that we've seen and actually have helped our customers build

actually have helped our customers build um along with them. Codeex is an SDK. It

um along with them. Codeex is an SDK. It can be called through a TypeScript

can be called through a TypeScript library. You can call it

library. You can call it programmatically in a Python exec.

programmatically in a Python exec. There's a GitHub action that you can

There's a GitHub action that you can plug into to have it merge merge

plug into to have it merge merge conflicts on PRs that everybody hates

conflicts on PRs that everybody hates doing. Then uh you can also add it to

doing. Then uh you can also add it to the agents SDK and give it MCP

the agents SDK and give it MCP connectors back to your product. So now

connectors back to your product. So now you have an agent. I like to say we

you have an agent. I like to say we started with chat bots that you can talk

started with chat bots that you can talk to. Then we gave the chatbots tools to

to. Then we gave the chatbots tools to use and then now you can give uh a tool

use and then now you can give uh a tool to your chatbot that can make other

to your chatbot that can make other tools that it doesn't have. And so now

tools that it doesn't have. And so now you can actually build out enterprise

you can actually build out enterprise software that does it that writes its

software that does it that writes its own plug-in connectors to the API level

own plug-in connectors to the API level for each customer on the spot. That's

for each customer on the spot. That's something that a professional services

something that a professional services team used to have to do. Um, so you have

team used to have to do. Um, so you have fully customizable software that can now

fully customizable software that can now talk back to itself. Um, I made a conbon

talk back to itself. Um, I made a conbon board for Devday that can actually fix

board for Devday that can actually fix its own bugs. Um, it's pretty fun. And

its own bugs. Um, it's pretty fun. And then lastly, um, you can actually do

then lastly, um, you can actually do something like what Zed has done. They

something like what Zed has done. They have just decided to wrap codeex inside

have just decided to wrap codeex inside of a layer and give it an interface to

of a layer and give it an interface to the IDE for talking back and forth for

the IDE for talking back and forth for the user and making code edits. And now

the user and making code edits. And now they don't actually have to do all the

they don't actually have to do all the work of staying on top of all of the

work of staying on top of all of the things that we're good at doing and they

things that we're good at doing and they can focus on building like the best code

can focus on building like the best code editor.

Uh so our top coding partners like GitHub has used this uh to great effect

GitHub has used this uh to great effect and well uh we've created an SDK for it

and well uh we've created an SDK for it that they used to directly integrate uh

that they used to directly integrate uh with codeex. You can also use the SDK to

with codeex. You can also use the SDK to uh control codecs as part of your CI/CD

uh control codecs as part of your CI/CD pipeline as well as use it as an agent

pipeline as well as use it as an agent that directly interacts with your own

that directly interacts with your own agent as well. Uh if you really want to

agent as well. Uh if you really want to customize the agent layer, you can do it

customize the agent layer, you can do it too. As an example of this, we worked

too. As an example of this, we worked with closely with the cursor team to get

with closely with the cursor team to get the best performance out of the codecs.

the best performance out of the codecs. The model, not the agent, we're bad at

The model, not the agent, we're bad at naming things. The model is different

naming things. The model is different from the agent. They did so by aligning

from the agent. They did so by aligning their tools to be in distribution with

their tools to be in distribution with how the model is trained and they did so

how the model is trained and they did so by aligning uh their harness with our

by aligning uh their harness with our open- source uh implementation of codeex

open- source uh implementation of codeex CLI. All of this is publicly available.

CLI. All of this is publicly available. Uh you can fork the repo, you can use

Uh you can fork the repo, you can use our source code, you can use it. Uh go

our source code, you can use it. Uh go nuts.

So what does the future hold for Codeex? It hasn't even been out for a year. Um

It hasn't even been out for a year. Um and especially with the lo la la la la

la la la la la la la la la la la la la la la la la la la la la la la la la la

la la la la la la la la la la la la la la la la la la la la la la launch of

la la la la la la la la la launch of codeex match yesterday like things are

codeex match yesterday like things are really changing fast. Uh it's the

really changing fast. Uh it's the fastest growing model in usage now

fastest growing model in usage now serving dozens of trillions of tokens

serving dozens of trillions of tokens per week which has actually doubled

per week which has actually doubled since dev day.

since dev day. It's always good to build where the

It's always good to build where the models are going. It's safe to assume

models are going. It's safe to assume that the models will get better. They'll

that the models will get better. They'll be able to get to work on much longer

be able to get to work on much longer horizon tasks unsupervised.

horizon tasks unsupervised. New models will raise the trust ceiling.

New models will raise the trust ceiling. I trust these models now to do some way

I trust these models now to do some way harder work than I would have six months

harder work than I would have six months ago. And that's going to keep

ago. And that's going to keep increasing. The future is about

increasing. The future is about sprawling code bases and non-standard

sprawling code bases and non-standard libraries and knowing how to work in

libraries and knowing how to work in closed source environments, matching

closed source environments, matching existing templates and practices

existing templates and practices and the models uh and and and so you can

and the models uh and and and so you can imagine that the SDK will evolve to

imagine that the SDK will evolve to better support these model capabilities,

better support these model capabilities, letting the model learn as it goes and

letting the model learn as it goes and not repeat mistakes and generally

not repeat mistakes and generally provide more surface area for an agent

provide more surface area for an agent that writes code and uses a terminal to

that writes code and uses a terminal to solve whatever problems it encounters.

solve whatever problems it encounters. counters and you can use that in your

counters and you can use that in your products via the SDK.

products via the SDK. So, what have we learned? Harnesses are

So, what have we learned? Harnesses are really complicated and take a lot of

really complicated and take a lot of work to maintain, especially with all

work to maintain, especially with all the new models coming out. So, we've

the new models coming out. So, we've built one for you inside of Codeex that

built one for you inside of Codeex that you can use off the shelf or look at the

you can use off the shelf or look at the source if you want to and you can use it

source if you want to and you can use it to build new things outside of coding

to build new things outside of coding and let us do all of the work making

and let us do all of the work making sure that you have the most capable

sure that you have the most capable computer agent.

computer agent. And we're really excited to see what you

And we're really excited to see what you craft.

Our [applause] [music]

next presenters believe that most enterprises are failing to unlock real

enterprises are failing to unlock real value from AI because the systems in

value from AI because the systems in which they operate are [music] stuck in

which they operate are [music] stuck in the past. Here to share how agents are

the past. Here to share how agents are reshaping software delivery are McKenzie

reshaping software delivery are McKenzie partners Martin Harrison and Natasha

partners Martin Harrison and Natasha Mania.

>> All right, good morning. Hello everyone. It's really great to be here. Uh so I'm

It's really great to be here. Uh so I'm Martin and I'm here with my colleague

Martin and I'm here with my colleague Natasha. Uh we're from a part of

Natasha. Uh we're from a part of Mckenzie you may may not be as familiar

Mckenzie you may may not be as familiar with. We have a practice called software

with. We have a practice called software X and we work with uh mostly enterprise

X and we work with uh mostly enterprise clients on how to build better software

clients on how to build better software products which has messed mostly using

products which has messed mostly using AI uh in the in the past couple of

AI uh in the in the past couple of years.

years. Uh and so what our talk is about today

Uh and so what our talk is about today is really more focused on the people and

is really more focused on the people and the operating model aspects of

the operating model aspects of leveraging AI for software development

leveraging AI for software development and and that we believe that that has

and and that we believe that that has changed quite significantly and and

changed quite significantly and and that's what we're excited to talk to you

that's what we're excited to talk to you about.

about. If I take a quick step back uh in in

If I take a quick step back uh in in time and we just uh you know think

time and we just uh you know think through some of these the major

through some of these the major technology breakthroughs that we've seen

technology breakthroughs that we've seen in the last few decades uh they tend to

in the last few decades uh they tend to always come with a paradigm shift in

always come with a paradigm shift in also how we develop software and so I

also how we develop software and so I still recall uh almost 20 years ago now

still recall uh almost 20 years ago now I started working as a software engineer

I started working as a software engineer an entry- level developer um in a tech

an entry- level developer um in a tech company and the company I was working

company and the company I was working for was just switching to to agile we

for was just switching to to agile we were using camb boards we were doing uh

were using camb boards we were doing uh standups and and other ceremonies. This

standups and and other ceremonies. This was a big change. It was a massive

was a big change. It was a massive change for the for the company. And now

change for the for the company. And now with everything that is happening

with everything that is happening happening in AI, we're at the precipice

happening in AI, we're at the precipice of another such paradigm shift.

of another such paradigm shift. And

And um

um if we think about some of the um some of

if we think about some of the um some of the things that are happening um with AI

the things that are happening um with AI and software development that we've seen

and software development that we've seen at this um at this conference, there's

at this um at this conference, there's no doubt that this is a new paradigm

no doubt that this is a new paradigm that is about us. And so we'll talk

that is about us. And so we'll talk about two things. Uh we'll first touch a

about two things. Uh we'll first touch a little bit about how do you go from

little bit about how do you go from these things that we're seeing at

these things that we're seeing at individual productivity to scaling that

individual productivity to scaling that to the whole team and what that what

to the whole team and what that what type of changes we think that implies

type of changes we think that implies and then we'll talk a little bit uh

and then we'll talk a little bit uh about how do you scale that across uh a

about how do you scale that across uh a whole organization and to really get get

whole organization and to really get get value

um if if you sort of I I'm talking to an

if if you sort of I I'm talking to an audience here which is using AI agents

audience here which is using AI agents all the time and I if I if I asked you

all the time and I if I if I asked you about some examples. I'm sure you could

about some examples. I'm sure you could rattle off, you know, 10 different ones

rattle off, you know, 10 different ones where you would say, "Look, there was

where you would say, "Look, there was this thing that I used to do. It it used

this thing that I used to do. It it used to take uh maybe even days and and and

to take uh maybe even days and and and hours that are now taking only minutes,

hours that are now taking only minutes, right? There's no shortage of those

right? There's no shortage of those those stories. And you can go over to

those stories. And you can go over to the expo and and talk to any of the

the expo and and talk to any of the companies there about all these all

companies there about all these all these great use cases. It really shows

these great use cases. It really shows that these tools work and they can be

that these tools work and they can be really impactful." And so yet despite

really impactful." And so yet despite seeing you know some of these uh

seeing you know some of these uh improvement uh improvements

improvement uh improvements uh we've done some research to gauge you

uh we've done some research to gauge you know where are our clients at the

know where are our clients at the moment. We we recently surveyed about

moment. We we recently surveyed about 300 uh companies uh mostly enterprises

300 uh companies uh mostly enterprises around what are they seeing in terms of

around what are they seeing in terms of productivity improvements. So you have

productivity improvements. So you have this and then they would say uh on

this and then they would say uh on average we're often seeing only 5 10 15%

average we're often seeing only 5 10 15% improvements overall as as a company. So

improvements overall as as a company. So we're in a place where there's a bit of

we're in a place where there's a bit of a disconnect between this this big

a disconnect between this this big potential uh around AI as uh from the

potential uh around AI as uh from the reality.

reality. And so we we think that um there is this

And so we we think that um there is this gap because as we've started

gap because as we've started implementing AI whether it's um you know

implementing AI whether it's um you know coding assistance or whether it's now

coding assistance or whether it's now using you know you just heard about uh

using you know you just heard about uh you know how open AI is using agents and

you know how open AI is using agents and more complex uh workflows. What has

more complex uh workflows. What has started to emerge is a is a set of

started to emerge is a is a set of bottlenecks uh that that were not

bottlenecks uh that that were not necessarily there before. Like for for

necessarily there before. Like for for example, as we now start moving much

example, as we now start moving much faster in certain in certain aspects of

faster in certain in certain aspects of the work, uh we haven't really changed

the work, uh we haven't really changed how we collaborate among people and and

how we collaborate among people and and team members. That's not quite keeping

team members. That's not quite keeping up.

up. We started generating way more more

We started generating way more more code, but we're it's still being

code, but we're it's still being reviewed in a in a pretty manual way in

reviewed in a in a pretty manual way in in many companies. Then we also have

in many companies. Then we also have this this theme which was recently

this this theme which was recently highlighted in in even a research report

highlighted in in even a research report from from Carnegie Melon uh about how

from from Carnegie Melon uh about how all the new code that is being generated

all the new code that is being generated is also amplifying uh the generation of

is also amplifying uh the generation of tech debt in some in some cases and

tech debt in some in some cases and actually generating complexity. And so

actually generating complexity. And so there are these bottlenecks. They're not

there are these bottlenecks. They're not impossible to overcome but this is what

impossible to overcome but this is what we believe is limiting uh many companies

we believe is limiting uh many companies from seeing the the the real value that

from seeing the the the real value that that they should be seeing.

that they should be seeing. Let me talk about maybe just a couple of

Let me talk about maybe just a couple of examples to to make that uh come to life

examples to to make that uh come to life a little bit more. One of the things

a little bit more. One of the things that we see as a big rate limiter at the

that we see as a big rate limiter at the moment is around how work is allocated.

moment is around how work is allocated. And so what what we've learned over the

And so what what we've learned over the last couple of years is that the impact

last couple of years is that the impact from AI and agents is highly uneven.

from AI and agents is highly uneven. There are some tasks which where it

There are some tasks which where it works amazingly well today and you see

works amazingly well today and you see uh huge improvements and there are

uh huge improvements and there are others where it it's not as effective

others where it it's not as effective and so you have that variability. You

and so you have that variability. You also have variability among people. Some

also have variability among people. Some have have uh lots of experience now

have have uh lots of experience now using these tools and and know how to

using these tools and and know how to pick that up and others uh are less

pick that up and others uh are less experienced right now and so what that

experienced right now and so what that means for for team leaders for

means for for team leaders for engineering managers and so on is it's

engineering managers and so on is it's very highly non-trivial to know how to

very highly non-trivial to know how to allocate work and resources in in a good

allocate work and resources in in a good way and this is creating a lot of

way and this is creating a lot of inefficiencies.

inefficiencies. Another example uh is is around how work

Another example uh is is around how work is being reviewed. So agents are often

is being reviewed. So agents are often giving given pretty uh fuzzy uh you know

giving given pretty uh fuzzy uh you know stories that are written in pros with

stories that are written in pros with pretty fussy acceptance criteria. Uh

pretty fussy acceptance criteria. Uh which which means that the code that

which which means that the code that comes back is not always what it was

comes back is not always what it was intended to be and and for many

intended to be and and for many companies the only mechanism to control

companies the only mechanism to control that is is often manual review. So

that is is often manual review. So you've automated some things but we've

you've automated some things but we've generated more manual review. So these

generated more manual review. So these are some of the some of the examples of

are some of the some of the examples of uh these bottlenecks that we that we see

uh these bottlenecks that we that we see coming up.

And as mentioned what what has that has resulted in so far is that most most

resulted in so far is that most most large companies today uh are are stuck a

large companies today uh are are stuck a little bit in in a world of relatively

little bit in in a world of relatively marginal gains. Uh they're working in

marginal gains. Uh they're working in ways that was developed with constraints

ways that was developed with constraints that we had in the past paradigm of

that we had in the past paradigm of human development. So you have you you

human development. So you have you you know if you go out to most companies you

know if you go out to most companies you see 8 to 10 person teams you see working

see 8 to 10 person teams you see working in two week sprints you have all these

in two week sprints you have all these these elements that were largely parts

these elements that were largely parts of like an of an agile operating uh

of like an of an agile operating uh model and that is and that is uh putting

model and that is and that is uh putting in some some limits to what they can

in some some limits to what they can see. Over the past year, we've been

see. Over the past year, we've been working with lots of clients to to sort

working with lots of clients to to sort of break that model a bit uh and develop

of break that model a bit uh and develop new ways of of working in smaller teams

new ways of of working in smaller teams in new roles uh in with shorter cycles.

in new roles uh in with shorter cycles. And when you do that, we see really

And when you do that, we see really great performance improvements and

great performance improvements and that's what gives you gives us this uh

that's what gives you gives us this uh path to where we see things are going to

path to where we see things are going to improve.

So we realized that rewiring the PDLC is not just a one-sizefits-all solution.

not just a one-sizefits-all solution. For example, different types of

For example, different types of engineering functions across the

engineering functions across the enterprise along the product life cycle

enterprise along the product life cycle may require different operating models

may require different operating models based on how humans and agents best

based on how humans and agents best collaborate. So if we take the example

collaborate. So if we take the example of modernizing legacy code bases, this

of modernizing legacy code bases, this task requires a high context of

task requires a high context of potentially the entire codebase but also

potentially the entire codebase but also has clearly well- definfined outputs. So

has clearly well- definfined outputs. So an example operating model could look

an example operating model could look like a factory of agents where humans

like a factory of agents where humans provide an initial spec and final review

provide an initial spec and final review with minimal intervention.

with minimal intervention. For new features for green field and

For new features for green field and brownfield projects, the operating model

brownfield projects, the operating model may look like an iterative loop because

may look like an iterative loop because they may benefit from the

they may benefit from the non-deterministic outputs and increased

non-deterministic outputs and increased variation where agents act as

variation where agents act as co-creators um providing more options to

co-creators um providing more options to facilitate faster feedback loops.

facilitate faster feedback loops. So, as we mentioned, we did a survey

So, as we mentioned, we did a survey among 300 enterprises globally to

among 300 enterprises globally to understand what sets these top

understand what sets these top performers apart. We found that they are

performers apart. We found that they are seven times more likely to have AI

seven times more likely to have AI native workflows which meant scaling

native workflows which meant scaling over four use cases across the software

over four use cases across the software development life cycle rather than just

development life cycle rather than just having point solutions for just code

having point solutions for just code review or for just code dov. They were

review or for just code dov. They were also six times more likely to have AI

also six times more likely to have AI native roles which meant having smaller

native roles which meant having smaller pods with different skill sets and new

pods with different skill sets and new roles.

roles. To enable these shifts, these

To enable these shifts, these organizations were investing in

organizations were investing in continuous and hands-on upskilling,

continuous and hands-on upskilling, impact measurement, and also incentive

impact measurement, and also incentive structures to incentivize developers and

structures to incentivize developers and PMs to adopt AI.

PMs to adopt AI. This led to five to six times increase

This led to five to six times increase in time to market and delivery speed as

in time to market and delivery speed as well as higher quality and more

well as higher quality and more consistent artifacts.

consistent artifacts. So when we talk about AI native

So when we talk about AI native workflows we mean that these enterprises

workflows we mean that these enterprises are moving away from quarterly planning

are moving away from quarterly planning to continuous planning and also um the

to continuous planning and also um the unit of work is moving from storydriven

unit of work is moving from storydriven to spec driven development so that these

to spec driven development so that these PMs are iterating on the specs with

PMs are iterating on the specs with agents rather than iterating on long

agents rather than iterating on long PRDs.

PRDs. On the talent side, AI native roles

On the talent side, AI native roles essentially means that we're moving away

essentially means that we're moving away from the two pizza structure to one

from the two pizza structure to one pizza pods of three to five individuals.

pizza pods of three to five individuals. Instead of having separate QA front-end

Instead of having separate QA front-end and back-end engineers, there are more

and back-end engineers, there are more consolidated roles where product

consolidated roles where product builders are managing and orchestrating

builders are managing and orchestrating agents with full stack fluency and also

agents with full stack fluency and also a better understanding of the full

a better understanding of the full architecture of their codebase. PMS are

architecture of their codebase. PMS are starting to create direct um prototypes

starting to create direct um prototypes in code rather than iterating on these

in code rather than iterating on these long PRDs.

long PRDs. And one example um that we've described

And one example um that we've described in our article, we've studied some AI

in our article, we've studied some AI native startups and realized that

native startups and realized that they've actually implemented all of

they've actually implemented all of these shifts to accelerate their

these shifts to accelerate their outcomes. And in our article, we've

outcomes. And in our article, we've described how cursor actually operates

described how cursor actually operates internally.

internally. But if you're a large enterprise

But if you're a large enterprise predicated on the agile model, what are

predicated on the agile model, what are some steps you can take? So in in a

some steps you can take? So in in a recent client study with a leading

recent client study with a leading international bank, we tested some team

international bank, we tested some team level interventions to address the

level interventions to address the bottlenecks previously mentioned before

bottlenecks previously mentioned before mainly around the sequencing of steps

mainly around the sequencing of steps within the agile ceremony and how uh to

within the agile ceremony and how uh to define the roles of agents and humans

define the roles of agents and humans within the sprint cycle. So let's walk

within the sprint cycle. So let's walk through some examples.

through some examples. First, team leads would assign sprint

First, team leads would assign sprint stories using agents based on the data

stories using agents based on the data of the team velocity and delivery

of the team velocity and delivery history. And then they would create

history. And then they would create co-create multiple prototypes and

co-create multiple prototypes and iterate with agents on the acceptance

iterate with agents on the acceptance criteria around security and

criteria around security and observability needs to have more

observability needs to have more consistent artifacts across teams. This

consistent artifacts across teams. This prevents downstream rework that was

prevents downstream rework that was mentioned before so that developers

mentioned before so that developers don't have to constantly be iterating

don't have to constantly be iterating with the agents during during the code

with the agents during during the code process. The squads were also

process. The squads were also reorganized by workflow. So there would

reorganized by workflow. So there would be one which would be focused on um

be one which would be focused on um small bug fixes and another focused on

small bug fixes and another focused on green field development. In the

green field development. In the background agents would be used to look

background agents would be used to look and impact uh look at um the potential

and impact uh look at um the potential cross repository impacts um to prevent

cross repository impacts um to prevent debugging time for developers.

debugging time for developers. And another example is that instead of

And another example is that instead of re for reducing the collaboration

re for reducing the collaboration overhead and meetings that happen within

overhead and meetings that happen within this sprint cycle um instead of waiting

this sprint cycle um instead of waiting for data scientist input PMS would

for data scientist input PMS would directly be observing the real-time

directly be observing the real-time customer feedback to rep prioritize

customer feedback to rep prioritize these features

these features and this would lead to an acceleration

and this would lead to an acceleration in the backlog within the same amount of

in the backlog within the same amount of time.

time. So we studied the um impact of these

So we studied the um impact of these interventions and found high promising

interventions and found high promising results. For example, not just the

results. For example, not just the increase in agent consumption by over 60

increase in agent consumption by over 60 times, but there was also an increase in

times, but there was also an increase in the delivery speed that was tied

the delivery speed that was tied directly to the business priorities for

directly to the business priorities for this bank. There was a 51% increase in

this bank. There was a 51% increase in code mergers, but also a decrease in um

code mergers, but also a decrease in um an increase in efficiency.

an increase in efficiency. The other aspect of this is is uh around

The other aspect of this is is uh around the different roles and and and the

the different roles and and and the talent model. And so one of the biggest

talent model. And so one of the biggest differentiators that we saw as mentioned

differentiators that we saw as mentioned was around what you have actually

was around what you have actually changed the roles that uh that are

changed the roles that uh that are involved in software development. And so

involved in software development. And so you know what what you all are seeing is

you know what what you all are seeing is that engineers are moving away from

that engineers are moving away from execution and and just simply writing

execution and and just simply writing code to being more of orchestrators and

code to being more of orchestrators and and thinking through more how to divide

and thinking through more how to divide up work to agents. for example. And we

up work to agents. for example. And we also heard some examples of how the role

also heard some examples of how the role of the product manager is changing. And

of the product manager is changing. And so while this this may sound, you know,

so while this this may sound, you know, pretty straightforward to many of you

pretty straightforward to many of you here who are who are working with these

here who are who are working with these tools like day-to-day that you have to

tools like day-to-day that you have to change what you do, the reality is that

change what you do, the reality is that about 70% of the companies that we that

about 70% of the companies that we that we survey have have not changed the

we survey have have not changed the roles at all. Right? And so you have

roles at all. Right? And so you have this background expectation that people

this background expectation that people are going to do things differently but

are going to do things differently but the the role is still defined in the

the the role is still defined in the same way and it's the same understanding

same way and it's the same understanding uh as it was you know a couple of years

uh as it was you know a couple of years ago.

ago. Um but we are starting to see you know

Um but we are starting to see you know some companies changing this. So this is

some companies changing this. So this is another example from a from another

another example from a from another recent recent client. They were set up

recent recent client. They were set up in a in a way that is, you know, pretty

in a in a way that is, you know, pretty common for for u many companies and a

common for for u many companies and a kind of typical two pizza uh team model

kind of typical two pizza uh team model with with the types of roles that you

with with the types of roles that you would be familiar with. Um the we ran a

would be familiar with. Um the we ran a bunch of experiments and front runners

bunch of experiments and front runners and and tested new models that were had

and and tested new models that were had much smaller pods uh that had uh new

much smaller pods uh that had uh new roles which consolidated some of the

roles which consolidated some of the tasks that were previously done with

tasks that were previously done with different roles.

different roles. And and so by doing that we could we

And and so by doing that we could we could create basically more pods or more

could create basically more pods or more teams uh with with the same number of

teams uh with with the same number of people uh but retaining the expectation

people uh but retaining the expectation that each pod is is uh is um performing

that each pod is is uh is um performing at about the same level as as they were

at about the same level as as they were before.

before. And so so we also see really uh really

And so so we also see really uh really positive results from that uh with with

positive results from that uh with with uh maintaining and even improving in

uh maintaining and even improving in some is the quality of the code that was

some is the quality of the code that was generated. In particular there was a

generated. In particular there was a there was a high speed up in in terms of

there was a high speed up in in terms of uh the output from from the different

uh the output from from the different teams and you can see some of the

teams and you can see some of the metrics uh here.

Let's shift gears a little bit and and and go from talking about just the team

and go from talking about just the team level. So how does this now scale uh

level. So how does this now scale uh across a big organization?

across a big organization? The reality is that many many companies

The reality is that many many companies don't just have like one or two of these

don't just have like one or two of these these teams but often hundreds of teams

these teams but often hundreds of teams even and thousands or even tens of

even and thousands or even tens of thousands of people who are working in

thousands of people who are working in this way. And uh this is where one of

this way. And uh this is where one of the biggest differences that we that we

the biggest differences that we that we saw between those that are stuck a bit

saw between those that are stuck a bit in the um in in getting only 10% or so

in the um in in getting only 10% or so change improvements from those who are

change improvements from those who are seeing outsized improvements is around

seeing outsized improvements is around how you manage that how you manage that

how you manage that how you manage that change and change management I is like

change and change management I is like one of these is a little bit of an often

one of these is a little bit of an often catch or elusive term for uh for a lot

catch or elusive term for uh for a lot of different things but but I think in

of different things but but I think in some ways it's not a bad way to think

some ways it's not a bad way to think about Right. I I usually say that the

about Right. I I usually say that the change management is about getting a lot

change management is about getting a lot of like small things right. And so the

of like small things right. And so the crux to like actually scaling this is

crux to like actually scaling this is often about getting 20 30 or even more

often about getting 20 30 or even more things right at the same time that

things right at the same time that involve the way you communicate uh what

involve the way you communicate uh what this means, the way you incentivize

this means, the way you incentivize people, uh the way you upskill them, and

people, uh the way you upskill them, and it all has to come together.

it all has to come together. Um and when it when it's not, we we we

Um and when it when it's not, we we we see what happens. And so this is an

see what happens. And so this is an example from from another tech company

example from from another tech company that we worked with um where initially

that we worked with um where initially we're rolling out new AI tools for them

we're rolling out new AI tools for them that that hit different parts of the

that that hit different parts of the product development life cycle. Um we we

product development life cycle. Um we we rolled we rolled out the tools there was

rolled we rolled out the tools there was some usage but often it dropped off. It

some usage but often it dropped off. It was either not used or it was um it was

was either not used or it was um it was sort of um used in very suboptimal ways.

sort of um used in very suboptimal ways. So that's the sort of jagged part that

So that's the sort of jagged part that you're seeing on the on the left hand

you're seeing on the on the left hand side here. despite kind of adding more

side here. despite kind of adding more users uh the overall impact did not

users uh the overall impact did not change at all. So we had to do a quite a

change at all. So we had to do a quite a reset and and um start over effectively

reset and and um start over effectively reset the expectations. What should what

reset the expectations. What should what what does this mean if you're a

what does this mean if you're a developer dayto-day? What does it mean

developer dayto-day? What does it mean for a PM? Uh we had much more hands-on

for a PM? Uh we had much more hands-on upskilling. There was could bring your

upskilling. There was could bring your own code. there were, you know, coaches

own code. there were, you know, coaches available, especially those first like

available, especially those first like few sprints before you get make this a

few sprints before you get make this a habit and work it into the way that you

habit and work it into the way that you develop software dayto-day. It's a very

develop software dayto-day. It's a very critical time and that's when when this

critical time and that's when when this matters a lot. Um, and having a bit of a

matters a lot. Um, and having a bit of a a measurement system as well, so you

a measurement system as well, so you know what's changing and and you're able

know what's changing and and you're able to to see what's uh what's what what's

to to see what's uh what's what what's improving.

improving. Another example just to put this alive a

Another example just to put this alive a little bit as mentioned like this is

little bit as mentioned like this is about getting a lot of things um right

about getting a lot of things um right and it's each one of these individually

and it's each one of these individually may not seem like it's the biggest deal

may not seem like it's the biggest deal uh but put together they really make a

uh but put together they really make a make a huge difference like this is for

make a huge difference like this is for this is some of the top uh interventions

this is some of the top uh interventions that another client had to go through

that another client had to go through for them it really helped having you

for them it really helped having you know setting up code labs for example

know setting up code labs for example really you know instituting a new set of

really you know instituting a new set of certifications that help motivate and

certifications that help motivate and and drive people to to change what they

and drive people to to change what they do day dayto-day. And these these things

do day dayto-day. And these these things really added up to uh the change they

really added up to uh the change they needed.

needed. >> But building a robust measurement system

>> But building a robust measurement system that prioritizes outcomes and not just

that prioritizes outcomes and not just adoption is important not just to

adoption is important not just to monitor progress but also pinpoint

monitor progress but also pinpoint issues and course correct quickly. So,

issues and course correct quickly. So, one surprising result from the survey

one surprising result from the survey was that these enterprises that were

was that these enterprises that were bottom performers were not even

bottom performers were not even measuring speed and only 10% were

measuring speed and only 10% were measuring productivity.

measuring productivity. But our goal is to make our clients top

But our goal is to make our clients top performing organizations. So, we've

performing organizations. So, we've worked with them to create a holistic

worked with them to create a holistic measurement system that captures impact

measurement system that captures impact all the way down to inputs. So for

all the way down to inputs. So for inputs this would include the investment

inputs this would include the investment into coding tools and other AI tools but

into coding tools and other AI tools but also the time and resources in

also the time and resources in upskilling and change management. These

upskilling and change management. These inputs would lead to direct outputs but

inputs would lead to direct outputs but a lot of organizations are just focusing

a lot of organizations are just focusing on how the increased breath and depth of

on how the increased breath and depth of adoption with of AI tools is leading to

adoption with of AI tools is leading to increased velocity and capac capacity

increased velocity and capac capacity increase. However, it's also important

increase. However, it's also important to understand how developers have uh

to understand how developers have uh different uh NPS scores and if they're

different uh NPS scores and if they're enjoying their craft more um rather than

enjoying their craft more um rather than feeling more frustrated. And it's also

feeling more frustrated. And it's also important to understand whether the code

important to understand whether the code is becoming more secure and have has

is becoming more secure and have has better quality but also more resilient.

better quality but also more resilient. And one proxy for resiliency that we

And one proxy for resiliency that we used for our client was the meantime to

used for our client was the meantime to resolve priority bugs.

resolve priority bugs. Now if we look at economic outcomes

Now if we look at economic outcomes which is priority for um the seauite

which is priority for um the seauite executives they look into what is the

executives they look into what is the time to revenue target. What is the

time to revenue target. What is the increased price differential for higher

increased price differential for higher quality features or expanding the number

quality features or expanding the number of customers to meet the feature demand

of customers to meet the feature demand and also what is the cost reduction per

and also what is the cost reduction per pod for reduced human labor.

pod for reduced human labor. In aggregate, having these larger

In aggregate, having these larger economic outcomes can also lead um to

economic outcomes can also lead um to for organizations to understand how

for organizations to understand how there is an increased reinvestment in

there is an increased reinvestment in green field and brownfield development.

green field and brownfield development. But as these tools evolve, the proxies

But as these tools evolve, the proxies for these metrics will also evolve. But

for these metrics will also evolve. But hopefully this provides a MECI framework

hopefully this provides a MECI framework as an initial starting point.

as an initial starting point. So what's next? The future of course is

So what's next? The future of course is difficult to predict, let alone in the

difficult to predict, let alone in the next 5 years. But we hope that with our

next 5 years. But we hope that with our vision of a new software development

vision of a new software development model, even as agents increase in their

model, even as agents increase in their intelligence and humans become more

intelligence and humans become more fluent in AI, that this model still

fluent in AI, that this model still stands. So hopefully this model that

stands. So hopefully this model that includes um shorter sprints, smaller

includes um shorter sprints, smaller teams, but large u smaller but larger

teams, but large u smaller but larger number of teams will set enterprises up

number of teams will set enterprises up for success in the long term.

for success in the long term. >> So just leave you with some some key

>> So just leave you with some some key takeaways. um start now. I would say to

takeaways. um start now. I would say to to our our clients, this is a human

to our our clients, this is a human change and it takes some times and it's

change and it takes some times and it's a big change and and it's going to be a

a big change and and it's going to be a journey and so I think um this is

journey and so I think um this is something that everyone needs to go on.

something that everyone needs to go on. I think it's also important to figure

I think it's also important to figure out which model works for you and set a

out which model works for you and set a really bold ambition and with that say

really bold ambition and with that say thank you so much for listening to us

thank you so much for listening to us and and uh we have an article here if

and and uh we have an article here if you're more interested in in the

you're more interested in in the research that we've conducted. Thank you

research that we've conducted. Thank you so much for having us. Our [applause]

next presenter is a researcher at Stanford who studies how AI impacts over

Stanford who studies how AI impacts over 100,000 developers in the real world.

100,000 developers in the real world. Please welcome Jaor Dennis Blanch.

So companies spend millions on AI tools for software engineering. But do we

for software engineering. But do we actually know how well these tools work

actually know how well these tools work in the enterprise or are these tools

in the enterprise or are these tools just all hype? to answer this and for

just all hype? to answer this and for the past two years we've been

the past two years we've been researching the impact of AI on software

researching the impact of AI on software engineering productivity and our

engineering productivity and our research is time series because we look

research is time series because we look at get historical data meaning we can go

at get historical data meaning we can go back in time and it's also

back in time and it's also cross-sectional because we cut across

cross-sectional because we cut across companies and the way we use to measure

companies and the way we use to measure most of the of the impact is by a

most of the of the impact is by a machine learning model that replicates a

machine learning model that replicates a panel of human experts. The way this

panel of human experts. The way this works is that imagine you have a

works is that imagine you have a software engineer who writes a code

software engineer who writes a code commit and this code commit would be

commit and this code commit would be evaluated by multiple panels or of 10

evaluated by multiple panels or of 10 and 15 independent experts who would

and 15 independent experts who would evaluate that code commit across

evaluate that code commit across implementation time maintainability and

implementation time maintainability and complexity and then produce an output

complexity and then produce an output evaluation. So we took the labels of

evaluation. So we took the labels of these panels across you know millions of

these panels across you know millions of of kind of evaluations and then trained

of kind of evaluations and then trained a model to replicate this panel of

a model to replicate this panel of experts meaning that we can deploy this

experts meaning that we can deploy this at scale and if there's ever any doubts

at scale and if there's ever any doubts around the models output you can always

around the models output you can always kind of assemble your own panel and see

kind of assemble your own panel and see that it correlates pretty well with

that it correlates pretty well with reality.

reality. Today we'll talk about four things.

Today we'll talk about four things. We'll start off with looking at some of

We'll start off with looking at some of the things that are driving AI

the things that are driving AI productivity gains in software. Then

productivity gains in software. Then we'll look at a AI practices benchmark

we'll look at a AI practices benchmark that we developed. We'll then look at

that we developed. We'll then look at how we propose to measure AI return on

how we propose to measure AI return on investment in software engineering. And

investment in software engineering. And lastly, we'll finish things off with a

lastly, we'll finish things off with a case study.

case study. So here we took 46 teams that were using

So here we took 46 teams that were using AI and we matched them with 46 similar

AI and we matched them with 46 similar teams that were not using AI and we

teams that were not using AI and we measured their net productivity gains

measured their net productivity gains from AI quarterly. And the shaded area

from AI quarterly. And the shaded area is the middle 50% of the data and the

is the middle 50% of the data and the dark blue line is the median which as of

dark blue line is the median which as of July of this year stands at about 10%

July of this year stands at about 10% for this cohort.

for this cohort. I'd like to direct your attention to the

I'd like to direct your attention to the fact that the discrepancy between the

fact that the discrepancy between the top performers and the bottom ones is

top performers and the bottom ones is increasing. There's a widening gap. And

increasing. There's a widening gap. And so if we very unscientifically and very

so if we very unscientifically and very illustratively project this forward, we

illustratively project this forward, we might get something like this, right?

might get something like this, right? Where uh you can have these top

Where uh you can have these top performers being part of this the rich

performers being part of this the rich gets richer effect where they these

gets richer effect where they these successful early AI adopters might

successful early AI adopters might compound their gains while these

compound their gains while these strugglers could fall further behind. At

strugglers could fall further behind. At some point this is going to converge and

some point this is going to converge and this is very directional. But my point

this is very directional. But my point here is that if you're a leader in a

here is that if you're a leader in a company, you definitely need to know in

company, you definitely need to know in which cohort you are right now so that

which cohort you are right now so that you can course correct. And without

you can course correct. And without measuring the impact of AI on your

measuring the impact of AI on your engineers, you're not going to be able

engineers, you're not going to be able to do this.

to do this. So we started investigating what are

So we started investigating what are some of the factors that drive these top

some of the factors that drive these top teams to perform better. And the first

teams to perform better. And the first thing we looked at is AI usage or

thing we looked at is AI usage or basically token spent. In this graph you

basically token spent. In this graph you have the same kind of on the vertical

have the same kind of on the vertical axis the productivity increase and then

axis the productivity increase and then on the horizontal one you have the token

on the horizontal one you have the token usage per engineer per month on a

usage per engineer per month on a logarithmic scale. And what you can see

logarithmic scale. And what you can see is that the correlation is quite loose

is that the correlation is quite loose 20 or so linearly. And there is a bit of

20 or so linearly. And there is a bit of a death valley effect around the 10

a death valley effect around the 10 million uh token mark whereby com teams

million uh token mark whereby com teams that were using that amount of tokens

that were using that amount of tokens seem to be doing worse than teams that

seem to be doing worse than teams that were using a bit less tokens. It's very

were using a bit less tokens. It's very directional but interesting

directional but interesting nevertheless.

nevertheless. The conclusion here might be that AI

The conclusion here might be that AI usage quality matters more than AI usage

usage quality matters more than AI usage value.

value. We dug deeper and we said well does the

We dug deeper and we said well does the environment in which the engineers work

environment in which the engineers work impact the productivity from AI and we

impact the productivity from AI and we came up with an environment cleaniness

came up with an environment cleaniness index index. It's quite experimental.

index index. It's quite experimental. It's a composite score that looks at

It's a composite score that looks at tests looks at uh types at documentation

tests looks at uh types at documentation and at modularity and at code quality.

and at modularity and at code quality. And that index is on the bottom axis

And that index is on the bottom axis here from 0 to one. And then on the

here from 0 to one. And then on the vertical axis once again you have the

vertical axis once again you have the kind of productivity lift relative to

kind of productivity lift relative to teams not using AI. And so what you can

teams not using AI. And so what you can see is that there's a40 R squar meaning

see is that there's a40 R squar meaning a pretty decent correlation around

a pretty decent correlation around environment cleanliness and gains from

environment cleanliness and gains from uh AI or productivity gains from using

uh AI or productivity gains from using AI. And so the takeaway here is to

AI. And so the takeaway here is to invest in codebased hygiene to unlock

invest in codebased hygiene to unlock these AI productivity gains.

these AI productivity gains. We dug deeper to illustrate this

We dug deeper to illustrate this concept. And here we have on this graph

concept. And here we have on this graph on the vertical axis the percentage of

on the vertical axis the percentage of tasks that might uh be able to be

tasks that might uh be able to be completed by AI based on three colors.

completed by AI based on three colors. And so green means that AI can do most

And so green means that AI can do most of the work for that task in that

of the work for that task in that sprint. Yellow means that AI can help

sprint. Yellow means that AI can help someone and red uh means that AI is not

someone and red uh means that AI is not very useful. And this is quite

very useful. And this is quite illustrative but it it conveys the

illustrative but it it conveys the point. And so then any code base at any

point. And so then any code base at any point in time sits on a vertical line

point in time sits on a vertical line across this graphic. And what you can

across this graphic. And what you can see is that clean code amplifies AI

see is that clean code amplifies AI gains.

gains. Secondly is that you need to manage your

Secondly is that you need to manage your codebase entropy, right? Your codebase

codebase entropy, right? Your codebase tech debt because if you just use AI

tech debt because if you just use AI unchecked, this is going to accelerate

unchecked, this is going to accelerate this entropy which is going to push and

this entropy which is going to push and degrade your cleaniness to the left kind

degrade your cleaniness to the left kind of right. And then you as as a human

of right. And then you as as a human need to push on the other side to kind

need to push on the other side to kind of improve or maintain that cleanliness

of improve or maintain that cleanliness to keep reaping the benefits from AI.

to keep reaping the benefits from AI. Thirdly is that it's important that

Thirdly is that it's important that engineers need to know when to use AI

engineers need to know when to use AI and when not to use AI. And what happens

and when not to use AI. And what happens when they don't is this kind of line on

when they don't is this kind of line on the left whereby you have AI AI outputs

the left whereby you have AI AI outputs that are rejected or need heavy

that are rejected or need heavy rewriting which then leads to engineers

rewriting which then leads to engineers losing trust in AI saying okay this just

losing trust in AI saying okay this just doesn't work. I'm not going to use it.

doesn't work. I'm not going to use it. Which then further collapses your AI

Which then further collapses your AI gains.

Now, we said, can we find out whether we can look not only at usage but at how

can look not only at usage but at how are these companies and these engineers

are these companies and these engineers using AI? And we came up with an AI

using AI? And we came up with an AI engineering practices benchmark. The way

engineering practices benchmark. The way this works is that we can scan your

this works is that we can scan your codebase and detect these AI

codebase and detect these AI fingerprints or artifacts. Basically,

fingerprints or artifacts. Basically, traces of how your team is using AI.

traces of how your team is using AI. It's quite directional at this point,

It's quite directional at this point, but evolving. And we can quantify this

but evolving. And we can quantify this based on the percentage of your active

based on the percentage of your active engineering work that uses each AI

engineering work that uses each AI pattern. And then we kind of repeat this

pattern. And then we kind of repeat this monthly using get history. And the way

monthly using get history. And the way this works is more or less you have kind

this works is more or less you have kind of a few levels. And level zero might be

of a few levels. And level zero might be how humans are just not using AI and

how humans are just not using AI and write all of the code. Level one is kind

write all of the code. Level one is kind of like personal use where engineers are

of like personal use where engineers are not sharing prompts across the team or

not sharing prompts across the team or not versioning them. Level two is team

not versioning them. Level two is team use whereby teams are are sharing these

use whereby teams are are sharing these kind of prompts and rules. And then

kind of prompts and rules. And then level three is even more sophisticated.

level three is even more sophisticated. It's where AI autonomously does specific

It's where AI autonomously does specific tasks maybe not the entire workflow. And

tasks maybe not the entire workflow. And level four is you know agentic

level four is you know agentic orchestration which is where AI just

orchestration which is where AI just runs the entire process. And so this is

runs the entire process. And so this is going to be an open- source tool which

going to be an open- source tool which you can leverage if you sign up on the

you can leverage if you sign up on the sweeper research portal.

sweeper research portal. We applied this benchmark to one of the

We applied this benchmark to one of the companies in our research data set and

companies in our research data set and we saw this this company had two

we saw this this company had two business units with equal access to AI

business units with equal access to AI tools, right? Same licenses, same spend,

tools, right? Same licenses, same spend, same tools, same everything. But the

same tools, same everything. But the adoption rate and the usage rate was

adoption rate and the usage rate was very different by business unit. On the

very different by business unit. On the left, the first business unit, you can

left, the first business unit, you can as you can see in the area in the blue,

as you can see in the area in the blue, seemed to be using AI a lot more for

seemed to be using AI a lot more for almost 40% of their work. Whereas on the

almost 40% of their work. Whereas on the on the uh right, the second business

on the uh right, the second business unit seem to struggle behind a bit more.

unit seem to struggle behind a bit more. And so the takeaway here is that access

And so the takeaway here is that access to AI and even AI usage doesn't mean or

to AI and even AI usage doesn't mean or doesn't guarantee that that AI is going

doesn't guarantee that that AI is going to be used in the same way across a

to be used in the same way across a company.

company. As a leader, you really want to be

As a leader, you really want to be understanding not just whether they're

understanding not just whether they're using but also how your engineers are

using but also how your engineers are using AI.

Great. Now let's dive into how do we actually measure AI return on investment

actually measure AI return on investment in software engineering.

Oh uh there we go. Okay. So here ideally we would be measuring this based on

we would be measuring this based on business outcomes, right? I give my AI

business outcomes, right? I give my AI engineer my engineers AI and then I make

engineer my engineers AI and then I make more money, more revenue, net revenue

more money, more revenue, net revenue retention, whatever business KPI you

retention, whatever business KPI you want to track. The problem is that

want to track. The problem is that there's too much noise between the

there's too much noise between the treatment right giving AI and the result

treatment right giving AI and the result which is the business outcome. And on

which is the business outcome. And on top of this there's confounding

top of this there's confounding variables such as your sales execution,

variables such as your sales execution, the macro environment, your product

the macro environment, your product strategy and therefore although that

strategy and therefore although that would be ideal unfortunately uh I think

would be ideal unfortunately uh I think we need to find alternative paths and

we need to find alternative paths and the most logical one is to simply look

the most logical one is to simply look at the engineering outcomes because

at the engineering outcomes because there is a clear signal right but here

there is a clear signal right but here we need to go beyond measuring AI usage

we need to go beyond measuring AI usage into measuring engineering outcomes.

into measuring engineering outcomes. There's a few caveats and this topic is

There's a few caveats and this topic is quite heavily discussed and so I want to

quite heavily discussed and so I want to mention some of them.

mention some of them. The first one is that this is assuming

The first one is that this is assuming that our product function can properly

that our product function can properly direct that increased capacity into

direct that increased capacity into something that generates value. And if

something that generates value. And if they aren't directing that, then it's a

they aren't directing that, then it's a product problem, which although sits

product problem, which although sits quite close to engineering, it's

quite close to engineering, it's slightly different, right?

slightly different, right? The second caveat is that this assumes

The second caveat is that this assumes that engineering is a meaningful

that engineering is a meaningful bottleneck for value which frankly it

bottleneck for value which frankly it typically is and that you can guard

typically is and that you can guard against good hards law by using a

against good hards law by using a balanced set of metrics and also by

balanced set of metrics and also by having a good company culture that

having a good company culture that doesn't weaponize these metrics.

doesn't weaponize these metrics. And thirdly is that AI is still very new

And thirdly is that AI is still very new and measuring proxy metrics is still

and measuring proxy metrics is still better than not measuring. There's going

better than not measuring. There's going to be winners and losers in this AI

to be winners and losers in this AI race. And progress is better than

race. And progress is better than perfection here. And so metrics don't

perfection here. And so metrics don't need to be flawless to be useful is what

need to be flawless to be useful is what I want to illustrate.

So then um here we have uh two parts which you need to do to get the ROI from

which you need to do to get the ROI from AI, right? You kind of need to measure

AI, right? You kind of need to measure usage and then you need to measure

usage and then you need to measure engineering outcomes. And so let's start

engineering outcomes. And so let's start with usage.

with usage. There's really two buckets for

There's really two buckets for enterprises. There's kind of more in a

enterprises. There's kind of more in a research environment, but to make it

research environment, but to make it simple, there's access based and there's

simple, there's access based and there's usage based. Accessbased is basically

usage based. Accessbased is basically looking at when did people get access to

looking at when did people get access to the tool. And here we have you can kind

the tool. And here we have you can kind of do a pilot group, give that group AI

of do a pilot group, give that group AI and then compare it to a similar group

and then compare it to a similar group without AI or you can measure the same

without AI or you can measure the same team across time. The problem is that

team across time. The problem is that access based is noisy and the gold

access based is noisy and the gold standard is really usage based which uh

standard is really usage based which uh uses telemetry from APIs from these

uses telemetry from APIs from these coding assistants right to uh give you

coding assistants right to uh give you the right data to know who's using AI

the right data to know who's using AI and and where and the caveat here is

and and where and the caveat here is that the vendor API is different

that the vendor API is different unfortunately tools like GitHub copilot

unfortunately tools like GitHub copilot aggregate the data and other tools like

aggregate the data and other tools like cursor give you more granular data

cursor give you more granular data the big takeaway is that you can measure

the big takeaway is that you can measure impact of um retroactively by using get

impact of um retroactively by using get history. And so you don't need to set up

history. And so you don't need to set up an experiment now and wait 6 months. You

an experiment now and wait 6 months. You can actually if you've already adopted

can actually if you've already adopted AI, you can go back in time and and and

AI, you can go back in time and and and do this. It's quite easy.

do this. It's quite easy. Now we've seen usage. Let's look into

Now we've seen usage. Let's look into how do we actually measure engineering

how do we actually measure engineering outcomes? What are some of the metrics

outcomes? What are some of the metrics we propose?

Here we have um our framework which we proposed which is using a primary metric

proposed which is using a primary metric and a guardrail metric. And so here um

and a guardrail metric. And so here um the primary metric is engineering

the primary metric is engineering output. It's not lines of code. It's not

output. It's not lines of code. It's not PR counts and it's not dura. And it's

PR counts and it's not dura. And it's basically based on this machine learning

basically based on this machine learning model that replicates the panel of

model that replicates the panel of experts, right? And the second set of

experts, right? And the second set of metrics are the guard ones which you

metrics are the guard ones which you want to maintain at a healthy level but

want to maintain at a healthy level but you don't want to maximize. It doesn't

you don't want to maximize. It doesn't make sense to maximize them truly. And

make sense to maximize them truly. And so then there's three categories within

so then there's three categories within the guardrail ones rework and

the guardrail ones rework and refactoring quality tech and risk and

refactoring quality tech and risk and then people and devops. The third bucket

then people and devops. The third bucket is important to highlight that these are

is important to highlight that these are not productivity metrics. They're useful

not productivity metrics. They're useful but you cannot just kind of use them

but you cannot just kind of use them like maximize them to maximize developer

like maximize them to maximize developer productivity. They kind of fall off at

productivity. They kind of fall off at some point. And so the goal here might

some point. And so the goal here might be to keep your guardrail metrics

be to keep your guardrail metrics healthy while increasing the primary

healthy while increasing the primary metric to whatever degree possible.

metric to whatever degree possible. Now let's dive into a case study. Here

Now let's dive into a case study. Here we worked with

we worked with a company that uh large enterprise. We

a company that uh large enterprise. We took a team of uh 350 people under a

took a team of uh 350 people under a vice president and we measured pull

vice president and we measured pull requests. The reason we did this is to

requests. The reason we did this is to illustrate that you cannot measure pull

illustrate that you cannot measure pull requests to understand whether AI is

requests to understand whether AI is helping you. And so here this team

helping you. And so here this team adopted um AI in May of this year and we

adopted um AI in May of this year and we measured the four months before four

measured the four months before four months after. We saw a 14% increase.

months after. We saw a 14% increase. Great. That's fantastic. But what about

Great. That's fantastic. But what about reviewer burden? What about code

reviewer burden? What about code quality? So we measured code quality.

quality? So we measured code quality. And here what we saw is um I mean

And here what we saw is um I mean firstly actually code quality think of

firstly actually code quality think of it as maintainability scale from 0 to

it as maintainability scale from 0 to 10. And uh there's kind of these bands.

10. And uh there's kind of these bands. Uh it uses our our methodology. You can

Uh it uses our our methodology. You can read it online. But basically what you

read it online. But basically what you see is that in the preAI period their

see is that in the preAI period their code quality was quite stable and

code quality was quite stable and consistent. And once they adopted AI,

consistent. And once they adopted AI, two things happened. Code quality

two things happened. Code quality decreased and then code quality became

decreased and then code quality became more erratic.

Next, we took a look at our metric, which is engineering output. It's not

which is engineering output. It's not lines of code. And here for every month,

lines of code. And here for every month, you see the sigma, the sum of the output

you see the sigma, the sum of the output delivered for that month, broken down

delivered for that month, broken down into four buckets. Rework and

into four buckets. Rework and refactoring. So rework is when you're

refactoring. So rework is when you're changing or editing code that was it's

changing or editing code that was it's still kind of fresh, so it's recent.

still kind of fresh, so it's recent. refactoring is when you're changing code

refactoring is when you're changing code that's a bit older and uh what uh then

that's a bit older and uh what uh then like added and removed it's pretty

like added and removed it's pretty self-explanatory and then also you can

self-explanatory and then also you can see these kind of benchmarks so we can

see these kind of benchmarks so we can benchmark this company against similar

benchmark this company against similar companies in their industry and here AI

companies in their industry and here AI usage had two effects firstly is that

usage had two effects firstly is that rework went up by 2.5 times which is

rework went up by 2.5 times which is really bad and effective output which is

really bad and effective output which is kind of like a proxy for productivity or

kind of like a proxy for productivity or so didn't really change

so didn't really change and so then what's the conclusion here

and so then what's the conclusion here let's do a recap app. So we saw that PRs

let's do a recap app. So we saw that PRs went up by 14%. But this is inconclusive

went up by 14%. But this is inconclusive because more PRs doesn't mean better. We

because more PRs doesn't mean better. We saw that code quality decreased by 9%

saw that code quality decreased by 9% which is problematic. We saw that

which is problematic. We saw that effective output didn't increase

effective output didn't increase meaningfully. And then we saw that

meaningfully. And then we saw that rework increased by a lot. And so then

rework increased by a lot. And so then the question here is what is the ROI of

the question here is what is the ROI of this AI adoption, right? It might be

this AI adoption, right? It might be negative. And what I want to point out

negative. And what I want to point out here is that had this company not

here is that had this company not measured this more thoroughly and simply

measured this more thoroughly and simply measured PR counts, they would have

measured PR counts, they would have thought, hey, we're doing great. We

thought, hey, we're doing great. We increased our productivity by 14%. Let's

increased our productivity by 14%. Let's run from the numbers. That's how many

run from the numbers. That's how many million lots of millions of dollars. And

million lots of millions of dollars. And does this offset the AI license? Sure

does this offset the AI license? Sure thing it does, right? The other thing is

thing it does, right? The other thing is that I don't think this company should

that I don't think this company should abandon AI. They should simply use this

abandon AI. They should simply use this data to understand what they're doing

data to understand what they're doing wrong. How can they improve? Because AI

wrong. How can they improve? Because AI is here to stay. It's a tool that's

is here to stay. It's a tool that's going to transform how engineers are are

going to transform how engineers are are working, right? and you can just um kind

working, right? and you can just um kind of like abandon it or yourself.

of like abandon it or yourself. Great. So, this concludes our insights

Great. So, this concludes our insights for today. If you've enjoyed this uh

for today. If you've enjoyed this uh talk and you would like similar insights

talk and you would like similar insights for your company, I invite you to

for your company, I invite you to participate in our research. Everything

participate in our research. Everything you've seen today can uh be accessed

you've seen today can uh be accessed through kind of participating in our

through kind of participating in our research, some of them through live

research, some of them through live dashboards in our research portal. And

dashboards in our research portal. And especially I'd like to invite companies

especially I'd like to invite companies that have access to Cursor Enterprise to

that have access to Cursor Enterprise to participate because we have a high need

participate because we have a high need for this so we can publish papers around

for this so we can publish papers around the granularity of using AI um in

the granularity of using AI um in software engineering. You can sign up at

software engineering. You can sign up at software engineering

software engineering productivity.stanford.edu.

productivity.stanford.edu. Thank you so much. [applause]

[music] next speaker will separate hype from reality on AI code quality using

from reality on AI code quality using realworld data to show when AI generated

realworld data to show when AI generated code can be trusted in production.

code can be trusted in production. Please welcome CEO of Kodto, Edidomar

Please welcome CEO of Kodto, Edidomar Freriedman.

It will grow. It will grow one or two more months. I'm really excited being

more months. I'm really excited being here. So many so much pragmatic and

here. So many so much pragmatic and insight and suggestions. I was sitting

insight and suggestions. I was sitting there uh just just before. So I'm Edmar

there uh just just before. So I'm Edmar Freiedman, the CEO and co-founder of

Freiedman, the CEO and co-founder of Kodto. Codto stands for quality of

Kodto. Codto stands for quality of development and I'm going to share uh

development and I'm going to share uh our reports and other companies reports

our reports and other companies reports about state of AI code quality. uh you

about state of AI code quality. uh you know trying to uh talk about the hype

know trying to uh talk about the hype versus reality which was uh like one of

versus reality which was uh like one of the uh points that were discussed here

the uh points that were discussed here quite a lot which is awesome. So in the

quite a lot which is awesome. So in the last three weeks, four weeks, we saw

last three weeks, four weeks, we saw like three outages in the clouds

like three outages in the clouds unfortunately, right? And these are

unfortunately, right? And these are coming from companies that really care

coming from companies that really care about moving fast, right? They're

about moving fast, right? They're they're they're saying themselves that

they're they're saying themselves that they're using AI to generate code 10%,

they're using AI to generate code 10%, 30%, 50%, at the same time, they care

30%, 50%, at the same time, they care about quality. So how did that happen?

about quality. So how did that happen? And is it is it related? I don't know.

And is it is it related? I don't know. But let's have some I'm going to share

But let's have some I'm going to share some guess. So by the way 60% of

some guess. So by the way 60% of developers say that the like quarter of

developers say that the like quarter of their code is either generated by AI or

their code is either generated by AI or in in like uh uh shaped by I and 15% say

in in like uh uh shaped by I and 15% say that even more than 80 80% of their code

that even more than 80 80% of their code uh is basically generated or or shaped

uh is basically generated or or shaped by AI. Now people are using AI to do

by AI. Now people are using AI to do vibe coding but actually they're even

vibe coding but actually they're even doing it for vibe checking vibe

doing it for vibe checking vibe reviewing. This is the command of cloud.

reviewing. This is the command of cloud. This is the prompt for the command of

This is the prompt for the command of claude code for security review. It was

claude code for security review. It was hyped like two months ago. Do you know

hyped like two months ago. Do you know what I'm talking about now? It says

what I'm talking about now? It says there, I don't know if you see it. Uh,

there, I don't know if you see it. Uh, you are a senior security engineer.

you are a senior security engineer. Good. And then like somewhere there uh

Good. And then like somewhere there uh down the line, it says please exclude

down the line, it says please exclude denial of service. Don't don't uh catch

denial of service. Don't don't uh catch denial of service issues. Maybe that's

denial of service issues. Maybe that's part of the part of the reason like

part of the part of the reason like we're we're having uh cloud outages.

we're we're having uh cloud outages. probably not just that, but you get the

probably not just that, but you get the point. Like we need to be rigorous about

point. Like we need to be rigorous about how we deal with quality. It's not just

how we deal with quality. It's not just like vibe quality or or so like we're

like vibe quality or or so like we're doing vibe coding sometimes. Uh let's go

doing vibe coding sometimes. Uh let's go to another example. Okay, cursor I guess

to another example. Okay, cursor I guess like or or pilot most of you use rules,

like or or pilot most of you use rules, right? We're going to talk about it. You

right? We're going to talk about it. You invest in code generation. After a

invest in code generation. After a [snorts] while, you understand if you

[snorts] while, you understand if you invest, you'll get more out of it. And

invest, you'll get more out of it. And uh we we asked like a bunch of of

uh we we asked like a bunch of of developers and I'm asking you as well

developers and I'm asking you as well think for a second for all the

think for a second for all the developers there in the audience like

developers there in the audience like when you write cursor rules or copilot

when you write cursor rules or copilot rules etc. Do you feel they're

rules etc. Do you feel they're completely followed or it's like mostly

completely followed or it's like mostly followed? Do you know how much they're

followed? Do you know how much they're followed? And what extent are they

followed? And what extent are they followed? It's rigorously like how

followed? It's rigorously like how technical deep they're they're being

technical deep they're they're being followed. So the what we get back like

followed. So the what we get back like the answer from what you see here on the

the answer from what you see here on the screen is mostly like B, C, and D. They

screen is mostly like B, C, and D. They are followed but they're not completely

are followed but they're not completely followed. Okay. So that means like we

followed. Okay. So that means like we are generating code trying to push it to

are generating code trying to push it to the standards but it's not necessarily

the standards but it's not necessarily still like getting to the quality we

still like getting to the quality we wanted. I'm going to share a bit more

wanted. I'm going to share a bit more statistics and and information and some

statistics and and information and some insight from three reports. One done by

insight from three reports. One done by Codo, another by done by Sonar, another

Codo, another by done by Sonar, another by far and all of them are are focused

by far and all of them are are focused on code code quality review etc. The

on code code quality review etc. The sample size is thousands of developers

sample size is thousands of developers in some cases even more millions of pull

in some cases even more millions of pull requests and and a billion of of lines

requests and and a billion of of lines lines of code that were uh uh being

lines of code that were uh uh being checked. Like for example, if you think

checked. Like for example, if you think about uh Sonar, this is a company, yeah,

about uh Sonar, this is a company, yeah, a bit like coming from pre-AI, but they

a bit like coming from pre-AI, but they see code at scale and you they're doing

see code at scale and you they're doing like a lot of checks in code that are

like a lot of checks in code that are not necessarily AI focused, but are

not necessarily AI focused, but are necessary in order to check uh your your

necessary in order to check uh your your software from all possible direction.

software from all possible direction. And that's why their scaling and the

And that's why their scaling and the scale of the code that you're seeing is

scale of the code that you're seeing is is immense. Okay. So for example, we

is immense. Okay. So for example, we took information from from their report

took information from from their report and eventually my purpose here is to

and eventually my purpose here is to break down the different dimension of

break down the different dimension of what uh code quality means and give you

what uh code quality means and give you some share some stats and and insights.

some share some stats and and insights. I want to start with the end. Okay, this

I want to start with the end. Okay, this is the takeaway I want you all all like

is the takeaway I want you all all like to take from from the next 13 minutes

to take from from the next 13 minutes that I have. We started with code

that I have. We started with code generation. We like out of the box use

generation. We like out of the box use it autocomplete etc. and you invest in

it autocomplete etc. and you invest in it and you can get more out of it. But

it and you can get more out of it. But there's the glass ceiling for how much

there's the glass ceiling for how much productivity you can get from code

productivity you can get from code generation. And then we move to the

generation. And then we move to the agent code generation, right? Let's call

agent code generation, right? Let's call it gen 2.0. And that's a higher glass

it gen 2.0. And that's a higher glass ceiling. It could do much more

ceiling. It could do much more productivity and especially if you

productivity and especially if you invest in it, for example, rules, etc.

invest in it, for example, rules, etc. Then with AI breaking outside of the

Then with AI breaking outside of the IDE, we can start using AI also for code

IDE, we can start using AI also for code for agentic quality workflows. It could

for agentic quality workflows. It could be inside the ID, but the the truth is

be inside the ID, but the the truth is that if you think about all the

that if you think about all the workflows you have in your organization,

workflows you have in your organization, especially if you're more than 100

especially if you're more than 100 developers or so, you probably have a

developers or so, you probably have a lot of workflows that you related to

lot of workflows that you related to quality that you need to auto automate.

quality that you need to auto automate. And that's where you start like breaking

And that's where you start like breaking through the glass ceiling of

through the glass ceiling of productivity. if you invest in it. And

productivity. if you invest in it. And finally, I I claim that you need those

finally, I I claim that you need those agentic workflows. Keep learning. And we

agentic workflows. Keep learning. And we might touch a little bit of that like

might touch a little bit of that like later later on. Okay? Like because

later later on. Okay? Like because quality is something dynamic. So you'll

quality is something dynamic. So you'll only finally break break the glass

only finally break break the glass ceiling if if you really have those

ceiling if if you really have those quality workflows and rules and standard

quality workflows and rules and standard being dynamic. And then then you will

being dynamic. And then then you will see the promised 2x let alone the 10x

see the promised 2x let alone the 10x that you were promised the hyped and and

that you were promised the hyped and and you you heard from McKenzie and from

you you heard from McKenzie and from Stanford you're not getting that. I

Stanford you're not getting that. I don't need to tell you the 2x 10x for

don't need to tell you the 2x 10x for the entire software development uh life

the entire software development uh life cycle. So a bit about more about the the

cycle. So a bit about more about the the market adoption. Uh one of the report

market adoption. Uh one of the report says that 82% of adoption already for AI

says that 82% of adoption already for AI dev tools are being used daily or

dev tools are being used daily or weekly. uh some people at 60 60% 59

weekly. uh some people at 60 60% 59 report that they're using more than

report that they're using more than three and 20% saying that they're using

three and 20% saying that they're using more than five code generation tools. If

more than five code generation tools. If you think about it for a second uh don't

you think about it for a second uh don't only take like cursor compil

only take like cursor compil etc. Sorry if I'm insulting anyone in

etc. Sorry if I'm insulting anyone in the that I forgot their tool but there's

the that I forgot their tool but there's also the lovable etc. They also generate

also the lovable etc. They also generate code and by the way you're going to get

code and by the way you're going to get to 10 I'm count on me you're going to

to 10 I'm count on me you're going to get to 10 tools in two three years that

get to 10 tools in two three years that generate code for you okay come to talk

generate code for you okay come to talk to me about later I'll try to convince

to me about later I'll try to convince you and and the thing is that it it's

you and and the thing is that it it's coming from bottom up like 50% of the

coming from bottom up like 50% of the usage is coming from less than 10 teams

usage is coming from less than 10 teams that are less than 10 developers but it

that are less than 10 developers but it is propagating also to the enterprise

is propagating also to the enterprise again I'm sure you know I mean talk

again I'm sure you know I mean talk propagating to the enterprise at scale

propagating to the enterprise at scale like not just like five developers in

like not just like five developers in the last year we're seeing like more and

the last year we're seeing like more and more enterprise using co code

more enterprise using co code generation. Uh so if like an in average

generation. Uh so if like an in average with within reports we saw 82 to 92%

with within reports we saw 82 to 92% using weekly to a monthly uh code

using weekly to a monthly uh code generation tools and in some cases maybe

generation tools and in some cases maybe extreme maybe not we're going to talk

extreme maybe not we're going to talk about it we saw 3x productivity boost in

about it we saw 3x productivity boost in writing code okay but that doesn't mean

writing code okay but that doesn't mean that if you have uh 3x productivity in

that if you have uh 3x productivity in writing code that you actually guarantee

writing code that you actually guarantee any quality like I presented before so

any quality like I presented before so actually 67% of the developer that we as

actually 67% of the developer that we as asked have serious equality concerns

asked have serious equality concerns about all the AI generated all the

about all the AI generated all the generated code uh uh the code generated

generated code uh uh the code generated by AI or influenced by AI and they're

by AI or influenced by AI and they're claiming that they're missing the

claiming that they're missing the framework how to deal with quality how

framework how to deal with quality how to measure quality it's a big question

to measure quality it's a big question what is quality I'm going to talk about

what is quality I'm going to talk about it in the next few slides okay think

it in the next few slides okay think about it for a second before I break

about it for a second before I break break it down what what is quality

break it down what what is quality um so what we're actually saying that

um so what we're actually saying that the crisis with VIP coding uh viable

the crisis with VIP coding uh viable coding we're seeing it shifting and

coding we're seeing it shifting and evolving is that you're getting like

evolving is that you're getting like more task being done like 20 some report

more task being done like 20 some report 20% more task you know velocity and like

20% more task you know velocity and like 97 more% or so of PRs being opened and

97 more% or so of PRs being opened and eventually it takes more time to review

eventually it takes more time to review PR like 90% more time to review PR and

PR like 90% more time to review PR and by the way like there's a lot of

by the way like there's a lot of statistics about AI generating code at

statistics about AI generating code at least there's not less amount amount of

least there's not less amount amount of bugs per line of code. I'm not claiming

bugs per line of code. I'm not claiming that there are more, but even if there's

that there are more, but even if there's not less bugs per line of code, you have

not less bugs per line of code, you have much more bugs because there are much

much more bugs because there are much more PRs, much more code being

more PRs, much more code being generated, etc. Right? So that that's a

generated, etc. Right? So that that's a problem for the reviewer. So it's

problem for the reviewer. So it's somebody surprised it takes more time to

somebody surprised it takes more time to review these, especially in the age of

review these, especially in the age of agents, right? When five minutes calling

agents, right? When five minutes calling to cloud code, I have 1,000 line of code

to cloud code, I have 1,000 line of code after 5 minutes. Once upon a time, it

after 5 minutes. Once upon a time, it took me like hours to write 10 proper

took me like hours to write 10 proper lines of code. Right? Now let's zoom out

lines of code. Right? Now let's zoom out for a second. Code generation is

for a second. Code generation is magnificent. Okay? Like it it's a

magnificent. Okay? Like it it's a gamecher when you're talking about green

gamecher when you're talking about green field. You saw people talk about it a

field. You saw people talk about it a few slides a few minutes before me. Uh

few slides a few minutes before me. Uh it it revolutionized how we do p proof

it it revolutionized how we do p proof of concept uh project etc. But when

of concept uh project etc. But when you're dealing with heavyduty software

you're dealing with heavyduty software then you you like it or not we are

then you you like it or not we are dealing with a lot of things when uh

dealing with a lot of things when uh when you serve millions of clients you

when you serve millions of clients you have financial transactions when you're

have financial transactions when you're doing transportation you're dealing with

doing transportation you're dealing with code integrity if you like code

code integrity if you like code governance uh review standards testing

governance uh review standards testing relability etc. That's what we need to

relability etc. That's what we need to uh uh to deal with. Now let's break that

uh uh to deal with. Now let's break that under the surface part of the glacier

under the surface part of the glacier into two dimensions. This is one

into two dimensions. This is one dimension you can look on the quality

dimension you can look on the quality issues in throughout the software

issues in throughout the software development life cycle like planning and

development life cycle like planning and then development writing code review

then development writing code review code review is a bit of a process but

code review is a bit of a process but like what you're like checking quality

like what you're like checking quality that's part of the process of code

that's part of the process of code review testing which is another part of

review testing which is another part of of quality and and deployment and I know

of quality and and deployment and I know I didn't cover the entire like uh

I didn't cover the entire like uh software development life cycle but just

software development life cycle but just to give you an example and each one of

to give you an example and each one of them like possess like introduce new

them like possess like introduce new problems that are coming because you're

problems that are coming because you're using more and more AI generated code.

using more and more AI generated code. Um now another dimension to look at it

Um now another dimension to look at it is actually code level problems and

is actually code level problems and process level problems. Okay, I'm not

process level problems. Okay, I'm not I'm not opening the you know list of

I'm not opening the you know list of functional just opening the list of

functional just opening the list of non-functional. You're talking about

non-functional. You're talking about security inefficiency that are not

security inefficiency that are not necessarily uh functional. Use I'll show

necessarily uh functional. Use I'll show you some statistics about that. And then

you some statistics about that. And then process level is for example learning.

process level is for example learning. Hey if you will have a

Hey if you will have a a a bad outage because of AI generated

a a bad outage because of AI generated code who is responsible is it the AI or

code who is responsible is it the AI or or the team that own that okay like you

or the team that own that okay like you need to learn and own the code

need to learn and own the code eventually that's a process that needs

eventually that's a process that needs to be done verification porting

to be done verification porting guardrails standards uh etc. So, so all

guardrails standards uh etc. So, so all of those issues when they are introduced

of those issues when they are introduced to thousands of developer that we asked

to thousands of developer that we asked them do you think like actually AI

them do you think like actually AI helped to reduce with those problems or

helped to reduce with those problems or or actually made more like more

or actually made more like more challenging 42 people reported that they

challenging 42 people reported that they spend 42 more of the development time on

spend 42 more of the development time on solving issues on fixing bugs etc and

solving issues on fixing bugs etc and and they saw 35 uh% project delays

and they saw 35 uh% project delays we're talking But we're talking about

we're talking But we're talking about maybe games they're talking about like

maybe games they're talking about like delays. Okay, there's some bias. We told

delays. Okay, there's some bias. We told them we talked about problem with

them we talked about problem with quality and what's the impact etc. Um

quality and what's the impact etc. Um but that's what they they they present

but that's what they they they present uh to when they they answer uh when when

uh to when they they answer uh when when you're talking about like when you're

you're talking about like when you're mass using AI code AI generated code and

mass using AI code AI generated code and we see reports uh some of the reports

we see reports uh some of the reports talking about 3x more security inc

talking about 3x more security inc incidents. By the way, it makes sense.

incidents. By the way, it makes sense. You remember we had a slide saying 3x

You remember we had a slide saying 3x more writing code. So 3x more security

more writing code. So 3x more security incidents like the same amount of line

incidents like the same amount of line of code the same amount of uh uh

of code the same amount of uh uh problems correlation. So what to do with

problems correlation. So what to do with that? Like I talked about problems and

that? Like I talked about problems and problems and problems. Okay, help help

problems and problems. Okay, help help me deal with it. Like let's let's spend

me deal with it. Like let's let's spend a few minutes on on that. So one one

a few minutes on on that. So one one suspect of course is testing and

suspect of course is testing and actually really interesting. We asked a

actually really interesting. We asked a couple of question about testing and one

couple of question about testing and one really relevant saying that people said

really relevant saying that people said that when they heavily use AI to on

that when they heavily use AI to on testing use AI to do testing they

testing use AI to do testing they actually double their trust in the AI

actually double their trust in the AI generated code. Okay, that's one thing.

generated code. Okay, that's one thing. The ne next suspect to help us with the

The ne next suspect to help us with the quality is code review. What really

quality is code review. What really interesting about code review that it's

interesting about code review that it's a process that helps almost with all the

a process that helps almost with all the process level and the code level like

process level and the code level like issues. For example, you can set your AI

issues. For example, you can set your AI code review tool to tell you block this

code review tool to tell you block this PR if it doesn't cover certain level of

PR if it doesn't cover certain level of test coverage. So through the PR, you

test coverage. So through the PR, you take care of the testing process

take care of the testing process problem. Okay. So code like code review

problem. Okay. So code like code review with AI is actually one of one of the

with AI is actually one of one of the major things you you you can do and

major things you you you can do and people that are developers that are

people that are developers that are using AI code review tool they're saying

using AI code review tool they're saying that they're saying they're seeing

that they're saying they're seeing double the quality gain and they're

double the quality gain and they're saying that actually it's it helps them

saying that actually it's it helps them to uh uh improve improve 47% in

to uh uh improve improve 47% in productivity of writing code. Okay. Now

productivity of writing code. Okay. Now a bit statistics from our own uh AI code

a bit statistics from our own uh AI code review tool. We scan a million of PRs a

review tool. We scan a million of PRs a month and we took one mill million of

month and we took one mill million of those PRs and we noticed that 17%

those PRs and we noticed that 17% include like high severity issues. By

include like high severity issues. By the way, we're now analyzing uh before

the way, we're now analyzing uh before and after using AI. I don't have that

and after using AI. I don't have that statistics yet, but we are noticing

statistics yet, but we are noticing since we're starting uh most of the

since we're starting uh most of the companies we serve, they use AI

companies we serve, they use AI generated code. So that's why uh I don't

generated code. So that's why uh I don't have before. We need to go scan

have before. We need to go scan backwards. Uh and that's like a really

backwards. Uh and that's like a really big a big number. Another thing I want

big a big number. Another thing I want to talk to you like about uh when you're

to talk to you like about uh when you're trying to improve on quality is is the

trying to improve on quality is is the foundation of having the right context

foundation of having the right context that is brought to the uh code

that is brought to the uh code generation tool that is brought to the

generation tool that is brought to the AI code review tool. Better context

AI code review tool. Better context better quality across the board wherever

better quality across the board wherever you're using AI. Uh so when we asked

you're using AI. Uh so when we asked developers when when you h when you

developers when when you h when you don't trust AI generated code like you

don't trust AI generated code like you remember like 67% sa like are really

remember like 67% sa like are really worried about that they said 80 80% of

worried about that they said 80 80% of the time they don't trust the context

the time they don't trust the context that the LLM have okay and and and uh

that the LLM have okay and and and uh when we asked developers what would you

when we asked developers what would you like to be improved in your AI generated

like to be improved in your AI generated code in your AI code review tool they

code in your AI code review tool they said the number one was context it was

said the number one was context it was number one was 33% they could choose

number one was 33% they could choose among many things to to improve. So

among many things to to improve. So context is extremely important. I can

context is extremely important. I can tell you that as codto one of our

tell you that as codto one of our technology moes uh is is around context

technology moes uh is is around context and when you connect our context engine

and when you connect our context engine we're seeing it as the number one tool

we're seeing it as the number one tool that is being used like 60% of code

that is being used like 60% of code generator or code review tools 60% of

generator or code review tools 60% of their calls to an MCP would be to a

their calls to an MCP would be to a context MCP. Okay. And just to tell you

context MCP. Okay. And just to tell you the context doesn't necessarily need to

the context doesn't necessarily need to include only your code. It could also

include only your code. It could also include context to your standards, your

include context to your standards, your best practices. We're seeing in our AI

best practices. We're seeing in our AI code review that 8% of the context usage

code review that 8% of the context usage is actually from files that are related

is actually from files that are related to standards and and best practices etc.

to standards and and best practices etc. Okay, I have to CEO of Kodo like

Okay, I have to CEO of Kodo like marketing will be mad on me if I don't

marketing will be mad on me if I don't brag a little bit. Right? So this is uh

brag a little bit. Right? So this is uh kind of like our architecture of our

kind of like our architecture of our context engine being presented by Jensen

context engine being presented by Jensen on GTC keynote. And he notice he didn't

on GTC keynote. And he notice he didn't talk about our qu co code review

talk about our qu co code review capabilities about our testing

capabilities about our testing capabilities. He talked about our

capabilities. He talked about our context engine that Nvidia checked

context engine that Nvidia checked because there's a realization that AI

because there's a realization that AI quality AI generated whatever review

quality AI generated whatever review testing will come from bringing the

testing will come from bringing the right context. So invest in that you

right context. So invest in that you need to build your context. Buy a

need to build your context. Buy a solution and invest in it. Build your

solution and invest in it. Build your solution uh etc. And the context needs

solution uh etc. And the context needs to include code uh uh versioning PR

to include code uh uh versioning PR history uh organization logs etc. That's

history uh organization logs etc. That's where all the context sits. It's not

where all the context sits. It's not just in the last branch of your

just in the last branch of your codebase. Okay. So I'm I'm zooming out

codebase. Okay. So I'm I'm zooming out starting to talk about like

starting to talk about like recommendations and uh and like uh

recommendations and uh and like uh takeaways. So what what what's next? So

takeaways. So what what what's next? So automated uh quality gateways invest in

automated uh quality gateways invest in that. People talked throughout the

that. People talked throughout the morning about parallel agents. You know

morning about parallel agents. You know what I'm talking about? Like background

what I'm talking about? Like background agents. You can use a lot of those like

agents. You can use a lot of those like tools and capabilities to build build

tools and capabilities to build build your quality gates. Uh use intelligent

your quality gates. Uh use intelligent code review testing and you need a li

code review testing and you need a li living and breathing like documentation

living and breathing like documentation and and what documentation means is is a

and and what documentation means is is a story by itself. Uh I'm not going to

story by itself. Uh I'm not going to double click on it. And and this is how

double click on it. And and this is how I present for three years now and I

I present for three years now and I think I'm going to go all the way until

think I'm going to go all the way until age of 60 with this slide of how I think

age of 60 with this slide of how I think the future of software development looks

the future of software development looks like. Okay. So basically you have your

like. Okay. So basically you have your specification and you have your code

specification and you have your code right and you have multiple agents

right and you have multiple agents parallel agents that are helping you to

parallel agents that are helping you to improve your spec write your spec

improve your spec write your spec improve your code transfer transfer from

improve your code transfer transfer from your spec to your to your code uh make

your spec to your to your code uh make tests which are executable specs right

tests which are executable specs right uh and and then you're going to have

uh and and then you're going to have your context engine the software

your context engine the software development database and you will build

development database and you will build your tools especially MCPs around

your tools especially MCPs around quality and verification and you will

quality and verification and you will Make sure you have environments, stable,

Make sure you have environments, stable, secured sandboxes where those agents can

secured sandboxes where those agents can run and and run validation and quality

run and and run validation and quality uh workflows. So don't don't forget like

uh workflows. So don't don't forget like the path forward is quality is your

the path forward is quality is your competitive edge over your uh

competitive edge over your uh competition. AI is a tool. It's not it's

competition. AI is a tool. It's not it's not a solution. Okay? And don't like

not a solution. Okay? And don't like only think about code generation as the

only think about code generation as the only thing. Look on the entire SDLC or

only thing. Look on the entire SDLC or product development life cycle. I saw

product development life cycle. I saw one of the uh people talked um speakers

one of the uh people talked um speakers and it iterate with everything we talked

and it iterate with everything we talked about today. I have uh I want to tell

about today. I have uh I want to tell you that you will gain value from it.

you that you will gain value from it. We're seeing in the reports people

We're seeing in the reports people seeing like security availability being

seeing like security availability being reduced faster code review you we just

reduced faster code review you we just got a hit on that because of AI

got a hit on that because of AI generated code and test coverage in a

generated code and test coverage in a month can can triple depends on on the

month can can triple depends on on the project etc. with with the last minute I

project etc. with with the last minute I want to show you like a really small

want to show you like a really small piece of what you can do with codo. uh

piece of what you can do with codo. uh you can go into codto and define your

you can go into codto and define your own rule for example almost the same

own rule for example almost the same rule you'll put on cursor of I don't

rule you'll put on cursor of I don't like nested ifs if this is a problem

like nested ifs if this is a problem that you have but then codto will look

that you have but then codto will look on your context build the good example

on your context build the good example the bad example and then start giving

the bad example and then start giving like building a workflow that is

like building a workflow that is specifically to catch that issue and

specifically to catch that issue and give you statistics over time when it's

give you statistics over time when it's being accepted and when not so you can

being accepted and when not so you can adjust that rule and really know and

adjust that rule and really know and have visibility to to your standards.

have visibility to to your standards. Okay. So when a PR is written with a few

Okay. So when a PR is written with a few ifs and else although it was written

ifs and else although it was written with cursor copilot that had a rule do

with cursor copilot that had a rule do not do nested ifs etc. then eventually

not do nested ifs etc. then eventually when you open a PR you will get uh codo

when you open a PR you will get uh codo uh uh catching that and giving a

uh uh catching that and giving a suggestion according to the good and the

suggestion according to the good and the bad example. COD will also make a graph,

bad example. COD will also make a graph, give you a CLI checks like check each

give you a CLI checks like check each one of the rules and eventually tell you

one of the rules and eventually tell you the nested if and then we'll record and

the nested if and then we'll record and learn what you did or did not do with

learn what you did or did not do with that suggestion in order to adapt the

that suggestion in order to adapt the standard and of the of the quality. Um

standard and of the of the quality. Um there will also automated like

there will also automated like suggestion. You don't need to write your

suggestion. You don't need to write your own. It learns your your your standards

own. It learns your your your standards and quality and offer that to you. And

and quality and offer that to you. And that's it. I'm I'm really really excited

that's it. I'm I'm really really excited about like breaking the glass ceiling,

about like breaking the glass ceiling, okay, with what we did with code

okay, with what we did with code generation and then a jet to code

generation and then a jet to code generation. Now we're turning into the

generation. Now we're turning into the era of putting AI into work and through

era of putting AI into work and through the entire SDLC. The most important part

the entire SDLC. The most important part is related to quality. You would need to

is related to quality. You would need to invest in that. It's not out of the box.

invest in that. It's not out of the box. Okay. And then you would see eventually

Okay. And then you would see eventually the promised 2x that that that probably

the promised 2x that that that probably promised to the CEO or something like

promised to the CEO or something like that once they give you the budget for

that once they give you the budget for for the relevant tools. Thank you so

for the relevant tools. Thank you so much. [applause]

much. [applause] [music]

Our next speaker is introducing Miniaax's latest model and how it powers

Miniaax's latest model and how it powers nextG experiences for code generation.

nextG experiences for code generation. Please welcome to the stage senior

Please welcome to the stage senior researcher at Miniaax, Olive Song.

researcher at Miniaax, Olive Song. [music]

Hi. Hi everyone. Um, I'm Olive. It's my great honor here today to present on our

great honor here today to present on our new model Mini Max M2. Um, I actually

new model Mini Max M2. Um, I actually lived in New York City for six years, so

lived in New York City for six years, so it feels great to come back. Um, but

it feels great to come back. Um, but with a different role. Um, I currently

with a different role. Um, I currently study reinforcement learning and model

study reinforcement learning and model evaluation at Miniax. Um, let me just

evaluation at Miniax. Um, let me just get a quick sense of the room. Who here

get a quick sense of the room. Who here has heard or have tried of Miniax

has heard or have tried of Miniax before? Oh, a couple of there. Yeah, not

before? Oh, a couple of there. Yeah, not everybody, but I guess yeah, but here's

everybody, but I guess yeah, but here's the value, right, of me standing here

the value, right, of me standing here today. Um so we are a global company

today. Um so we are a global company that works on both foundation models and

that works on both foundation models and applications. We develop multi modality

applications. We develop multi modality models including text um vision language

models including text um vision language models our video generation model hyo

models our video generation model hyo and speech generation music generation

and speech generation music generation stuff and we also have um many

stuff and we also have um many applications including agents and stuff

applications including agents and stuff um inhouse. So that that's the specific

um inhouse. So that that's the specific thing that's different from the other

thing that's different from the other labs for other companies. So we both

labs for other companies. So we both develop foundation models um and

develop foundation models um and applications. So we have research and

applications. So we have research and developers sitting uh sitting side by

developers sitting uh sitting side by side working on things. Um so our

side working on things. Um so our difference would be that we have

difference would be that we have firsthand experience from our um

firsthand experience from our um in-house developers into developing

in-house developers into developing models that developers would really need

models that developers would really need in the community. And here I want to

in the community. And here I want to introduce our Miniax M2 um which is an

introduce our Miniax M2 um which is an openweight model very small with only 10

openweight model very small with only 10 billion active parameters um that was

billion active parameters um that was designed specifically for coding

designed specifically for coding workplace agentic tasks. It's very

workplace agentic tasks. It's very costefficient.

costefficient. Um let me just go over the benchmark

Um let me just go over the benchmark performance because people care about

performance because people care about it. So uh we rank very top in both um

it. So uh we rank very top in both um intelligence benchmarks and also agent

intelligence benchmarks and also agent benchmarks. Uh we I think we're on the

benchmarks. Uh we I think we're on the top of the open source models. But then

top of the open source models. But then numbers don't tell everything because

numbers don't tell everything because sometimes you get those super high

sometimes you get those super high number models you plug into them um into

number models you plug into them um into your environment and they suck, right?

your environment and they suck, right? So we really care about the dynamics in

So we really care about the dynamics in the community and in our first week we

the community and in our first week we had the most downloads

had the most downloads and also we climbed up to top three

and also we climbed up to top three token usage on open router. So we're

token usage on open router. So we're very glad that people in the community

very glad that people in the community are really loving our model um into

are really loving our model um into their development cycle.

their development cycle. So today what I want to share is how we

So today what I want to share is how we actually shape these men model

actually shape these men model characteristics that made M2 so good in

characteristics that made M2 so good in your coding experience. And I'm going to

your coding experience. And I'm going to present to you um the training be behind

present to you um the training be behind it that supports each one of them from

it that supports each one of them from coding experience to long horizon state

coding experience to long horizon state tracking tasks um to robust

tracking tasks um to robust generalization to different scaffolds to

generalization to different scaffolds to multi- aent uh scalability.

multi- aent uh scalability. So first let's talk about code

So first let's talk about code experience which we sc uh which we

experience which we sc uh which we supported with um scaled environments

supported with um scaled environments and scaled experts.

and scaled experts. So um developers need a model that can

So um developers need a model that can actually work in the language they use

actually work in the language they use and across the workflow that they deal

and across the workflow that they deal with every day. So which means that we

with every day. So which means that we need to utilize the real data from from

need to utilize the real data from from the internet and then um scale the

the internet and then um scale the number of environments so that the model

number of environments so that the model when during training for example during

when during training for example during reinforcement learning it can actually

reinforcement learning it can actually um reacts to the uh environment. it can

um reacts to the uh environment. it can actually target verifiable coding goals

actually target verifiable coding goals and to learn from it. So that's why we

and to learn from it. So that's why we scaled both the number uh of

scaled both the number uh of environments and also our um

environments and also our um infrastructure so that we can perform

infrastructure so that we can perform those training very efficiently.

those training very efficiently. So um with data construction and

So um with data construction and reinforcement learning we were able to

reinforcement learning we were able to train the model so that it's very strong

train the model so that it's very strong um it's full stack multilingual

um it's full stack multilingual and what I want to mention here is that

and what I want to mention here is that besides scaling environment that

besides scaling environment that everybody talks about we actually scale

everybody talks about we actually scale something called expert developers um as

something called expert developers um as reward models. So as I mentioned before

reward models. So as I mentioned before uh we have a ton of um super expert

uh we have a ton of um super expert developers in house that could give us

developers in house that could give us feedback to our model's performance. So

feedback to our model's performance. So they participated closely into the model

they participated closely into the model development and training cycle including

development and training cycle including problem definition for example um bugs

problem definition for example um bugs bug fixing for example um repo

bug fixing for example um repo refactoring and stuff like that. And

refactoring and stuff like that. And also they identify the model behaviors

also they identify the model behaviors that developers enjoy and they identify

that developers enjoy and they identify what's reliable and uh what developers

what's reliable and uh what developers would trust

would trust and they give precise reward and

and they give precise reward and evaluation to the model's behaviors to

evaluation to the model's behaviors to the final um deliverables so that um it

the final um deliverables so that um it is a model that developers really want

is a model that developers really want to work with and that can adds

to work with and that can adds efficiency to the developers.

So with that we were able to lead in many um languages in real use.

many um languages in real use. And the second characteristic that

And the second characteristic that Miniax M2 has is it it performs good in

Miniax M2 has is it it performs good in those long horizon tasks. Uh those long

those long horizon tasks. Uh those long tasks that require interacting with

tasks that require interacting with complex environments that requiring um

complex environments that requiring um using multiple tools with reasoning.

using multiple tools with reasoning. And we supported that with the interled

And we supported that with the interled thinking pattern um and reinforcement

thinking pattern um and reinforcement learning.

learning. So what is interled thinking? Um so with

So what is interled thinking? Um so with a normal reasoning model that can use

a normal reasoning model that can use tools, it it normally works like this.

tools, it it normally works like this. You have the tools information given to

You have the tools information given to it. You have the system prompts. Um you

it. You have the system prompts. Um you have user prompts and then the model

have user prompts and then the model would think and then it calls tools. It

would think and then it calls tools. It can be a couple of tools at the same

can be a couple of tools at the same time. And then they get the tool

time. And then they get the tool response from the environment and then

response from the environment and then it performs a final thinking and deliver

it performs a final thinking and deliver a final content. But but here's the

a final content. But but here's the truth, right? In real world, the

truth, right? In real world, the environments are often noisy and

environments are often noisy and dynamic. You can't really perform this

dynamic. You can't really perform this one test just by once. You can get um

one test just by once. You can get um tool errors for example. You can get um

tool errors for example. You can get um unexpected results from the environment

unexpected results from the environment and stuff like that. So um what we did

and stuff like that. So um what we did is that we imagine how humans interact

is that we imagine how humans interact with the world. We we we look at

with the world. We we we look at something we get feedbacks and then we

something we get feedbacks and then we think about it. We think if the feedback

think about it. We think if the feedback is good or not and then we make other

is good or not and then we make other actions, make other decisions. And

actions, make other decisions. And that's why we did the same thing with

that's why we did the same thing with our M2 model. So if we look at this um

our M2 model. So if we look at this um chart over a diagram on the right. So

chart over a diagram on the right. So instead of just stopping um after one

instead of just stopping um after one round of tool calling, it actually

round of tool calling, it actually thinks again and reacts to the uh reacts

thinks again and reacts to the uh reacts to the environments to see if the

to the environments to see if the information is enough for it to uh get

information is enough for it to uh get what it wants. So basically we call the

what it wants. So basically we call the interle thinking or people call it

interle thinking or people call it interle thinking because it interle

interle thinking because it interle thinking with tool calling. um a couple

thinking with tool calling. um a couple of time it can be you know uh tens to a

of time it can be you know uh tens to a hundred um turns of tool calling within

hundred um turns of tool calling within just one user interaction term

just one user interaction term so it helps um adaptation to environment

so it helps um adaptation to environment noise for example uh just like what I

noise for example uh just like what I mentioned the environment is it's it's

mentioned the environment is it's it's not stable all the time and then

not stable all the time and then something is suboptimal and then it can

something is suboptimal and then it can choose to use other tools or do other

choose to use other tools or do other decisions it can focus on long horizon

decisions it can focus on long horizon has um can automate your workflow um

has um can automate your workflow um using for example Gmails, notions um

using for example Gmails, notions um terminal all at the same time. You just

terminal all at the same time. You just need to maybe make one model call

need to maybe make one model call without minim with minimal um human

without minim with minimal um human intervention. It can do it all by

intervention. It can do it all by itself. And here's a cool illustration

itself. And here's a cool illustration on the right because it's New York City.

on the right because it's New York City. I feel the vibe of you know trading and

I feel the vibe of you know trading and marketing. Um so you can see that there

marketing. Um so you can see that there was some um there was some pertabbations

was some um there was some pertabbations in the stock market uh I think last week

in the stock market uh I think last week and then our model was able to keep it

and then our model was able to keep it stable. So just like I said there's like

stable. So just like I said there's like environment noise there's no new

environment noise there's no new information there's like yeah news it

information there's like yeah news it looks like there there's like other

looks like there there's like other trading policies and stuff like that but

trading policies and stuff like that but our model was able to uh to perform

our model was able to uh to perform pretty stably in these kind of

pretty stably in these kind of environments.

environments. And the third characteristic is our

And the third characteristic is our robust um generalization to many agent

robust um generalization to many agent scaffolds which was supported by our

scaffolds which was supported by our perturbations in the data pipeline.

perturbations in the data pipeline. So we want our agent to generalize. But

So we want our agent to generalize. But what is agent generalization?

what is agent generalization? At first we thought it was just tool

At first we thought it was just tool scaling. We train the model with enough

scaling. We train the model with enough tools, various tools kind of new tools.

tools, various tools kind of new tools. we invent tools um and then it will just

we invent tools um and then it will just perform good on unseen tools. Well, that

perform good on unseen tools. Well, that was kind of the truth. It worked at

was kind of the truth. It worked at first. Uh but then we soon realized that

first. Uh but then we soon realized that if we perturb the environment a little

if we perturb the environment a little bit, for example, we change another

bit, for example, we change another agent scaffold, then it doesn't

agent scaffold, then it doesn't generalize. So what is agent

generalize. So what is agent generalization?

generalization? Well, we conclude that um it's

Well, we conclude that um it's adaptation to perturbations across the

adaptation to perturbations across the model's entire uh operational space.

model's entire uh operational space. If we uh think back what's the model's

If we uh think back what's the model's um operational space that we talked

um operational space that we talked about it can be tool information it can

about it can be tool information it can be system prompts it can be user prompts

be system prompts it can be user prompts they can all all be different they can

they can all all be different they can be the chat template they can be the

be the chat template they can be the environment they can be the tool

environment they can be the tool response. So what we did is that we

response. So what we did is that we designed and maintain perturbation

designed and maintain perturbation pipelines of our data so that um our

pipelines of our data so that um our model can actually gen generalized to a

model can actually gen generalized to a lot of agent scaffolds.

lot of agent scaffolds. And the fourth characteristic that I

And the fourth characteristic that I want to mention is the multi- aent

want to mention is the multi- aent scalability

scalability um which is very possible with M2

um which is very possible with M2 because it's very small and cost

because it's very small and cost effective.

I have a couple of videos here. Um, this is M2 powered by our own Miniax agent uh

is M2 powered by our own Miniax agent uh app. We actually have the QR code

app. We actually have the QR code downside. So, if you want it, you can

downside. So, if you want it, you can just scan and try it. So, it's like an

just scan and try it. So, it's like an agent app we we developed. And here we

agent app we we developed. And here we can see different copies of M2, right?

can see different copies of M2, right? It can do research. um it can write the

It can do research. um it can write the write the research results and analyze

write the research results and analyze it and put it in a re report. It can put

it and put it in a re report. It can put it in some kind of front-end

it in some kind of front-end illustration and they can work in

illustration and they can work in parallel. So because it is so small um

parallel. So because it is so small um and so cost effective, it can really um

and so cost effective, it can really um support those long run agentic tasks and

support those long run agentic tasks and tasks that maybe um require some kind of

tasks that maybe um require some kind of parallelism.

So what's next right for Miniax M2 from what I've introduced we gathered

what I've introduced we gathered environments um algorithms data expert

environments um algorithms data expert values model architecture inference

values model architecture inference evaluation all these stuff to build a

evaluation all these stuff to build a model um that was you know fast that was

model um that was you know fast that was uh intelligent that could use tools that

uh intelligent that could use tools that generalizes

generalizes what's next

what's next for um M2.1 1 and M3 were in the future.

for um M2.1 1 and M3 were in the future. We thinks of better coding, maybe memory

We thinks of better coding, maybe memory work, context management, proactive AI

work, context management, proactive AI for workplace, vertical experts, and

for workplace, vertical experts, and because we have those great audio

because we have those great audio generation, video generation models,

generation, video generation models, maybe we can integrate them. But all our

maybe we can integrate them. But all our mission is that we're committed to bring

mission is that we're committed to bring all these resources, whatever is on the

all these resources, whatever is on the screen and maybe more. Yeah. and values

screen and maybe more. Yeah. and values and put them all together to develop

and put them all together to develop models for uh the community to use. So

models for uh the community to use. So um we really need feedback from the

um we really need feedback from the community if possible because we want to

community if possible because we want to build this together and you know this is

build this together and you know this is kind of a race that everyone needs to

kind of a race that everyone needs to participate and then um we com we are

participate and then um we com we are committed to share it with the

committed to share it with the community. Yeah.

community. Yeah. And that's all the insights for today.

And that's all the insights for today. Um, we really hope again we really hope

Um, we really hope again we really hope you to try the model because it's pretty

you to try the model because it's pretty good. And then we can contact contact us

good. And then we can contact contact us up there. You can try the models by

up there. You can try the models by scanning the QR code. Yeah, basically

scanning the QR code. Yeah, basically that's it. Thank you all for listening.

that's it. Thank you all for listening. [applause]

Ladies [music] and gentlemen, please welcome back to the stage, Alex

welcome back to the stage, Alex Lieberman.

Lieberman. Let's give it up again for Olive [music]

Let's give it up again for Olive [music] and all the other speakers from the

and all the other speakers from the morning. [applause]

morning. [applause] It is time for lunch. Very exciting. Uh,

It is time for lunch. Very exciting. Uh, one thing I want to say before we head

one thing I want to say before we head out for lunch and we're it's going to be

out for lunch and we're it's going to be downstairs in the expo. check out all

downstairs in the expo. check out all the boos, talk to people, have food is,

the boos, talk to people, have food is, you know, my own experience with going

you know, my own experience with going to conferences is even though I come up

to conferences is even though I come up talk on stage a lot, I find it very

talk on stage a lot, I find it very difficult to engage in conversation with

difficult to engage in conversation with people when there's like these little

people when there's like these little small group settings. I don't know like

small group settings. I don't know like can I go and chat with people? Can I

can I go and chat with people? Can I not? This is a kind of awkward. I give

not? This is a kind of awkward. I give you all permission to butt into

you all permission to butt into conversations, introduce yourself. Ben

conversations, introduce yourself. Ben and Swix have done an incredible job of

and Swix have done an incredible job of cultivating such a high quality

cultivating such a high quality community here. And the most value you

community here. And the most value you will get is not just from these

will get is not just from these incredible presentations. It's from

incredible presentations. It's from meeting other folks in the crowd. So

meeting other folks in the crowd. So please, you have my permission butt into

please, you have my permission butt into conversations. Introduce yourself. Share

conversations. Introduce yourself. Share what you've learned with folks. And if

what you've learned with folks. And if you need any sort of uh ice breakers to

you need any sort of uh ice breakers to get the conversation going, I have two

get the conversation going, I have two for you. One is just go into a group and

for you. One is just go into a group and share your hottest take on uh the state

share your hottest take on uh the state of AI today. It's a great way to get off

of AI today. It's a great way to get off to a good start with someone. The

to a good start with someone. The second, a little less intense, is is a

second, a little less intense, is is a hot dog a sandwich? Is cereal uh in milk

hot dog a sandwich? Is cereal uh in milk a soup? That is how you're going to

a soup? That is how you're going to start the conversations with folks.

start the conversations with folks. Everyone enjoy lunch. We'll see you back

Everyone enjoy lunch. We'll see you back in an hour and uh thanks so much for

in an hour and uh thanks so much for your time.

Heat. Heat. [music]

Heat. [music]

[music] >> Heat up here.

Heat up

here. [music]

>> Heat up here.

here. [music]

Heat. [music] Heat.

[music] Heat.

[music] here.

>> Heat [music]

[music] up here.

>> Heat up here.

up here. [music]

>> Heat. Heat. [music]

Heat. [music]

[music] >> Heat.

[music] Heat.

Heat. [music]

[music] Heat. Heat.

Heat. Heat. [music]

>> Heat. [music] Heat.

[music] Heat.

Heat. Heat [music]

Heat up

Heat up here. [music]

Heat [music] up here.

Heat. [music]

>> [music] >> Heat. Heat.

here. [music]

[music] Heat

[music] Heat.

Heat [music]

>> Heat. Heat. [music]

[music] Heat. Heat.

here. [music]

[music] Heat.

Heat up here. [music]

>> Heat up here.

up here. [music]

>> [music] >> Heat. Heat.

here. Heat. Heat.

[music] >> Heat. Heat.

[music] Heat.

>> Heat up here.

>> Heat. [music]

[music] Heat.

>> up [music] here.

[music] Heat.

>> [music] >> Heat up

>> Heat up here.

here. [music]

>> [music] >> up

here. Heat. Heat.

>> Heat up

Heat. Heat. [music]

Heat up here.

[music] Heat.

[music] >> Heat up here.

>> [music] >> Heat up here.

Heat. [music]

Heat. Heat.

>> Heat. Heat.

[music] Heat. Heat.

[music] Hey. Hey.

>> Heat. Heat.

Heat. Heat. Heat. [music]

Heat. Heat. [music]

>> Heat up here.

here. [music]

[music] Heat. Heat.

[music] up

up here.

Heat. [music]

>> Heat up here. [music]

>> Heat [music]

up here. Heat. Heat.

[music] Heat. Heat.

>> Heat up [music]

[music] Heat.

Heat. [music]

>> Heat. Heat. [music]

Heat. Heat. [music]

>> up [music]

[music] Heat.

[music] Heat. Heat.

>> Heat >> [music]

[music] >> Heat.

Heat. [music]

>> [music] >> Heat up here.

Heat. [music]

>> [music] >> Heat up here.

Heat >> [music]

Heat [music]

[music] up here.

>> How's everyone How you doing? Good lunch. [music]

lunch. [music] Excited for the afternoon sessions. Out

Excited for the afternoon sessions. Out of curiosity, did anyone have the hot

of curiosity, did anyone have the hot dog conversation? Does anyone think Who

dog conversation? Does anyone think Who thinks that a hot dog's a sandwich?

thinks that a hot dog's a sandwich? We got one. We got two.

We got one. We got two. Uh, anyone think a hot dog isn't a

Uh, anyone think a hot dog isn't a sandwich? Most of the crowd. That is

sandwich? Most of the crowd. That is that is usually the consensus. Uh, one

that is usually the consensus. Uh, one other question. Who thinks that they

other question. Who thinks that they have the hottest take on the state of AI

have the hottest take on the state of AI or AI engineering right now in the room?

or AI engineering right now in the room? Anyone think they have the hottest take?

Anyone think they have the hottest take? Well, I I'll I'll give you uh a tea up

Well, I I'll I'll give you uh a tea up for later. My co-founder Arman is

for later. My co-founder Arman is speaking around four and I would say he

speaking around four and I would say he has one of the hotter takes I've seen,

has one of the hotter takes I've seen, which is he thinks all engineers should

which is he thinks all engineers should be paid like salespeople based on

be paid like salespeople based on output. That is going to attract a lot

output. That is going to attract a lot of debate and I give you full permission

of debate and I give you full permission to debate him after his talk. Well, are

to debate him after his talk. Well, are you guys ready to jump into the next

you guys ready to jump into the next group of sessions?

group of sessions? >> Let's do it. We will be diving into

>> Let's do it. We will be diving into proactive agents from Google labs,

proactive agents from Google labs, building Gen BI at a Fortune 100

building Gen BI at a Fortune 100 business, deploying AI within

business, deploying AI within Bloomberg's engineering org, lessons

Bloomberg's engineering org, lessons learned building an AI browser, and

learned building an AI browser, and developer experience in the age of AI

developer experience in the age of AI coding agents. With that, please join me

coding agents. With that, please join me in welcoming our next speaker, Kath

in welcoming our next speaker, Kath Corvec, director of product at Google

Corvec, director of product at Google Labs. Let's give it to her.

>> Hi everybody. I'm so excited to be here. I love New

I'm so excited to be here. I love New York and I love meeting everybody here.

York and I love meeting everybody here. And I am Kath Corbec. I'm from Google

And I am Kath Corbec. I'm from Google Labs and I work on this little team

Labs and I work on this little team called ADA. And I'm going to be talking

called ADA. And I'm going to be talking about some of the stuff that we've been

about some of the stuff that we've been doing on this project called Jewels. So,

doing on this project called Jewels. So, a few months ago in my household, our

a few months ago in my household, our dishwasher broke. And while it was being

dishwasher broke. And while it was being repaired, my husband decided that he was

repaired, my husband decided that he was going to do all the dishes. And so, he

going to do all the dishes. And so, he told me he was going to do this. But

told me he was going to do this. But every single night, I found myself

every single night, I found myself reminding him to do the dishes. And you

reminding him to do the dishes. And you can imagine that got old pretty fast.

can imagine that got old pretty fast. And I realized that even though I wasn't

And I realized that even though I wasn't physically washing the dishes, I was

physically washing the dishes, I was still carrying this mental load. I know

still carrying this mental load. I know a lot of you can probably relate to

a lot of you can probably relate to this. I was keeping track of whether or

this. I was keeping track of whether or not that task was done, following up,

not that task was done, following up, making sure that things kept moving. And

making sure that things kept moving. And I realized in that moment that that's

I realized in that moment that that's exactly where we are with asynchronous

exactly where we are with asynchronous agents today. They can handle some of

agents today. They can handle some of the work, but we're still the ones as

the work, but we're still the ones as developers carrying that mental load and

developers carrying that mental load and monitoring them. So here's the truth.

monitoring them. So here's the truth. Humans, we are serial processors, not

Humans, we are serial processors, not parallel ones. We can juggle multiple

parallel ones. We can juggle multiple goals, but we execute them in sequence,

goals, but we execute them in sequence, not all at once. When you manually kick

not all at once. When you manually kick off a task in jewels, you're usually

off a task in jewels, you're usually waiting to be able to move on. And it's

waiting to be able to move on. And it's that pause, it's that gap in attention

that pause, it's that gap in attention where we really lose momentum. And this

where we really lose momentum. And this is actually backed up by science where

is actually backed up by science where uh humans actually think we think we're

uh humans actually think we think we're multitaskers, but we're actually

multitaskers, but we're actually executing many tasks very rapidly. But

executing many tasks very rapidly. But switching between these tasks comes with

switching between these tasks comes with a huge cost. It can cost up to 40% of

a huge cost. It can cost up to 40% of your productive time. So that's like

your productive time. So that's like half a day lost to switching contexts

half a day lost to switching contexts and reloading. So if humans are uniters,

and reloading. So if humans are uniters, what's the solution here with agents? So

what's the solution here with agents? So for async agents, in order in order for

for async agents, in order in order for them to succeed, developers can't be

them to succeed, developers can't be expected to babysit them.

expected to babysit them. We've all seen that post on Twitter of

We've all seen that post on Twitter of 16 different cla tasks running in

16 different cla tasks running in parallel on 16 different terminals on

parallel on 16 different terminals on three different huge browsers or huge

three different huge browsers or huge monitors. And when I first saw this, I

monitors. And when I first saw this, I thought, god forbid that is the DevX of

thought, god forbid that is the DevX of the future. I want to I don't want to

the future. I want to I don't want to manage work. I don't want to manage my

manage work. I don't want to manage my agents. I want to be a coder. I want to

agents. I want to be a coder. I want to build. And so we need to think we need

build. And so we need to think we need uh uh collaborators in our system that

uh uh collaborators in our system that we can trust. agents that really

we can trust. agents that really understand context, can anticipate our

understand context, can anticipate our needs, and they know really when to step

needs, and they know really when to step in. And then uh I think finally, we're

in. And then uh I think finally, we're reaching that point with models where

reaching that point with models where they're getting better and better at

they're getting better and better at executing end to end as long as they

executing end to end as long as they understand what our goals are clearly.

understand what our goals are clearly. And that's where trust really becomes

And that's where trust really becomes this unlock where you can trust the

this unlock where you can trust the system to know what's missing, to fill

system to know what's missing, to fill in the gaps, and to really keep progress

in the gaps, and to really keep progress moving forward while you manage on

moving forward while you manage on something else where where while you

something else where where while you focus on what matters most. And

focus on what matters most. And essentially, we want jewels to do the

essentially, we want jewels to do the dishes without being asked.

dishes without being asked. So most AI developer tools today are

So most AI developer tools today are fundamentally reactive. you open up your

fundamentally reactive. you open up your CLI or your ID and you ask the agent to

CLI or your ID and you ask the agent to do something and it responds or it waits

do something and it responds or it waits for you to start typing and then it

for you to start typing and then it autocompletes a suggestion. And there's

autocompletes a suggestion. And there's a benefit to this model. It's very

a benefit to this model. It's very efficient. It only uses compute when you

efficient. It only uses compute when you explicitly ask for it. But the real

explicitly ask for it. But the real question I'm asking myself is, is this

question I'm asking myself is, is this how I want to manage AI? And if you

how I want to manage AI? And if you think about in the future, imagine a

think about in the future, imagine a world where compute is not a limiting

world where compute is not a limiting factor anymore. Instead of a single

factor anymore. Instead of a single reactive assistant for instructions, you

reactive assistant for instructions, you could have dozens of small proactive

could have dozens of small proactive agents working with you in parallel,

agents working with you in parallel, quietly looking for patterns, noticing

quietly looking for patterns, noticing friction, and taking on the boring tasks

friction, and taking on the boring tasks that you don't want to do before you

that you don't want to do before you even ask. It can do things like fixing

even ask. It can do things like fixing authentication bugs that you've been

authentication bugs that you've been avoiding. uh updating configs, flagging

avoiding. uh updating configs, flagging potential order uh errors, preparing uh

potential order uh errors, preparing uh migrations and all of this can happen in

migrations and all of this can happen in the background triggered off of things

the background triggered off of things in my natural workflow. So I really

in my natural workflow. So I really think there are four essential

think there are four essential ingredients that make up proactive

ingredients that make up proactive systems today. There's observation. The

systems today. There's observation. The agent has to really continually

agent has to really continually understand what is happening and of what

understand what is happening and of what your code changes are, what your

your code changes are, what your patterns are, what your workflow is,

patterns are, what your workflow is, etc. to get context about your entire

etc. to get context about your entire project. And then there's

project. And then there's personalization. And this one's

personalization. And this one's difficult. It has to learn how you work,

difficult. It has to learn how you work, what you care about, what you tend to

what you care about, what you tend to ignore, what your preferences are, the

ignore, what your preferences are, the code that you absolutely don't want to

code that you absolutely don't want to ever touch. And then it has to be timely

ever touch. And then it has to be timely as well. If it comes in too soon, it's

as well. If it comes in too soon, it's going to interrupt you. And if it's too

going to interrupt you. And if it's too late, then the moment is lost. And it

late, then the moment is lost. And it also has to work seamlessly across your

also has to work seamlessly across your workflow. It has to insert itself into

workflow. It has to insert itself into spaces where you naturally work already

spaces where you naturally work already in your terminal, in your repository, in

in your terminal, in your repository, in your IDE, not forcing you to go

your IDE, not forcing you to go somewhere else to some application

somewhere else to some application that's secret or that you forgot about.

that's secret or that you forgot about. So bringing all these tools together,

So bringing all these tools together, you can imagine, is not trivial.

>> So is running this presentation. Um, and uh, you you want to be able to ask your

uh, you you want to be able to ask your agent to understand your workflow and

agent to understand your workflow and anticipate your needs and then intervene

anticipate your needs and then intervene at exactly the right moment without

at exactly the right moment without breaking your workflow.

breaking your workflow. And that's when it really starts to feel

And that's when it really starts to feel like magic. The interesting thing is pro

like magic. The interesting thing is pro these proactive systems, they're all

these proactive systems, they're all around us today. One of my favorite

around us today. One of my favorite examples is Google Nest where you put it

examples is Google Nest where you put it in your house, you install it, and then

in your house, you install it, and then you configure it and then it starts to

you configure it and then it starts to learn your habits as you leave the

learn your habits as you leave the house, as you come back, uh, as you go

house, as you come back, uh, as you go to sleep, as you wake up in the morning.

to sleep, as you wake up in the morning. And then pretty soon, you don't have to

And then pretty soon, you don't have to think about climate control in your

think about climate control in your house anymore because it's learned what

house anymore because it's learned what your habits are. Another one is your own

your habits are. Another one is your own body. your heart rate elevates as you go

body. your heart rate elevates as you go for a run or start to work out or it

for a run or start to work out or it anticipates that you're about to fall

anticipates that you're about to fall and so it reacts before you consciously

and so it reacts before you consciously think I'm going to put my hand out. So

think I'm going to put my hand out. So when you look at it like that

when you look at it like that proactivity is actually not that

proactivity is actually not that proactivity for AI is actually not that

proactivity for AI is actually not that futuristic. It's very familiar and it is

futuristic. It's very familiar and it is very human and that's exactly the point.

very human and that's exactly the point. What we're building is tools that behave

What we're building is tools that behave more like a good collaborator and less

more like a good collaborator and less like command line utilities. So we're

like command line utilities. So we're already doing this in this tool called

already doing this in this tool called jewels which is this uh proactive

jewels which is this uh proactive asynchronous autonomous coding agent

asynchronous autonomous coding agent from Google labs. And we're doing this

from Google labs. And we're doing this in kind of three levels of of uh

in kind of three levels of of uh proactivity. Level one is where a

proactivity. Level one is where a collaboration really starts to emerge.

collaboration really starts to emerge. And this is how Jules works today where

And this is how Jules works today where it can detect things like missing tests,

it can detect things like missing tests, unused dependencies, unsafe patterns,

unused dependencies, unsafe patterns, and then it starts to automatically fix

and then it starts to automatically fix those things as it's doing other other

those things as it's doing other other tasks that you've asked it to do. This

tasks that you've asked it to do. This is sort of like this attentive sue chef

is sort of like this attentive sue chef in your workflow where it's keeping the

in your workflow where it's keeping the kitchen clean, the knives sharp, the

kitchen clean, the knives sharp, the kitchen uh stocked so that you can focus

kitchen uh stocked so that you can focus on what comes next. And that's the

on what comes next. And that's the beginning of proactive software. At

beginning of proactive software. At level two, the agent becomes more

level two, the agent becomes more contextually aware of the entire

contextually aware of the entire project. It observes how you work, the

project. It observes how you work, the code you write. If you're a back-end

code you write. If you're a back-end engineer, maybe you need help with

engineer, maybe you need help with React. If you're a designer, maybe it

React. If you're a designer, maybe it wants you to may maybe it'll help uh uh

wants you to may maybe it'll help uh uh write the database schema. And then it

write the database schema. And then it learns what your frameworks are and what

learns what your frameworks are and what your deployment style is, etc. And this

your deployment style is, etc. And this is the kitchen manager. This is the

is the kitchen manager. This is the person in your workflow keeping the

person in your workflow keeping the rhythm and anticipating what you need

rhythm and anticipating what you need next. And then comes level three. And

next. And then comes level three. And this is what we're working on pretty

this is what we're working on pretty hard right now going into December. And

hard right now going into December. And I'll show you a little bit of what we're

I'll show you a little bit of what we're what we're going to be shipping in

what we're going to be shipping in December in a minute. But level three is

December in a minute. But level three is where things start to converge around

where things start to converge around that context. It's where the agent

that context. It's where the agent starts to understand not just context,

starts to understand not just context, but also consequence. How these choices

but also consequence. How these choices are actually affecting the users of your

are actually affecting the users of your products, the performance, and the

products, the performance, and the outcomes. And at that level, we have

outcomes. And at that level, we have this thing jewels. We also have an agent

this thing jewels. We also have an agent called Stitch, which is a design agent.

called Stitch, which is a design agent. and another one we're building called

and another one we're building called insights which is a data agent and

insights which is a data agent and they're all coming together to build

they're all coming together to build this collective intelligence across your

this collective intelligence across your application tools can see what's

application tools can see what's breaking in the software stitch

breaking in the software stitch understands how users are interacting

understands how users are interacting with it and insights connects behaviors

with it and insights connects behaviors from real world signals like analytics

from real world signals like analytics telemetry and conversion rates and then

telemetry and conversion rates and then together they can propose improvements

together they can propose improvements across boundaries of how the system all

across boundaries of how the system all works together doing things like

works together doing things like performance fixes to improve UX and then

performance fixes to improve UX and then design changes to prevent regressions

design changes to prevent regressions and then all of that is organized based

and then all of that is organized based on live data. So the trick here is that

on live data. So the trick here is that the human stays firmly in the loop.

the human stays firmly in the loop. You're observing what the agents are

You're observing what the agents are doing. You're refining when you when

doing. You're refining when you when they when you need to intervene and then

they when you need to intervene and then you're redirecting it when it has when

you're redirecting it when it has when it has been misdirected. So level three

it has been misdirected. So level three isn't really about autonomy anymore.

isn't really about autonomy anymore. It's actually about alignment to your

It's actually about alignment to your project. a a agents and humans

project. a a agents and humans collaborating together across the full

collaborating together across the full life cycle of your project.

life cycle of your project. So right now Jules is focused on this

So right now Jules is focused on this code awareness piece that understands

code awareness piece that understands the environment, the frameworks and the

the environment, the frameworks and the project structures and we're moving

project structures and we're moving towards more of that system awareness.

towards more of that system awareness. So things that we're introducing in

So things that we're introducing in Jules now, we've added something called

Jules now, we've added something called memory which I'm sure a lot of you are

memory which I'm sure a lot of you are familiar with. It's the ability for

familiar with. It's the ability for Jules to write its own memories and you

Jules to write its own memories and you can edit them and interact with them. it

can edit them and interact with them. it can edit them and it understands that

can edit them and it understands that and builds this memory and context and

and builds this memory and context and knowledge of of your project as you work

knowledge of of your project as you work with it. We've added a critic agent

with it. We've added a critic agent which works adversarially with Jules to

which works adversarially with Jules to make sure that the code is is high

make sure that the code is is high quality but then also does a full code

quality but then also does a full code review. And then we've added

review. And then we've added verification where Jules will write a

verification where Jules will write a playwright script, take a screenshot,

playwright script, take a screenshot, and then put that back into the

and then put that back into the trajectory for you to validate. And then

trajectory for you to validate. And then we're also doing things like adding uh a

we're also doing things like adding uh a to-do bot that will look through your

to-do bot that will look through your code and look through your repository

code and look through your repository and pick up on anything that where

and pick up on anything that where you've said this is a to-do I want to

you've said this is a to-do I want to get to in the future and it will start

get to in the future and it will start to proactively work on those things with

to proactively work on those things with that context. We're also adding in

that context. We're also adding in things like best practices where Jules

things like best practices where Jules will understand best practices and start

will understand best practices and start to suggest those and also environment

to suggest those and also environment setup. We have an environment agent that

setup. We have an environment agent that we use internally for running evals and

we use internally for running evals and we're extending that externally to

we're extending that externally to better understand how environment how

better understand how environment how your environments work and and set those

your environments work and and set those up for you. And then we also are adding

up for you. And then we also are adding something called a just in time context.

something called a just in time context. It's like a jewels cheat sheet where if

It's like a jewels cheat sheet where if it's doing something very specific it

it's doing something very specific it can and gets stuck it can just

can and gets stuck it can just immediately look at that cheat sheet

immediately look at that cheat sheet instead of reaching out to you. So, this

instead of reaching out to you. So, this is all moving Jules very close to being

is all moving Jules very close to being that proactive teammate, not just this

that proactive teammate, not just this reactive assistant. Okay, so this

reactive assistant. Okay, so this morning I was talking to my team back in

morning I was talking to my team back in San Francisco and I was thinking, okay,

San Francisco and I was thinking, okay, I'm going to do a live demo, but the

I'm going to do a live demo, but the live demo gods did not align with me

live demo gods did not align with me this morning. We still have CLS that are

this morning. We still have CLS that are being pushed to staging right now. So,

being pushed to staging right now. So, I'm going to walk you through a little

I'm going to walk you through a little bit of this. And if you know Jed, he's

bit of this. And if you know Jed, he's going to, I think, be talking tomorrow.

going to, I think, be talking tomorrow. We're gonna um affectionately try to fix

We're gonna um affectionately try to fix Jed's code here. Um, so this is a view

Jed's code here. Um, so this is a view of of proactivity and this is this is

of of proactivity and this is this is Jules where you prompt it and the first

Jules where you prompt it and the first thing you that you do when you configure

thing you that you do when you configure and enable proactivity is Jules will

and enable proactivity is Jules will index your entire uh codebase. It'll

index your entire uh codebase. It'll index your directory and start looking

index your directory and start looking for things that it can do and then it'll

for things that it can do and then it'll that'll show up on the screen. So right

that'll show up on the screen. So right here we're looking at a little bit more

here we're looking at a little bit more in this um in this repository ADK Python

in this um in this repository ADK Python and uh and it's indexed the repository

and uh and it's indexed the repository and it's found a bunch of to-dos. It's

and it's found a bunch of to-dos. It's found a bunch of best practices that it

found a bunch of best practices that it can update and it's giving me some

can update and it's giving me some signal about what it's finding. And so

signal about what it's finding. And so you can see the signal is high

you can see the signal is high confidence, medium confidence, and low.

confidence, medium confidence, and low. And so it's actually telling me what it

And so it's actually telling me what it thinks it can achieve based on what's in

thinks it can achieve based on what's in my code and what it wants to do. And

my code and what it wants to do. And that's so it has high confidence in

that's so it has high confidence in green, medium and purple, low in yellow

green, medium and purple, low in yellow way down at the bottom. Um, and so I can

way down at the bottom. Um, and so I can go through this and I can manually click

go through this and I can manually click these and say I want to start these. And

these and say I want to start these. And so I don't have to think about the

so I don't have to think about the prompt. I don't have to look at the

prompt. I don't have to look at the code. I don't I I can do kind of less

code. I don't I I can do kind of less cognitive load here. We're working on

cognitive load here. We're working on something to just start these

something to just start these automatically. And so that's coming in

automatically. And so that's coming in the future. But I can also delete these.

the future. But I can also delete these. I can say, "Hey, this one isn't isn't

I can say, "Hey, this one isn't isn't for me. Isn't good." And so once it gets

for me. Isn't good." And so once it gets started on a task, I can kind of drill

started on a task, I can kind of drill into it and see a little bit more. I can

into it and see a little bit more. I can peek into the code that it is suggesting

peek into the code that it is suggesting uh that uh it's suggesting it work on. I

uh that uh it's suggesting it work on. I can find the location of that code. And

can find the location of that code. And it also gives me some rationale about

it also gives me some rationale about why it wants to work on that code, why

why it wants to work on that code, why what it's doing, etc. And so it's giving

what it's doing, etc. And so it's giving me a lot more context and helping me

me a lot more context and helping me trust that it knows what to do here.

trust that it knows what to do here. Okay. So that's proactivity that's

Okay. So that's proactivity that's coming in December and hopefully we'll

coming in December and hopefully we'll be able to give that to everybody here.

be able to give that to everybody here. We're very excited about it and I want

We're very excited about it and I want to tell you a little story about uh

to tell you a little story about uh something my husband and I were working

something my husband and I were working on just to kind of set set wrap things

on just to kind of set set wrap things up. We uh tinker a bunch with hardware

up. We uh tinker a bunch with hardware and we live on this slow street in the

and we live on this slow street in the middle of San Francisco in Haydashberry

middle of San Francisco in Haydashberry district. And so on Halloween we get a

district. And so on Halloween we get a lot of people walking by our house and

lot of people walking by our house and so we are trying to take advantage of

so we are trying to take advantage of that with our Halloween decorations. And

that with our Halloween decorations. And so we built this six-foot animatronic

so we built this six-foot animatronic head that sits in the front of our

head that sits in the front of our house. It's this old Victorian house.

house. It's this old Victorian house. And he sculpted it out of foam, epoxy,

And he sculpted it out of foam, epoxy, and fiberglass. And then I our our kids

and fiberglass. And then I our our kids also called this lovingly the bald head.

also called this lovingly the bald head. And it's based off of if you ever see

And it's based off of if you ever see saw Peewee Herman from the 80s. It's

saw Peewee Herman from the 80s. It's based off of the Peewee Herman Peewee's

based off of the Peewee Herman Peewee's Big Adventures head. Um so while my

Big Adventures head. Um so while my husband was doing this I was spending my

husband was doing this I was spending my time working with Jules on updating the

time working with Jules on updating the firmware, controlling the stepper

firmware, controlling the stepper motors, working on the um on the LEDs

motors, working on the um on the LEDs and the sensors. And for me that's the

and the sensors. And for me that's the fun part for me is like really getting

fun part for me is like really getting creative with what the LEDs are doing.

creative with what the LEDs are doing. So I wanted to focus on that, the LED

So I wanted to focus on that, the LED animations, but I ended up spending most

animations, but I ended up spending most of my time actually fixing bugs and

of my time actually fixing bugs and swapping libraries and doing things like

swapping libraries and doing things like that. So what I would do is I would

that. So what I would do is I would prompt Jules, I'd wait 10 minutes and

prompt Jules, I'd wait 10 minutes and then I would repeat. And I found that

then I would repeat. And I found that process very very tedious. And what I

process very very tedious. And what I wanted was actually Jules to do the

wanted was actually Jules to do the research. I wanted it to handle the the

research. I wanted it to handle the the ugly parts where it was researching how

ugly parts where it was researching how to fix a bug. Uh doing the debugging

to fix a bug. Uh doing the debugging itself. And I wanted it to do this so

itself. And I wanted it to do this so that I could focus on the creative

that I could focus on the creative parts. I wanted the eyes to move and

parts. I wanted the eyes to move and like follow people as they walk down the

like follow people as they walk down the street and like have lasers coming in

street and like have lasers coming in out of its eyes and stuff like I

out of its eyes and stuff like I mentioned it was Halloween. It was very

mentioned it was Halloween. It was very scary. Uh and and this but but I

scary. Uh and and this but but I couldn't really do as much of that. and

couldn't really do as much of that. and I ended up actually not shipping as much

I ended up actually not shipping as much as I wanted to with this animatronic

as I wanted to with this animatronic bald head. And so it's that gap that we

bald head. And so it's that gap that we actually want to close. It's the space

actually want to close. It's the space between with jewels, it's the space

between with jewels, it's the space between that tool friction and creative

between that tool friction and creative freedom that we're trying to unlock with

freedom that we're trying to unlock with these kinds of proactive agents.

these kinds of proactive agents. So what I really want you guys to take

So what I really want you guys to take away from it, I give this advice to the

away from it, I give this advice to the the folks on on the Jules team a lot is

the folks on on the Jules team a lot is that the product we build today actually

that the product we build today actually won't be the project the products that

won't be the project the products that we have in the future and I think a lot

we have in the future and I think a lot of us know that but in reality I want

of us know that but in reality I want everybody in this room and everyone

everybody in this room and everyone building working with AI to be able to

building working with AI to be able to take those big steps. I think the

take those big steps. I think the patterns that we rely on today git uh

patterns that we rely on today git uh your your idees even the code how we

your your idees even the code how we think about the code itself might not

think about the code itself might not exist a year from now might not exist

exist a year from now might not exist six months from now and that's the

six months from now and that's the exciting part for me it's sort of we get

exciting part for me it's sort of we get to invent the future right now we get to

to invent the future right now we get to describe and decide how software is made

describe and decide how software is made and built uh kind of all the people in

and built uh kind of all the people in this room so my my challenge to you is

this room so my my challenge to you is to not be afraid to question the old

to not be afraid to question the old ways of how you're building software

ways of how you're building software because really the future is coming

because really the future is coming faster than any of us know. It's

faster than any of us know. It's probably already here and the cool thing

probably already here and the cool thing is we get to build it together. Thank

is we get to build it together. Thank you.

you. [applause]

[applause] [music]

[music] Our

next talk is a case study from the enterprise on incremental rollout of AI.

enterprise on incremental rollout of AI. Here to provide us with a blueprint for

Here to provide us with a blueprint for making AI transformation fundable,

making AI transformation fundable, governable and real inside large risk

governable and real inside large risk averse organizations is engineering

averse organizations is engineering leader at Northwestern Mutual, ASAF

leader at Northwestern Mutual, ASAF board.

>> [music] [applause]

>> Doesn't this look like something's going to drop from the ceiling? Like a ground

to drop from the ceiling? Like a ground zero type thing? [snorts] Be honest.

zero type thing? [snorts] Be honest. Like, who has a buzzer that if I'm I

Like, who has a buzzer that if I'm I really suck, they press it and

really suck, they press it and everything falls down through the trap

everything falls down through the trap door? No.

door? No. >> Be careful.

>> Be careful. >> Yeah. Okay. Who was it? Okay. you tell

>> Yeah. Okay. Who was it? Okay. you tell me if I'm doing okay or if I should take

me if I'm doing okay or if I should take a couple steps back. Right. So, hi

a couple steps back. Right. So, hi everyone. I'm Assaf. Um, and I'm here to

everyone. I'm Assaf. Um, and I'm here to talk about Genbi. And kind of first

talk about Genbi. And kind of first disclaimer, this presentation was not

disclaimer, this presentation was not created with Genai. Um, to be honest, I

created with Genai. Um, to be honest, I actually started doing it uh with uh

actually started doing it uh with uh GPT03 back in August. Uh, [snorts] and

GPT03 back in August. Uh, [snorts] and then I did kind of a first draft and

then I did kind of a first draft and then a couple of weeks back I wanted to

then a couple of weeks back I wanted to come in and refresh it before the

come in and refresh it before the conference and then GPT5 took over

conference and then GPT5 took over completely messed up my slide so I ended

completely messed up my slide so I ended up doing it manually kind of

up doing it manually kind of oldfashioned. So if I'm missing like an

oldfashioned. So if I'm missing like an M dash somewhere in the middle let me

M dash somewhere in the middle let me know after. Okay. [snorts] Uh, so first

know after. Okay. [snorts] Uh, so first of all a bit of housekeeping. What's

of all a bit of housekeeping. What's GenBI? So it's a fusion of Gen AI and

GenBI? So it's a fusion of Gen AI and BI. It's basically an agent that helps

BI. It's basically an agent that helps people answer business questions with

people answer business questions with data like a a business intelligence

data like a a business intelligence person would do in real life. Uh the

person would do in real life. Uh the reason that we're pursuing GenBI is

reason that we're pursuing GenBI is really because of the data

really because of the data democratization that it can bring.

democratization that it can bring. Right? So having access to data at your

Right? So having access to data at your fingertips without having to be reliant

fingertips without having to be reliant on a BI team that helps you find a

on a BI team that helps you find a report, figure out what it means, uh

report, figure out what it means, uh understand your world before they can

understand your world before they can even give you any kind of input. Uh, so

even give you any kind of input. Uh, so that's Genbi. Uh, a bit about

that's Genbi. Uh, a bit about Northwestern Mutual. That's where I

Northwestern Mutual. That's where I work. So, we're a financial services,

work. So, we're a financial services, life insurance, and wealth management.

life insurance, and wealth management. Been around for 160 years. Uh, [snorts]

Been around for 160 years. Uh, [snorts] some very impressive numbers there. But

some very impressive numbers there. But first of all, I want to say why is

first of all, I want to say why is Northwestern Mutual a great place to do

Northwestern Mutual a great place to do Gen AI? We got a lot of data, we got a

Gen AI? We got a lot of data, we got a lot of money, we got a lot of use cases,

lot of money, we got a lot of use cases, and we got access to some of the best

and we got access to some of the best talent uh, anyone can dream of. really

talent uh, anyone can dream of. really truly humbled by the people that I get

truly humbled by the people that I get to work with. Um, but on the flip side,

to work with. Um, but on the flip side, why is it hard to do Gen AI at

why is it hard to do Gen AI at Northwestern Mutual? Because it is a

Northwestern Mutual? Because it is a very riskaverse company, right? If you

very riskaverse company, right? If you think about it, our main motto is

think about it, our main motto is generational responsibility. I call it

generational responsibility. I call it don't f up. Uh, because what we end

don't f up. Uh, because what we end up selling to people is a decadesl long

up selling to people is a decadesl long commitment, right? you buy life

commitment, right? you buy life insurance now,

insurance now, uh, if you stay with us until it comes

uh, if you stay with us until it comes to term, so to speak, that can be 20,

to term, so to speak, that can be 20, 40, 80 years down the line, depending on

40, 80 years down the line, depending on when you buy it and how long you get to

when you buy it and how long you get to live. And so stability is something

live. And so stability is something that's very important for us because

that's very important for us because it's important for our clients. So, how

it's important for our clients. So, how do we balance stability with innovation?

do we balance stability with innovation? That's what I want to talk about today.

That's what I want to talk about today. Um, and really the four main challenges

Um, and really the four main challenges that we had when we even came up with

that we had when we even came up with the idea kind of a pie in the sky Genbi

the idea kind of a pie in the sky Genbi concept. Uh, first [snorts] of all, no

concept. Uh, first [snorts] of all, no one's done it before, right? Truly, no

one's done it before, right? Truly, no one's done GenBI in this fashion in the

one's done GenBI in this fashion in the past. Uh, secondly, and this was really

past. Uh, secondly, and this was really a preference for us, we wanted to use

a preference for us, we wanted to use actual data that's messy because we knew

actual data that's messy because we knew that those were that's where the real

that those were that's where the real challenges are going to be, right?

challenges are going to be, right? understanding actual messy data for 160

understanding actual messy data for 160 year old company and how can we perform

year old company and how can we perform well within that ecosystem. Um the third

well within that ecosystem. Um the third was kind of a blind trust bias. So um

was kind of a blind trust bias. So um the bi the trust that we had to build

the bi the trust that we had to build was both with the users but also with

was both with the users but also with the leadership of the company, right?

the leadership of the company, right? How can we bring accurate information,

How can we bring accurate information, accurate answers to people when uh all

accurate answers to people when uh all of these things that we know about and

of these things that we know about and everyone's talked about is is just out

everyone's talked about is is just out there, right? No one's blind to the

there, right? No one's blind to the trust barriers. No one's blind to the

trust barriers. No one's blind to the accuracy barriers. So, how do we

accuracy barriers. So, how do we convince that this is actually something

convince that this is actually something that we can trust in the company? And

that we can trust in the company? And lastly,

lastly, um but really firstly, when we go to

um but really firstly, when we go to approach this from an enterprise

approach this from an enterprise perspective, budget impact, right? How

perspective, budget impact, right? How do we convince someone in a leadership

do we convince someone in a leadership uh organization where risk averseman is

uh organization where risk averseman is ingrained in the DNA to even invest in

ingrained in the DNA to even invest in something like this that no one's done

something like this that no one's done before? We don't really know how we

before? We don't really know how we would do it. Uh we're not even sure how

would do it. Uh we're not even sure how it would look like when it comes to

it would look like when it comes to term.

term. Uh so I'll start kind of one by one uh

Uh so I'll start kind of one by one uh and first of all really talk about why

and first of all really talk about why we chose to use actual data uh and not

we chose to use actual data uh and not synthesized data or cleanse data. Uh

synthesized data or cleanse data. Uh [snorts] so really it's about making

[snorts] so really it's about making sure that we understand the actual

sure that we understand the actual complexities that we will have to face

complexities that we will have to face when we eventually want to go to

when we eventually want to go to production right we know that you know

production right we know that you know building uh PC's and demos is so easy

building uh PC's and demos is so easy but the gap from PC to production is so

but the gap from PC to production is so broad uh especially in this genai space

broad uh especially in this genai space especially because we don't know upfront

especially because we don't know upfront how to design the system what we would

how to design the system what we would expect it to behave like so making sure

expect it to behave like so making sure that we operate with real data just gave

that we operate with real data just gave us that extra confidence that when

us that extra confidence that when something works in the it's very likely

something works in the it's very likely to also work in reality. Uh but also and

to also work in reality. Uh but also and maybe not uh in the least less important

maybe not uh in the least less important is that we got to work with actual

is that we got to work with actual people who work with the data day in and

people who work with the data day in and day out and that gave us two things.

day out and that gave us two things. Okay, first of all subject matter

Okay, first of all subject matter expertise which are super critical for

expertise which are super critical for us to be able to validate that the

us to be able to validate that the system is actually working gave us a lot

system is actually working gave us a lot of real life examples of what people are

of real life examples of what people are actually asking in a corporate and what

actually asking in a corporate and what people have answered to them. So

people have answered to them. So basically the eval right and all the

basically the eval right and all the testing and stuff uh but at the end of

testing and stuff uh but at the end of the day it also brought the business to

the day it also brought the business to be a part of the research project itself

be a part of the research project itself and they became kind of bought into the

and they became kind of bought into the idea as part of the process. So we

idea as part of the process. So we didn't just test something in the lab

didn't just test something in the lab and then had to convince someone to go

and then had to convince someone to go ahead and use it. The end users were

ahead and use it. The end users were part of the research process itself. And

part of the research process itself. And so when eventually it matured enough so

so when eventually it matured enough so we can take some of that to production,

we can take some of that to production, they were already there and they

they were already there and they actually were pulling that. They told us

actually were pulling that. They told us we want to take this, how can we wrap

we want to take this, how can we wrap it? How can we package it uh quickly

it? How can we package it uh quickly enough so we can put it into practice?

Uh and the next part was really about building trust. Uh so this is about

building trust. Uh so this is about building trust first of all with our

building trust first of all with our management team. right now. I don't know

management team. right now. I don't know about you, but last time that I got a

about you, but last time that I got a million dollar to do a research project

million dollar to do a research project that I wanted in a pie sky idea, I woke

that I wanted in a pie sky idea, I woke up from the dream and I realized that

up from the dream and I realized that this is not how things work in reality.

this is not how things work in reality. You don't just get a million dollars and

You don't just get a million dollars and go ahead and try something out. Uh you

go ahead and try something out. Uh you had to show that you know what you're

had to show that you know what you're doing. And part of what we did, it's

doing. And part of what we did, it's kind of listed out here, but obviously,

kind of listed out here, but obviously, you know, we did all the regular stuff,

you know, we did all the regular stuff, right? We worked in a sandbox

right? We worked in a sandbox environment. We made sure that we're not

environment. We made sure that we're not using actual client data. We made sure

using actual client data. We made sure to put in all the security risks aside,

to put in all the security risks aside, but uh one of the first approaches that

but uh one of the first approaches that we said we're going to take is we're not

we said we're going to take is we're not just going to build a tool that's going

just going to build a tool that's going to be uh released to everyone, right? We

to be uh released to everyone, right? We understood very quickly that um how

understood very quickly that um how people interact with the tool, their

people interact with the tool, their ability to verify that what they're

ability to verify that what they're getting is right and also give us

getting is right and also give us feedback changes dramatically depending

feedback changes dramatically depending on their expertise and understanding of

on their expertise and understanding of the data. So we took that crawl, walk,

the data. So we took that crawl, walk, run approach that basically said we're

run approach that basically said we're first going to release it to actual BI

first going to release it to actual BI experts, right? People that would be

experts, right? People that would be able to do it on their own and know what

able to do it on their own and know what good looks like when they get it. and

good looks like when they get it. and we're just going to expedite the process

we're just going to expedite the process for them. Kind of like a GitHub

for them. Kind of like a GitHub co-pilot. The next phase would be to

co-pilot. The next phase would be to bring it to business managers. And

bring it to business managers. And again, people who are closer to the BI

again, people who are closer to the BI team, but when they see a mistake, they

team, but when they see a mistake, they can pretty much figure out that what

can pretty much figure out that what they're seeing is wrong because they're

they're seeing is wrong because they're used to seeing that on day-to-day basis.

used to seeing that on day-to-day basis. Um, and they will might be less

Um, and they will might be less sensitive to these types of mistakes and

sensitive to these types of mistakes and be more inclined to give us that

be more inclined to give us that feedback instead of just, you know,

feedback instead of just, you know, dumping it aside and never using it

dumping it aside and never using it again. giving this type of tool to

again. giving this type of tool to executives in the company. I don't even

executives in the company. I don't even know when we're going to get there,

know when we're going to get there, right? Like an executive, they want

right? Like an executive, they want clear, concise answers that they know

clear, concise answers that they know they can trust. We're definitely not

they can trust. We're definitely not there yet. I think that's the vision uh

there yet. I think that's the vision uh at some point in time, but the system is

at some point in time, but the system is not accurate enough for us to get there.

not accurate enough for us to get there. Maybe it never will be.

Maybe it never will be. Um, [snorts] another way that we another

Um, [snorts] another way that we another liver that we kind of used to build

liver that we kind of used to build inherent transit the system is that we

inherent transit the system is that we said well in the get-go we're not going

said well in the get-go we're not going to even try to build SQLs right this is

to even try to build SQLs right this is very complex this is very hard even for

very complex this is very hard even for a person so we said step number one

a person so we said step number one let's just bring information that is

let's just bring information that is already in the ecosystem that's already

already in the ecosystem that's already verified right we have a lot of uh

verified right we have a lot of uh certified reports and dashboards um and

certified reports and dashboards um and actually in the conversation we had with

actually in the conversation we had with some of the BI teams that we worked

some of the BI teams that we worked with. They told us guys like 80% of the

with. They told us guys like 80% of the work that we do is basically sending

work that we do is basically sending people to the right report and helping

people to the right report and helping them figure out how to use it. So the

them figure out how to use it. So the report is already there. Um and that

report is already there. Um and that again built some inherent trust into how

again built some inherent trust into how we architected the system because we

we architected the system because we said we're not going to make up

said we're not going to make up information. We're just going to deliver

information. We're just going to deliver you the same asset that you would have

you the same asset that you would have gotten anyway just in a much faster much

gotten anyway just in a much faster much more interactive way. uh and that was

more interactive way. uh and that was the alignment of expectations that we

the alignment of expectations that we did very upfront with the uh users and

did very upfront with the uh users and also with the management team.

also with the management team. Now [clears throat]

Now [clears throat] the biggest um

the biggest um process or kind of the most important

process or kind of the most important approach that we took when uh

approach that we took when uh approaching our leadership team and

approaching our leadership team and convincing them that we want to do this

convincing them that we want to do this was to create a very gradual incremental

was to create a very gradual incremental process that gave them a lot of

process that gave them a lot of visibility and control. [snorts] Uh and

visibility and control. [snorts] Uh and it was very important for us to build

it was very important for us to build incremental deliveries throughout that

incremental deliveries throughout that process so that uh not only did they

process so that uh not only did they have the the visibility into what are we

have the the visibility into what are we funding now, what do we get out of it,

funding now, what do we get out of it, they actually had business deliverables

they actually had business deliverables they could realize potential from

they could realize potential from throughout the process and at any point

throughout the process and at any point in time they could pull the plug right

in time they could pull the plug right and say okay like it's not working well

and say okay like it's not working well or we got enough out of it or you know

or we got enough out of it or you know the next phase is so you know unknown

the next phase is so you know unknown and long that we don't want further

and long that we don't want further invest in it. And this is how we

invest in it. And this is how we basically broke it down. So phase one

basically broke it down. So phase one was just pure research, right? We kind

was just pure research, right? We kind of did the shift from natural language

of did the shift from natural language to SQL. We figured out how to write

to SQL. We figured out how to write responses. We figure out how to

responses. We figure out how to understand questions that coming in.

understand questions that coming in. Just kind of setting the stage. Phase

Just kind of setting the stage. Phase [snorts] two was about really

[snorts] two was about really understanding, okay, so what does good

understanding, okay, so what does good metadata and good context look like in

metadata and good context look like in the perspective of a BI agent, right? It

the perspective of a BI agent, right? It looks very different if you're just

looks very different if you're just chatting with something or if you're

chatting with something or if you're trying to do a rag with you know

trying to do a rag with you know unstructured data like documents and uh

unstructured data like documents and uh business knowledge and stuff like that.

business knowledge and stuff like that. And this phase on its own already had uh

And this phase on its own already had uh impact on the business because when we

impact on the business because when we define what good metadata looks like for

define what good metadata looks like for an an LLM uh we could immediately apply

an an LLM uh we could immediately apply that also to just the ecosystem of data

that also to just the ecosystem of data users across the enterprise. Um, and by

users across the enterprise. Um, and by understanding how to extract LLM from

understanding how to extract LLM from the information, we could also how to

the information, we could also how to extract metadata. Sorry, here's where

extract metadata. Sorry, here's where the [snorts] trap door comes into play,

the [snorts] trap door comes into play, right? Um, we could also project that on

right? Um, we could also project that on how or what good metadata looks like for

how or what good metadata looks like for humans interacting with the data. We

humans interacting with the data. We have another initiative around semantic

have another initiative around semantic layer going on which tries to model

layer going on which tries to model exactly that and this provided a very

exactly that and this provided a very valuable input to that initiative as

valuable input to that initiative as well. But the immediate next step was

well. But the immediate next step was basically just doing this kind of uh

basically just doing this kind of uh multicontext semantic search, right?

multicontext semantic search, right? People coming in asking different

People coming in asking different questions and having the system figure

questions and having the system figure out what's the right context, what's the

out what's the right context, what's the right information we uh bring them. And

right information we uh bring them. And this is something that could already be

this is something that could already be packaged as its own product and

packaged as its own product and delivered uh and basically just do kind

delivered uh and basically just do kind of a data finder and data owner finder

of a data finder and data owner finder which is something that could take

which is something that could take anywhere between two to maybe four weeks

anywhere between two to maybe four weeks in an enterprise like Northwestern

in an enterprise like Northwestern Mutual just finding what data exists and

Mutual just finding what data exists and who owns it so I can start talk uh the

who owns it so I can start talk uh the conversation with them.

conversation with them. Um and the next layer was really about

Um and the next layer was really about pulling in information and trying to do

pulling in information and trying to do some light pivoting around the data. Um

some light pivoting around the data. Um each one of these steps as you can see

each one of these steps as you can see also created an input to the step to the

also created an input to the step to the following step so that the research

following step so that the research itself was kind of self u

itself was kind of self u self-propelling and there were

self-propelling and there were incremental outcomes coming out of each

incremental outcomes coming out of each one of these phases. Uh the next one is

one of these phases. Uh the next one is more kind of setting it up for

more kind of setting it up for enterprise level usage. So understanding

enterprise level usage. So understanding roles of in uh of different users coming

roles of in uh of different users coming in what they may be asking about what

in what they may be asking about what type of access we want to give them etc

type of access we want to give them etc and eventually and this is still some

and eventually and this is still some ways to go ahead uh building kind of a

ways to go ahead uh building kind of a fullyfledged NBI agent which doesn't

fullyfledged NBI agent which doesn't only quote information from existing

only quote information from existing reports but I can actually run SQL

reports but I can actually run SQL queries on its own uh pull in more data

queries on its own uh pull in more data do more sophisticated joints between

do more sophisticated joints between different data so it can answer more

different data so it can answer more complex questions so that's the road map

complex questions so that's the road map right That's kind of the high level

right That's kind of the high level plan. Now, why did that work? Well, kind

plan. Now, why did that work? Well, kind of quickly summarize them. We talked

of quickly summarize them. We talked about uh so we get value uh early and we

about uh so we get value uh early and we get value often. Each one of this was a

get value often. Each one of this was a six week sprint at the end of which we

six week sprint at the end of which we had had a very tangible deliverable

had had a very tangible deliverable coming back to the business that we

coming back to the business that we could decide to productize. Uh and at

could decide to productize. Uh and at any point in time, we could decide how

any point in time, we could decide how we want to move forward. There was

we want to move forward. There was transparent progress. There was

transparent progress. There was incremental business value. Uh each one

incremental business value. Uh each one of these steps allowed us to learn

of these steps allowed us to learn something that helped feed the next

something that helped feed the next step.

step. And maybe the most important part and

And maybe the most important part and that's the bottom line here and that's

that's the bottom line here and that's the part that executives really look at.

the part that executives really look at. How do we control the risk in continuing

How do we control the risk in continuing to invest in this type of research

to invest in this type of research project and this is really about

project and this is really about eliminating things like some cost bias

eliminating things like some cost bias right? We already paid you know you know

right? We already paid you know you know whatever a million dollar let's just get

whatever a million dollar let's just get through the project see what we get at

through the project see what we get at the end. This eliminates the uh uh fear

the end. This eliminates the uh uh fear of of competitors coming in and maybe we

of of competitors coming in and maybe we don't need to continue investing in this

don't need to continue investing in this right so everyone in the industry is

right so everyone in the industry is researching Genbi and there are

researching Genbi and there are solutions like data bricks genie that

solutions like data bricks genie that are coming up and they're getting better

are coming up and they're getting better and better maybe at some point in time

and better maybe at some point in time it's better for us as an organization to

it's better for us as an organization to actually adopt data bricks genie but at

actually adopt data bricks genie but at that point again first it's much easier

that point again first it's much easier for us to pull the plug and the funding

for us to pull the plug and the funding but we already have a good understanding

but we already have a good understanding of what good looks like we have

of what good looks like we have benchmarks that we used for ourselves

benchmarks that we used for ourselves when testing our own system that we can

when testing our own system that we can test a third party solution with. And we

test a third party solution with. And we know what to expect, right? We know what

know what to expect, right? We know what works, we know what doesn't. We know

works, we know what doesn't. We know what a kind of fluffy demo from a vendor

what a kind of fluffy demo from a vendor would look like. And we know where to

would look like. And we know where to drill in to ask the tough questions.

So let's see kind of what it looks like under the hood and how we productize

under the hood and how we productize different elements uh of this

different elements uh of this architecture. Uh and maybe kind of very

architecture. Uh and maybe kind of very quickly, why can't we just do it with uh

quickly, why can't we just do it with uh chat GPT? So you know [snorts] just

chat GPT? So you know [snorts] just dumping a schema into chat GPT doesn't

dumping a schema into chat GPT doesn't work. Usually schemas are very messy.

work. Usually schemas are very messy. It's not uh easy to understand the

It's not uh easy to understand the context and the meaning of things. Uh

context and the meaning of things. Uh [snorts] and eventually governance is

[snorts] and eventually governance is super important. So there was a lot of

super important. So there was a lot of governance built into the architecture

governance built into the architecture that was very hard to apply on Czech GPD

that was very hard to apply on Czech GPD from the outside but even solutions like

from the outside but even solutions like you know data bricks genius third party

you know data bricks genius third party much harder to govern from the outside

much harder to govern from the outside than from the inside. But still TBD.

than from the inside. But still TBD. Uh so the stack kind of looks like this.

Uh so the stack kind of looks like this. Uh we have a data and metadata layer

Uh we have a data and metadata layer that we produced. We have four different

that we produced. We have four different agents that are running across the

agents that are running across the pipeline. A metadata agent that

pipeline. A metadata agent that understands the context. A rag agent

understands the context. A rag agent that finds the different reports. An SQL

that finds the different reports. An SQL agent that can pull more data if we need

agent that can pull more data if we need that. And then eventually what we call a

that. And then eventually what we call a BI agent that takes all that information

BI agent that takes all that information and delivers an answer to the question

and delivers an answer to the question that was asked. On top of that, we slap

that was asked. On top of that, we slap governance and trust and orchestration

governance and trust and orchestration and eventually some kind of a contextual

and eventually some kind of a contextual UI. Um and this is how the flow goes. So

UI. Um and this is how the flow goes. So when a business question comes in we uh

when a business question comes in we uh push it into the orchestrator and

push it into the orchestrator and basically decides how to facilitate the

basically decides how to facilitate the process. The first thing that we do is

process. The first thing that we do is understanding the context. So that's

understanding the context. So that's where that metadata agent comes in works

where that metadata agent comes in works with the catalog works with all the

with the catalog works with all the documentation that we have across the

documentation that we have across the system to understand what we're being

system to understand what we're being asked about and what's the relevant

asked about and what's the relevant information to share. Then we go to the

information to share. Then we go to the rag agent which tries to find an

rag agent which tries to find an existing report again out of a list of

existing report again out of a list of certified reports that we know are

certified reports that we know are allowed for people to use and people

allowed for people to use and people have spent a lot of time fine-tuning

have spent a lot of time fine-tuning them and making them as accurate as

them and making them as accurate as possible.

possible. If we can't find the report or if it's

If we can't find the report or if it's not exactly what we need to um to use,

not exactly what we need to um to use, that's where we go to the SQL agent that

that's where we go to the SQL agent that basically tries to create a more um

basically tries to create a more um exact query or a more elaborate query.

exact query or a more elaborate query. And even if the report that we have is

And even if the report that we have is not usable as is, it gives us that

not usable as is, it gives us that initial seed of a query that we can then

initial seed of a query that we can then expand on rather than having to build

expand on rather than having to build one from scratch. So it's kind of like a

one from scratch. So it's kind of like a fewot uh example, but in this case the

fewot uh example, but in this case the example that we give is very very close

example that we give is very very close to the actual result that we're

to the actual result that we're expecting to get. We then execute it

expecting to get. We then execute it against the database pull and push it

against the database pull and push it into the BI agent which gen with which

into the BI agent which gen with which gen uh translate that to a business

gen uh translate that to a business answer and not just dumping data back on

answer and not just dumping data back on the user and this is what goes into the

the user and this is what goes into the final answer. Now there's obviously some

final answer. Now there's obviously some kind of a loop that says if I'm in the

kind of a loop that says if I'm in the same conversation I'm probably talking

same conversation I'm probably talking about the same data so we don't have to

about the same data so we don't have to talk about this or do this again and

talk about this or do this again and again. Now

again. Now each one of these three components, each

each one of these three components, each one of these three agents can be

one of these three agents can be packaged as its own product and

packaged as its own product and delivered to production with a very

delivered to production with a very tangible and actual impact on business

tangible and actual impact on business metrics. Okay. And that's the kind of

metrics. Okay. And that's the kind of beauty of this uh approach that after we

beauty of this uh approach that after we productize each one of these, we could

productize each one of these, we could have basically said stop or let's move

have basically said stop or let's move forward.

forward. uh and just some giving bottom line

uh and just some giving bottom line numbers around some of these. So just

numbers around some of these. So just the rag agent that pulls the right

the rag agent that pulls the right report uh allowed us to take about 20%

report uh allowed us to take about 20% of the overall capacity of the BI team

of the overall capacity of the BI team that basically said uh all we do is just

that basically said uh all we do is just share the right report with the right

share the right report with the right person. So we were able to automate

person. So we were able to automate around 80% out of those uh 20% and we're

around 80% out of those uh 20% and we're talking about a team of 10 people. So

talking about a team of 10 people. So roughly two people full-time job all

roughly two people full-time job all they do is find the right report and

they do is find the right report and send it to the right person.

send it to the right person. uh the metadata understandings that we

uh the metadata understandings that we got from learning how to interact with

got from learning how to interact with the data through an LLM allowed us to

the data through an LLM allowed us to run AB test in a in the semantic layer

run AB test in a in the semantic layer project that we did and that allowed us

project that we did and that allowed us to prove back again to the senior

to prove back again to the senior leadership in the company that there is

leadership in the company that there is value and tangible value measurable

value and tangible value measurable value in enriching metadata. And we did

value in enriching metadata. And we did that basically by running uh a a battery

that basically by running uh a a battery of questions um against a database that

of questions um against a database that had good metadata and one that didn't

had good metadata and one that didn't have good metadata. And we show how much

have good metadata. And we show how much better an LLM performs when having the

better an LLM performs when having the right metadata in place. So basically

right metadata in place. So basically proving the value of something that can

proving the value of something that can be very fluffy like hey let's bring in

be very fluffy like hey let's bring in more documentation into the code. Uh

more documentation into the code. Uh right now we're experimenting with the

right now we're experimenting with the data pivoting bot. Uh so once you have a

data pivoting bot. Uh so once you have a dashboard or a report be able to change

dashboard or a report be able to change the time horizon some of the views some

the time horizon some of the views some of the segmentations and the groupings

of the segmentations and the groupings of the data again kind of real time

of the data again kind of real time without having a person do that for uh a

without having a person do that for uh a business stakeholder and some of the

business stakeholder and some of the next steps is really evaluating the

next steps is really evaluating the tools that are out there for uh Genbi

tools that are out there for uh Genbi like data bricks genie for example and

like data bricks genie for example and we're going to go into a much more

we're going to go into a much more rigorous process of enriching our

rigorous process of enriching our catalog with metadata and documentation

catalog with metadata and documentation and that's also going to come out of a

and that's also going to come out of a lot of the learnings that we got from uh

lot of the learnings that we got from uh the research that we've done. So even if

the research that we've done. So even if we don't end up writing a GenBI agent

we don't end up writing a GenBI agent full-fledged end to end, we already got

full-fledged end to end, we already got a lot of value back from this and this

a lot of value back from this and this is really what allowed our senior

is really what allowed our senior leadership team to continuously invest

leadership team to continuously invest in this project quarter over quarter.

in this project quarter over quarter. One thing that I want to wrap up with is

One thing that I want to wrap up with is just a couple of thoughts I had about

just a couple of thoughts I had about the future. So um I think we talk a lot

the future. So um I think we talk a lot about how to prepare data. I think

about how to prepare data. I think that's going to be a huge area in the

that's going to be a huge area in the market and they're going to be probably

market and they're going to be probably a lot of companies and tools that are

a lot of companies and tools that are going to help us with that. Uh building

going to help us with that. Uh building very specific task specific models and

very specific task specific models and applications. I think a lot of startups

applications. I think a lot of startups and companies are going to come up from

and companies are going to come up from that area. Uh co-pilots is really making

that area. Uh co-pilots is really making sure that we meet the users where they

sure that we meet the users where they are. Uh and securing of models obviously

are. Uh and securing of models obviously a very big thing. The last thing is the

a very big thing. The last thing is the one the the one I want to focus on the

one the the one I want to focus on the most because that's kind of a recent

most because that's kind of a recent thought that came to me a couple of

thought that came to me a couple of weeks ago. How we do pricing of SAS in

weeks ago. How we do pricing of SAS in the Gen AI era. Uh this is really about

the Gen AI era. Uh this is really about the fact that one individual person

the fact that one individual person today can be 10x more effective uh than

today can be 10x more effective uh than they used to be in the past. And then do

they used to be in the past. And then do we price uh software based on seats or

we price uh software based on seats or do we price software based on how much

do we price software based on how much they used it or do we price software

they used it or do we price software based on the value that they got out of

based on the value that they got out of it. Uh Salesforce is already

it. Uh Salesforce is already experimenting with that. So the the data

experimenting with that. So the the data cloud product at Salesforce is starting

cloud product at Salesforce is starting to be uh usage priced and not seats

to be uh usage priced and not seats priced. And I think this is going to

priced. And I think this is going to have a big impact on just the uh kind of

have a big impact on just the uh kind of SAS economics worldwide.

SAS economics worldwide. uh and it it doesn't even matter if the

uh and it it doesn't even matter if the product itself is genai. It's really

product itself is genai. It's really about what does the person using the

about what does the person using the product can do and what can they do in

product can do and what can they do in their other time uh and whether it still

their other time uh and whether it still makes sense to price it by how many

makes sense to price it by how many employees you have or how much work you

employees you have or how much work you get done with the employees that you

get done with the employees that you have.

have. That is me and thank you very much for

That is me and thank you very much for listening and thanks for not opening the

listening and thanks for not opening the door on me.

door on me. [applause]

[applause] Our

next presenter [music] is the head of technology infrastructure engineering at

technology infrastructure engineering at Bloomberg. He's here to tell us what

Bloomberg. He's here to tell us what they learned deploying AI within

they learned deploying AI within Bloomberg's engineering organization.

Bloomberg's engineering organization. Please join me in welcoming to the stage

Please join me in welcoming to the stage Le Jang.

>> All right. I don't have a joke about the dot. I don't have a joke about the uh

dot. I don't have a joke about the uh hot dog either. So, I would just jump to

hot dog either. So, I would just jump to the topic right away. Um so, my name is

the topic right away. Um so, my name is Lei. Um I lead the uh department of

Lei. Um I lead the uh department of technology infrastructure in Bloommer.

technology infrastructure in Bloommer. So, we're basically a group of

So, we're basically a group of technologists focused on global

technologists focused on global infrastructure. Think data centers

infrastructure. Think data centers connectivities

connectivities um developer productivities. uh think

um developer productivities. uh think SCS tooling and also uh reliability

SCS tooling and also uh reliability solutions think telemetry and instant

solutions think telemetry and instant responses right so um depends on the

responses right so um depends on the audience sometimes uh you know you're

audience sometimes uh you know you're familiar with what Bloomer is sometimes

familiar with what Bloomer is sometimes you don't so I thought it might be good

you don't so I thought it might be good idea to talk a little bit about about

idea to talk a little bit about about our company

our company um so there's no better way to talk

um so there's no better way to talk about our company by sharing some

about our company by sharing some numbers I want to highlight a few

numbers I want to highlight a few numbers we have more than 9,000

numbers we have more than 9,000 engineers and most of them are software

engineers and most of them are software engineers. Uh we handle a lot of market

engineers. Uh we handle a lot of market techs uh which in the billions and 600

techs uh which in the billions and 600 billions I believe and um we also have

billions I believe and um we also have tons of folks uh focus on AI research

tons of folks uh focus on AI research and engineering. So we have a more than

and engineering. So we have a more than you know really today's 500 plus

you know really today's 500 plus employees focus on AI products uh for um

employees focus on AI products uh for um sort of our customers. So takeaway here

sort of our customers. So takeaway here is we are I guess you know building a

is we are I guess you know building a lot of software and use a lot of data to

lot of software and use a lot of data to empower our flagship product which is

empower our flagship product which is called the bloomer terminal and to

called the bloomer terminal and to really support our users to make the

really support our users to make the most important f decisions for them to

most important f decisions for them to do their job um the best. Um in the

do their job um the best. Um in the technical lens um a lot of time kind of

technical lens um a lot of time kind of to explain that we actually have one of

to explain that we actually have one of the largest private network uh in the

the largest private network uh in the whole world. We also have one of the

whole world. We also have one of the largest JavaScript codebase um in the

largest JavaScript codebase um in the world. Um we because the domain we're in

world. Um we because the domain we're in uh so terminal is really you can think

uh so terminal is really you can think of a um software that supports thousands

of a um software that supports thousands of different applications. uh we call

of different applications. uh we call them functions right um email is a

them functions right um email is a function uh news is a group of functions

function uh news is a group of functions um let's say fixed income price to yield

um let's say fixed income price to yield calculation to spread calculation is

calculation to spread calculation is another function um trading workflows is

another function um trading workflows is another group of functions so there's

another group of functions so there's many many many different type of

many many many different type of functions as you can imagine we kind of

functions as you can imagine we kind of have to utilize different technologies

have to utilize different technologies to really support those uh

to really support those uh functionalities

functionalities uh we also been

uh we also been increasingly more than use but also

increasingly more than use but also contribute to open source communities.

contribute to open source communities. um for this audience I guess I want to

um for this audience I guess I want to call out you know we kind of helped

call out you know we kind of helped creation of the case envoy AI gateways

creation of the case envoy AI gateways and among many many other things that

and among many many other things that that we deploy inhouse and support the

that we deploy inhouse and support the communities again in summary there's a

communities again in summary there's a lot of software there's a lot of data uh

lot of software there's a lot of data uh we kind of have to um figure out how to

we kind of have to um figure out how to make the best of AI tooling to support

make the best of AI tooling to support us to do our engineering work all right

us to do our engineering work all right so get to what is AI for coding Um we

so get to what is AI for coding Um we started about two years ago maybe a

started about two years ago maybe a little bit more than that. Um and as I

little bit more than that. Um and as I guess the rest of the world we look at

guess the rest of the world we look at the toolings provided and you know I

the toolings provided and you know I apologize if if your logos are not here.

apologize if if your logos are not here. Um but as you can imagine it's kind of

Um but as you can imagine it's kind of like overwhelming right there's so many

like overwhelming right there's so many things and every day there's news about

things and every day there's news about this is great this is great. Um so at

this is great this is great. Um so at the time we actually didn't know what

the time we actually didn't know what all the AI solutions can help us to

all the AI solutions can help us to uh boost our productivities as well as

uh boost our productivities as well as stability. But one thing we knew at the

stability. But one thing we knew at the time is um unless we deploy and try we

time is um unless we deploy and try we wouldn't know what's the best way to

wouldn't know what's the best way to benefit from all the awesome work and

benefit from all the awesome work and and you know a lot of folks are

and you know a lot of folks are contributing to. So at the time uh we

contributing to. So at the time uh we quickly form a team people start

quickly form a team people start kind of like release um kind a set of

kind of like release um kind a set of capabilities so that people start

capabilities so that people start iterating on um utilizing the toolings

iterating on um utilizing the toolings and then of course you know we are data

and then of course you know we are data company so kind of want to get a sense

company so kind of want to get a sense of how we measure the impact and um what

of how we measure the impact and um what we can do from the capability we provide

we can do from the capability we provide right so we look at the typical

right so we look at the typical developer productivity measurements we

developer productivity measurements we run a you survey. Uh it was very obvious

run a you survey. Uh it was very obvious that people felt like there's much

that people felt like there's much quicker uh proof concept, people roll

quicker uh proof concept, people roll out tests, uh there's a lot of one time

out tests, uh there's a lot of one time use scripts being generated and then the

use scripts being generated and then the measurements dropped actually pretty

measurements dropped actually pretty quickly when you

quickly when you go beyond all the green field type of

go beyond all the green field type of thing, right? And then then we start

thing, right? And then then we start thinking like okay so what are the

thinking like okay so what are the things that we should really be doing

things that we should really be doing using all those wonderful things so that

using all those wonderful things so that we can really make a dent um in the in

we can really make a dent um in the in the space and then at this time we also

the space and then at this time we also kind of like also be thoughtful of um

kind of like also be thoughtful of um unleash a very powerful tooling right uh

unleash a very powerful tooling right uh the the benefits is it's very fast the

the the benefits is it's very fast the challenge is also it's very fast, right?

challenge is also it's very fast, right? U for any of you who actually dealt with

U for any of you who actually dealt with hundreds of millions of lines code, you

hundreds of millions of lines code, you probably understand the system

probably understand the system complexity is a at least

complexity is a at least um exponential or at least polinomial as

um exponential or at least polinomial as function of your line of code or

function of your line of code or software assets, right? So at some point

software assets, right? So at some point you kind of want to be very careful uh

you kind of want to be very careful uh what you do with your software assets.

what you do with your software assets. And what we thought so maybe we should

And what we thought so maybe we should look at some of the basics. One idea we

look at some of the basics. One idea we had is

had is um all right so AI for coding there's

um all right so AI for coding there's narrow definition of what coding is but

narrow definition of what coding is but there's also a broader definition of

there's also a broader definition of what software engineering right and then

what software engineering right and then maybe we can also look into some of the

maybe we can also look into some of the work our developers don't really prefer

work our developers don't really prefer to do for instance

to do for instance um some men work some of the migration

um some men work some of the migration work some of the I don't know

work some of the I don't know maintenance work and stuff like that so

maintenance work and stuff like that so I want to give some examples of the

I want to give some examples of the things that we been trying and we think

things that we been trying and we think there's pretty good return on

there's pretty good return on investment.

investment. So the question we ask ourselves is how

So the question we ask ourselves is how do we evolve our codebase right the

do we evolve our codebase right the first one is all right wouldn't it be

first one is all right wouldn't it be cool uh the day you get a ticket say hey

cool uh the day you get a ticket say hey you know this piece of software is

you know this piece of software is patched and at the same time you have a

patched and at the same time you have a pull request with the fix with a patch

pull request with the fix with a patch and also with thinking why the patch

and also with thinking why the patch happened that way right so it's kind of

happened that way right so it's kind of like we're trying to uh broadly deploy

like we're trying to uh broadly deploy something called uplift agents

something called uplift agents um broadly scan through our codebase and

um broadly scan through our codebase and figure out what the patch would be

figure out what the patch would be applicable and be able to apply those

applicable and be able to apply those patch step back a little bit. We did

patch step back a little bit. We did have a reg based refraction tool. Um it

have a reg based refraction tool. Um it works to some extent but it's limited

works to some extent but it's limited right now with um LMS and other tooling.

right now with um LMS and other tooling. So we are able to uh see very much

So we are able to uh see very much better results from the um uplift

better results from the um uplift agents. So there are a few challenges in

agents. So there are a few challenges in case you also plan to deploy such

case you also plan to deploy such capabilities. The first one is

capabilities. The first one is I guess any AI or ML it would be really

I guess any AI or ML it would be really nice if there's some detistic

nice if there's some detistic verification capability. uh oftentimes

verification capability. uh oftentimes it's not so easy especially if you have

it's not so easy especially if you have test cases you don't have good llinter

test cases you don't have good llinter if you don't have good verification the

if you don't have good verification the the patch can sometimes be uh uh

the patch can sometimes be uh uh difficult to to to be applied

difficult to to to be applied and uh one thing we also realized when

and uh one thing we also realized when we deploy AI tooling is the average open

we deploy AI tooling is the average open pull requests increased and time to

pull requests increased and time to merge also increased uh because you're

merge also increased uh because you're spinning a lot of new code and then

spinning a lot of new code and then still we have to review the code and

still we have to review the code and merge the code right so time to merge

merge the code right so time to merge become a challenge sometimes. And the

become a challenge sometimes. And the last one is um I think it applies to any

last one is um I think it applies to any gen is the shift becomes what do we want

gen is the shift becomes what do we want to achieve rather than how we want to

to achieve rather than how we want to achieve. Right? So the second example

achieve. Right? So the second example that I I want to share is uh the other

that I I want to share is uh the other area that people kind of like sometimes

area that people kind of like sometimes really impact our productivity in a

really impact our productivity in a negative way or impact our stability in

negative way or impact our stability in negative way is how we handle instance.

negative way is how we handle instance. So we're trying to develop and then

So we're trying to develop and then deploy um in response agents. Um now

deploy um in response agents. Um now the importance of this is if you really

the importance of this is if you really think about GI tools it's really really

think about GI tools it's really really fast and it's also unbiased right

fast and it's also unbiased right instance it can go through your codebase

instance it can go through your codebase really quickly. It can go through your

really quickly. It can go through your telemetry system very quickly. It can go

telemetry system very quickly. It can go through your feature flags very quickly.

through your feature flags very quickly. it can go through your um I don't know

it can go through your um I don't know call trace very quickly and in I

call trace very quickly and in I unbiased lens when we do troubleshooting

unbiased lens when we do troubleshooting sometimes we have this biased views it

sometimes we have this biased views it must be this it turns out to be not the

must be this it turns out to be not the case so there's many many interesting

case so there's many many interesting benefits um by uh deploying agents from

benefits um by uh deploying agents from this perspective

this perspective and then the second question is become

and then the second question is become interesting is imagine you have

interesting is imagine you have organization of 10,000 pe um let's say

organization of 10,000 pe um let's say 9,000 people as I described a lot of

9,000 people as I described a lot of people trying to fix those problems,

people trying to fix those problems, right? And you can have 10 teams who

right? And you can have 10 teams who wants to build a pull request review

wants to build a pull request review bots. You have 20 teams who wants to

bots. You have 20 teams who wants to build a instant response agents, right?

build a instant response agents, right? They become very quickly chaotic and

They become very quickly chaotic and sometimes can have duplications.

sometimes can have duplications. So before I talk about the pay pass, I'm

So before I talk about the pay pass, I'm going to give example of the uh instance

going to give example of the uh instance response agent. So basically this is

response agent. So basically this is what you know a in response agent will

what you know a in response agent will look like. Um the key part is we're

look like. Um the key part is we're going to need to build a lot of MCP

going to need to build a lot of MCP servers to connect to the uh the metrics

servers to connect to the uh the metrics and logs dashboards you have connect to

and logs dashboards you have connect to the topology you have whether it's

the topology you have whether it's network topology or it's the um your

network topology or it's the um your service dependency topology uh your

service dependency topology uh your alarms your triggers right your SLOs's

alarms your triggers right your SLOs's and then we kind of don't want people

and then we kind of don't want people just start building MCP servers uh

just start building MCP servers uh without a pay pass so we created a pay

without a pay pass so we created a pay pass in partnership with our AI

pass in partnership with our AI organization and I will talk a little

organization and I will talk a little bit what that means.

bit what that means. Before that

Before that um I do want to explain a little bit

um I do want to explain a little bit some of the platform principles.

some of the platform principles. Some company allow teams to be have a

Some company allow teams to be have a lot of freedom as at the same time

lot of freedom as at the same time responsibility in the sense a business

responsibility in the sense a business unit can build whatever infrastructure

unit can build whatever infrastructure whatever platform.

whatever platform. um some organization

um some organization have a very very strong tight

have a very very strong tight abstraction of the service

abstraction of the service infrastructure and typically kind of

infrastructure and typically kind of have to use their platforms right so

have to use their platforms right so Bloomberg is kind of in the middle if

Bloomberg is kind of in the middle if you look at the golden ones we kind of

you look at the golden ones we kind of believe in provide a golden path

believe in provide a golden path um with enablement teams so my team is

um with enablement teams so my team is really a en enabling team and one of the

really a en enabling team and one of the guiding principle for us is we want to

guiding principle for us is we want to make easy is extremely easy to do. Uh

make easy is extremely easy to do. Uh sorry, the right thing is extremely easy

sorry, the right thing is extremely easy to do and we want to make sure the wrong

to do and we want to make sure the wrong thing is ridiculously hard to do. So

thing is ridiculously hard to do. So that's the guiding principle here.

that's the guiding principle here. Now move on. So what is the pay path

Now move on. So what is the pay path here? So the pay path is

here? So the pay path is uh we have a gateway so that teams can

uh we have a gateway so that teams can easily figure out which model works the

easily figure out which model works the best. They can do quick experiments.

best. They can do quick experiments. they can um we can have visibility of

they can um we can have visibility of what kind of models being used and we

what kind of models being used and we can also guide through the teams which

can also guide through the teams which model should is a better fit for the for

model should is a better fit for the for the problem they want to solve. uh we

the problem they want to solve. uh we have a two discovery uh basically MCP

have a two discovery uh basically MCP directory via hub so that let's say team

directory via hub so that let's say team A wants to do something they will go to

A wants to do something they will go to the hub okay someone is building MCB

the hub okay someone is building MCB server already maybe I should partner

server already maybe I should partner with them to build it together right

with them to build it together right uh tool creation and deployment is via a

uh tool creation and deployment is via a pass it's basically a um you know a

pass it's basically a um you know a standard platform service where you can

standard platform service where you can do your SLC and and we provide runtime

do your SLC and and we provide runtime environment for you as well taking care

environment for you as well taking care of all and side of things as well so it

of all and side of things as well so it really reduce friction of for for teams

really reduce friction of for for teams to to deploy um their MT MCP servers.

to to deploy um their MT MCP servers. And then this is kind of interesting is

And then this is kind of interesting is we want to make demo very easy so that

we want to make demo very easy so that or I really say proof concept very easy

or I really say proof concept very easy so that people can try have ideal

so that people can try have ideal generation uh because we believe in

generation uh because we believe in creativity come from some freedom of try

creativity come from some freedom of try different new things but we also want to

different new things but we also want to make sure the production requires some

make sure the production requires some quality control. um

quality control. um because at the end of the day stability

because at the end of the day stability and system reliability is at the core of

and system reliability is at the core of our business. So this is sort of the

our business. So this is sort of the path that we deployed um and enabled the

path that we deployed um and enabled the rest of engineering really the 9,000

rest of engineering really the 9,000 software engineers to do their job.

software engineers to do their job. Okay.

Okay. And um with all this and then we start

And um with all this and then we start maybe okay yes we got path uh path we

maybe okay yes we got path uh path we have some good ideas of how to evolve

have some good ideas of how to evolve our codebase.

our codebase. help out our people right um now this is

help out our people right um now this is where I find that

where I find that any new things any adoption of new

any new things any adoption of new things provide opportunity to leverage

things provide opportunity to leverage the strength you have and also identify

the strength you have and also identify the some of the weakness that you may

the some of the weakness that you may have so um in Bloomberg we have a

have so um in Bloomberg we have a wellestablished training program uh it's

wellestablished training program uh it's more than 20 years so there's on

more than 20 years so there's on boarding training depends on entry level

boarding training depends on entry level it depends on senior level um so we have

it depends on senior level um so we have this whole training program to prepare

this whole training program to prepare folks to before they join a team. And

folks to before they join a team. And what we did is we just incorporate AI

what we did is we just incorporate AI coding in on boarding training program

coding in on boarding training program and also show them how to best utilize

and also show them how to best utilize them with our principles and our

them with our principles and our technologies right there's a huge

technologies right there's a huge benefits here because um if any of you

benefits here because um if any of you run into the challenge of adoption

run into the challenge of adoption somehow run into a chasm right the rest

somehow run into a chasm right the rest of is not uh adopt as quick as possible.

of is not uh adopt as quick as possible. Whenever we have folks join a company,

Whenever we have folks join a company, they learn how to do things in new way.

they learn how to do things in new way. When they go back to their team, they

When they go back to their team, they were like, "Hey, why don't we do that?"

were like, "Hey, why don't we do that?" Right? They're going to challenge the

Right? They're going to challenge the some of the senior folks as well to say,

some of the senior folks as well to say, "Hey, there's a new way to do this type

"Hey, there's a new way to do this type of things. Why don't we do that?" So, we

of things. Why don't we do that?" So, we actually find this program extremely

actually find this program extremely effective uh to be a change agent for

effective uh to be a change agent for anything I want to push out.

anything I want to push out. And then bunch of results. There's a lot

And then bunch of results. There's a lot more familiarity and comfortable with

more familiarity and comfortable with the tooling. Um and also the important

the tooling. Um and also the important part is there's lot more nuance insights

part is there's lot more nuance insights of where it's at value right

of where it's at value right the second one is um often times we run

the second one is um often times we run organization to push uh new initiatives

organization to push uh new initiatives so within bloomer we have something

so within bloomer we have something called um a champ program and a guild

called um a champ program and a guild program that's basically a cross

program that's basically a cross organization or tech communities where

organization or tech communities where people have similar interest and similar

people have similar interest and similar passion they get together and get stuff

passion they get together and get stuff done so Um we had this for more than 10

done so Um we had this for more than 10 years now. Uh we sort of bootstrapped

years now. Uh we sort of bootstrapped engineer AI productivity community two

engineer AI productivity community two years back leveraged the community we

years back leveraged the community we have already and then have some few

have already and then have some few results uh because we have this pretty

results uh because we have this pretty much everyone passionate about this and

much everyone passionate about this and will be in that community. So

will be in that community. So organically it dduplicates efforts and

organically it dduplicates efforts and there's shared learning uh shared

there's shared learning uh shared learning happening

learning happening and it also helps to boost inner source

and it also helps to boost inner source contributions and then visit engineer

contributions and then visit engineer idea right often times team A wants to

idea right often times team A wants to do something team B let's say a platform

do something team B let's say a platform team have different prioritization and

team have different prioritization and the way we solve this is via inner

the way we solve this is via inner source or via visit engineer we just

source or via visit engineer we just move someone over the team work for six

move someone over the team work for six months a year get it done and then we

months a year get it done and then we can move on Um the last one is

can move on Um the last one is interesting. So our data shows

interesting. So our data shows individual contributors have a much

individual contributors have a much better stronger adoption than our

better stronger adoption than our leadership team. Now if you think about

leadership team. Now if you think about this a lot of software TLS and managers

this a lot of software TLS and managers in the age of AI they kind of don't

in the age of AI they kind of don't really have

really have um enough experience to truly guide

um enough experience to truly guide their teams to build software right so

their teams to build software right so often times the stuff that they learned

often times the stuff that they learned before might not be exactly applicable

before might not be exactly applicable it's still very valuable but there's

it's still very valuable but there's some missing piece there to make sure

some missing piece there to make sure they can continue to guide the team to

they can continue to guide the team to do the right thing. So, we're rolling

do the right thing. So, we're rolling out leadership workshops to make sure

out leadership workshops to make sure our leaders are equipped with whatever

our leaders are equipped with whatever knowledge they need to have to drive the

knowledge they need to have to drive the techn um innovation.

techn um innovation. So, um I'm going to close my part and to

So, um I'm going to close my part and to share with you what uh the part I'm I

share with you what uh the part I'm I feel most excited about. The part I feel

feel most excited about. The part I feel most excite most excited about is that

most excite most excited about is that with a lot of um creativity and

with a lot of um creativity and innovation in the geni space, it

innovation in the geni space, it actually changes the cost function of

actually changes the cost function of software engineering.

software engineering. Meaning

Meaning the trade-off decision of whether we do

the trade-off decision of whether we do something versus we don't do something

something versus we don't do something actually changed because some of the

actually changed because some of the work become a lot cheaper to do and some

work become a lot cheaper to do and some work become a lot more expensive to do.

work become a lot more expensive to do. I tend to think it is a great

I tend to think it is a great opportunity for engineers and

opportunity for engineers and engineering leaders to get back to some

engineering leaders to get back to some of the uh basic principles and sort of

of the uh basic principles and sort of ask a soul searching question. What is a

ask a soul searching question. What is a high quality soft engineering and how

high quality soft engineering and how can we use a tool for that purpose? So

can we use a tool for that purpose? So that's it. Thank you very much.

that's it. Thank you very much. [applause]

[applause] [music]

[music] Our

next speaker helped to reimagine a beloved browser from Arcadia by

beloved browser from Arcadia by rebuilding it around AI native

rebuilding it around AI native experiences.

experiences. Please welcome to the stage head of AI

Please welcome to the stage head of AI engineering at the browser company Samir

engineering at the browser company Samir Motti. [music]

Hey everyone. Oh wow. How's it going? Um, my name is Samir and I'm the head of

Um, my name is Samir and I'm the head of AI engineering at the browser company of

AI engineering at the browser company of New York. And today I'm going to talk a

New York. And today I'm going to talk a little bit about how we transitioned

little bit about how we transitioned from building ARC to DIA and the lessons

from building ARC to DIA and the lessons we learned in building an AI browser.

we learned in building an AI browser. But first, a little about the browser

But first, a little about the browser company.

company. So we started with a mission to rethink

So we started with a mission to rethink how people use the internet. At its

how people use the internet. At its core, we believe that the browser is one

core, we believe that the browser is one of the most important pieces of software

of the most important pieces of software in your life and it wasn't getting the

in your life and it wasn't getting the attention it deserved.

attention it deserved. Simply put, the way we've used a browser

Simply put, the way we've used a browser has changed over the last couple

has changed over the last couple decades, but the browser itself hadn't.

decades, but the browser itself hadn't. And think about this, we we started this

And think about this, we we started this company in 2019. Um, and so this is a

company in 2019. Um, and so this is a screen cap of Josh, our CEO, sharing a

screen cap of Josh, our CEO, sharing a little bit about our idea on the

little bit about our idea on the internet a few years ago, which we

internet a few years ago, which we endearingly called the internet

endearingly called the internet computer. So our mission has been to

computer. So our mission has been to build a browser that reflects how people

build a browser that reflects how people use the internet today and how we think

use the internet today and how we think the browser should be used tomorrow.

the browser should be used tomorrow. So through years of discovery, trial and

So through years of discovery, trial and error, and some ups and some downs, we

error, and some ups and some downs, we shipped our first browser, Arc, in 2022.

shipped our first browser, Arc, in 2022. It was a browser we felt was an

It was a browser we felt was an improvement over the browsers of that

improvement over the browsers of that time. It made the internet more

time. It made the internet more personal, more organized, and to us a

personal, more organized, and to us a little more delightful with a little

little more delightful with a little more craft.

more craft. And it was a browser that was loved by

And it was a browser that was loved by many. It still is by millions. many of

many. It still is by millions. many of whom are probably in this audience

whom are probably in this audience today. I've gotten a lot of questions

today. I've gotten a lot of questions about ARC today. Um, and it's great, but

about ARC today. Um, and it's great, but um, if we took a step back, we felt that

um, if we took a step back, we felt that ARC was still just an incremental

ARC was still just an incremental improvement over the browsers of that

improvement over the browsers of that time. And it didn't really hit the

time. And it didn't really hit the vision that we set out to create. And

vision that we set out to create. And so, uh, we kept building. And then in

so, uh, we kept building. And then in 2022, we got access to LLMs like the GPT

2022, we got access to LLMs like the GPT models. And so we started like we always

models. And so we started like we always do with prototyping. We started trying

do with prototyping. We started trying new ideas um and eventually shipped a

new ideas um and eventually shipped a few of them in ARC. But what started as

few of them in ARC. But what started as a you know a basic exploration turned

a you know a basic exploration turned into a fully formed thesis. In the

into a fully formed thesis. In the beginning of 2024, uh, our company put

beginning of 2024, uh, our company put out what we called act two, a video on

out what we called act two, a video on YouTube where we shared that thesis that

YouTube where we shared that thesis that we believe that AI is going to transform

we believe that AI is going to transform how people use the internet and in turn

how people use the internet and in turn fundamentally change the browser itself.

fundamentally change the browser itself. And so with that, we started building

And so with that, we started building again, but this time we built a new

again, but this time we built a new browser with AI speed and security in

browser with AI speed and security in mind and from the ground up. And later

mind and from the ground up. And later or sorry earlier this year we shipped

or sorry earlier this year we shipped DIA our AI native browser.

DIA our AI native browser. It allows you to have an assistant

It allows you to have an assistant alongside you in all the work you do in

alongside you in all the work you do in the browser. It gets to know you,

the browser. It gets to know you, personalizes, helps you get work done

personalizes, helps you get work done with your tabs and effectively get more

with your tabs and effectively get more work done through the apps you use. And

work done through the apps you use. And while it hasn't achieved our vision yet,

while it hasn't achieved our vision yet, we fully believe it's well on the way

we fully believe it's well on the way too.

So it is not easy to build a product. You all know that. Let alone two. The

You all know that. Let alone two. The latter of which an AI native one. We've

latter of which an AI native one. We've had a lot of years of iteration, trial

had a lot of years of iteration, trial and error. And through that we've

and error. And through that we've learned a lot. And I'm going to just

learned a lot. And I'm going to just talk about a few of those things uh here

talk about a few of those things uh here today.

today. The first I want to talk about is

The first I want to talk about is optimizing your tools and process for

optimizing your tools and process for faster iteration. From the beginning,

faster iteration. From the beginning, browser company has believed that we're

browser company has believed that we're not going to win unless we build the

not going to win unless we build the tools, the process, the platform, and

tools, the process, the platform, and the mindset to iterate, build, ship, and

the mindset to iterate, build, ship, and learn faster than everyone else. And

learn faster than everyone else. And that of course holds true today. But the

that of course holds true today. But the form it takes with AI and an AI native

form it takes with AI and an AI native product has changed.

product has changed. So even as a small company, where are we

So even as a small company, where are we investing in tooling these days? First

investing in tooling these days? First is prototyping for AI product features.

is prototyping for AI product features. Second is building and running evals.

Second is building and running evals. Third is collecting data for training

Third is collecting data for training and for evals. And uh last but

and for evals. And uh last but definitely not least automation for hill

definitely not least automation for hill climbing.

climbing. So let's start with tools. Initially uh

So let's start with tools. Initially uh as we always do we built some tools. The

as we always do we built some tools. The first was a very rudimentary uh prompt

first was a very rudimentary uh prompt editor and it was only in dev builds.

editor and it was only in dev builds. What did what did this mean for us? Well

What did what did this mean for us? Well it meant a few things. One limited

it meant a few things. One limited access as only engineers were able to

access as only engineers were able to access this. two slow iteration speeds

access this. two slow iteration speeds and three none of your personal context

and three none of your personal context and as you all know with an AI product

and as you all know with an AI product the context is what matters and what's

the context is what matters and what's gives you the feel of whether product is

gives you the feel of whether product is good or not. So we evolved and since

good or not. So we evolved and since then we built all of our tools into our

then we built all of our tools into our product. the product that we as a

product. the product that we as a company internally use every day and

company internally use every day and that includes the prompts, the tools,

that includes the prompts, the tools, the context, the models, every

the context, the models, every parameter. Um, which has not only

parameter. Um, which has not only allowed us to 10x our speed of ideating,

allowed us to 10x our speed of ideating, iterating and refining our products, but

iterating and refining our products, but has also widened the number of people

has also widened the number of people who can access and iterate on our

who can access and iterate on our products themselves from our CEO to our

products themselves from our CEO to our newest hire can ideate and create a new

newest hire can ideate and create a new product in DIA and also refine an

product in DIA and also refine an existing one all with their full

existing one all with their full context.

context. And this holds true with all of our

And this holds true with all of our major product protocols. We have tools

major product protocols. We have tools for optimizing our memory knowledge

for optimizing our memory knowledge graph which all of us use and we have

graph which all of us use and we have tools for creating iterating on our

tools for creating iterating on our computer use mechanism. We actually

computer use mechanism. We actually tried tens of different types of

tried tens of different types of computer use strategies before landing

computer use strategies before landing on one before even building it into the

on one before even building it into the product itself.

product itself. And I'll say and I'll end this part with

And I'll say and I'll end this part with uh it actually is a lot of fun. People

uh it actually is a lot of fun. People don't talk about that a lot but uh

don't talk about that a lot but uh actually building these tools into our

actually building these tools into our product has enabled so much creativity.

product has enabled so much creativity. It has enabled our PMs, our designers,

It has enabled our PMs, our designers, uh customer service and strategy and ops

uh customer service and strategy and ops to try out new ideas that are tailored

to try out new ideas that are tailored to their use cases. And that ultimately

to their use cases. And that ultimately is what we're trying to do.

is what we're trying to do. The next thing I want to talk about is

The next thing I want to talk about is how we evolve and optimize our prompts

how we evolve and optimize our prompts through a mechanism called Jeepa. This

through a mechanism called Jeepa. This for us is very nent but an important

for us is very nent but an important learning nevertheless.

learning nevertheless. How we hill climb and refine our AI

How we hill climb and refine our AI products is just as important as

products is just as important as ideating them in the first place. So

ideating them in the first place. So we're investing in mechanisms to help

we're investing in mechanisms to help with this to enable faster hill climbing

with this to enable faster hill climbing and one of those being Jeepa and this is

and one of those being Jeepa and this is based on a paper from earlier this year

based on a paper from earlier this year from a few smart folks.

from a few smart folks. So the key motivation here is simple.

So the key motivation here is simple. It's a sample efficient way to improve a

It's a sample efficient way to improve a complex LLM system without having to

complex LLM system without having to leverage RL or other fine-tuning

leverage RL or other fine-tuning techniques. And for us as a small

techniques. And for us as a small company that's hugely critical.

company that's hugely critical. And how it works is you're able to seed

And how it works is you're able to seed the system with a set of prompts, then

the system with a set of prompts, then execute it across a set of tasks and

execute it across a set of tasks and score them. Then leverage a mechanism

score them. Then leverage a mechanism called PA selection to select the best

called PA selection to select the best ones. And then leverage an LLM on top of

ones. And then leverage an LLM on top of that to reflect on what went well and

that to reflect on what went well and what didn't and then generate new

what didn't and then generate new prompts and then repeat with the key

prompts and then repeat with the key innovations here being around that

innovations here being around that reflective prompt mutation technique.

reflective prompt mutation technique. the selection process which allows you

the selection process which allows you to explore more of the space of

to explore more of the space of prompting rather than one avenue and the

prompting rather than one avenue and the ability to tune text and not weights.

ability to tune text and not weights. And here's a modest uh example of this

And here's a modest uh example of this at work for us. You know, you can

at work for us. You know, you can provide it a very simple uh a simple

provide it a very simple uh a simple simple prompt and run it through JPA and

simple prompt and run it through JPA and it's able to optimize it uh along the

it's able to optimize it uh along the metrics and scoring mechanisms that we

metrics and scoring mechanisms that we uh created to refine that prompt.

And so if I take a step back and talk about kind of how we build uh for

about kind of how we build uh for certain types of features, I would buck

certain types of features, I would buck it into a couple different phases. The

it into a couple different phases. The first is that prototyping and ideation

first is that prototyping and ideation phase where we have widened the breadth

phase where we have widened the breadth of number of ideas at the top of the

of number of ideas at the top of the funnel um and lower the threshold on who

funnel um and lower the threshold on who can build them and how. And so we try

can build them and how. And so we try out a bunch of ideas every week, every

out a bunch of ideas every week, every day from all types of people and we dog

day from all types of people and we dog food those. And if we feel like there's

food those. And if we feel like there's actually real utility there, it's

actually real utility there, it's solving a real problem for us and there

solving a real problem for us and there is a path towards actually hitting the

is a path towards actually hitting the quality threshold that we believe we

quality threshold that we believe we need to hit, then we'll move on to this

need to hit, then we'll move on to this next phase where we collect and refine

next phase where we collect and refine eval to clarify product requirements and

eval to clarify product requirements and then hill climb through code through

then hill climb through code through prompting and automated techniques like

prompting and automated techniques like Jeepa and then dog food as we always do

Jeepa and then dog food as we always do internally and then chip.

internally and then chip. And I do want to kind of double down on

And I do want to kind of double down on these phases. The ideation phase is

these phases. The ideation phase is extremely important just as much as that

extremely important just as much as that refinement phase.

And our goal is to enable faster ideation and a more efficient path to

ideation and a more efficient path to shipping because with all these AI

shipping because with all these AI advancements every week, new

advancements every week, new possibilities are unlocked in DIA. And

possibilities are unlocked in DIA. And it's up to us as a browser, as a product

it's up to us as a browser, as a product to get as many at bats with these new

to get as many at bats with these new ideas and try out as many of them and

ideas and try out as many of them and explore as many of them as possible. At

explore as many of them as possible. At the same time though not underestimating

the same time though not underestimating the path it takes to ship some of these

the path it takes to ship some of these ideas to productions as a high quality

ideas to productions as a high quality experience.

Next uh I want to talk about treating model behavior as a craft and

model behavior as a craft and discipline.

discipline. So what is model behavior to us? It's

So what is model behavior to us? It's the function that defines evaluates and

the function that defines evaluates and ships the desired behavior models. It's

ships the desired behavior models. It's turning principles into product

turning principles into product requirements, prompts and evals, and

requirements, prompts and evals, and ultimately shaping the behavior and the

ultimately shaping the behavior and the personality of our LLM products and

personality of our LLM products and ultimately for us our DIA assistant.

ultimately for us our DIA assistant. So, I'd buck it into a few different

So, I'd buck it into a few different areas. First, it's that behavior design

areas. First, it's that behavior design defining the product experience we

defining the product experience we actually want, the style, the tone, the

actually want, the style, the tone, the shape of responses in some cases. Then,

shape of responses in some cases. Then, it's collecting that data for

it's collecting that data for measurement and training, clarifying

measurement and training, clarifying those product requirements through eval.

those product requirements through eval. And last but not least, it's the model

And last but not least, it's the model steering. It's the building of the

steering. It's the building of the product itself. It's the prompting. It's

product itself. It's the prompting. It's the model selection. It's defining the

the model selection. It's defining the what's in the context window, the

what's in the context window, the parameters, etc. Um, and so much more.

parameters, etc. Um, and so much more. And to us, that that process is

And to us, that that process is iterative, very iterative. We build,

iterative, very iterative. We build, refine, we create evals, and then we

refine, we create evals, and then we ship, and then we collect more feedback

ship, and then we collect more feedback and feed that into our iterative

and feed that into our iterative building process. That could be internal

building process. That could be internal feedback, and that could be also uh

feedback, and that could be also uh external feedback.

external feedback. And so I move on for a second. One

And so I move on for a second. One analogy we've thought about uh is for

analogy we've thought about uh is for model behaviors that to product design

model behaviors that to product design through the evolution of the internet.

through the evolution of the internet. At first websites were functional. They

At first websites were functional. They got the job done. But over time that

got the job done. But over time that evolved as we tried to achieve more on

evolved as we tried to achieve more on the internet and technology advanced. Uh

the internet and technology advanced. Uh product design and the craft of the

product design and the craft of the internet itself grew as well as well as

internet itself grew as well as well as the complexity.

the complexity. And so what might that be for model

And so what might that be for model behavior? Well, at first it was

behavior? Well, at first it was functional. We had prompts, we had

functional. We had prompts, we had evals, we had instructions in and output

evals, we had instructions in and output out. Now we frame it through agent

out. Now we frame it through agent behaviors. It's goal- directed

behaviors. It's goal- directed reasoning, the shaping of autonomous

reasoning, the shaping of autonomous tasks, selfcorrection in learning, and

tasks, selfcorrection in learning, and even shaping the personality of the LLM

even shaping the personality of the LLM models themselves.

models themselves. And so, what might the future hold? I'm

And so, what might the future hold? I'm excited to see. But what we believe is

excited to see. But what we believe is that we are in the early days of

that we are in the early days of building AI products and model behavior

building AI products and model behavior will continue to evolve and into a

will continue to evolve and into a specialized and prevalent function of

specialized and prevalent function of its own even at product companies.

its own even at product companies. And the last thing I'll leave you with

And the last thing I'll leave you with here is that the best people for it

here is that the best people for it might just surprise you. One of my

might just surprise you. One of my favorite stories about building DIA

favorite stories about building DIA these last couple years has been uh the

these last couple years has been uh the formation of actually this model

formation of actually this model behavior team. As I mentioned earlier,

behavior team. As I mentioned earlier, uh engineers were writing the prompts at

uh engineers were writing the prompts at first and then we built these prompt

first and then we built these prompt tools to enable more people at the

tools to enable more people at the company to actually prompt and iterate.

company to actually prompt and iterate. And there was a person on our team on

And there was a person on our team on the strategy and ops team. And he

the strategy and ops team. And he actually leveraged these prompt tools

actually leveraged these prompt tools one weekend to rewrite all our prompts.

one weekend to rewrite all our prompts. And he came in on a Monday morning and

And he came in on a Monday morning and dropped a Loom video sharing what he

dropped a Loom video sharing what he did, how he did it, and why. And a set

did, how he did it, and why. And a set of prompts. And those prompts alone

of prompts. And those prompts alone unlocked a new level of capability and

unlocked a new level of capability and quality and experience in our product.

quality and experience in our product. And consequentially uh it was the

And consequentially uh it was the formation of our model behavior team.

formation of our model behavior team. And so one thing I'd emphasize to you

And so one thing I'd emphasize to you all is to think about who are those

all is to think about who are those people at the company agnostic of their

people at the company agnostic of their role who can help shape your product and

role who can help shape your product and help shape and steer the model itself.

help shape and steer the model itself. It might not be an engineer or it might

It might not be an engineer or it might be it could also be someone on the

be it could also be someone on the strategy and ops team.

strategy and ops team. Next, I want to talk about AI security

Next, I want to talk about AI security as an emergent property of product

as an emergent property of product building. And today, I'm going to focus

building. And today, I'm going to focus specifically on prompt injections.

specifically on prompt injections. So, what is a prompt injection? Well,

So, what is a prompt injection? Well, it's a prompt attack in which a third

it's a prompt attack in which a third party can override the instructions of

party can override the instructions of an LLM to cause harm. That might be data

an LLM to cause harm. That might be data exfiltration, the execution of malicious

exfiltration, the execution of malicious commands, or ignoring safety rules.

commands, or ignoring safety rules. And so here's an example in which you

And so here's an example in which you give uh the context of a website to an

give uh the context of a website to an LLM and instruct it to summarize it.

LLM and instruct it to summarize it. Little did you know that there was a

Little did you know that there was a prompt injection hidden in that

prompt injection hidden in that website's uh HTML. So instead of

website's uh HTML. So instead of actually summarizing the web page, the

actually summarizing the web page, the LM actually gets directed to open a new

LM actually gets directed to open a new website, extracting your personal

website, extracting your personal information and embedding it as get

information and embedding it as get parameters in the website's URL,

parameters in the website's URL, effectively excfiltrating that data.

effectively excfiltrating that data. So, as a browser, prompt injections are

So, as a browser, prompt injections are extremely crucial for us to prevent.

extremely crucial for us to prevent. They're critical to prevent

They're critical to prevent because browsers sit at the middle of

because browsers sit at the middle of what we can call a lethal trifecta.

what we can call a lethal trifecta. It has access to your private data. It

It has access to your private data. It has exposure to untrusted content and it

has exposure to untrusted content and it has the ability to externally

has the ability to externally communicate. And for us, that means

communicate. And for us, that means opening websites, sending emails,

opening websites, sending emails, scheduling events, etc. So, how to

scheduling events, etc. So, how to prevent this? Well, there's some

prevent this? Well, there's some technical strategies we can try. First

technical strategies we can try. First is wrapping that untrusted context in

is wrapping that untrusted context in tax. You can tell the LM, listen to

tax. You can tell the LM, listen to these instructions around these tags and

these instructions around these tags and don't listen to the content around these

don't listen to the content around these tags. But this is easily escapable and

tags. But this is easily escapable and quite trivy, an attacker could still uh

quite trivy, an attacker could still uh leverage a prompt injection on your

leverage a prompt injection on your browser.

browser. Well, another solution we could try is

Well, another solution we could try is separating that data and that

separating that data and that instructions. We can assign uh the

instructions. We can assign uh the operating instructions to a system role

operating instructions to a system role and we can assign a user role for the

and we can assign a user role for the content of the third party and even

content of the third party and even layer on randomly generated tags to wrap

layer on randomly generated tags to wrap that user content to be extra sure that

that user content to be extra sure that the LM listens to the instructions and

the LM listens to the instructions and not the content. And while this can

not the content. And while this can help, there are no guarantees and prompt

help, there are no guarantees and prompt injections will still happen.

injections will still happen. So what do we do? Well, it's on us to

So what do we do? Well, it's on us to design a product with that in mind. We

design a product with that in mind. We have to blend technology approaches and

have to blend technology approaches and user experience and design into a

user experience and design into a cohesive story that actually builds them

cohesive story that actually builds them from the ground up and solves it

from the ground up and solves it together.

together. So, what that might what that excuse me

So, what that might what that excuse me what might that be for a feature in DIA?

what might that be for a feature in DIA? Well, let's take the autofill tool in

Well, let's take the autofill tool in DIA. The autofill tool allows you to

DIA. The autofill tool allows you to leverage an LLM with context, memory,

leverage an LLM with context, memory, and your details to fill forms on the

and your details to fill forms on the internet. It's extremely powerful, but

internet. It's extremely powerful, but as you can imagine, it has some

as you can imagine, it has some vulnerabilities. A prompt injection here

vulnerabilities. A prompt injection here could extract your data and put it on a

could extract your data and put it on a form, and once it's on that form, it's

form, and once it's on that form, it's out of your hands.

out of your hands. So, we try to build with that in mind.

So, we try to build with that in mind. In this case, before the form is written

In this case, before the form is written to, we actually let the user read and

to, we actually let the user read and confirm that data in plain text. This

confirm that data in plain text. This doesn't prevent a prompt injection, but

doesn't prevent a prompt injection, but it gives the user control, awareness,

it gives the user control, awareness, and trust in what is happening. And this

and trust in what is happening. And this is a framing we carry throughout our

is a framing we carry throughout our product and how we build every single

product and how we build every single feature. So here are some examples.

feature. So here are some examples. Scheduling events in DIA, we have a

Scheduling events in DIA, we have a similar confirmation step. Writing

similar confirmation step. Writing emails India, we also have a similar

emails India, we also have a similar confirmation step.

So I've talked about three different things here today. First is optimizing

things here today. First is optimizing your tools and process for fast

your tools and process for fast iteration. Second, treating model

iteration. Second, treating model behavior as a craft and discipline. And

behavior as a craft and discipline. And third, AI security as an emergent

third, AI security as an emergent property of building products.

property of building products. But uh the last thing I want to leave

But uh the last thing I want to leave you with, when we started on this

you with, when we started on this journey to building DIA, we recognized a

journey to building DIA, we recognized a technology shift and we sought to evolve

technology shift and we sought to evolve our product of ARC. We initially came at

our product of ARC. We initially came at it from a hey, how can we leverage AI to

it from a hey, how can we leverage AI to make ARC better, make the browser

make ARC better, make the browser better. But what we quickly learned and

better. But what we quickly learned and adapted to was that it wasn't just a

adapted to was that it wasn't just a product evolution. It was a company one

product evolution. It was a company one and today I shared a glimpse of that.

and today I shared a glimpse of that. How we build and how it's changed a team

How we build and how it's changed a team we've literally created around this and

we've literally created around this and how we think about security for AI

how we think about security for AI products. But really it's so much more.

products. But really it's so much more. It goes beyond that. It's how we train

It goes beyond that. It's how we train everyone here. It's how we hire. It's

everyone here. It's how we hire. It's how we communicate. It's how we

how we communicate. It's how we collaborate and so much more. And if

collaborate and so much more. And if there's one thing I'll leave you all

there's one thing I'll leave you all with, if there's one thing we've learned

with, if there's one thing we've learned over the last couple years, it's that

over the last couple years, it's that when when you recognize that technology

when when you recognize that technology shift, you have to embrace it. And you

shift, you have to embrace it. And you have to embrace it with conviction.

have to embrace it with conviction. Thank you.

Thank you. [applause]

Our next speaker [music] draws on over 20 years in enterprise

draws on over 20 years in enterprise developer experience to ask what will

developer experience to ask what will still matter when AI coding agents are

still matter when AI coding agents are everywhere. Please welcome to the stage

everywhere. Please welcome to the stage executive distinguished engineer at

executive distinguished engineer at Capital 1, Max Canet Alexander.

Capital 1, Max Canet Alexander. [music]

[music] [applause]

Hey, how's everybody doing? Still awake? Okay, great. So

Okay, great. So like the robot voice said, I have been

like the robot voice said, I have been doing developer experience for a very

doing developer experience for a very long time and I have never in my life

long time and I have never in my life seen anything like the last 12 months.

seen anything like the last 12 months. The you know about every two to three

The you know about every two to three weeks software engineers been making

weeks software engineers been making this face on the screen.

this face on the screen. Okay. And if you work in developer

Okay. And if you work in developer experience the problem is even worse.

experience the problem is even worse. You're like this guy on the screen every

You're like this guy on the screen every few weeks. You're like, "Oh yeah, yeah,

few weeks. You're like, "Oh yeah, yeah, yeah, yeah, yeah. Here's the new

yeah, yeah, yeah. Here's the new hotness." And then somebody else comes

hotness." And then somebody else comes up and they're like, "Well, can I use

up and they're like, "Well, can I use the the new new hotness?" And you know,

the the new new hotness?" And you know, people have been doing that for years.

people have been doing that for years. I've been working in developer

I've been working in developer experience for a long time. Everybody

experience for a long time. Everybody always shows up and they're like, "Oh,

always shows up and they're like, "Oh, can I use this tool that came out

can I use this tool that came out yesterday?" And you're like, "No, of

yesterday?" And you're like, "No, of course not." And now we're like, "Uh,

course not." And now we're like, "Uh, maybe yes." Right? And what this leads

maybe yes." Right? And what this leads to overall is the future is super hard

to overall is the future is super hard to predict right now. So

to predict right now. So a I think a lot of people a lot of CTO's

a I think a lot of people a lot of CTO's a lot of people who work in developer

a lot of people who work in developer experience to people who care about

experience to people who care about helping developers are asking themselves

helping developers are asking themselves this question

this question are all of my investments going to go to

are all of my investments going to go to waste like what could I invest in now

waste like what could I invest in now that if I look back at the end of 2026

that if I look back at the end of 2026 I'll be like I sure am glad that I

I'll be like I sure am glad that I invested in that for my developers and I

invested in that for my developers and I think a lot of people have just decided

think a lot of people have just decided well I don't know I guess it's just

well I don't know I guess it's just coding agents and I guess they'll fix

coding agents and I guess they'll fix every single thing about my entire

every single thing about my entire company by themselves.

company by themselves. they're amazing.

The first one is how can we use our understanding of the principles of

understanding of the principles of developer experience to know what's

developer experience to know what's going to be valuable no matter what

going to be valuable no matter what happens. Okay. And what do we need to do

happens. Okay. And what do we need to do to get the maximum possible value from

to get the maximum possible value from AI agents? like what would we need to

AI agents? like what would we need to fix at all levels outside of the agents

fix at all levels outside of the agents in order to make sure that the agents

in order to make sure that the agents and our developers can be as effective

and our developers can be as effective as possible. And this isn't like a minor

as possible. And this isn't like a minor question. These are the sorts of things

question. These are the sorts of things that could make or break you as a

that could make or break you as a software business going into the future.

software business going into the future. So let's talk about what some of those

So let's talk about what some of those things are that I think are no regrets

things are that I think are no regrets investments that will help both our

investments that will help both our human beings and our agents. So the in

human beings and our agents. So the in general one of the framings that I think

general one of the framings that I think about here is things that are inputs to

about here is things that are inputs to the agents things around the agents that

the agents things around the agents that help them be more effective. And one of

help them be more effective. And one of the biggest one is the development

the biggest one is the development environment. What are the tools that you

environment. What are the tools that you use to build your code? What package

use to build your code? What package manager do you use? What llinters do you

manager do you use? What llinters do you run? Those sorts of things. You want to

run? Those sorts of things. You want to use the industry standard tools in the

use the industry standard tools in the same way the industry uses them and

same way the industry uses them and ideally in the same way the outside

ideally in the same way the outside world uses them because that's what's in

world uses them because that's what's in the training set. And look, yes, you can

the training set. And look, yes, you can write instruction files and you can try

write instruction files and you can try your best to try to fight the training

your best to try to fight the training set and make it do something unnatural

set and make it do something unnatural and unholy with some crazy amalgamation

and unholy with some crazy amalgamation that or modification that you've made of

that or modification that you've made of those developer tools. Like you might be

those developer tools. Like you might be you invented your own package manager.

you invented your own package manager. You probably should not do that. you

You probably should not do that. you probably should undo that and try to go

probably should undo that and try to go back to the way the outside world does

back to the way the outside world does software development because then you

software development because then you are not fighting the training set. Um,

are not fighting the training set. Um, and also it means it means things like

and also it means it means things like you can't use obscure programming

you can't use obscure programming languages anymore. Look, I'm a

languages anymore. Look, I'm a programming language nerd. I love those

programming language nerd. I love those things. I do not use them anymore in my

things. I do not use them anymore in my day-to-day agentic software development

day-to-day agentic software development work. as an enthusiast, I do come

work. as an enthusiast, I do come sometimes go and I code on, you know,

sometimes go and I code on, you know, frontline uh software engineering

frontline uh software engineering languages, but not in my like real work

languages, but not in my like real work anymore.

anymore. So, what people ask me sometimes, does

So, what people ask me sometimes, does that mean like we're never going to ever

that mean like we're never going to ever have any new tools again because we're

have any new tools again because we're always going to be dependent on the

always going to be dependent on the tools that the model already knows?

tools that the model already knows? Probably not because like I said,

Probably not because like I said, there's still going to be enthusiasts.

there's still going to be enthusiasts. And also, but like I would like to make

And also, but like I would like to make a point. The thing that I'm talking

a point. The thing that I'm talking about has always been a real problem.

about has always been a real problem. Like there's always some developer at

Like there's always some developer at the company has always come up to you

the company has always come up to you and be like, "Can I use this technology

and be like, "Can I use this technology that came out last week and has never

that came out last week and has never been vetted in an enterprise to run my

been vetted in an enterprise to run my like 100,000 queries per second service

like 100,000 queries per second service that serves a billion users?" And I'm

that serves a billion users?" And I'm like, "No, you can't do that now and you

like, "No, you can't do that now and you can't do that yesterday. It's still the

can't do that yesterday. It's still the same."

same." Uh, another one is

Uh, another one is in order to take action today, agents

in order to take action today, agents need either a CLI or an API to take that

need either a CLI or an API to take that action. Yes, there's computer use. Yes,

action. Yes, there's computer use. Yes, you can make them write playright and

you can make them write playright and orchestrate a browser. But why? Like if

orchestrate a browser. But why? Like if you could have a CLI that the agent can

you could have a CLI that the agent can just execute natively in its normal

just execute natively in its normal format that it understands the most

format that it understands the most natively, which is text interaction, why

natively, which is text interaction, why why would you choose to do something

why would you choose to do something else, especially in an area where

else, especially in an area where accuracy matters dramatically and where

accuracy matters dramatically and where that accuracy dramatically influences

that accuracy dramatically influences the effectiveness of the agent?

the effectiveness of the agent? One of the most important things that

One of the most important things that you can invest in is validation. So any

you can invest in is validation. So any kind of objective deterministic

kind of objective deterministic validation that you give an agent will

validation that you give an agent will increase its capabilities. So yes,

increase its capabilities. So yes, sometimes you can create this with the

sometimes you can create this with the agent. I'm going to talk about that in a

agent. I'm going to talk about that in a second. But it doesn't really matter how

second. But it doesn't really matter how you get it or where you get it from. You

you get it or where you get it from. You just need to think about how do I have

just need to think about how do I have high quality validation that produces

high quality validation that produces very clear error messages. This is the

very clear error messages. This is the same thing you always wanted by the way

same thing you always wanted by the way in your tests and your llinters, right?

in your tests and your llinters, right? But it's even more important for the

But it's even more important for the agents because the agents cannot divine

agents because the agents cannot divine what you mean by 500 internal error with

what you mean by 500 internal error with no other message, right? Like they need

no other message, right? Like they need a way to actually understand what the

a way to actually understand what the problem was and what they should do

problem was and what they should do about it.

about it. However, there is a problem here. So,

However, there is a problem here. So, you know, you think, okay, I'll just get

you know, you think, okay, I'll just get the agent to do it. They'll write my

the agent to do it. They'll write my tests and then I'll be fine. But have

tests and then I'll be fine. But have you ever asked an agent to write a test

you ever asked an agent to write a test on a completely untestable codebase?

on a completely untestable codebase? They do kind of what it's like is

They do kind of what it's like is happening on the screen here. They will

happening on the screen here. They will write a test that says, "Hey boss, I

write a test that says, "Hey boss, I pushed the button and the button pushed

pushed the button and the button pushed successfully. Test passed."

successfully. Test passed." Um, like so there is a sort of a a

Um, like so there is a sort of a a larger problem that a lot of enterprises

larger problem that a lot of enterprises have in particular, which is there's a

have in particular, which is there's a lot of legacy code bases that either

lot of legacy code bases that either were not designed with testing in mind

were not designed with testing in mind or were not designed with like high

or were not designed with like high quality testing in mind. like maybe they

quality testing in mind. like maybe they just have like some very high level

just have like some very high level endto-end tests and they don't have like

endto-end tests and they don't have like great unit tests that the agent can

great unit tests that the agent can actually run iteratively in a loop and

actually run iteratively in a loop and that will produce actionable and useful

that will produce actionable and useful errors.

errors. So another thing that you can invest in

So another thing that you can invest in that will can be perennially valuable

that will can be perennially valuable both to humans and to agents is

both to humans and to agents is structure of your systems and structure

structure of your systems and structure of your code bases. Agents work better

of your code bases. Agents work better on better structured code bases. And for

on better structured code bases. And for those of you who have never worked in a

those of you who have never worked in a large enterprise and seen very old

large enterprise and seen very old legacy codebases, you might not be

legacy codebases, you might not be familiar with what I'm talking about.

familiar with what I'm talking about. But for those who have, you know that

But for those who have, you know that there are code bases that no human being

there are code bases that no human being could reason about in any kind of

could reason about in any kind of successful way because the information

successful way because the information necessary to reason about that codebase

necessary to reason about that codebase isn't in the codebase and the structure

isn't in the codebase and the structure of the codebase makes the codebase

of the codebase makes the codebase impossible to reason about looking at

impossible to reason about looking at it. Yes, you the agents can do the same

it. Yes, you the agents can do the same thing human beings do in that case,

thing human beings do in that case, which is sort of go through an iterative

which is sort of go through an iterative process of trying to run the thing and

process of trying to run the thing and see what breaks, but that decreases the

see what breaks, but that decreases the capability of the agent so much compared

capability of the agent so much compared to just it having the ability to just

to just it having the ability to just look at the code and reason about it the

look at the code and reason about it the exact same way the human capability is

exact same way the human capability is decreased. And of course, like I said,

decreased. And of course, like I said, that all has to lead up to being

that all has to lead up to being testable. If the only thing I can do

testable. If the only thing I can do with your codebase is push a button and

with your codebase is push a button and know if the button pushed successfully

know if the button pushed successfully and not see the explosion behind it,

and not see the explosion behind it, like if if if there's no way to get that

like if if if there's no way to get that information out of the codebase from the

information out of the codebase from the test, then the agent's not going to be

test, then the agent's not going to be able to do that either unless it it goes

able to do that either unless it it goes and refactors it or you go and refactor

and refactors it or you go and refactor it first.

And you know, there's a lot of talk about documentation. There's always been

about documentation. There's always been a lot of talk about documentation in the

a lot of talk about documentation in the field of developer experience, in the

field of developer experience, in the field of improving things. And there's

field of improving things. And there's people go back and forth about it.

people go back and forth about it. Engineers hate writing documentation.

Engineers hate writing documentation. Uh, and the value of it is often debated

Uh, and the value of it is often debated like what kind of documentation you want

like what kind of documentation you want or don't want, do or don't want. But

or don't want, do or don't want. But here's the thing. The agent, let's just

here's the thing. The agent, let's just take this in the context of the agent.

take this in the context of the agent. The agent cannot read your mind. It did

The agent cannot read your mind. It did not attend your verbal meeting that had

not attend your verbal meeting that had no transcript.

no transcript. Okay? Now there are many companies in

Okay? Now there are many companies in the world that depend on that sort of

the world that depend on that sort of tribal knowledge to understand what the

tribal knowledge to understand what the requirements are for the system. Why the

requirements are for the system. Why the code is being written? What is the

code is being written? What is the specification that we're we're writing

specification that we're we're writing towards if things are not written down?

towards if things are not written down? And that sounds like blatantly obvious

And that sounds like blatantly obvious but like there are a lot of things that

but like there are a lot of things that are fundamentally written like if the

are fundamentally written like if the code is comprehensible like all the

code is comprehensible like all the other steps are in that we've gotten to

other steps are in that we've gotten to so far. you don't need to reexplain

so far. you don't need to reexplain what's in the code. So, there's actually

what's in the code. So, there's actually probably a whole class of documentation

probably a whole class of documentation that we may not need anymore where you

that we may not need anymore where you can just ask the agent like, "Hey, tell

can just ask the agent like, "Hey, tell me about the structure of this codebase

me about the structure of this codebase overall and it'll just do it, but it

overall and it'll just do it, but it won't be able to ever know why you wrote

won't be able to ever know why you wrote it unless that's written down

it unless that's written down somewhere." or things that happen

somewhere." or things that happen outside of the program like what is the

outside of the program like what is the shape of the data that comes in from

shape of the data that comes in from this URL parameter as an example like if

this URL parameter as an example like if you have already written the code

you have already written the code there's a validator and that does

there's a validator and that does explain it but if you haven't written

explain it but if you haven't written the code yet it doesn't know what comes

the code yet it doesn't know what comes in from the outside world so basically

in from the outside world so basically anything that can't be in the code or

anything that can't be in the code or isn't in the code needs to somehow be

isn't in the code needs to somehow be written somewhere that the agent can

written somewhere that the agent can access

now we've covered sort of a few technical aspects of things that we need

technical aspects of things that we need to improve improve. But there's a point

to improve improve. But there's a point about software development in general

about software development in general and it that's always been true and one

and it that's always been true and one of and that's you've heard this we spend

of and that's you've heard this we spend more time reading code than writing it.

more time reading code than writing it. The difference today is that writing

The difference today is that writing code has become reading code. So even

code has become reading code. So even now when we are writing code we spend

now when we are writing code we spend more time reading it than actually

more time reading it than actually typing things into the terminal.

typing things into the terminal. And what that means is

And what that means is every software engineer becomes a code

every software engineer becomes a code reviewer as basically their primary job.

reviewer as basically their primary job. In addition, as anybody who has worked

In addition, as anybody who has worked in a in a shop that has deeply adopted

in a in a shop that has deeply adopted agentic coding, we generate far more PRs

agentic coding, we generate far more PRs than ever before, which has led to code

than ever before, which has led to code review itself, the like the big scale

review itself, the like the big scale code review being a bottleneck.

code review being a bottleneck. So one of the things that we need to do

So one of the things that we need to do is we need to figure out how to improve

is we need to figure out how to improve code review velocity both for the big

code review velocity both for the big code reviews that we like where we you

code reviews that we like where we you send a PR and somebody like you know

send a PR and somebody like you know writes comments on it and you go back

writes comments on it and you go back and forth and also just the iterative

and forth and also just the iterative process of working with the agent. How

process of working with the agent. How do you speed up a person's ability to

do you speed up a person's ability to look at code and know what to do with

look at code and know what to do with it? So

it? So the principles are pretty similar for

the principles are pretty similar for both of those, but the exact way you

both of those, but the exact way you implement them is a little bit

implement them is a little bit different. What you care about the most

different. What you care about the most is making each individual response fast.

is making each individual response fast. You don't actually want to shorten the

You don't actually want to shorten the whole timeline of code review generally

whole timeline of code review generally because code review is a quality

because code review is a quality process. It's the same thing with agent

process. It's the same thing with agent iteration. Like what you want with agent

iteration. Like what you want with agent iteration is you want to get to the

iteration is you want to get to the place where you've got the right result.

place where you've got the right result. You don't want to like just be like,

You don't want to like just be like, "Well, I guess I've hit my five minute

"Well, I guess I've hit my five minute time limit, so I'm going to check in

time limit, so I'm going to check in this garbage that doesn't work, right?

this garbage that doesn't work, right? You you But what you do want is you want

You you But what you do want is you want the iterations to be fast." Not just the

the iterations to be fast." Not just the agents iterations, but the human

agents iterations, but the human response time to the agent to be fast.

response time to the agent to be fast. And in order to do that, they have to

And in order to do that, they have to get very good at doing code reviews or

get very good at doing code reviews or knowing what the next step is to do with

knowing what the next step is to do with a lot of code. At the big code of review

a lot of code. At the big code of review level, one thing that I see that I think

level, one thing that I see that I think is sort of a social disease that has

is sort of a social disease that has infected a lot of companies is when

infected a lot of companies is when people want PR reviews, they just send a

people want PR reviews, they just send a Slack message to a team channel and say,

Slack message to a team channel and say, "Hey, could one of the 10 of you review

"Hey, could one of the 10 of you review my PR?" And what and you know what that

my PR?" And what and you know what that means is one person does all those

means is one person does all those reviews. That's what really happens.

reviews. That's what really happens. There there's like when you look at the

There there's like when you look at the code of review stats of teams like that

code of review stats of teams like that there's one person who has like 50 and

there's one person who has like 50 and the other person have like three two

the other person have like three two five seven because there's just one

five seven because there's just one person is like super responsive. So but

person is like super responsive. So but what that means is if you start

what that means is if you start generating dramatically more PRs that

generating dramatically more PRs that one person cannot handle the load you

one person cannot handle the load you have to distribute it and really the

have to distribute it and really the only way to distribute it is to assign

only way to distribute it is to assign it to specific individuals have a system

it to specific individuals have a system that distributes it among those

that distributes it among those individuals and then set SLOs's that

individuals and then set SLOs's that have some mechanism of enforcement. And

have some mechanism of enforcement. And another thing is like that GitHub, for

another thing is like that GitHub, for example, is not very good at today is

example, is not very good at today is making it clear whose turn it is to take

making it clear whose turn it is to take action. Like I left a bunch of comments

action. Like I left a bunch of comments on your PR. Uh,

on your PR. Uh, you now responded to one of my comments.

you now responded to one of my comments. Should I come back again now? Oh, wait.

Should I come back again now? Oh, wait. No, no, now you pushed a no change.

No, no, now you pushed a no change. Should I come back now? Okay. No, no,

Should I come back now? Okay. No, no, now you've responded to more comments.

now you've responded to more comments. What I rely on mostly is people telling

What I rely on mostly is people telling me in Slack, I'm ready for you to review

me in Slack, I'm ready for you to review my PR again. Which is a terrible and

my PR again. Which is a terrible and inefficient system.

And another thing you got to think about a lot is the quality of code reviews.

a lot is the quality of code reviews. And I mean this once again both for the

And I mean this once again both for the individual developers doing it with the

individual developers doing it with the agent and the people doing it in the

agent and the people doing it in the code review pipeline.

code review pipeline. You have to keep holding a high bar. I

You have to keep holding a high bar. I know that people have other opinions

know that people have other opinions about this. And yes, depending on the

about this. And yes, depending on the timeline that you expect your software

timeline that you expect your software to live, you might not need as much

to live, you might not need as much software design. Like look, it's

software design. Like look, it's software design is not the goal of

software design is not the goal of perfection. It's a goal of good enough

perfection. It's a goal of good enough and better than you had before, right?

and better than you had before, right? But sometimes good enough for a very

But sometimes good enough for a very long lived system is a much higher bar

long lived system is a much higher bar than people expect it to be.

than people expect it to be. And if you don't have a process that is

And if you don't have a process that is capable of rejecting things that

capable of rejecting things that shouldn't go in, you will very likely

shouldn't go in, you will very likely actually see decreasing productivity

actually see decreasing productivity gains from your agentic coders over time

gains from your agentic coders over time as the system becomes harder and harder

as the system becomes harder and harder for both the agent and the human to work

for both the agent and the human to work with.

with. The problem is this. In many companies,

The problem is this. In many companies, we have the people who are the best code

we have the people who are the best code reviewers not doing any of their time

reviewers not doing any of their time doing code review. They are spending all

doing code review. They are spending all their times in meetings doing highle

their times in meetings doing highle reviews doing strategy. And so we aren't

reviews doing strategy. And so we aren't teaching junior engineers to be better

teaching junior engineers to be better software engineers and to be better code

software engineers and to be better code reviewers. So we have to have some

reviewers. So we have to have some mechanism that allows the people who are

mechanism that allows the people who are the best at this to do this through

the best at this to do this through apprenticeship. If somebody else has a

apprenticeship. If somebody else has a better way of doing this than doing code

better way of doing this than doing code reviews with people, I would love to

reviews with people, I would love to know because in the 20 plus years that

know because in the 20 plus years that I've been doing this, I have never found

I've been doing this, I have never found a way to teach people to be good code

a way to teach people to be good code reviewers other than doing good code

reviewers other than doing good code reviews with them.

Now, if you do if you don't do all the things that I talked about, what is the

things that I talked about, what is the danger?

danger? The danger is you take a bad codebase

The danger is you take a bad codebase with a confusing environment. You give

with a confusing environment. You give it to an agent or a developer working

it to an agent or a developer working with that agent. The agent produces

with that agent. The agent produces relative levels of nonsense

relative levels of nonsense and the developer experiences more or

and the developer experiences more or less frustration. And depending on how

less frustration. And depending on how persistent they are, at some point they

persistent they are, at some point they give up and they just send their PR off

give up and they just send their PR off for review. They're like, I think it

for review. They're like, I think it works. Right? And then if you have

works. Right? And then if you have low-quality code reviews or code

low-quality code reviews or code reviewers who are overwhelmed, they go,

reviewers who are overwhelmed, they go, I don't know. I don't know what to do

I don't know. I don't know what to do with this. I guess it's okay. And you

with this. I guess it's okay. And you just have lots and lots and lots of bad

just have lots and lots and lots of bad rubber stamp PRs that keep going in and

rubber stamp PRs that keep going in and you get into a vicious cycle where what

you get into a vicious cycle where what I expect to occur and what my prediction

I expect to occur and what my prediction is is if you are in this cycle, uh, your

is is if you are in this cycle, uh, your agent productivity will decrease

agent productivity will decrease consistently through the year. On the

consistently through the year. On the other hand, we live in an amazing time

other hand, we live in an amazing time where if we increase the ability of the

where if we increase the ability of the agents to help us be productive, then

agents to help us be productive, then they can actually help us be more

they can actually help us be more productive and we actually get into a

productive and we actually get into a virtuous cycle instead where we actually

virtuous cycle instead where we actually accelerate more and more and more and

accelerate more and more and more and more. And yes, some of these things

more. And yes, some of these things sound like very expensive fundamental

sound like very expensive fundamental investments, but I think now is the time

investments, but I think now is the time to make them because now is one of the

to make them because now is one of the times you're going to have the biggest

times you're going to have the biggest differentiation in your business in

differentiation in your business in terms of software engineering velocity

terms of software engineering velocity if you can do these things versus other

if you can do these things versus other in industries or companies that can't

in industries or companies that can't structurally do these things.

structurally do these things. So to summarize, here's a few things.

So to summarize, here's a few things. Not literally everything in the world

Not literally everything in the world you can do that's no regrets, but you

you can do that's no regrets, but you can standardize your development

can standardize your development environments. You can make CLIs or APIs

environments. You can make CLIs or APIs for anything that needs a CLI or API.

for anything that needs a CLI or API. Those CLIs or APIs have to run at

Those CLIs or APIs have to run at development time. By the way, too,

development time. By the way, too, another big thing that people miss is

another big thing that people miss is sometimes they have things that only run

sometimes they have things that only run in CI. If you're CI takes 15, 20 minutes

in CI. If you're CI takes 15, 20 minutes and you know, agents are like way more

and you know, agents are like way more persistent and patient than a human

persistent and patient than a human being is. So like, but they're also more

being is. So like, but they're also more errorprone than human beings are. So

errorprone than human beings are. So like they will run the thing and then

like they will run the thing and then run your test and then run a thing and

run your test and then run a thing and then run your test and then run a thing

then run your test and then run a thing and then run your test and they'll do it

and then run your test and they'll do it like five times in a row. If that takes

like five times in a row. If that takes 20 minutes, your developers productivity

20 minutes, your developers productivity is going to be shot to heck. Whereas, if

is going to be shot to heck. Whereas, if it takes 30 seconds, you're going to

it takes 30 seconds, you're going to have a they're going to have a much

have a they're going to have a much better experience. You can improve

better experience. You can improve validation. You can refactor for both

validation. You can refactor for both testability and the ability to reason

testability and the ability to reason about the codebase. You can make sure

about the codebase. You can make sure all the external context and your

all the external context and your intentions, the why is written down. You

intentions, the why is written down. You can make every response during code

can make every response during code review faster. And you can raise the bar

review faster. And you can raise the bar on code review quality. But if you look

on code review quality. But if you look at all of these things, there's one

at all of these things, there's one lesson and one principle that we take

lesson and one principle that we take away from all these things that covers

away from all these things that covers even more things than this. And it's

even more things than this. And it's basically that what's good for humans is

basically that what's good for humans is good for AI. And the great thing about

good for AI. And the great thing about this, one second. The great thing about

this, one second. The great thing about this is that it means that when we

this is that it means that when we invest in this thing, we will help our

invest in this thing, we will help our developers no matter what. Even if

developers no matter what. Even if sometimes we miss on helping the agent,

sometimes we miss on helping the agent, we are guaranteed to help the humans.

we are guaranteed to help the humans. Thank you very much. [applause]

>> Ladies and gentlemen, please welcome back to the stage, Alex Lieberman.

back to the stage, Alex Lieberman. [music] Let's give it up again for Max.

[music] Let's give it up again for Max. [applause]

[applause] We have one more break now and then the

We have one more break now and then the last block of sessions where we'll have

last block of sessions where we'll have speakers talking about AI. uh

speakers talking about AI. uh consultancies uh paying engineers like

consultancies uh paying engineers like salespeople and how to make your company

salespeople and how to make your company AI native. So be back here at 4 o'clock

AI native. So be back here at 4 o'clock or if you're watching the live stream be

or if you're watching the live stream be back online at four o'clock and we'll

back online at four o'clock and we'll see you then. Thanks everyone.

Heat. [music]

>> Heat. Heat. [music]

[music] >> Heat. Heat.

>> Heat [music]

[music] >> Heat.

>> Heat. Heat.

>> Down no [music]

no [music] down.

Heat up [music]

>> Heat. Heat.

>> Heat. Heat. [music]

Heat >> [music]

>> up here. [music]

Heat. [music]

[music] >> Heat. Heat.

[music] Heat.

up [music]

Heat. [music]

>> [music] >> Heat. Heat.

Heat. [music]

up Heat.

>> [music] >> Heat.

>> Heat. Heat.

Heat. Heat. [music]

>> Heat up [music] here.

Heat [music]

up here.

>> Heat. Heat. [music]

[music] Heat. Heat.

Heat [music]

Heat. Heat. [music]

[music] Heat

[music] >> up here.

>> [music] >> Heat. Heat.

[music] Heat.

How we doing? We are officially 7 hours in. How's the energy level? 7 hours in.

in. How's the energy level? 7 hours in. Let's hear it. There we go. There we go.

Let's hear it. There we go. There we go. So this is our last block of sessions

So this is our last block of sessions before you all get to enjoy the graphite

before you all get to enjoy the graphite afterparty. More coming on that in a

afterparty. More coming on that in a few. And for this block, we're going to

few. And for this block, we're going to cover a lot. AI consulting in practice,

cover a lot. AI consulting in practice, paying engineers like salespeople, as I

paying engineers like salespeople, as I mentioned earlier, leadership in AI

mentioned earlier, leadership in AI assisted engineering, and how to build

assisted engineering, and how to build an AI native company. You guys ready for

an AI native company. You guys ready for this?

this? >> Oh, come on. Let's go. So [applause]

>> Oh, come on. Let's go. So [applause] with that, please join me in welcoming

with that, please join me in welcoming our next speaker and one of last year's

our next speaker and one of last year's MC's to talk about helping organizations

MC's to talk about helping organizations transform with AI. Let's hear it for

transform with AI. Let's hear it for NLW.

[music] All right. Great to be back here, guys.

All right. Great to be back here, guys. Uh for those of you who are here in

Uh for those of you who are here in February, I had the privilege of MCing.

February, I had the privilege of MCing. Uh and today I'm excited to talk about

Uh and today I'm excited to talk about something a little bit different. So

something a little bit different. So right now uh there's been the last

right now uh there's been the last couple of months have been an

couple of months have been an interesting time in AI. There's been a

interesting time in AI. There's been a sort of surge in the air uh the

sort of surge in the air uh the narrative of an AI bubble. A lot of it

narrative of an AI bubble. A lot of it driven by dubious studies uh like the

driven by dubious studies uh like the MIT report. And so what I wanted to do

MIT report. And so what I wanted to do today is get into not so much the

today is get into not so much the practice of consulting and transforming

practice of consulting and transforming but what organizations are actually

but what organizations are actually finding value in right now. So for those

finding value in right now. So for those of you who don't know me, uh there's

of you who don't know me, uh there's kind of two contexts I bring to this

kind of two contexts I bring to this conversation. The first is as the host

conversation. The first is as the host of the AI daily brief which is a daily

of the AI daily brief which is a daily uh news analysis podcast about AI. The

uh news analysis podcast about AI. The second is as the CEO of super

second is as the CEO of super intelligent which is an AI planning

intelligent which is an AI planning platform. And so the different

platform. And so the different perspectives are sort of very high level

perspectives are sort of very high level macro thinking about the news that's

macro thinking about the news that's happening and then a much more kind of

happening and then a much more kind of ground level view where we're spending a

ground level view where we're spending a ton of time interviewing executives

ton of time interviewing executives about what's going on inside their

about what's going on inside their organizations. And what we're going to

organizations. And what we're going to talk about is sort of one kind of

talk about is sort of one kind of briefly in the first part just the

briefly in the first part just the status of enterprise adoption uh as it

status of enterprise adoption uh as it currently stands. And two, um, and the

currently stands. And two, um, and the more interesting part is we've been live

more interesting part is we've been live with a study in in the market for about

with a study in in the market for about a month now collecting self-reported

a month now collecting self-reported information about ROI around different

information about ROI around different use cases. And this will be the first

use cases. And this will be the first time uh, this week was the first time I

time uh, this week was the first time I did some analysis on it. And so I'm

did some analysis on it. And so I'm going to share what people have uh, what

going to share what people have uh, what people have told us around the first

people have told us around the first kind of 2500 or so use cases that

kind of 2500 or so use cases that they've shared. Um, so it should be

they've shared. Um, so it should be pretty pretty interesting stuff. talking

pretty pretty interesting stuff. talking about kind of enterprise AI adoption

about kind of enterprise AI adoption first. I'll go through this pretty

first. I'll go through this pretty quickly because it's um pretty

quickly because it's um pretty well-known stuff. Uh the short of it is

well-known stuff. Uh the short of it is enterprises are adopting AI uh in in a

enterprises are adopting AI uh in in a growing fashion. Um pretty much everyone

growing fashion. Um pretty much everyone is using it at least a little bit. Uh

is using it at least a little bit. Uh and increasingly they're using it a lot.

and increasingly they're using it a lot. Uh this year I will need to tell none of

Uh this year I will need to tell none of you that there is a major inflection

you that there is a major inflection around um specifically adoption in the

around um specifically adoption in the uh coding and software engineering.

uh coding and software engineering. Right? You saw a huge huge uptick in

Right? You saw a huge huge uptick in this. Um there's a lot that's

this. Um there's a lot that's interesting about that from an

interesting about that from an enterprise perspective because it wasn't

enterprise perspective because it wasn't just with the software engineering

just with the software engineering organizations. Other parts of the

organizations. Other parts of the organization are also now thinking about

organization are also now thinking about how they can communicate with code,

how they can communicate with code, build things with code. Uh but that's a

build things with code. Uh but that's a huge huge theme of this year [snorts]

huge huge theme of this year [snorts] coming into 2025. One of the big sort of

coming into 2025. One of the big sort of thoughts that many people had was that

thoughts that many people had was that this would be the year of agents inside

this would be the year of agents inside the enterprise, right? That big chunks

the enterprise, right? That big chunks of work would get automated away. And on

of work would get automated away. And on the one hand, I think it's pretty clear

the one hand, I think it's pretty clear that we didn't see some sort of mass

that we didn't see some sort of mass shift towards automation uh at large

shift towards automation uh at large across different functions in the

across different functions in the organization. But when you dig into the

organization. But when you dig into the numbers, there has been actually pretty

numbers, there has been actually pretty significant uh shifts in the patterns of

significant uh shifts in the patterns of of agent adoption. So this is from

of agent adoption. So this is from KPMG's quarterly pulse survey. And it's

KPMG's quarterly pulse survey. And it's a measure of how many enterprises that

a measure of how many enterprises that are a part of their survey, which is all

are a part of their survey, which is all companies over a billion dollars in

companies over a billion dollars in revenue, have uh actual sort of full

revenue, have uh actual sort of full production agents in deployment. So this

production agents in deployment. So this isn't pilots, this isn't experiments.

isn't pilots, this isn't experiments. This is where they consider uh some

This is where they consider uh some agent that's actually doing doing kind

agent that's actually doing doing kind of work in a in a full way. And it's

of work in a in a full way. And it's jumped from 11% in Q1 of this year to

jumped from 11% in Q1 of this year to 42% in their most recent study for Q3.

42% in their most recent study for Q3. So you actually are seeing pretty

So you actually are seeing pretty meaningful uptake of of agents inside

meaningful uptake of of agents inside the enterprise. In fact, I would argue

the enterprise. In fact, I would argue based on our conversations that people

based on our conversations that people have that it's moved more quickly

have that it's moved more quickly through the pilot or experimental phase

through the pilot or experimental phase than people might have thought. um so

than people might have thought. um so much so that you're actually seeing now

much so that you're actually seeing now a big shift in the emphasis around kind

a big shift in the emphasis around kind of the human side of agents and how

of the human side of agents and how humans are going to interact with agents

humans are going to interact with agents and it's involving a shift in upskilling

and it's involving a shift in upskilling and and uh and enablement work. Um

and and uh and enablement work. Um you're seeing a decrease in the sort of

you're seeing a decrease in the sort of resistance to agents as people start to

resistance to agents as people start to actually dig in with them. You're seeing

actually dig in with them. You're seeing more experiments like these sandboxes

more experiments like these sandboxes where people can interact with agents.

where people can interact with agents. So this is a big theme even if it wasn't

So this is a big theme even if it wasn't necessarily the dominant theme that some

necessarily the dominant theme that some thought it might be coming into this

thought it might be coming into this year. At the same time, it is absolutely

year. At the same time, it is absolutely the case that many many if not most

the case that many many if not most enterprises are broadly speaking stuck

enterprises are broadly speaking stuck inside sort of pilot and experimental

inside sort of pilot and experimental phases. There is a lot of challenge

phases. There is a lot of challenge around moving from some of those first

around moving from some of those first exciting experiments to something that's

exciting experiments to something that's more scaled. Um, so this is from

more scaled. Um, so this is from McKenzie state of AI study which came

McKenzie state of AI study which came out I think a couple weeks ago now and

out I think a couple weeks ago now and you can see only 7% of the organizations

you can see only 7% of the organizations that they talk to claim or sort of see

that they talk to claim or sort of see themselves as as fully at scale with

themselves as as fully at scale with with AI and agents. and it's something

with AI and agents. and it's something like 62% are either still experimenting

like 62% are either still experimenting or piloting.

or piloting. Interestingly, big organizations are on

Interestingly, big organizations are on in general a little bit ahead in terms

in general a little bit ahead in terms of uh the organizations that are scaling

of uh the organizations that are scaling as compared to small organizations. This

as compared to small organizations. This has been a a thing that we've noticed

has been a a thing that we've noticed kind of throughout the trajectory of uh

kind of throughout the trajectory of uh of AI um adoption over the last couple

of AI um adoption over the last couple of years that you would think that

of years that you would think that perhaps smaller, more nimble companies

perhaps smaller, more nimble companies uh would be more kind of quick to adopt

uh would be more kind of quick to adopt these things, but in fact, it's often

these things, but in fact, it's often been the opposite with the biggest

been the opposite with the biggest organizations making the biggest

organizations making the biggest efforts. You can also see from the chart

efforts. You can also see from the chart on the bottom that there's very sort of

on the bottom that there's very sort of jagged patterns of adoption, right?

jagged patterns of adoption, right? you're starting to see uh from you know

you're starting to see uh from you know last year if you looked there's very

last year if you looked there's very similar kind of rates of experimentation

similar kind of rates of experimentation across lots of different departments

across lots of different departments you're starting to see some pretty big

you're starting to see some pretty big breakouts now uh with for example you

breakouts now uh with for example you know IT operations kind of jumping out

know IT operations kind of jumping out ahead of other functions

ahead of other functions I won't spend too much time on this sort

I won't spend too much time on this sort of high performer piece but I think the

of high performer piece but I think the thing to note because it comes back in

thing to note because it comes back in and in and some of the stuff that we

and in and some of the stuff that we found with our ROI study is that you are

found with our ROI study is that you are also starting to see a pretty

also starting to see a pretty significant bifurcation between leaders

significant bifurcation between leaders and laggers when it comes to AI

and laggers when it comes to AI adoption. And one of the things that

adoption. And one of the things that tends to distinguish the companies that

tends to distinguish the companies that are leading is that they are just doing

are leading is that they are just doing more of it and they are thinking more

more of it and they are thinking more comprehensively and systematically about

comprehensively and systematically about AI and agent adoption. So they are not

AI and agent adoption. So they are not just sort of doing spot experiments.

just sort of doing spot experiments. They're thinking about their strategy as

They're thinking about their strategy as a whole. They're doing multiple things

a whole. They're doing multiple things at once. And importantly, they're not

at once. And importantly, they're not just thinking about sort of the very

just thinking about sort of the very kind of first tier time savings or

kind of first tier time savings or productivity types of use cases. They're

productivity types of use cases. They're also thinking about how do we grow

also thinking about how do we grow revenue? How do we create new

revenue? How do we create new capabilities? How do we create new

capabilities? How do we create new product lines?

product lines? Overall, it's very clear that despite

Overall, it's very clear that despite what is sort of, you know, the the the

what is sort of, you know, the the the concerns in the media that spend is

concerns in the media that spend is going to do nothing but increase on

going to do nothing but increase on this. Um, so the bottom is the KPMG

this. Um, so the bottom is the KPMG pulse survey again, and this is a an

pulse survey again, and this is a an estimation of the amount of money that

estimation of the amount of money that these organizations intend to spend on

these organizations intend to spend on AI over the next 12 months. The

AI over the next 12 months. The beginning of the year was 114, which by

beginning of the year was 114, which by the way was up from like 88 in Q4 of

the way was up from like 88 in Q4 of last year. It's now up to in their last

last year. It's now up to in their last study 130 million is what they expect to

study 130 million is what they expect to spend uh in the in the year ahead which

spend uh in the in the year ahead which obviously the the total magnitude

obviously the the total magnitude doesn't matter as much as the change. Um

doesn't matter as much as the change. Um you also see the green charts are from

you also see the green charts are from Deote and you can see 90% plus of

Deote and you can see 90% plus of organizations intend to increase their

organizations intend to increase their spend uh on AI in the next 12 months and

spend uh on AI in the next 12 months and as part of that I think that you're

as part of that I think that you're going to see a much more determined

going to see a much more determined conversation around impact and ROI uh

conversation around impact and ROI uh which is a particularly thorny topic but

which is a particularly thorny topic but interestingly

interestingly there has been an increase in optimism

there has been an increase in optimism over the course of this year around the

over the course of this year around the realization of AI. So this is from a

realization of AI. So this is from a different KPMG study, their annual CEO

different KPMG study, their annual CEO survey, which interviews tons and tons

survey, which interviews tons and tons of CEOs. And if you look at the 2024

of CEOs. And if you look at the 2024 numbers, 63% of those pled thought that

numbers, 63% of those pled thought that it would take between 3 and 5 years to

it would take between 3 and 5 years to realize ROI from their AI investments.

realize ROI from their AI investments. 20% said 1 to three and 16% said more

20% said 1 to three and 16% said more than five. This year in that same

than five. This year in that same survey, the number that said 1 to 3

survey, the number that said 1 to 3 years had gone up to 67%. There were now

years had gone up to 67%. There were now 19% who said 6 months to one year. uh

19% who said 6 months to one year. uh and 3 to 5 years was down to just 12%.

and 3 to 5 years was down to just 12%. So huge huge kind of pull forward of

So huge huge kind of pull forward of expectations of of ROI realization. The

expectations of of ROI realization. The challenge is that ROI is really tough.

challenge is that ROI is really tough. So this is back to the pulse survey. 78%

So this is back to the pulse survey. 78% of those pled in that in that survey

of those pled in that in that survey said that they thought that ROI was

said that they thought that ROI was going to basically become a bigger

going to basically become a bigger consideration in the year to come. Uh

consideration in the year to come. Uh but also 78% said that traditional

but also 78% said that traditional impact metrics and measures were having

impact metrics and measures were having a very hard time keeping up with the

a very hard time keeping up with the with the new reality that we were living

with the new reality that we were living in. And this is something that I've

in. And this is something that I've heard constantly over and over from CIOS

heard constantly over and over from CIOS and other people who are in charge of

and other people who are in charge of these investments that the the the ways

these investments that the the the ways that we have measured impact of previous

that we have measured impact of previous technologies and just previous

technologies and just previous initiatives are kind of falling flat

initiatives are kind of falling flat with AI. And so that got us thinking

with AI. And so that got us thinking about the the the overall need that we

about the the the overall need that we have to just have more information. I'm

have to just have more information. I'm not even talking about good systematic

not even talking about good systematic information, just more information

information, just more information around what ROI looks like, what impact

around what ROI looks like, what impact looks like, and you know, I've got this

looks like, and you know, I've got this great podcast audience. They're super

great podcast audience. They're super engaged. And so, we just decided, screw

engaged. And so, we just decided, screw it. We're going to ask them, we're just

it. We're going to ask them, we're just going to ask them to report on what ROI

going to ask them to report on what ROI they're finding from their use cases.

they're finding from their use cases. So, this went up at the very end of

So, this went up at the very end of October. Uh like I said as of this

October. Uh like I said as of this morning or when I looked last looked

morning or when I looked last looked we've had over a thousand submissions uh

we've had over a thousand submissions uh a thousand individual organizations

a thousand individual organizations rather submit something like 3500 use

rather submit something like 3500 use cases and um this is uh some some of the

cases and um this is uh some some of the first observations that we had around um

first observations that we had around um kind of the first 2500.

kind of the first 2500. So the impact categories the way that we

So the impact categories the way that we divided things was into sort of eight

divided things was into sort of eight broad categories of impact um which will

broad categories of impact um which will all I think be very intuitive to you

all I think be very intuitive to you guys. time savings, increased output,

guys. time savings, increased output, improvement in quality, new

improvement in quality, new capabilities, improved decision- making,

capabilities, improved decision- making, cost savings, increased revenue, and

cost savings, increased revenue, and risk reduction. So, basically, it was

risk reduction. So, basically, it was trying to think of like kind of a a

trying to think of like kind of a a broad simple heristic for uh for for

broad simple heristic for uh for for kind of dividing or subdividing the

kind of dividing or subdividing the different the different ways that people

different the different ways that people are thinking about ROI. And TLDDR is

are thinking about ROI. And TLDDR is that people are finding uh ROI right

that people are finding uh ROI right now. Um, now again, the caveats are that

now. Um, now again, the caveats are that this is a highly infranchised audience.

this is a highly infranchised audience. they're listening to a daily AI podcast

they're listening to a daily AI podcast and they are voluntarily sharing this.

and they are voluntarily sharing this. So, I think that, you know, there's

So, I think that, you know, there's there's some caveing there, but you have

there's some caveing there, but you have 44.3% saying that they're seeing modest

44.3% saying that they're seeing modest ROI right now. And then you have another

ROI right now. And then you have another 37.6% seeing high ROI. For the purposes

37.6% seeing high ROI. For the purposes of a lot of these stats, high ROI will

of a lot of these stats, high ROI will be significant plus transformational. Uh

be significant plus transformational. Uh only 5% or so are seeing negative ROI.

only 5% or so are seeing negative ROI. And keep in mind, negative ROI doesn't

And keep in mind, negative ROI doesn't mean that they think programs are

mean that they think programs are failing. It just means they haven't

failing. It just means they haven't they've spent more than they've gained

they've spent more than they've gained uh in terms of how their their

uh in terms of how their their perception is. More [snorts] than that,

perception is. More [snorts] than that, expectations are absolutely skyhigh. 67%

expectations are absolutely skyhigh. 67% think over the next year they will see

think over the next year they will see uh increased and high growth in their

uh increased and high growth in their ROI. So we have really optimistic sense

ROI. So we have really optimistic sense from the ground view of where ROI is

from the ground view of where ROI is going to be in AI. Um you even have the

going to be in AI. Um you even have the teams that are currently experiencing

teams that are currently experiencing negative ROI. 53% say that they're going

negative ROI. 53% say that they're going to see high growth. So very very

to see high growth. So very very optimistic. Um as [snorts] you might

optimistic. Um as [snorts] you might imagine, time savings is the default.

imagine, time savings is the default. It's the starting point for so many

It's the starting point for so many organizations. It represents about 35%

organizations. It represents about 35% of the use cases. After that, increasing

of the use cases. After that, increasing output, quality improvement, basically

output, quality improvement, basically all those things that you would imagine

all those things that you would imagine around productivity are sort of like the

around productivity are sort of like the dominant categories when it comes to

dominant categories when it comes to these uh when it comes to these use

these uh when it comes to these use cases. When it comes to the specifics

cases. When it comes to the specifics around time savings, you see a real

around time savings, you see a real cluster between 1 and 10 hours,

cluster between 1 and 10 hours, especially right around 5 hours. And I

especially right around 5 hours. And I think this is interesting to call out

think this is interesting to call out because it's so obvious to all of us who

because it's so obvious to all of us who are inside building these things uh

are inside building these things uh whether you are a developer or an

whether you are a developer or an entrepreneur or just someone sort of in

entrepreneur or just someone sort of in and around it how the the vast breadth

and around it how the the vast breadth of opportunity that AI represents new

of opportunity that AI represents new capabilities things unimagined yet. It's

capabilities things unimagined yet. It's hard to or it's easy to forget that if

hard to or it's easy to forget that if you save 5 hours a week or 10 hours a

you save 5 hours a week or 10 hours a week you're talking about winning back 7

week you're talking about winning back 7 to 10 work weeks a year. Uh and that's

to 10 work weeks a year. Uh and that's very very powerful. And when it comes to

very very powerful. And when it comes to a lot of these enterprises, that is a

a lot of these enterprises, that is a very meaningful thing, even if it's not

very meaningful thing, even if it's not what they're ultimately in it for.

what they're ultimately in it for. Interestingly though, it's very clear

Interestingly though, it's very clear that the story, although it might be uh

that the story, although it might be uh have a concentration in time savings, is

have a concentration in time savings, is about much more than time savings. So

about much more than time savings. So this is the ROI distribution category uh

this is the ROI distribution category uh ROI distribution by organization size.

ROI distribution by organization size. And this starts to get really

And this starts to get really interesting where you can see that there

interesting where you can see that there are some differences in where different

are some differences in where different size organizations are focused. So for

size organizations are focused. So for example, the organization size between

example, the organization size between 200 and a,000 people has a higher

200 and a,000 people has a higher portion of their use cases concentrated

portion of their use cases concentrated in increasing output. Now we haven't

in increasing output. Now we haven't taken the time yet to really figure out

taken the time yet to really figure out exactly what this means or even

exactly what this means or even speculate on on what this means. But I

speculate on on what this means. But I think it's interesting that this is a

think it's interesting that this is a category of organization that has often

category of organization that has often reached a certain scale but is still

reached a certain scale but is still very much striving for more and so seems

very much striving for more and so seems to be focused more on use cases that

to be focused more on use cases that expand their capabilities.

expand their capabilities. Same thing with uh when you start to

Same thing with uh when you start to divide things by role you see real kind

divide things by role you see real kind of variance where for example seuitees

of variance where for example seuitees and leaders uh are less focused on those

and leaders uh are less focused on those time savings use cases and more focused

time savings use cases and more focused on other things like increased output

on other things like increased output and uh and new capabilities

and uh and new capabilities in general we're finding that sea

in general we're finding that sea leaders uh and just sort of seuite and

leaders uh and just sort of seuite and and leaders in general are even more

and leaders in general are even more optimistic and excited and seeing

optimistic and excited and seeing transformational impact than people who

transformational impact than people who are in more junior positions. Now, some

are in more junior positions. Now, some of this might be sort of selection bias

of this might be sort of selection bias in terms of um what types of use cases

in terms of um what types of use cases you are focused on. If you are in that

you are focused on. If you are in that seuite, you're thinking about things

seuite, you're thinking about things that inherently if they work are more

that inherently if they work are more transformational. Uh but it is notable

transformational. Uh but it is notable that 17% of uh of the use cases that

that 17% of uh of the use cases that that people in those leadership

that people in those leadership positions have submitted uh they say

positions have submitted uh they say have transformational impact and ROI

have transformational impact and ROI already. [snorts] Uh I'm going to skip

already. [snorts] Uh I'm going to skip this because there's we don't have time

this because there's we don't have time for too much. um you're seeing

for too much. um you're seeing interestingly uh a concentration um

interestingly uh a concentration um where the smallest organizations are

where the smallest organizations are getting more of that transformational

getting more of that transformational benefit early. Um one of the things that

benefit early. Um one of the things that I want to do following this study is

I want to do following this study is maybe do a sort of second round where we

maybe do a sort of second round where we dig into what this 1 to 50 person uh

dig into what this 1 to 50 person uh size really looks like. I actually think

size really looks like. I actually think that whereas there might be a lot of

that whereas there might be a lot of similarity between a 1000 and a 2,000

similarity between a 1000 and a 2,000 person organization, there could be a

person organization, there could be a wild difference between a threeperson,

wild difference between a threeperson, you know, small company and a 40 person

you know, small company and a 40 person company. And so I'd really like to dig

company. And so I'd really like to dig into that more. But you are definitely

into that more. But you are definitely seeing a a lot of impact in those sort

seeing a a lot of impact in those sort of more small nimble moving

of more small nimble moving organizations.

organizations. Uh as you might expect, coding and uh

Uh as you might expect, coding and uh and software related or uh use cases

and software related or uh use cases have a higher ROI than average and a

have a higher ROI than average and a lower negative ROI than average. Um one

lower negative ROI than average. Um one really interesting kind of you know

really interesting kind of you know pulling on a specific category of use

pulling on a specific category of use cases. Risk reduction is our lowest

cases. Risk reduction is our lowest category in terms of the percentage of

category in terms of the percentage of use cases that that that was their

use cases that that that was their primary benefit. So when you're filling

primary benefit. So when you're filling out the survey, which is by the way at

out the survey, which is by the way at ROI survey.ai AI if you want to check it

ROI survey.ai AI if you want to check it out. Uh you basically only get to pick a

out. Uh you basically only get to pick a primary ROI benefit. We didn't want it

primary ROI benefit. We didn't want it to be super sort of um we wanted you to

to be super sort of um we wanted you to pick and and hone in on the thing that

pick and and hone in on the thing that was uh seemed most important or most

was uh seemed most important or most significant. And so only 3.4% have risk

significant. And so only 3.4% have risk reduction as their primary benefit uh in

reduction as their primary benefit uh in terms of ROI categories. But it is by

terms of ROI categories. But it is by far those use cases are by far the most

far those use cases are by far the most likely to have transformational impact

likely to have transformational impact as as the as as their outcome. It's at

as as the as as their outcome. It's at 25%. So a full quarter of those uh have

25%. So a full quarter of those uh have transformational ROI. And interestingly,

transformational ROI. And interestingly, I was having this conversation with a

I was having this conversation with a couple of my friends who work in sort of

couple of my friends who work in sort of back office and compliance and risk

back office and compliance and risk functions, and this has been their

functions, and this has been their experience as well, where there are a

experience as well, where there are a lot of uh a lot of the the the

lot of uh a lot of the the the challenges for those organizations

challenges for those organizations involve sheer volume and quantity uh in

involve sheer volume and quantity uh in ways that that AI can be really helpful

ways that that AI can be really helpful for.

for. We also are finding some interesting

We also are finding some interesting patterns among organizations. And again,

patterns among organizations. And again, this is where we get into some of the

this is where we get into some of the limits of this just being a whoever

limits of this just being a whoever walks through the door of my listeners.

walks through the door of my listeners. We have a pretty heavy concentration

We have a pretty heavy concentration among technology, as you might expect,

among technology, as you might expect, industries and among professional

industries and among professional services, but we still have fairly

services, but we still have fairly decent sample sizes for some others. And

decent sample sizes for some others. And in both healthcare and manufacturing,

in both healthcare and manufacturing, the use cases are meaningfully higher

the use cases are meaningfully higher impact on average uh than the average

impact on average uh than the average across all organizations. Um, which I

across all organizations. Um, which I think is uh it was kind of worthy of

think is uh it was kind of worthy of further study.

further study. Last sort of part of this as I wrap up,

Last sort of part of this as I wrap up, you know, a lot of these use cases as

you know, a lot of these use cases as you saw have to do with that sort of

you saw have to do with that sort of first tier that most enterprises are

first tier that most enterprises are going to be in. Uh, increasing the

going to be in. Uh, increasing the amount of content that you output,

amount of content that you output, increasing the quality of that content,

increasing the quality of that content, just finding ways to win back, you know,

just finding ways to win back, you know, your 5 hours a week. Um but increasingly

your 5 hours a week. Um but increasingly there are automation and agentic use

there are automation and agentic use cases and we are absolutely seeing that

cases and we are absolutely seeing that where those are the the focus where

where those are the the focus where those use cases mention certain types of

those use cases mention certain types of automation or they mention agents they

automation or they mention agents they wildly outperform in terms of the

wildly outperform in terms of the self-reported ROI from them that's both

self-reported ROI from them that's both on automation and it's on agents and I

on automation and it's on agents and I think that that's sort of a a trend

think that that's sort of a a trend towards where we're headed with sort of

towards where we're headed with sort of the next layer of more advanced use

the next layer of more advanced use cases.

cases. The last thing that uh from this sort of

The last thing that uh from this sort of first first look of observations is

first first look of observations is there is clearly benefits and this goes

there is clearly benefits and this goes back to to what we saw with that

back to to what we saw with that Mackenzie study as well of thinking

Mackenzie study as well of thinking about AI and agentic transformation in

about AI and agentic transformation in systematic cross-organizational

systematic cross-organizational cross-disciplinary types of terms. um

cross-disciplinary types of terms. um effectively pretty much uh directly the

effectively pretty much uh directly the more use cases that a person or an

more use cases that a person or an organization submitted the the better

organization submitted the the better they tended to see uh ROI for. Now

they tended to see uh ROI for. Now there's lots of reasons for that but I

there's lots of reasons for that but I do think it speaks to that that core

do think it speaks to that that core idea that once you move beyond kind of

idea that once you move beyond kind of your single spot experiments there's a

your single spot experiments there's a lot of opportunity uh to to sort of grow

lot of opportunity uh to to sort of grow grow the impact of the organization. So,

grow the impact of the organization. So, like I said, that is the the first look.

like I said, that is the the first look. Uh, it's kind of the first twothirds of

Uh, it's kind of the first twothirds of these uh of these use cases. We'll be

these uh of these use cases. We'll be open for another week and then we'll

open for another week and then we'll have the full study out at the beginning

have the full study out at the beginning of December. Um, I'm really excited, I

of December. Um, I'm really excited, I think, heading into next year to see how

think, heading into next year to see how we move from sort of generic

we move from sort of generic conversations about impact uh and our

conversations about impact uh and our gut senses about impact to a lot more

gut senses about impact to a lot more random experiments like this to figure

random experiments like this to figure out where the impact really is and uh

out where the impact really is and uh and where we go next. So, look at that.

and where we go next. So, look at that. I'm going to end 27 seconds early and

I'm going to end 27 seconds early and really throw off the time, but

really throw off the time, but appreciate you guys all being here. Uh,

appreciate you guys all being here. Uh, and again, if you want to check this

and again, if you want to check this out, it's roicervey.ai.

As AI [music] changes our business and engineering landscape, do we need to

engineering landscape, do we need to rethink how we incentivize and

rethink how we incentivize and compensate engineers? Here to provide us

compensate engineers? Here to provide us with a case study for scaling output,

with a case study for scaling output, not overhead, is the co-founder and

not overhead, is the co-founder and managing partner at 10X, Arman Hezarki.

How's everybody feeling? It's been uh 7 and 1/2 hours. We doing what? Are we

and 1/2 hours. We doing what? Are we doing okay?

doing okay? >> Awesome. I'm Arman. Uh like the voice of

>> Awesome. I'm Arman. Uh like the voice of God apparently. That's what they're

God apparently. That's what they're called, Voice of God. Apparently. Uh so

called, Voice of God. Apparently. Uh so my name's Armon. I'm one of the

my name's Armon. I'm one of the co-founders and managing partners at a

co-founders and managing partners at a company called 10X. Uh my co-founder is

company called 10X. Uh my co-founder is Alex who's been uh kindly announcing

Alex who's been uh kindly announcing everybody all day. We do a lot of cool

everybody all day. We do a lot of cool work. We uh we help companies with their

work. We uh we help companies with their AI transformation. We have incredible

AI transformation. We have incredible clients all over the world. But I'm not

clients all over the world. But I'm not going to talk about any of that today.

going to talk about any of that today. I'm going to talk about something much

I'm going to talk about something much more niche. I'm going to talk about how

more niche. I'm going to talk about how we pay engineers. And we pay engineers

we pay engineers. And we pay engineers like salespeople. Earlier I was just in

like salespeople. Earlier I was just in the green room with a bunch of

the green room with a bunch of distinguished engineers that I've grown

distinguished engineers that I've grown to uh respect for my entire career. And

to uh respect for my entire career. And we were talking and I was telling them

we were talking and I was telling them that we pay engineers based on the story

that we pay engineers based on the story points that they complete. And we had a

points that they complete. And we had a lot of people roll their eyes and and

lot of people roll their eyes and and laugh. And they asked, "What do you

laugh. And they asked, "What do you mean?" And I said, "Clients pay us for

mean?" And I said, "Clients pay us for the number of story points that we

the number of story points that we deliver and we pay engineers based on

deliver and we pay engineers based on the number of story points that they

the number of story points that they complete." And similar to the looks that

complete." And similar to the looks that I'm getting from some of you, there was

I'm getting from some of you, there was skepticism.

skepticism. And I know this sounds crazy, but it's

And I know this sounds crazy, but it's working. We've been able to hire

working. We've been able to hire incredible engineers, many of whom have

incredible engineers, many of whom have started and exited uh companies before

started and exited uh companies before this. We have been able to hire

this. We have been able to hire worldclass machine learning and AI

worldclass machine learning and AI researchers. We've hired rocket

researchers. We've hired rocket scientists from NASA. We are shipping

scientists from NASA. We are shipping code incredibly quickly, and it's

code incredibly quickly, and it's maintainable and high quality code. Of

maintainable and high quality code. Of course, that is everyone's dream.

course, that is everyone's dream. Everybody wants to hire great people.

Everybody wants to hire great people. Everyone wants to deliver really uh fast

Everyone wants to deliver really uh fast code.

code. So, my goal here is not to convince you

So, my goal here is not to convince you all to adopt our model. My goal is to

all to adopt our model. My goal is to show you what compensation looks like in

show you what compensation looks like in AI and hopefully provide a new

AI and hopefully provide a new perspective on the fact that things

perspective on the fact that things might change as we introduce this

might change as we introduce this technology. Before I jump in though, I

technology. Before I jump in though, I want to talk about uh how we got here.

want to talk about uh how we got here. So, I'm a software engineer by training.

So, I'm a software engineer by training. I went to Carnegie Melon and then I

I went to Carnegie Melon and then I taught there in their school of computer

taught there in their school of computer science. After that, I went to Google

science. After that, I went to Google and I helped them scale their AI, cloud,

and I helped them scale their AI, cloud, and mobile practices internationally

and mobile practices internationally before starting a few ventureback

before starting a few ventureback startups. And in my last startup, I

startups. And in my last startup, I would work out of a weiwork. And I was

would work out of a weiwork. And I was sitting in this uh 33 Irving Weiwork. If

sitting in this uh 33 Irving Weiwork. If any of you are from New York, you you

any of you are from New York, you you might have worked out of that we work.

might have worked out of that we work. And they have these big tables and there

And they have these big tables and there were 12 of 12 of us kind of sitting

were 12 of 12 of us kind of sitting around. No one's talking. Everyone has

around. No one's talking. Everyone has their headphones in. And I look to my

their headphones in. And I look to my left and I see I see somebody with

left and I see I see somebody with Visual Studio Code open, right? I'm

Visual Studio Code open, right? I'm like, "Okay, I have a fellow engineer to

like, "Okay, I have a fellow engineer to my left." And I see that he was typing,

my left." And I see that he was typing, but I didn't see a chat window. This

but I didn't see a chat window. This person was typing into the code editor.

person was typing into the code editor. They were typing fo

They were typing fo like a caveman. This this poor person

like a caveman. This this poor person was typing like with their little

was typing like with their little chopstick fingers individual characters.

chopstick fingers individual characters. I I I couldn't believe it. On my

I I I couldn't believe it. On my computer, I had 45 agents. Three were

computer, I had 45 agents. Three were ordering me lunch. Two were writing

ordering me lunch. Two were writing code. One was doing research. Just

code. One was doing research. Just different worlds were happening on my

different worlds were happening on my computer versus this person's computer.

computer versus this person's computer. And I felt bad. I thought maybe we

And I felt bad. I thought maybe we should do a GoFundMe or something. But I

should do a GoFundMe or something. But I I I tried to look deeply at what is

I I tried to look deeply at what is actually causing this difference. Why am

actually causing this difference. Why am I using AI in the way that I am? And why

I using AI in the way that I am? And why is this person not?

is this person not? There are different ways that that

There are different ways that that people try AI and there are different

people try AI and there are different reasons why people don't use it. We've

reasons why people don't use it. We've all heard people who have tried it and

all heard people who have tried it and have said it's not as good as me. We've

have said it's not as good as me. We've all heard people who have not tried it

all heard people who have not tried it because they don't want to. But

because they don't want to. But regardless, my belief is that this is an

regardless, my belief is that this is an incentive issue. For me, I was a founder

incentive issue. For me, I was a founder and I wanted to squeak out every bit of

and I wanted to squeak out every bit of incremental value and and efficiency

incremental value and and efficiency that I could. And so I would sit on

that I could. And so I would sit on Twitter and LinkedIn and read blog posts

Twitter and LinkedIn and read blog posts and try to understand what is the

and try to understand what is the cutting edge in software engineering and

cutting edge in software engineering and what's going to give me the ability to

what's going to give me the ability to output more code higher quality faster.

output more code higher quality faster. And because of that, I was using all

And because of that, I was using all these all these different agents. But

these all these different agents. But this person probably worked at a

this person probably worked at a startup, probably had a base salary with

startup, probably had a base salary with an annual bonus and some equity. And

an annual bonus and some equity. And that was supposed to be the model that

that was supposed to be the model that incentivized people to be innovated, to

incentivized people to be innovated, to be innovative, and to work smarter and

be innovative, and to work smarter and faster and harder. But it wasn't

faster and harder. But it wasn't working. And so, in order to understand

working. And so, in order to understand how we got to where we are, I'm going to

how we got to where we are, I'm going to do a brief uh history of compensation.

do a brief uh history of compensation. And this is by no means accurate. I'm

And this is by no means accurate. I'm making a lot of things up here. It's all

making a lot of things up here. It's all illustrative. Okay. So, back in the day,

illustrative. Okay. So, back in the day, we had some cavemen who were writing

we had some cavemen who were writing code. We were we're uh probably

code. We were we're uh probably inscribing C in a in a tablet somewhere

inscribing C in a in a tablet somewhere and we were paying people hourly, right?

and we were paying people hourly, right? This makes sense. I look at somebody

This makes sense. I look at somebody sitting in a chair and I'm going to pay

sitting in a chair and I'm going to pay them some amount of dollars for some

them some amount of dollars for some amount of time. That makes sense for me

amount of time. That makes sense for me and it makes sense for the for the

and it makes sense for the for the engineer. But why is that broken? I

engineer. But why is that broken? I actually I want to hear from people. Why

actually I want to hear from people. Why is hourly broken?

>> It's slow output. >> No upside.

>> No upside. >> There's no upside. there's no reason to

>> There's no upside. there's no reason to work faster, right? And in fact, there's

work faster, right? And in fact, there's a disincentive to work faster. And so,

a disincentive to work faster. And so, what if I I notice this as the buyer of

what if I I notice this as the buyer of this technology and I say, "Okay, how

this technology and I say, "Okay, how long is it going to take you? It's going

long is it going to take you? It's going to take you five hours. Okay, so I'll

to take you five hours. Okay, so I'll pay you 500 bucks, right? Hourly $100.

pay you 500 bucks, right? Hourly $100. Multiply that by five." And then you as

Multiply that by five." And then you as the engineer, if you work faster, great.

the engineer, if you work faster, great. You get to keep the $500. And if you

You get to keep the $500. And if you work slower, that's on you. As

work slower, that's on you. As engineers, we're really, really bad at

engineers, we're really, really bad at estimating how long things are going to

estimating how long things are going to take. And so because of that, I'm not

take. And so because of that, I'm not going to say it's going to take five

going to say it's going to take five hours. I'm going to say it's going to

hours. I'm going to say it's going to take 15 hours, 20 hours, so that I have

take 15 hours, 20 hours, so that I have no downside. And so again, as the buyer,

no downside. And so again, as the buyer, I don't want to pay you based on the

I don't want to pay you based on the project.

project. So what if we hire people on salary and

So what if we hire people on salary and give them a bonus, right? Well, we in

give them a bonus, right? Well, we in the startup community know what happens

the startup community know what happens in that when when this is the case.

in that when when this is the case. People punch in at five, leave or nine,

People punch in at five, leave or nine, leave at five. And so I'm Larry Paige. I

leave at five. And so I'm Larry Paige. I notice this and I see why am I working

notice this and I see why am I working so hard at Google? Why am I putting my

so hard at Google? Why am I putting my blood, sweat, and tears into this? It's

blood, sweat, and tears into this? It's because I have some of the upside. I own

because I have some of the upside. I own the company, right? And so when we exit

the company, right? And so when we exit for for many, many dollars, I'm going to

for for many, many dollars, I'm going to see that. So what if I can share that

see that. So what if I can share that with my employees? And that's when

with my employees? And that's when equity comes in. And and this has

equity comes in. And and this has worked. This has worked for many many

worked. This has worked for many many years to incentivize employees. This is

years to incentivize employees. This is this is the foundation of the startup

this is the foundation of the startup community that we all know and are a

community that we all know and are a part of. It's incredible.

part of. It's incredible. But

But the not every company is Google. In

the not every company is Google. In fact, for every one Google, there are

fact, for every one Google, there are many many failures. And software

many many failures. And software engineers know this, right? For those

engineers know this, right? For those who want to take the risk, many will

who want to take the risk, many will just go to YC or or start their own

just go to YC or or start their own company. And for the ones who don't want

company. And for the ones who don't want the risk, they're opting for cash over

the risk, they're opting for cash over equity. Many of us who've hired

equity. Many of us who've hired engineers know that the cash is

engineers know that the cash is non-negotiable. Equity. Yeah, sure. I'll

non-negotiable. Equity. Yeah, sure. I'll take some upside.

take some upside. And so my contention is that this model

And so my contention is that this model needs to be reinvented in the age of AI.

needs to be reinvented in the age of AI. We need to directly incentivize people

We need to directly incentivize people to use these tools and to use them well

to use these tools and to use them well and to still maintain really high

and to still maintain really high quality standards of code. And so here's

quality standards of code. And so here's how it works for us. So we basically

how it works for us. So we basically just to take a step back, we do two

just to take a step back, we do two types of work at 10X. One is road

types of work at 10X. One is road mapping and one is execution. So

mapping and one is execution. So companies come to us and they say, "Hey,

companies come to us and they say, "Hey, we want AI." That's generally the

we want AI." That's generally the request. Sometimes it's more specific.

request. Sometimes it's more specific. It's like, hey, I want my customer

It's like, hey, I want my customer service team to have 10% more uh output

service team to have 10% more uh output using AI, right? But but generally they

using AI, right? But but generally they come to us with a request. We do a bunch

come to us with a request. We do a bunch of studying and learning and then we

of studying and learning and then we output a road map and based on that road

output a road map and based on that road map, they can take it and work on it on

map, they can take it and work on it on their own or we can do it. For a lot of

their own or we can do it. For a lot of things, we're taking off-the-shelf

things, we're taking off-the-shelf tools, but a lot of what we do is custom

tools, but a lot of what we do is custom builds and that's where the story point

builds and that's where the story point model comes in. So we will build a

model comes in. So we will build a roadmap for a lot of our clients. But

roadmap for a lot of our clients. But once they see that then they're putting

once they see that then they're putting in requests on their own as well. And we

in requests on their own as well. And we have two roles in the company that are

have two roles in the company that are client facing. One is the strategist and

client facing. One is the strategist and the other is the AI engineer. The

the other is the AI engineer. The strategists are all are mostly

strategists are all are mostly technical. And so we've have we have

technical. And so we've have we have former PMs, we have former engineers.

former PMs, we have former engineers. They are doing PM type work, consulting

They are doing PM type work, consulting type work. They're the ones that are

type work. They're the ones that are taking the product requirements and

taking the product requirements and distilling that down with the client.

distilling that down with the client. Then they hand that over to the engineer

Then they hand that over to the engineer and the engineer puts together an

and the engineer puts together an architecture design document. They spend

architecture design document. They spend a lot of time doing that. In fact, that

a lot of time doing that. In fact, that is where most of our engineering time

is where most of our engineering time goes. Then they write code and they

goes. Then they write code and they start implementing that that

start implementing that that architecture design document includes

architecture design document includes tickets and each ticket is graded on

tickets and each ticket is graded on some number of story points. This is a

some number of story points. This is a very traditional method of doing work,

very traditional method of doing work, right?

right? And when that ticket is accepted, the

And when that ticket is accepted, the engineer gets paid a a fee per story

engineer gets paid a a fee per story point that they complete. Our engineers

point that they complete. Our engineers have a flat base that they're paid and

have a flat base that they're paid and then every quarter we round up based on

then every quarter we round up based on the story points that they've completed.

the story points that they've completed. And again, this has led to us being able

And again, this has led to us being able to hire incredible people, but we've

to hire incredible people, but we've also been able to do incredible work.

also been able to do incredible work. So, I'm going to walk through a couple

So, I'm going to walk through a couple of projects that we've done. So, this is

of projects that we've done. So, this is one. This is a billboard company.

one. This is a billboard company. If you go to Times Square right now,

If you go to Times Square right now, you'll see some billboards that they've

you'll see some billboards that they've sold that inventory for.

sold that inventory for. They sell in two ways. One is you can

They sell in two ways. One is you can call them up traditional sales, you can

call them up traditional sales, you can buy that inventory, but the other is

buy that inventory, but the other is they have like an Uber for billboards

they have like an Uber for billboards type of product where you can go online,

type of product where you can go online, you can upload a PNG, you can choose

you can upload a PNG, you can choose where you want this to run and for how

where you want this to run and for how long, similar to like a Facebook or

long, similar to like a Facebook or Google ad. It's very similar to that

Google ad. It's very similar to that experience. And they came to us and they

experience. And they came to us and they said, "Hey, we think that there's some

said, "Hey, we think that there's some opportunities for AI in our product." We

opportunities for AI in our product." We did an analysis and we found a few. One

did an analysis and we found a few. One of them is this. We found that when an

of them is this. We found that when an image is uploaded to their system, it

image is uploaded to their system, it has to go through two rounds of

has to go through two rounds of moderation. One is internal to the

moderation. One is internal to the company and the other is with the

company and the other is with the billboard owner. Internal to their

billboard owner. Internal to their company, they're spending money on that

company, they're spending money on that to actually hire the people to do that

to actually hire the people to do that and there's a lot of inaccuracy

and there's a lot of inaccuracy and it takes a lot of time. So that

and it takes a lot of time. So that costs them money and it costs them

costs them money and it costs them revenue because every moment that the

revenue because every moment that the billboard is not running, they're not

billboard is not running, they're not making money. And so we found what if we

making money. And so we found what if we could build an AI model that can

could build an AI model that can actually do this moderation for them. We

actually do this moderation for them. We scoped that out. We built the

scoped that out. We built the architecture uh design doc. We broke it

architecture uh design doc. We broke it down into tickets and we built this for

down into tickets and we built this for them. We did it in two weeks and we got

them. We did it in two weeks and we got to 96% accuracy when compared to the

to 96% accuracy when compared to the human moderator. We've done a lot of

human moderator. We've done a lot of other projects with this company as

other projects with this company as well. This is another company. This is

well. This is another company. This is they work with retailers all around the

they work with retailers all around the world and currently they have devices in

world and currently they have devices in these retailers and they're low power

these retailers and they're low power devices. And so because of this, they're

devices. And so because of this, they're able to run one AI model on device. And

able to run one AI model on device. And what this model does is does heat

what this model does is does heat mapping. So imagine there's a camera in

mapping. So imagine there's a camera in this room looks down and it can

this room looks down and it can basically generate a heat map of where

basically generate a heat map of where the traffic is throughout the day. And

the traffic is throughout the day. And for retailers, of course, this is very,

for retailers, of course, this is very, very useful. But there's other things

very useful. But there's other things you can do too, right? If we just sit

you can do too, right? If we just sit here for a few minutes, we can probably

here for a few minutes, we can probably come up with a lot of ideas of if you

come up with a lot of ideas of if you have a camera with a chip, you can make

have a camera with a chip, you can make a lot of money from that. You can show

a lot of money from that. You can show really useful information. And so that's

really useful information. And so that's what we did. We we came up with what are

what we did. We we came up with what are some of the things that we could do with

some of the things that we could do with this? If you put a little bit little bit

this? If you put a little bit little bit more power in that chip, if you make the

more power in that chip, if you make the models, if you quantize them so they can

models, if you quantize them so they can run in parallel, what could you do? And

run in parallel, what could you do? And so we gave them this report and then we

so we gave them this report and then we built them five models that can run in

built them five models that can run in parallel. It does everything from heat

parallel. It does everything from heat mapping to Q detection to theft

mapping to Q detection to theft detection and more. And again, we start

detection and more. And again, we start with the product requirement stock. We

with the product requirement stock. We break this down into architecture. Then

break this down into architecture. Then we build it and then we pay engineers

we build it and then we pay engineers based on the output.

This is the big question. What are the risks? Right? I just talked about

risks? Right? I just talked about dandelions and rainbows, right? Uh so I

dandelions and rainbows, right? Uh so I promised you that my goal is not to

promised you that my goal is not to convince you to do this. And part of

convince you to do this. And part of this is showing you what the potential

this is showing you what the potential risks are. These are a few that come up.

risks are. These are a few that come up. One is what if an engineer inflates the

One is what if an engineer inflates the story points, right? What if an engineer

story points, right? What if an engineer says, "Okay, you want me to add a

says, "Okay, you want me to add a button? 45 story points." Right?

button? 45 story points." Right? What if an engineer rushes and quality

What if an engineer rushes and quality drops? You're saying that it took two

drops? You're saying that it took two weeks to do that? Well, was it good? Did

weeks to do that? Well, was it good? Did it work?

it work? And what if engineers get sharp elbowed?

And what if engineers get sharp elbowed? I started this by saying that we

I started this by saying that we compensate engineers like salespeople.

compensate engineers like salespeople. It's not a it's not a culture that we

It's not a it's not a culture that we necessarily want to emulate in software

necessarily want to emulate in software engineering, right? So, how do we how do

engineering, right? So, how do we how do we uh make sure that that's not

we uh make sure that that's not happening?

happening? First of all, I mentioned that we have

First of all, I mentioned that we have two different roles and we compensate

two different roles and we compensate like a counterbalance. So strategists

like a counterbalance. So strategists are compensated based on NR which really

are compensated based on NR which really is like customer happiness

is like customer happiness and every single ticket has to be

and every single ticket has to be approved internally with multiple rounds

approved internally with multiple rounds of QA of which the strategist is

of QA of which the strategist is involved but also by the client and so

involved but also by the client and so there's a counterbalance to every single

there's a counterbalance to every single ticket that is delivered.

ticket that is delivered. Uh, I skipped to the second one. For the

Uh, I skipped to the second one. For the first one, inflating story points. The

first one, inflating story points. The strategists are the ones who scope it.

strategists are the ones who scope it. And again, we have to review all of

And again, we have to review all of that. And for the third, how do you make

that. And for the third, how do you make sure that all of this is correct? And

sure that all of this is correct? And how do you make sure that there's no

how do you make sure that there's no sharp elbows? How do you make sure that

sharp elbows? How do you make sure that everybody is happy and the dandelions

everybody is happy and the dandelions and rainbows are continue throughout

and rainbows are continue throughout this parade of joy? Well, you have to

this parade of joy? Well, you have to hire the right people. And this is what

hire the right people. And this is what I tell everybody.

I tell everybody. We make hiring incredibly difficult for

We make hiring incredibly difficult for ourselves so that everything else is

ourselves so that everything else is easy. And that is a principle that we

easy. And that is a principle that we all know and we all stand true to. And

all know and we all stand true to. And this is incredibly important with AI. My

this is incredibly important with AI. My co-founder, Alex, always says, "AI makes

co-founder, Alex, always says, "AI makes people look like one of those crazy

people look like one of those crazy mirrors where any one of your

mirrors where any one of your attributes, it makes it 10 times

attributes, it makes it 10 times larger." If you're a great engineer, AI

larger." If you're a great engineer, AI makes you great. If you're not, it makes

makes you great. If you're not, it makes you sloppier. And this is the case with

you sloppier. And this is the case with all of these things. You have to start

all of these things. You have to start with hiring.

with hiring. Our belief is that AI gives people

Our belief is that AI gives people superpowers and it makes all of us

superpowers and it makes all of us smarter, faster, and better at what we

smarter, faster, and better at what we do. But my belief is that the current

do. But my belief is that the current way that we compensate people is

way that we compensate people is actually holding them back. And I would

actually holding them back. And I would invite you to think about how can you

invite you to think about how can you compensate people on your team

compensate people on your team differently, whether it's software

differently, whether it's software engineering or anything else. If you

engineering or anything else. If you want to unlock your employees potential,

want to unlock your employees potential, feel free to reach out at arman@10x.co.

feel free to reach out at arman@10x.co. Thank you. [applause]

Thank you. [applause] Our

next presenter [music] is deputy CTO at DX, the engineering intelligence

DX, the engineering intelligence platform designed by leading researchers

platform designed by leading researchers speaking about effective leadership in

speaking about effective leadership in AI enhanced organizations.

AI enhanced organizations. Please join me in welcoming to the stage

Please join me in welcoming to the stage Justin Rio.

>> Hello. Thanks for joining me in one of the

Thanks for joining me in one of the later day sessions. Looks like we we we

later day sessions. Looks like we we we kept a lot of people here. This is a

kept a lot of people here. This is a nice full room. I'm great to see it.

nice full room. I'm great to see it. We're going to go through a lot of

We're going to go through a lot of content in a short amount of time. So,

content in a short amount of time. So, I'm going to get right into it. If you

I'm going to get right into it. If you want to get deeper into any of this

want to get deeper into any of this stuff, we have published this uh AI

stuff, we have published this uh AI strategy playbook for senior executives.

strategy playbook for senior executives. And uh a lot of the content that I'm

And uh a lot of the content that I'm going to go through, I'm not going to

going to go through, I'm not going to have time to get quite as deep, but this

have time to get quite as deep, but this is just a nice PDF copy that you can

is just a nice PDF copy that you can come and refer to later. If you missed

come and refer to later. If you missed this QR code, don't worry. I'll show it

this QR code, don't worry. I'll show it again uh at the end. So, what is the

again uh at the end. So, what is the current impact of Genai?

current impact of Genai? Nobody knows, right? We've got Google on

Nobody knows, right? We've got Google on the one hand telling us that everyone's

the one hand telling us that everyone's 10% more productive. That's interesting.

10% more productive. That's interesting. Now, they're Google. They were already

Now, they're Google. They were already pretty productive to begin with. But we

pretty productive to begin with. But we have this sort of now infamous meter MER

have this sort of now infamous meter MER study which has some flaws in the way

study which has some flaws in the way that study was put together that showed

that study was put together that showed actually a 19% decrease in productivity

actually a 19% decrease in productivity using codec assistance. So there's a lot

using codec assistance. So there's a lot of volatility, a lot of variability. Uh

of volatility, a lot of variability. Uh what was really interesting about this

what was really interesting about this study even though I I mentioned there

study even though I I mentioned there were some flaws. Um but every engineer

were some flaws. Um but every engineer that took part in this study felt more

that took part in this study felt more productive but then the data actually

productive but then the data actually bore out that they were less productive.

bore out that they were less productive. Kind of interesting, right? we've got

Kind of interesting, right? we've got this induced flow uh that makes us feel

this induced flow uh that makes us feel really good about what we're doing. So,

really good about what we're doing. So, we need to address this. Dora has put

we need to address this. Dora has put out some really good research on this

out some really good research on this too. But this is based on industry

too. But this is based on industry averages. This is impact based on what

averages. This is impact based on what do we look at when we see a large sample

do we look at when we see a large sample and an average of how certain factors

and an average of how certain factors are being impacted by in this case 25%

are being impacted by in this case 25% increase in AI adoption. We see these

increase in AI adoption. We see these modest but positive leaning indicators.

modest but positive leaning indicators. 7.5% increase in documentation quality

7.5% increase in documentation quality and uh increase in code quality by about

and uh increase in code quality by about 3.4%. At least that's not leaning in the

3.4%. At least that's not leaning in the other direction, right? And when we

other direction, right? And when we started digging through some of DX's

started digging through some of DX's data, we have, you know, we're the

data, we have, you know, we're the developer productivity measurement

developer productivity measurement company. We have lots of aggregate data

company. We have lots of aggregate data that we can look at with this. We found

that we can look at with this. We found the same thing. When we looked at

the same thing. When we looked at averages, we see about a 2.6% 6%

averages, we see about a 2.6% 6% increase in overall uh change

increase in overall uh change confidence, which is a a percentage of

confidence, which is a a percentage of people who answered positively that they

people who answered positively that they feel confident in the changes that

feel confident in the changes that they're putting into production. Uh

they're putting into production. Uh similar positive leaning average when we

similar positive leaning average when we looked at code maintainability, another

looked at code maintainability, another qualitative metric, a1% reduction in

qualitative metric, a1% reduction in change failure rate. uh which when you

change failure rate. uh which when you think about the industry benchmark being

think about the industry benchmark being 4% it's not insignificant

4% it's not insignificant but this is not the full story because

but this is not the full story because this is what we saw when we broke the

this is what we saw when we broke the same studies down per company. Every

same studies down per company. Every company here is a every every bar

company here is a every every bar represents a company right we have some

represents a company right we have some that are seeing 20% increases in change

that are seeing 20% increases in change confidence while others are seeing 20%

confidence while others are seeing 20% decreases. We're seeing extreme

decreases. We're seeing extreme volatility which is why these averages

volatility which is why these averages look so innocuous but they're belying

look so innocuous but they're belying the greater story of variability. See

the greater story of variability. See the same thing with code

the same thing with code maintainability.

maintainability. The same thing with change failure rate.

The same thing with change failure rate. So this is a 2% increase in change

So this is a 2% increase in change failure rate up here at the top. Again

failure rate up here at the top. Again with an industry benchmark of 4%. That

with an industry benchmark of 4%. That means shipping as much as 50% more

means shipping as much as 50% more defects than we were shipping before.

defects than we were shipping before. Right? We want to make sure we're on the

Right? We want to make sure we're on the lower end of this. But how? Like what

lower end of this. But how? Like what should we be doing? Well, we found some

should we be doing? Well, we found some patterns here. We see that some

patterns here. We see that some organizations are seeing positive

organizations are seeing positive impacts to KPIs, but others are

impacts to KPIs, but others are struggling with adoption and even seeing

struggling with adoption and even seeing some of these negative impacts. Top down

some of these negative impacts. Top down mandates are not working, right? Driving

mandates are not working, right? Driving towards, oh, we must have 100% adoption

towards, oh, we must have 100% adoption of AI. Great, I will update my read my

of AI. Great, I will update my read my file every morning and I will be

file every morning and I will be compliant, right? We're not actually

compliant, right? We're not actually moving the needle anywhere when we do

moving the needle anywhere when we do that. We also find that lack of

that. We also find that lack of education and enablement uh has a big

education and enablement uh has a big impact on sort of negatively impacting

impact on sort of negatively impacting this. Some organizations just turn on

this. Some organizations just turn on the tech and expect it to just start

the tech and expect it to just start working and everybody to know the best

working and everybody to know the best ways to use it. Uh and a difficulty

ways to use it. Uh and a difficulty measuring the impact or even knowing

measuring the impact or even knowing what we should be measuring like what

what we should be measuring like what metrics would should we be looking at

metrics would should we be looking at you know does utilization really tell us

you know does utilization really tell us much about the full story of genai

much about the full story of genai impact. This is another graph from Dora.

impact. This is another graph from Dora. uh this is a basian uh posterior

uh this is a basian uh posterior distribution which is an interesting way

distribution which is an interesting way of representing data. Basically you want

of representing data. Basically you want your mass to be on the yellow side of

your mass to be on the yellow side of this line uh the the uh the right side

this line uh the the uh the right side of this line for the audience. Yeah. And

of this line for the audience. Yeah. And you want a sharp peak which is telling

you want a sharp peak which is telling you that we're pretty confident that

you that we're pretty confident that this initiative will have this impact.

this initiative will have this impact. And if we look at some of the topline

And if we look at some of the topline initiatives here, these are things like

initiatives here, these are things like clear AI policies. All right, we want to

clear AI policies. All right, we want to make sure we have that. We want time to

make sure we have that. We want time to learn. Not just giving people materials,

learn. Not just giving people materials, but actually giving them space to

but actually giving them space to experiment, right? Um, and so these

experiment, right? Um, and so these types of factors are the ones that seem

types of factors are the ones that seem to be moving the needle the most. So,

to be moving the needle the most. So, we're going to go over some quick tips

we're going to go over some quick tips on how we can do all of these things.

on how we can do all of these things. And again, the guide will go deeper into

And again, the guide will go deeper into this. We want to integrate across the

this. We want to integrate across the SDLC. All right. For most organizations,

SDLC. All right. For most organizations, writing code has never been the

writing code has never been the bottleneck, right? We can in uh we can

bottleneck, right? We can in uh we can increase productivity a bit by helping

increase productivity a bit by helping with code completion, but our our

with code completion, but our our biggest bottlenecks are elsewhere within

biggest bottlenecks are elsewhere within the SDLC. There's a lot more to creating

the SDLC. There's a lot more to creating software than just writing code. We want

software than just writing code. We want to unblock usage. We can't just say,

to unblock usage. We can't just say, well, we're worried about data

well, we're worried about data xfiltration, so we can't try this thing

xfiltration, so we can't try this thing like no, get creative about it. We've

like no, get creative about it. We've got really good infrastructure out there

got really good infrastructure out there now like bedrock and fireworks AI that

now like bedrock and fireworks AI that can let us run powerful models in safe

can let us run powerful models in safe spaces. We have to have open discussions

spaces. We have to have open discussions about these metrics. We need to

about these metrics. We need to evangelize the wins and we need to let

evangelize the wins and we need to let our engineers know why we're gathering

our engineers know why we're gathering metrics and data. What is it that we're

metrics and data. What is it that we're trying to improve? We have to reduce the

trying to improve? We have to reduce the fear of AI, right? We have to make sure

fear of AI, right? We have to make sure that people understand that this is not

that people understand that this is not a technology that is ready to replace

a technology that is ready to replace engineers. This is a a technology that's

engineers. This is a a technology that's really good at augmenting engineers and

really good at augmenting engineers and increasing the throughput of our

increasing the throughput of our business. We have to establish better

business. We have to establish better compliance and trust and we need to tie

compliance and trust and we need to tie this stuff to employee success. These

this stuff to employee success. These are new skill sets. AI is not coming for

are new skill sets. AI is not coming for your job, but somebody really good at AI

your job, but somebody really good at AI might take your job. And so, as leaders,

might take your job. And so, as leaders, we have the opportunity to help our

we have the opportunity to help our employees become more successful with

employees become more successful with this technology. So, how do we reduce

this technology. So, how do we reduce the fear? Well, first of all, why do we

the fear? Well, first of all, why do we need to do this? Well, there's a lot of

need to do this? Well, there's a lot of good reasons, but I love to point to

good reasons, but I love to point to Google's project Aristotle. This was a

Google's project Aristotle. This was a 2012 study where Google wanted to figure

2012 study where Google wanted to figure out what are the characteristics of

out what are the characteristics of highly performant teams. uh they thought

highly performant teams. uh they thought that the recipe was just going to be

that the recipe was just going to be what Google had this combination of high

what Google had this combination of high performers, experienced managers and

performers, experienced managers and basically unlimited resources and they

basically unlimited resources and they were dead wrong. Overwhelmingly the

were dead wrong. Overwhelmingly the biggest indicator of productivity was

biggest indicator of productivity was psychological safety. Okay. And so that

psychological safety. Okay. And so that very much applies now. We also have data

very much applies now. We also have data like this is SWEBench. I'm sure a lot of

like this is SWEBench. I'm sure a lot of you have seen this and there are some

you have seen this and there are some impressive benchmarks that the agents

impressive benchmarks that the agents can do like a third of the things

can do like a third of the things they're asked to do without any human

they're asked to do without any human intervention. That means that they're

intervention. That means that they're not able to do twothirds of them. Right?

not able to do twothirds of them. Right? Again, we are augmenting. We're not

Again, we are augmenting. We're not replacing. We're not ready. We may never

replacing. We're not ready. We may never be ready. So, we need to be very

be ready. So, we need to be very transparent with what we're doing. We

transparent with what we're doing. We need to set very clear intents. Why, you

need to set very clear intents. Why, you know, are we uh using this to to

know, are we uh using this to to augment, not to replace. We need to be

augment, not to replace. We need to be proactive in the way that we communicate

proactive in the way that we communicate that and not just wait for people to get

that and not just wait for people to get upset and possibly scared. We need to

upset and possibly scared. We need to say, "No, we are here to help you to

say, "No, we are here to help you to give you a better developer experience

give you a better developer experience and to increase the throughput of the

and to increase the throughput of the business." And again we have to have

business." And again we have to have these discussions about metrics. Now

these discussions about metrics. Now what metrics what should we be looking

what metrics what should we be looking at? Well DX again developer experience

at? Well DX again developer experience and productivity measurement company. Um

and productivity measurement company. Um there are two sort of classes of metrics

there are two sort of classes of metrics that we can be looking at really two

that we can be looking at really two levers that matter here and that's speed

levers that matter here and that's speed and quality. Right? We want to increase

and quality. Right? We want to increase PR throughput. We want to increase our

PR throughput. We want to increase our velocity but not by just creating a

velocity but not by just creating a bunch of slop that's going to give us a

bunch of slop that's going to give us a bunch of tech debt later that we're

bunch of tech debt later that we're going to have to deal with and we just

going to have to deal with and we just kick the bottleneck down the road if we

kick the bottleneck down the road if we do that. Right? So we want to be looking

do that. Right? So we want to be looking at things like change failure rate, our

at things like change failure rate, our overall perception of quality, change

overall perception of quality, change confidence, maintainability.

confidence, maintainability. And we have three types of metrics that

And we have three types of metrics that we can be looking at here. We have our

we can be looking at here. We have our telemetry metrics. These are the things

telemetry metrics. These are the things coming out of the API. And they're good

coming out of the API. And they're good for some stuff, but they're not always

for some stuff, but they're not always accurate, right? We know like accept

accurate, right? We know like accept versus suggest was kind of like all the

versus suggest was kind of like all the rage until we realize that engineers

rage until we realize that engineers need to click accept in the IDE in order

need to click accept in the IDE in order for the API to know about it. even if

for the API to know about it. even if they do click accept, who's to say they

they do click accept, who's to say they didn't just go back and rewrite every

didn't just go back and rewrite every line that was suggested, right? So

line that was suggested, right? So that's providing us some context, but we

that's providing us some context, but we also need to do some experience

also need to do some experience sampling. We need to like for instance

sampling. We need to like for instance add a new field to a PR form that says I

add a new field to a PR form that says I used AI to generate this PR or I enjoyed

used AI to generate this PR or I enjoyed using AI to generate this PR and get

using AI to generate this PR and get some data that way. And then

some data that way. And then self-reported data or survey data. We

self-reported data or survey data. We are big on surveys, but let me

are big on surveys, but let me underscore we're big on effective

underscore we're big on effective surveys. 90% plus participation rates

surveys. 90% plus participation rates engineered against questions that treat

engineered against questions that treat developer experience as a systems

developer experience as a systems problem not a people problem because

problem not a people problem because that's what it is W. Edwards Deming 90

that's what it is W. Edwards Deming 90 to 95% of the productivity output of an

to 95% of the productivity output of an organization is determined by the system

organization is determined by the system and not the worker. Okay, so

and not the worker. Okay, so foundational developer experience and

foundational developer experience and developer productivity metrics still

developer productivity metrics still matter the most. Right? Our AI metrics

matter the most. Right? Our AI metrics like utilization and things are telling

like utilization and things are telling us what's happening with the tech, but

us what's happening with the tech, but these core metrics that we've been able

these core metrics that we've been able to trust are telling us whether these

to trust are telling us whether these initiatives are actually working, right?

initiatives are actually working, right? Are we actually moving the needle and

Are we actually moving the needle and having the outcomes that we want to see?

having the outcomes that we want to see? So top companies are looking at

So top companies are looking at different things, right? We are seeing

different things, right? We are seeing like adoption metrics coming out of

like adoption metrics coming out of Microsoft. They've also got this great

Microsoft. They've also got this great metric called a bad developer day. I'm

metric called a bad developer day. I'm not going to go into it, but there's a

not going to go into it, but there's a really good white paper that shows like

really good white paper that shows like all the different telemetry that they

all the different telemetry that they can look at to determine what makes a

can look at to determine what makes a bad developer day. Dropbox is looking at

bad developer day. Dropbox is looking at similar stuff. Adoption like weekly

similar stuff. Adoption like weekly active users, daily active users, that

active users, daily active users, that sort of thing, but also looking at

sort of thing, but also looking at quality metrics like change failure

quality metrics like change failure rate. And booking is looking at similar

rate. And booking is looking at similar stuff as well. And so we built a

stuff as well. And so we built a framework around this. We were first to

framework around this. We were first to market with what we call our DXAI

market with what we call our DXAI measurement framework. And this is very

measurement framework. And this is very much inspired by things like Dora space

much inspired by things like Dora space framework, DevX just like our core four

framework, DevX just like our core four metric set which you can ask me about

metric set which you can ask me about later. Uh and we take these metrics and

later. Uh and we take these metrics and we uh normalize them into these three

we uh normalize them into these three dimensions of utilization, impact and

dimensions of utilization, impact and cost. And you can kind of think about

cost. And you can kind of think about this as a maturity curve too. A lot of

this as a maturity curve too. A lot of people start just figuring out okay

people start just figuring out okay what's happening? who's using the tech,

what's happening? who's using the tech, what's the percentage of pull requests

what's the percentage of pull requests that we're getting that are AI assisted

that we're getting that are AI assisted maybe through experience sampling? How

maybe through experience sampling? How many tasks are being assigned to agents?

many tasks are being assigned to agents? But then we can mature that perspective

But then we can mature that perspective a little bit and we can correlate that

a little bit and we can correlate that utilization to impact. What is this

utilization to impact. What is this actually doing to velocity? What is this

actually doing to velocity? What is this actually doing to quality? And this is

actually doing to quality? And this is when we start getting more mature in our

when we start getting more mature in our picture of our impact. And then finally,

picture of our impact. And then finally, cost. Although I like to joke that we're

cost. Although I like to joke that we're 15 years past the last hype cycle, which

15 years past the last hype cycle, which was cloud, and we still have new

was cloud, and we still have new companies spinning up that are teaching

companies spinning up that are teaching us how to understand and optimize our

us how to understand and optimize our cloud costs. So, we will see if we get

cloud costs. So, we will see if we get there. Although, I also hear horror

there. Although, I also hear horror stories about people burning through

stories about people burning through 2,000 tokens at $2,000 worth of tokens a

2,000 tokens at $2,000 worth of tokens a day. So, we probably do need to hit that

day. So, we probably do need to hit that as well. What about compliance and

as well. What about compliance and trust? What can we do to ensure that the

trust? What can we do to ensure that the output uh that that's being generated is

output uh that that's being generated is something that can be trusted by our

something that can be trusted by our engineers? We have a lot of levers to

engineers? We have a lot of levers to pull here, but one of the ones that I'd

pull here, but one of the ones that I'd like to talk about is setting up a

like to talk about is setting up a feedback loop for our system prompts. So

feedback loop for our system prompts. So these could be called system prompts,

these could be called system prompts, cursor rules, agent markdown. Pretty

cursor rules, agent markdown. Pretty much all of the mainstream solutions

much all of the mainstream solutions have something like this where you can

have something like this where you can go and provide a set of rules uh to

go and provide a set of rules uh to control how these models behave. Uh and

control how these models behave. Uh and I won't get too much into the technical

I won't get too much into the technical details here. We have an example where

details here. We have an example where like the uh models have been providing

like the uh models have been providing outdated Spring Boot uh stuff. We want

outdated Spring Boot uh stuff. We want Spring Boot 3. It's It's been sending us

Spring Boot 3. It's It's been sending us Spring Boot 2 stuff. The big takeaway

Spring Boot 2 stuff. The big takeaway here is to have the feedback loop. Have

here is to have the feedback loop. Have a gatekeeper, right? Have somebody or a

a gatekeeper, right? Have somebody or a group in the organization that can

group in the organization that can receive this feedback that understand

receive this feedback that understand how to maintain and continuously improve

how to maintain and continuously improve these system prompts, right? And that

these system prompts, right? And that way we're always maintaining the way

way we're always maintaining the way that these assistants or models or

that these assistants or models or agents affect the whole business. It

agents affect the whole business. It also pays to understand the way that uh

also pays to understand the way that uh temperature works, especially when we're

temperature works, especially when we're building agents, right? we do have some

building agents, right? we do have some control over the determinism and

control over the determinism and nondeterminism of these models. Uh again

nondeterminism of these models. Uh again like when a model is predicting a next

like when a model is predicting a next token, it doesn't just have like one

token, it doesn't just have like one token. It has a matrix of tokens and

token. It has a matrix of tokens and those are associated with a certain

those are associated with a certain probability of that being like the right

probability of that being like the right token. And so we have this setting

token. And so we have this setting called temperature, which is heat, which

called temperature, which is heat, which is entropy, which is randomness that can

is entropy, which is randomness that can control the amount of randomness

control the amount of randomness involved in actually picking that token.

involved in actually picking that token. This is sometimes called increasing the

This is sometimes called increasing the creativity of the model. And it's a

creativity of the model. And it's a number between 0 and one. For those

number between 0 and one. For those reasons I just mentioned, don't use zero

reasons I just mentioned, don't use zero or don't use one. Weird things will

or don't use one. Weird things will happen. But you want some decimal in

happen. But you want some decimal in between zero and one. When we have a

between zero and one. When we have a lower temperature, like we're seeing

lower temperature, like we're seeing here, 0.001,

here, 0.001, we give it the same task twice, and it

we give it the same task twice, and it gives us the exact same output character

gives us the exact same output character for character. When we set that

for character. When we set that temperature higher, this is an example

temperature higher, this is an example of 0.9. I'm asking the agent to create a

of 0.9. I'm asking the agent to create a gradient for me, uh, simple task. It's

gradient for me, uh, simple task. It's giving me two relatively valid

giving me two relatively valid solutions. I did ask it for a JavaScript

solutions. I did ask it for a JavaScript method and this is the only one that's

method and this is the only one that's giving me a JavaScript method. But the

giving me a JavaScript method. But the point is they are wildly different

point is they are wildly different approaches to the same problem when I've

approaches to the same problem when I've increased the creativity of that model.

increased the creativity of that model. So we need to think about like use case-

So we need to think about like use case- wise where should we have more

wise where should we have more creativity and where should we have more

creativity and where should we have more determinism and temperature is another

determinism and temperature is another setting that we have that can help

setting that we have that can help control this. You can experiment with

control this. You can experiment with all this using like docker model runner

all this using like docker model runner lama lm studio that sort of thing. How

lama lm studio that sort of thing. How can we tie this to better employee

can we tie this to better employee success? We had to provide both

success? We had to provide both education and adequate time to learn. So

education and adequate time to learn. So we put together a study where we sampled

we put together a study where we sampled a bunch of uh developers that were

a bunch of uh developers that were saving at least an hour a day uh uh

saving at least an hour a day uh uh excuse me an hour a week and we asked

excuse me an hour a week and we asked them to stack rank their top five most

them to stack rank their top five most valuable use cases. And we built a guide

valuable use cases. And we built a guide around that. A guide that effectively

around that. A guide that effectively goes through code examples, prompting

goes through code examples, prompting examples uh of what we determined using

examples uh of what we determined using the sort of data approach where we

the sort of data approach where we should get more reflexive about our best

should get more reflexive about our best practice and about uh the use cases that

practice and about uh the use cases that we're becoming reflexive in in our use

we're becoming reflexive in in our use of AI. And so that's what this guide was

of AI. And so that's what this guide was about. And uh we've had this become

about. And uh we've had this become required reading in certain engineering

required reading in certain engineering groups and uh proud of that. And this is

groups and uh proud of that. And this is another way that we can help educate.

another way that we can help educate. But we need to give time. Uh we don't

But we need to give time. Uh we don't have time to go through all of this. I

have time to go through all of this. I do think it's interesting that the

do think it's interesting that the number one use case for this was stack

number one use case for this was stack trace analysis, right? So, not a

trace analysis, right? So, not a generative use case, actually more of an

generative use case, actually more of an interpretive use case. And we see some

interpretive use case. And we see some other ones here that are not too

other ones here that are not too surprising. And there's examples of each

surprising. And there's examples of each of these. What about unblocking usage?

of these. What about unblocking usage? How can we make sure that we can

How can we make sure that we can creatively ensure that engineers can

creatively ensure that engineers can take the most advantage of this? Well,

take the most advantage of this? Well, leverage self-hosted and private models.

leverage self-hosted and private models. That's getting easier and easier to do.

That's getting easier and easier to do. Partner with compliance on day one,

Partner with compliance on day one, right? Make sure that what you're doing

right? Make sure that what you're doing is in line with your organization's

is in line with your organization's compliance. You may find that you're

compliance. You may find that you're making a lot of assumptions about things

making a lot of assumptions about things that you don't think you can do that you

that you don't think you can do that you can actually do, right? And then think

can actually do, right? And then think creatively around various barriers.

creatively around various barriers. Finally, how can we integrate across the

Finally, how can we integrate across the SDLC? What should we think about doing

SDLC? What should we think about doing there? You know, and I'm a big Ellie

there? You know, and I'm a big Ellie Gold theory of constraints fan. Probably

Gold theory of constraints fan. Probably have some others in the audience. An

have some others in the audience. An hour saved on something that isn't the

hour saved on something that isn't the bottleneck is worthless. And when we

bottleneck is worthless. And when we look at data across in this case almost

look at data across in this case almost 140,000 engineers, we find that there

140,000 engineers, we find that there are definitely good like annualized time

are definitely good like annualized time savings with AI that are being eclipsed

savings with AI that are being eclipsed by sources of context switching and

by sources of context switching and interruption, meeting heavy days, these

interruption, meeting heavy days, these other things that it's like, yeah, we

other things that it's like, yeah, we can save time here, but we're losing so

can save time here, but we're losing so much more time over there. So find the

much more time over there. So find the bottleneck, fix the bottleneck, right?

bottleneck, fix the bottleneck, right? Morgan Stanley's been very public about

Morgan Stanley's been very public about their uh building this thing called Dev

their uh building this thing called Dev Gen AI that looks at a bunch of legacy

Gen AI that looks at a bunch of legacy code, Cobalt, mainframe natural. I hate

code, Cobalt, mainframe natural. I hate to admit Pearl because I'm an old school

to admit Pearl because I'm an old school Pearl developer. Uh but apparently

Pearl developer. Uh but apparently that's legacy now, too. And basically

that's legacy now, too. And basically creating specs uh for developers that

creating specs uh for developers that can just be handed to developers to

can just be handed to developers to start modernizing the code without

start modernizing the code without having to do all that reverse

having to do all that reverse engineering, right? And they're saving

engineering, right? And they're saving about 300,000 hours annually right now

about 300,000 hours annually right now doing this. There's a Wall Street

doing this. There's a Wall Street Journal journal article about this,

Journal journal article about this, Business Insider article about it. Uh

Business Insider article about it. Uh they're very public about that. Zapier,

they're very public about that. Zapier, Zapier should be the example for

Zapier should be the example for everyone. They have a whole series of

everyone. They have a whole series of bots and agents that are doing things

bots and agents that are doing things like assisting with onboarding. They can

like assisting with onboarding. They can now make engineers effective in two

now make engineers effective in two weeks. Industry benchmark on the good

weeks. Industry benchmark on the good side is like a month. On the medium side

side is like a month. On the medium side is like 90 days. And uh because they're

is like 90 days. And uh because they're able to increase the effectiveness of

able to increase the effectiveness of the engineers that they're h that

the engineers that they're h that they've bringing into the organization,

they've bringing into the organization, they realized that they should be hiring

they realized that they should be hiring more, right? As opposed to trying to

more, right? As opposed to trying to maintain status quo by like cutting

maintain status quo by like cutting headcount and trying to make individual

headcount and trying to make individual engineers more productive. They said,

engineers more productive. They said, "No, we could get more value out of a

"No, we could get more value out of a single engineer. We should be hiring

single engineer. We should be hiring faster than ever." And they are, and

faster than ever." And they are, and it's really increasing their competitive

it's really increasing their competitive edge. I think that's the right attitude.

edge. I think that's the right attitude. Spotify has been helping out their S

Spotify has been helping out their S surres by pulling together context when

surres by pulling together context when incidents uh are detected and then

incidents uh are detected and then taking things like run but steps and and

taking things like run but steps and and other areas of context and documentation

other areas of context and documentation and pushing them directly into S sur

and pushing them directly into S sur channels so that those critical minutes

channels so that those critical minutes of trying to get to the bottom of what's

of trying to get to the bottom of what's actually happening and what we should do

actually happening and what we should do do to resolve the incident uh they just

do to resolve the incident uh they just eliminated that time right it's

eliminated that time right it's significantly increased their MTTR so

significantly increased their MTTR so let's get creative about areas in the

let's get creative about areas in the STLC that are our actual bottlenecks

STLC that are our actual bottlenecks All right, next steps. Uh, distribute

All right, next steps. Uh, distribute this guide as a reference for

this guide as a reference for integrating AI into the development

integrating AI into the development workflows that you have. Uh, determine a

workflows that you have. Uh, determine a method for measuring and evaluating

method for measuring and evaluating Genai impact. It's really important to

Genai impact. It's really important to make sure that we're not on the bad

make sure that we're not on the bad sides of those graphs that I showed you

sides of those graphs that I showed you earlier and then track and measure AI

earlier and then track and measure AI adoption and and see how that correlates

adoption and and see how that correlates to overall impact metrics and iterate on

to overall impact metrics and iterate on best practices and use cases. And here's

best practices and use cases. And here's a guide. Again, thank you so much Our

a guide. Again, thank you so much Our [applause]

closing presentation will teach us how to build an AI native company even if

to build an AI native company even if that company is 50 years old. Please

that company is 50 years old. Please join me in welcoming to the stage the

join me in welcoming to the stage the founder of Every, Dan [music] Shipper.

founder of Every, Dan [music] Shipper. >> Hello.

>> Hello. [applause]

How's it going everybody? I'm the last speaker of the day, so I'm just between

speaker of the day, so I'm just between you and dinner or drinks. So, I'm going

you and dinner or drinks. So, I'm going to try to make this fun and hopefully a

to try to make this fun and hopefully a little bit short.

So, first of all, I just want to say I'm very glad to see everybody and I'm

very glad to see everybody and I'm actually kind of surprised to see so

actually kind of surprised to see so many people here um because I've been I

many people here um because I've been I live here, but I've been traveling. I

live here, but I've been traveling. I was in Portugal uh last week and I was

was in Portugal uh last week and I was on Twitter and someone said that

on Twitter and someone said that everyone was moving to San Francisco.

everyone was moving to San Francisco. Uh but it's great to have everybody here

Uh but it's great to have everybody here instead because I love New York.

instead because I love New York. [laughter]

[laughter] Come on. Come on.

Come on. Come on. >> [applause]

>> [applause] >> Um, so I'm supposed to talk uh today

>> Um, so I'm supposed to talk uh today about uh how to build an a playbook for

about uh how to build an a playbook for how to build an AI native company. And

how to build an AI native company. And um I actually don't have one

um I actually don't have one unfortunately. Um

unfortunately. Um and that's because I think the playbook

and that's because I think the playbook is actually being invented right now. So

is actually being invented right now. So we're doing it at the company that I run

we're doing it at the company that I run every but all of you are doing it here

every but all of you are doing it here today as well. and and and so I don't

today as well. and and and so I don't want to do this talk from the

want to do this talk from the perspective of I have all the answers

perspective of I have all the answers and I'm going to tell you the framework

and I'm going to tell you the framework and the playbook and all that kind of

and the playbook and all that kind of stuff. Um but um I do think it is

stuff. Um but um I do think it is helpful when we're in this beginning

helpful when we're in this beginning stage of

stage of uh learning how to use AI to do

uh learning how to use AI to do engineering to build companies uh to

engineering to build companies uh to share like the the personal experiences

share like the the personal experiences that we're having inside of our

that we're having inside of our companies um and uh and sort of

companies um and uh and sort of collaboratively figure out the playbook

collaboratively figure out the playbook together. So I think the best that I can

together. So I think the best that I can offer is really just sort of dispatches

offer is really just sort of dispatches from the future. Uh notes on what I've

from the future. Uh notes on what I've figured out um and the work that we've

figured out um and the work that we've done inside of every um and I think the

done inside of every um and I think the the first big thing the first the first

the first big thing the first the first big thing I really noticed is that there

big thing I really noticed is that there is definitely a huge there's a 10x

is definitely a huge there's a 10x difference between an org where 90% of

difference between an org where 90% of the engineers are using AI versus an org

the engineers are using AI versus an org where 100% of the engineers are using

where 100% of the engineers are using AI. It's it's it's totally different.

AI. It's it's it's totally different. Um, I think the I think the big thing is

Um, I think the I think the big thing is if even 10% of your company is

if even 10% of your company is uh is using a more traditional

uh is using a more traditional engineering method, you you sort of have

engineering method, you you sort of have to lean all the way back over into that

to lean all the way back over into that world. Um, and so it it prevents you

world. Um, and so it it prevents you from doing some of the things that you

from doing some of the things that you might do if everyone was uh not typing

might do if everyone was uh not typing into a code editor all the time. Um,

into a code editor all the time. Um, and I know this because this is what we

and I know this because this is what we do at every um, which is the company

do at every um, which is the company that I run and it has totally

that I run and it has totally transformed what we are able to do as a

transformed what we are able to do as a small company. Um, and so I think of us

small company. Um, and so I think of us as like a little bit of a lab for what's

as like a little bit of a lab for what's possible that I I'm excited to share

possible that I I'm excited to share with you. So for people who don't know,

with you. So for people who don't know, I run every um, inside of every we have

I run every um, inside of every we have six business units. We have four

six business units. We have four software products. We run four software

software products. We run four software products with just 15 people, which is

products with just 15 people, which is kind of crazy. Um, and these software

kind of crazy. Um, and these software products are not toys. We've grown at

products are not toys. We've grown at every we've grown MR by double digits

every we've grown MR by double digits every month for the last 6 months. We

every month for the last 6 months. We have over 7,000 paying subscribers and

have over 7,000 paying subscribers and over 100,000 free subscribers. Um, and

over 100,000 free subscribers. Um, and we've done this in a very capital-like

we've done this in a very capital-like way. We've only raised about a million

way. We've only raised about a million dollars in total. Um and very

dollars in total. Um and very importantly for for this audience and

importantly for for this audience and for this discussion um 99% of our code

for this discussion um 99% of our code is written by AI agents. Uh no one is

is written by AI agents. Uh no one is handwriting code. No one is writing code

handwriting code. No one is writing code at all. Um it's all done with cloud

at all. Um it's all done with cloud code, codeex, Droid, what have you. Um

code, codeex, Droid, what have you. Um uh coding agent of your of your choice.

uh coding agent of your of your choice. Um, and also really importantly for the

Um, and also really importantly for the size of team we are, each one of our

size of team we are, each one of our apps is built by a single developer,

apps is built by a single developer, which is crazy. And these are not like

which is crazy. And these are not like uh little apps. Uh, here here's an

uh little apps. Uh, here here's an example. This is Kora, which is a um AI

example. This is Kora, which is a um AI email management app. Um, it's sort of

email management app. Um, it's sort of an it's an it is it's an assistant for

an it's an it is it's an assistant for your email. It on on the left over here,

your email. It on on the left over here, it summarizes all of your all of your

it summarizes all of your all of your emails that come in. So, you can kind of

emails that come in. So, you can kind of read your email that way. This is what

read your email that way. This is what my inbox looks like. on the right is a

my inbox looks like. on the right is a um email assistant that you can ask

um email assistant that you can ask questions like I asked where's when's my

questions like I asked where's when's my AI engineer talk um today and it gave me

AI engineer talk um today and it gave me just gave me the answer um and this is

just gave me the answer um and this is built primarily by one engineer um that

built primarily by one engineer um that he's got one or two contractors that

he's got one or two contractors that have helped in in certain ways but like

have helped in in certain ways but like almost all of this is built by one guy

almost all of this is built by one guy same thing for um

same thing for um uh this app which is another one that we

uh this app which is another one that we we make called monologue which is a

we make called monologue which is a speechto text app It's sort of like

speechto text app It's sort of like Super Whisper or Whisper Flow if you

Super Whisper or Whisper Flow if you know of those. Um, again, one guy,

know of those. Um, again, one guy, thousands of users. Um, I I love it.

thousands of users. Um, I I love it. It's a it's a it's just a beautifully

It's a it's a it's just a beautifully done app and it's not it's not simple.

done app and it's not it's not simple. It's complicated. There's a lot of stuff

It's complicated. There's a lot of stuff to it. Same thing for this app called

to it. Same thing for this app called Spiral. You can see there's it's it's

Spiral. You can see there's it's it's big. Um, and again, one engineer.

big. Um, and again, one engineer. So, obviously, this would not have been

So, obviously, this would not have been possible um a few years ago. it would

possible um a few years ago. it would not have been possible even a year ago.

not have been possible even a year ago. And I think the big change that happened

And I think the big change that happened that we're all starting to catch up to

that we're all starting to catch up to is um it started with cloud code, the

is um it started with cloud code, the sort of like terminal UI that gets rid

sort of like terminal UI that gets rid of the code editor

of the code editor really push pushed us into a place where

really push pushed us into a place where um we are delegating tasks to these

um we are delegating tasks to these agents. We are and and that allows us to

agents. We are and and that allows us to uh work in parallel and do much more

uh work in parallel and do much more than we would have ordinarily. Um,

than we would have ordinarily. Um, so some of the things that some of the

so some of the things that some of the things that I've noticed that we can do

things that I've noticed that we can do that I I assume people in this room are

that I I assume people in this room are starting to see but um [snorts] I think

starting to see but um [snorts] I think is is sort of important to put put our

is is sort of important to put put our finger on is uh the reason we can go

finger on is uh the reason we can go much faster is we can work on multiple

much faster is we can work on multiple multiple features and bugs in parallel.

multiple features and bugs in parallel. And I think that there's a um

And I think that there's a um there's like a little bit of a meme of

there's like a little bit of a meme of the vibe coder on Twitter that is oh

the vibe coder on Twitter that is oh [snorts] like they they have um they

[snorts] like they they have um they have four panes open but they're not

have four panes open but they're not actually doing any work. And I actually

actually doing any work. And I actually you can do it that way and I think there

you can do it that way and I think there are also definitely engineers and I know

are also definitely engineers and I know that they are because they work at every

that they are because they work at every that are productively using four panes

that are productively using four panes of agents at the same time. Um, and

of agents at the same time. Um, and that's that's crazy and that that

that's that's crazy and that that contributes a lot to the um ability for

contributes a lot to the um ability for a single developer to build and run a

a single developer to build and run a production application. Um, another like

production application. Um, another like really important thing about this, a

really important thing about this, a really big um unlock is because code is

really big um unlock is because code is cheap, you can prototype risky ideas and

cheap, you can prototype risky ideas and that allows you to do more experiments

that allows you to do more experiments than you would ordinarily. And that lets

than you would ordinarily. And that lets you make way more progress because the

you make way more progress because the starting energy to try something is so

starting energy to try something is so much lower because you just like say,

much lower because you just like say, "Oh, go do this. go do some research on

"Oh, go do this. go do some research on this like big refactor I might want to

this like big refactor I might want to do and then you go off and do something

do and then you go off and do something else. And that's a really big deal.

Um, and another really interesting thing that I love about this stuff that I'

that I love about this stuff that I' I've noticed in inside of inside of our

I've noticed in inside of inside of our organization is we move we're moving a

organization is we move we're moving a bit more toward a demo culture where um

bit more toward a demo culture where um instead of you know previously if you

instead of you know previously if you wanted to make something you'd have to

wanted to make something you'd have to be like maybe write a memo or do a do a

be like maybe write a memo or do a do a deck or um or you know convince a bunch

deck or um or you know convince a bunch of people that it was a good idea to

of people that it was a good idea to spend time on because you can vibe code

spend time on because you can vibe code something uh in a couple hours that sort

something uh in a couple hours that sort of shows the thing that you're uh that

of shows the thing that you're uh that you want to make. It it allows you to

you want to make. It it allows you to show everybody and uh I think that being

show everybody and uh I think that being a being a sort of de democultural allows

a being a sort of de democultural allows you to do weirder things that you only

you to do weirder things that you only get if you can feel it. Um which is I

get if you can feel it. Um which is I think really amazing

think really amazing and beyond just like sort of the basic

and beyond just like sort of the basic productivity unlocks.

productivity unlocks. um

um AI has and the way that we use it has

AI has and the way that we use it has caused us to sort of invent an entirely

caused us to sort of invent an entirely new set of engineering primitives and

new set of engineering primitives and processes which I'm sure that everybody

processes which I'm sure that everybody in this room is starting to do already.

in this room is starting to do already. I think everyone is sort of approaching

I think everyone is sort of approaching the same things from different angles

the same things from different angles and a lot of them definitely do echo

and a lot of them definitely do echo engineering processes from the past but

engineering processes from the past but I think it's really helpful to try to

I think it's really helpful to try to put our finger on okay what is the new

put our finger on okay what is the new way of programming if we're moving up a

way of programming if we're moving up a level of the stack and and we're moving

level of the stack and and we're moving from you know Python and JavaScript and

from you know Python and JavaScript and scripting languages up into um up into

scripting languages up into um up into English and the uh the the name that

English and the uh the the name that we've given to this process is

we've given to this process is compounding engineering

compounding engineering Um, and the way that I talk about

Um, and the way that I talk about compounding engineering is in

compounding engineering is in traditional engineering, each feature

traditional engineering, each feature makes the next feature harder to build.

makes the next feature harder to build. In compounding engineering, your goal is

In compounding engineering, your goal is to make sure that each feature makes the

to make sure that each feature makes the next feature easier to build. Um, and we

next feature easier to build. Um, and we do that in this loop.

do that in this loop. Um, the loop has four steps. The first

Um, the loop has four steps. The first one is plan. And if you're you've been

one is plan. And if you're you've been here today, you've been paying

here today, you've been paying attention, you know how important it is

attention, you know how important it is when you're working with agents to make

when you're working with agents to make a really really detailed plan. So I

a really really detailed plan. So I think everyone is doing that. Second

think everyone is doing that. Second step is delegate. Just like go tell the

step is delegate. Just like go tell the agent to do it. Everyone's doing that

agent to do it. Everyone's doing that too. Third step is assess. And we have

too. Third step is assess. And we have tons and tons of ways to um assess

tons and tons of ways to um assess whether the work that the agent did is

whether the work that the agent did is any good. There's tests, there's trying

any good. There's tests, there's trying it, there's having the agent uh figure

it, there's having the agent uh figure it out. There's there's code review,

it out. There's there's code review, there's agent code review, there's all

there's agent code review, there's all this types of stuff. And then the last

this types of stuff. And then the last step which is I think the most

step which is I think the most interesting one is codify. And this is

interesting one is codify. And this is kind of like the the money step which is

kind of like the the money step which is where you compound

where you compound everything that you've learned from the

everything that you've learned from the planning stage, the delegation stage,

planning stage, the delegation stage, the assessment stage back into prompts

the assessment stage back into prompts that go into your, you know, your cloud

that go into your, you know, your cloud MD file or your um your sub aents or

MD file or your um your sub aents or your slash commands and you start to um

your slash commands and you start to um basically create this library. You take

basically create this library. You take all the tacet knowledge that you pick up

all the tacet knowledge that you pick up um that all your engineers are picking

um that all your engineers are picking up um as they find bugs, fix plans, um

up um as they find bugs, fix plans, um delegate work, and you um you make it

delegate work, and you um you make it into an explicit collection of prompts

into an explicit collection of prompts that you can spread for your entire

that you can spread for your entire organization.

organization. And um when you do that really well,

And um when you do that really well, there's a lot of like really interesting

there's a lot of like really interesting um second order effects that are are not

um second order effects that are are not I think that well understood or or that

I think that well understood or or that commonly talked about that I think would

commonly talked about that I think would be interesting to to bring here because

be interesting to to bring here because my guess is that um some people are

my guess is that um some people are already seeing this, but like maybe it

already seeing this, but like maybe it needs to be pushed on a little bit more

needs to be pushed on a little bit more to like really be brought out and some

to like really be brought out and some people uh it might be an interesting way

people uh it might be an interesting way to get more of your organization to buy

to get more of your organization to buy into using these tools. tools 100% of

into using these tools. tools 100% of the time.

the time. Um so the first thing that you notice if

Um so the first thing that you notice if you sort of if you set up this process

you sort of if you set up this process and you and you're like 100% bought in

and you and you're like 100% bought in on something like compounding

on something like compounding engineering um is that tacet code

engineering um is that tacet code sharing

sharing becomes much easier. So uh we have we

becomes much easier. So uh we have we have multiple products at every a lot of

have multiple products at every a lot of a lot of products a lot of times need to

a lot of products a lot of times need to implement similar things even if they

implement similar things even if they use different technologies or imple

use different technologies or imple implementing similar things like a

implementing similar things like a team's feature or a certain type of ooth

team's feature or a certain type of ooth or whatever. Um

or whatever. Um previously in order to share code you'd

previously in order to share code you'd have to like abstract out whatever you

have to like abstract out whatever you did into a library and then like allow

did into a library and then like allow someone else to download and it it'd be

someone else to download and it it'd be hard to do or you'd have to talk about

hard to do or you'd have to talk about it.

it. With agents, um you can just point your

With agents, um you can just point your Cloud Code instance at um the repo from

Cloud Code instance at um the repo from the developer sitting next to you and

the developer sitting next to you and learn the process that they went through

learn the process that they went through to build the feature that they that you

to build the feature that they that you need to reimplement and reimplement it

need to reimplement and reimplement it yourself in your own tech stack in your

yourself in your own tech stack in your own framework and in your own way. Um,

own framework and in your own way. Um, and that's really really cool to kind of

and that's really really cool to kind of have this the more developers you have

have this the more developers you have working on different things inside of

working on different things inside of the org, the more you can um share

the org, the more you can um share without any extra cost because AI can

without any extra cost because AI can just go read all the code and and um and

just go read all the code and and um and use it. Um, another really cool thing

use it. Um, another really cool thing that I've noticed is that new hires are

that I've noticed is that new hires are productive on their first day because

productive on their first day because you've taken all of the things that

you've taken all of the things that you've learned about like, okay, how do

you've learned about like, okay, how do I set up an environment and what does a

I set up an environment and what does a good commit look like and all this kind

good commit look like and all this kind of stuff and on the first day they have

of stuff and on the first day they have all that set up in their in in their,

all that set up in their in in their, you know, cloud MD files or their cursor

you know, cloud MD files or their cursor files or uh codeex files or whatever and

files or uh codeex files or whatever and um the agent just sets up their local

um the agent just sets up their local environment and knows how write a good

environment and knows how write a good PR. That's really cool. It also helps if

PR. That's really cool. It also helps if you um want to hire like expert

you um want to hire like expert freelancers. Like there's some there's

freelancers. Like there's some there's one guy there's one person who just is

one guy there's one person who just is really good at this one specific thing.

really good at this one specific thing. You can have them come in for a day and

You can have them come in for a day and like do that thing. It's I think of it a

like do that thing. It's I think of it a little bit like um like a DJ or whatever

little bit like um like a DJ or whatever can like go in on like a couple bars of

can like go in on like a couple bars of a song. Like you can just sort of drop

a song. Like you can just sort of drop in and that's really helpful. it's it

in and that's really helpful. it's it would ordinarily be like too hard to

would ordinarily be like too hard to collaborate because the the startup cost

collaborate because the the startup cost is too high, but you can do that a lot

is too high, but you can do that a lot better now.

better now. Um, another thing that I've noticed

Um, another thing that I've noticed which is really cool too is um

which is really cool too is um developers inside of every commit to um

developers inside of every commit to um other products. So, uh you know, we have

other products. So, uh you know, we have four products that run internally.

four products that run internally. Everybody uses all the products. If

Everybody uses all the products. If someone uh runs into a bug or a paper

someone uh runs into a bug or a paper cutter, like a little minor quality of

cutter, like a little minor quality of life thing that they want, they will um

life thing that they want, they will um often just um they will often just uh

often just um they will often just uh just submit a poll request for it to

just submit a poll request for it to other GM of the app um because it's very

other GM of the app um because it's very easy for them to go download the repo

easy for them to go download the repo and figure out uh or have really have

and figure out uh or have really have Claude or Codex figure out, okay, this

Claude or Codex figure out, okay, this is how we fix the bug or this is how we

is how we fix the bug or this is how we fix the paper cut. Um and that's really

fix the paper cut. Um and that's really really cool because you have this

really cool because you have this much um much easier way of collaborating

much um much easier way of collaborating across apps that I I think over the next

across apps that I I think over the next couple years. I imagine that you will

couple years. I imagine that you will also be able to let customers do this to

also be able to let customers do this to some extent. Like if you run into a bug,

some extent. Like if you run into a bug, um this is, you know, speculative, but

um this is, you know, speculative, but if you run into a bug, you can have your

if you run into a bug, you can have your little agent fix it um and submit it as

little agent fix it um and submit it as pull request. It's a weird open source

pull request. It's a weird open source thing, but um yeah, this is really

thing, but um yeah, this is really really cool and and definitely is

really cool and and definitely is happening a lot inside of our company.

happening a lot inside of our company. Um,

Um, another really cool thing is um we we

another really cool thing is um we we have not this may get different as we as

have not this may get different as we as we scale but um we have not yet had to

we scale but um we have not yet had to standardize onto a particular stack or

standardize onto a particular stack or language. We instead let everyone who's

language. We instead let everyone who's building different products like pick

building different products like pick the thing that they like best and the

the thing that they like best and the reason is because it makes it AI makes

reason is because it makes it AI makes it much easier to translate between

it much easier to translate between them. Um and it makes it much easier to

them. Um and it makes it much easier to to jump into any language and framework

to jump into any language and framework and environment and be productive. And

and environment and be productive. And so it we don't uh it's easier for us to

so it we don't uh it's easier for us to let people just do the thing that that

let people just do the thing that that they like and let AI kind of like handle

they like and let AI kind of like handle the translation in between.

the translation in between. Um and the last thing which is my

Um and the last thing which is my favorite but like is also the horror I

favorite but like is also the horror I think of of some developers and to some

think of of some developers and to some degree maybe the horror of my team um is

degree maybe the horror of my team um is that managers can commit code. um if

that managers can commit code. um if you're technical uh even the CEO and

you're technical uh even the CEO and um for for me like I have no business

um for for me like I have no business committing code because we've got four

committing code because we've got four products we've got 15 people we're

products we've got 15 people we're growing really fast um I'm doing tons

growing really fast um I'm doing tons and tons of other things but I can and I

and tons of other things but I can and I I have like committed production code

I have like committed production code over the last couple months and the

over the last couple months and the reason for that is AI allows um

reason for that is AI allows um engineers to work with fractured

engineers to work with fractured attention so previously you might have

attention so previously you might have needed like a 3 or 4 hour block of focus

needed like a 3 or 4 hour block of focus time in order to like get anything done.

time in order to like get anything done. Um, but with cloud code, you can kind of

Um, but with cloud code, you can kind of like get out of meeting and say, "Hey,

like get out of meeting and say, "Hey, like I want you to investigate this

like I want you to investigate this bug." And then go do something else and

bug." And then go do something else and then come back and you have like a a

then come back and you have like a a plan or like a um root cause fix and

plan or like a um root cause fix and then you can submit a PR. And it's not

then you can submit a PR. And it's not easy. It's not magic, but it is actually

easy. It's not magic, but it is actually possible. And I think that's a that's

possible. And I think that's a that's just a totally new way of thinking how

just a totally new way of thinking how thinking of thinking about how managers

thinking of thinking about how managers interact with the products that they

interact with the products that they make.

So, um, just to just to summarize, um, there's a I really think there's a 10x

there's a I really think there's a 10x difference in how things work when you

difference in how things work when you hit 100% AI adoption. I think, um, from

hit 100% AI adoption. I think, um, from what we've seen, a single engineer

what we've seen, a single engineer should be able to build and maintain a

should be able to build and maintain a complex production product. what we call

complex production product. what we call compounding engineering, but I think

compounding engineering, but I think what all of us are are sort of pointing

what all of us are are sort of pointing to um is I I think really works to make

to um is I I think really works to make each feature easier to build and then

each feature easier to build and then creates all of these sort of nonobvious

creates all of these sort of nonobvious second order effects that makes it

second order effects that makes it easier for the entire organization to

easier for the entire organization to collaborate together.

collaborate together. And very importantly, many people in San

And very importantly, many people in San Francisco don't know this yet. Um so

Francisco don't know this yet. Um so you're you're the first to hear it. Um

you're you're the first to hear it. Um so that is my talk. So, if you're

so that is my talk. So, if you're interested in um in what we do, uh I run

interested in um in what we do, uh I run every uh Every is the only subscription

every uh Every is the only subscription you need to stay at the edge of AI. You

you need to stay at the edge of AI. You can find us at every.to. Um we uh we

can find us at every.to. Um we uh we have a daily newsletter about AI. So, we

have a daily newsletter about AI. So, we do ideas, apps, and training. We have a

do ideas, apps, and training. We have a on the ideas side, we have a daily

on the ideas side, we have a daily newsletter. We review all the new models

newsletter. We review all the new models when they come out and all the new

when they come out and all the new products when they come out. The apps,

products when they come out. The apps, you already saw, we've a bundle of all

you already saw, we've a bundle of all these apps and then we do training and

these apps and then we do training and consulting with big companies to help

consulting with big companies to help them use AI and it's all bundled into

them use AI and it's all bundled into one subscription. So you get everything

one subscription. So you get everything for one price and that's it. Thank you

for one price and that's it. Thank you very much. [applause]

very much. [applause] [music]

[music] >> Ladies and gentlemen, please welcome

>> Ladies and gentlemen, please welcome back to the stage Alex Lieberman.

Okay, 8 hours in. We did it. Um I have some housekeeping. We have to finish the

some housekeeping. We have to finish the day with housekeeping. First of all, I

day with housekeeping. First of all, I want to thank you all. It has been

want to thank you all. It has been phenomenal to be on this journey with

phenomenal to be on this journey with you all. But uh let's give a a shout out

you all. But uh let's give a a shout out just to you all for being here, going

just to you all for being here, going through a full day listening to the

through a full day listening to the programming. So, round of applause for

programming. So, round of applause for everyone in the crowd, everyone

everyone in the crowd, everyone [applause] online who's been watching.

[applause] online who's been watching. Let's also keep it going for all the

Let's also keep it going for all the team in production behind the scenes

team in production behind the scenes making this possible. I watched them

making this possible. I watched them work tirelessly throughout the day to

work tirelessly throughout the day to make this happen. And then finally,

make this happen. And then finally, let's give a huge shout out to Swix and

let's give a huge shout out to Swix and Ben who made this whole thing happen.

Ben who made this whole thing happen. >> [applause]

>> So get comfortable for a second. I have some housekeeping. Make sure everyone

some housekeeping. Make sure everyone knows where to go. And then we have one

knows where to go. And then we have one final speaker who's going to chat uh

final speaker who's going to chat uh right after I hop off stage. So let's

right after I hop off stage. So let's just dive in for a sec. Uh tomorrow is

just dive in for a sec. Uh tomorrow is the engineering session day. I will not

the engineering session day. I will not be your MC. So you will be taken care of

be your MC. So you will be taken care of by Jed who works at Google. I spent the

by Jed who works at Google. I spent the day with Jed. He is incredible. He's

day with Jed. He is incredible. He's just like a taller, better looking

just like a taller, better looking version of me. and he's actually an

version of me. and he's actually an engineer. So, you get a true engineer

engineer. So, you get a true engineer tomorrow. Uh, if you have a bundle pass,

tomorrow. Uh, if you have a bundle pass, your ticket includes tomorrow's track.

your ticket includes tomorrow's track. So, we'll see you tomorrow at 8:00 a.m.

So, we'll see you tomorrow at 8:00 a.m. here. If you have the leadership pass

here. If you have the leadership pass only, your ticket does not include

only, your ticket does not include access to the sessions or the venue

access to the sessions or the venue tomorrow. However, we have organized an

tomorrow. However, we have organized an off-site brunch for you on us at a

off-site brunch for you on us at a restaurant not far from here. So, check

restaurant not far from here. So, check your calendar for the invite and the

your calendar for the invite and the location. But right now we are headed

location. But right now we are headed into the afterparty. And not only is

into the afterparty. And not only is there an afterparty, but there are after

there an afterparty, but there are after afterparties. There's a lot of side

afterparties. There's a lot of side events. So your entire night is planned

events. So your entire night is planned for you. And we have Graphite to thank

for you. And we have Graphite to thank for sponsoring the afterparty. So here

for sponsoring the afterparty. So here to give us the last word for a brief

to give us the last word for a brief message is the co-founder and CEO of

message is the co-founder and CEO of Graphite, Mel Lutzky.

[applause] >> [music]

>> Good evening everyone. My name is Mel Lutzky and I'm the co-founder and CEO of

Lutzky and I'm the co-founder and CEO of Graphite. Uh we're the AI powered code

Graphite. Uh we're the AI powered code review platform for this new age of

review platform for this new age of agentic software development. Now I know

agentic software development. Now I know you guys heard a lot today about agents

you guys heard a lot today about agents and how to make them as effective as

and how to make them as effective as possible in generating code and building

possible in generating code and building features faster than ever. And they're

features faster than ever. And they're incredible at this. But I think

incredible at this. But I think everybody who's who's built software in

everybody who's who's built software in a professional environment knows that

a professional environment knows that writing the code is only the first part

writing the code is only the first part of the story. Every code change then

of the story. Every code change then needs to be tested. It needs to be

needs to be tested. It needs to be reviewed, merged, deployed. And

reviewed, merged, deployed. And oftentimes that second half of the

oftentimes that second half of the process takes just as long if not longer

process takes just as long if not longer than actually generating the code. And

than actually generating the code. And that's what we do with graphite. We're

that's what we do with graphite. We're applying AI to the entire development

applying AI to the entire development process and making code review as

process and making code review as quickly as as quick as possible. Uh we

quickly as as quick as possible. Uh we have an agent that's integrated fully

have an agent that's integrated fully into our pull request page. Um it's like

into our pull request page. Um it's like reviewing code in 2025 and not you it

reviewing code in 2025 and not you it doesn't feel like 2015 anymore. U that's

doesn't feel like 2015 anymore. U that's what we build. Um we're super excited

what we build. Um we're super excited about it. Uh if you want to come check

about it. Uh if you want to come check it out, we have our booth uh in the expo

it out, we have our booth uh in the expo hall and also we're going to be around

hall and also we're going to be around all day tomorrow. We're the official

all day tomorrow. We're the official sponsors of tonight's afterparty and

sponsors of tonight's afterparty and also tomorrow's event at public records.

also tomorrow's event at public records. So we wanted you all you guys who came

So we wanted you all you guys who came from out of town out of town, we wanted

from out of town out of town, we wanted to show you good time in New York. Uh,

to show you good time in New York. Uh, so we have two events for you, uh, to

so we have two events for you, uh, to make sure that you have you have a good

make sure that you have you have a good time and, uh, see what New York is all

time and, uh, see what New York is all about. Uh, want to give a big shout out

about. Uh, want to give a big shout out to Swix and Ben and the whole AIG team

to Swix and Ben and the whole AIG team for organizing and we're excited to see

for organizing and we're excited to see you guys all at the party tonight. Thank

you guys all at the party tonight. Thank you very much.

you very much. [applause]

taking place in the halls on both doors. Expo

>> [music] >> Heat. Heat.

คลิกที่ข้อความหรือไทม์สแตมป์ใดก็ได้เพื่อข้ามไปยังช่วงเวลานั้นในวิดีโอ

แชร์:

คำบรรยายส่วนใหญ่พร้อมภายใน 5 วินาที

คัดลอกด้วยคลิกเดียว125+ ภาษาค้นหาเนื้อหาข้ามไปยังไทม์สแตมป์

วาง YouTube URL

ใส่ลิงก์วิดีโอ YouTube ใดก็ได้เพื่อรับคำบรรยายฉบับเต็ม

คำบรรยายส่วนใหญ่พร้อมภายใน 5 วินาที

ติดตั้ง Chrome Extension ของเรา

ดึงคำบรรยายได้ทันทีโดยไม่ต้องออกจาก YouTube ติดตั้ง Chrome Extension เพื่อเข้าถึงคำบรรยายวิดีโอใดก็ได้ด้วยคลิกเดียวบนหน้าวิดีโอโดยตรง

เพิ่มลงใน Chrome — ฟรี

รองรับ YouTube, Coursera, Udemy และแพลตฟอร์มการเรียนรู้อื่นๆ อีกมากมาย

รับคำบรรยายทันที: แก้ไขโดเมนในแถบที่อยู่ของคุณได้เลย!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

คำบรรยาย YouTubeกำลังเตรียมผลลัพธ์ให้คุณ…

คำบรรยาย YouTube:AIE CODE 2025: AI Leadership ft Anthropic, OpenAI, McKinsey, Bloomberg, Google Deepmind, and Tenex

AutoDub

คำบรรยายวิดีโอ

Summary

Core Theme

วาง YouTube URL

แบบฟอร์มดึงคำบรรยาย

ติดตั้ง Chrome Extension ของเรา

รับคำบรรยายทันที: แก้ไขโดเมนในแถบที่อยู่ของคุณได้เลย!

คำบรรยาย YouTube:
AIE CODE 2025: AI Leadership ft Anthropic, OpenAI, McKinsey, Bloomberg, Google Deepmind, and Tenex