YouTube Transcript:
Building Frontier AI Products with Fin x Cognition x Harvey AI x Perplexity

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

[Music]

that I had moved on

that I had moved on. [Music]

a a [Music]

[Music] Oh,

Oh, [Music]

[Music]

I had moved on. Our head.

Okay, welcome everyone. Thank you so

much for joining us online and in person

here in San Francisco. You want to come

start taking your seats? We'll get

started. Uh we'll just get started in a

minute. Um we've got a great evening for

you tonight. We've got talks and

technical presentations and a panel

discussion with leaders from Harvey,

Cognition, Perplexity, and Finn. all on

the theme of building great frontier AI

products. Um, I'm Jordan Neil, SVP of

engineering from Intercom. I'm going to

be your MC tonight. But to get us

started, I'm going to intro uh

Thank you, Jordan. And good evening to

everyone. Welcome to all of you here in

SF and to the thousands on the live

stream on the live stream as well. I'm

Dez uh co-founder of Intercom. We're the

company behind Finn. I'm guessing you

gathered that or I'm hoping you've got

that by now. Um we have a really fun

evening ahead. We've got pretty much the

entire uh leadership team of intercom,

CEO, CPO, CT, all the CES basically are

here along with me. Um we wanted to

gather a group of great companies and

great people to talk about frontier AI

products. What really what we want to

talk about is like what it means to be

working on the actual edge, like really

pushing things forward. This wave of AI

that we're in is still kind of quite

young and it's very fast moving.

Companies are blowing up in like the

good way and blowing up in the bad way,

too. Um, for us, we're betting really

hard on AI. Uh, it's basically the

future of our entire business. Finn at

its pretty young age already has about

over 6,000 like paying happy customers.

It resolves over a million conversations

a week. By all the data we have, it's

the highest performing agent that's out

there. And it's also the fastest growing

thing that any of us in the company have

ever worked on, ever by a massive

margin. So yeah, we're all locked in on

AI. We're all locked in on Finn. When we

look at the leading AI companies, of

which we'd humbly submit Finn into, what

we see is this like innovation at all of

the levels, right? In software, so many

of us are so used to this world where

like if you're working on something, it

shows up in the product as UI that you

can look at and point at and click and

play with. We're far less used to

talking about these like subterranean

improvements, these ways in which the

product gets a lot better, but you don't

see anything change. What happens is the

users realize after, you know, one

update or whatever, this thing's

working really well now. And those

improvements tend to be slightly more

invisible because they're happening at

the AI layer or any of the layers

beneath. That's where so much of the

magic of products like Finn, Harvey,

Devon, Perplexity, that's where it

happens. It's at this AI layer through

optimizations, through rearchitectures

or even deeper again at the actual model

layer itself. That's where groups like

our AI group of which is about you know

we've about 50 strong. There's a lot of

the folks here who are at doing the

poster sessions uh are part of it as

well. That's where they spend their

time, right down at all the levels,

finding all of the edges, all of the

ways to make a truly great AI product.

So, as I said, we wanted to pull

together a great group of people, a

great group of companies and a crowd to

basically talk about what it means to go

further, to go harder, and to go deeper

when you're building AI to really push

the envelope. That's what tonight's

about. And to kick us off with an

opening keynote, I'm really excited to

hand over to our chief AI officer, Mr.

Thanks very much Dez and thank you so

much everybody for coming here today. Uh

so my name is Fergle um said I kind of

head up AI at intercom and I'm here to

talk about creating value at the AI

layer and kind of share how we think

about this important topic because you

know AI product strategy is hard right

it's a very dynamic space the rules

change all the time you think you're

building something that's really

valuable and then something changes at a

layer below you and suddenly you have to

revisit all your your your your

assumptions and so you know you don't

want to spend time building things that

aren't valuable. And so, you know, we

really think about this idea that like,

hey, there's an AI layer. And you're

probably familiar with the idea of very

commonly, you know, there's an

application layer, right? There's the UI

and the UX of the the product people

use. And of course, there's the model

layer, the the LLMs deep down underneath

that tend to power things. But really,

there's a lot to do at the AI layer that

we spend a lot of time on, too. You

know, prompts, orchestration, logic,

context. And we really think it's

interesting to sort of look at different

products and kind of taxonomize them

using this lens. And you know, back when

chat GPT first came out two years ago,

there's a lot of talk about like thin

wrappers. And really the the first chat

GPT experience itself was kind of a thin

wrapper, but which I would say like the

application layer is pretty thin.

There's there's an AI layer, but it's

also thin. really you're very nakedly

talking to the model when you play with

that kind of chat GPT version one and of

course this has changed over time I

think now you know another way of

taxonomizing things or another set of

products is is sort of the AI enabled

application right and a good example of

that might be you know the early version

of co-pilot in in VS code for example

right big application that people have

been building for for decades but then

with like a pretty thin AI integration

and you know with a very small AI eye

layer above above the model there. And I

think you know as time has passed you

know two years in you're starting to see

more and more things that are are what

maybe we might call an AI native

application right where there's an

application layer and maybe it's a

little bit relatively smaller and

there's a big AI layer there's a lot of

engineering around the models and you

know Finn our product today uh would

probably fall into this category that

there's a very complex kind of rag layer

above the models that I'll talk about uh

in a minute and as AI I kind of has

matured. I think we're seeing more and

more products that look a little bit

like this. And and that's great, right?

If you're building a product, this is a

very nice clean narrative to be able to

tell. But I think something interesting

happened recently uh with clog code,

which some of you may have seen. In case

anyone hasn't seen clog code, it's sort

of a a very kind of command line

terminal-based um kind of application

that people used to write code. And it

was it was just quite interesting

because it was really the model there

kind of striking back, right? It really

looks like, you know, a a relatively

thin application and a relatively thin

AI there and the model doing a lot.

Looks a lot like the tin rappers of two

years ago. And this is kind of

interesting. And you know, claw code has

a sweet spot. It doesn't do everything,

but it's influential. Suddenly it

spawned many clones and it's quite quick

to clone because it's relatively thin at

the layers above the model. Um, and I

often think like what does someone like,

you know, Jet Brains who make idees for

like decades think when they see

something like this, right? A part of

their application surface has suddenly

been, you know, very quickly

commoditized and very quickly made thin

by something like this. I think it's

very dangerous for application companies

to to look at this sort of trend and if

this goes and so obviously we have a

thesis here, right? We think that like

you know there's generally a lot of work

to do above the model layer but we kind

of think that AI companies they can't

ignore the risk of the model layer sort

of coming up quite suddenly and you know

making some of their investment um

suddenly outdated.

So really we think that anyone building

deep AI applications you have to have a

plan to build durable value around the

models and at the AI layer. And we we'll

argue today that it's it's possible to

do this and share a little bit about

what we're investing in and why we think

this is this is something that companies

like us can deliver on. But first I want

to share a little bit about our context

and our history because it's it's kind

of important for for the direction we're

going. Uh so intercom we make a customer

support platform uh in inbox where you

know humans go and answer customer

support questions. And you know back in

2018 we started to build this product we

called resolution bot which was like you

know a previous generation I guess AI

chatbot. Um you know humans would have

to go and like settle up and configure

intents and then define like how it

would answer a question. We did a lot of

work to make this as seamless as

possible. Um you know back then our tech

stack was very uh you know BM25 meets

word tovec. This is like a slide from

way back when. H it's like a

handgineered sort of you know um

information retrieval uh wordtovec thing

um that actually wrote and you know this

used to be previous generation uh 2019

we're an early adopter of AI and I I

went and did some splunking through our

uh our GitHub for this and I think

there's one thing I want to highlight

here is um you know uh we were putting

neural networks in production we put

Muse in production pretty soon after it

came out and we really ended up building

sort of a proto

uh vector DB with like cosign similarity

and everything back then um to to kind

of to power our previous generation

bots. So we're kind of in this space and

we were well equipped when the world

changed with chat GPT um you know to to

start building Finn. So I quickly touch

on the story of Finn. So we kind of been

watching this space for a while. We this

is an internal memo I wrote about Google

Lambda which a lot of us seen at the

time and was sort of a I guess a

precursor to uh to to chat GPT and you

know we're kind of like looking at this

and watching and then suddenly chat GPT

came out and we moved very fast on it

and we built Finn which we considered

sort of a breakthrough AI agent. We

launched Finn on H

it was rag from the start and we had

already started investing in rag um I

guess from maybe January uh 2023.

um one of the first production rag

systems in customer experience and maybe

ever you know I'm not sure and we didn't

know it was called rag at the time and

really what we invested in in 23 to 24

we invested a ton in prompt engineering

right I think like a lot of people at

the time you had to do a lot of work

with the models of 2023 in order to to

get good results out of them you do a

lot of prompt engineering we invested a

ton in in our rag pipeline in sort of

like you

uh trying out different types of

retrieval strategies, different types of

chunking, things like that. And then we

did a whole lot of testing and

optimization. And over time, through

experimentation and gradual refinement,

we ended up with an architecture that

looked a little bit like this. This is

kind of the fin architecture from a few

months ago. Um I'm not going to talk you

through it all, don't worry. But uh but

uh it's just to show that like this is

where we got to optimizing AB testing

bit by bit um to build like quite a

complex industrial strength product with

a whole lot of different pieces. you

know, we were summarizing issues, uh,

doing custom retrieval, etc., etc. And

really at that point in that sort of

2023, 2024 timeline, you know, what we

were really were investing in was this

experimental culture, always AB test

everything in production. And it was

always deeply intuitive, deeply

unintuitive, um, whether a new thing we

were trying was actually going to

improve the product or degrade it. And

so we all had to like AB test

everything. And of course we built a

bunch of product features around Corfin

kind of helping the rest of the

intercore mortgage to do that. And uh

there's a lot of pain and suffering

there as we had a big SAS company and we

were sort of like learning how to build

AI with probabilistic systems and Molly

is going to give a little bit of talk of

like pitfalls there. But but overall it

you know we're proud of where we got to.

It worked out pretty well. This is

probably the the chart that I am most

proud of um in my time um at at intercom

building Finn which is um the chart of

Finn's resolution rate over time and

very weirdly it kind of grows almost in

a mur's law style way each month we're

like we've got a bunch of things we can

try and probably it won't work probably

we're asenting out in terms of the

quality but uh but each month we we get

about on average a percentage point of

of kind of end user defined resolution improvements

improvements

And um it's working pretty well. We've

got about $50 million of ARR at the

moment just from Finn and on on a good

trajectory if it holds to to 100 million

in a couple of quarters. So you know

Finn today, as I was saying, it really

has a a sort of a mature AI layer

powering it, right? There's the models,

but then there's a ton of stuff around

the models.

But how do we take that to the next

level? And this is really what I'd like

to talk about today, which is what we've

been doing over the last sort of six

months. And again, we're wary about the

model there coming up. We really want to

spend our time building durable

differentiation. It it's easier today to

build something competitive with Finn

than it was a year ago and certainly

than it was two years ago. So what do we

do? How do we take it to the next level?

One thing we spend a lot of time

thinking about is this quote from Alan

K, right? People who are serious about

software should make their own hardware.

It's quite an influential quote in

Apple. It's quite a cool quote.

How does this apply to the AI era?

Right. And you know, we spent a lot of

time thinking about this and I think as

Dez said earlier, we really have an

emerging thesis that you've kind of got

to go quite deep into the AI layer to

build the best products. And so I'm

going to talk about the kind of the

results of our deep investment here. And

again, you know, we still do of course

use LLMs from Frontier Labs, Antropic

for Fore Excellent model, OpenAI partner

for voice. But I'd like to tell you

about some of the work we'll be doing

ourselves because we think this is we

speculate. We don't know, right? We

don't have a crystal ball, but we think

this is a template for what a lot of AI

applications will do over time. It's

going to talk at a high level about some

of the things we've been doing recently.

Um, if you want to get more technical

details, we're releasing a whole lot of

blog posts today where we're really

sharing a lot of technical information

about the work we've been doing over the

last six months. And of course, we've

got poster presentations here, too. So

one of the first things we set out to do

is is to build a custom reranker. I'm

going to share a little bit about our

journey doing that. So what's a

reranker? Um you know Finn as a a rag

application um you know goes and tries

to answer a question using a whole bunch

of chunked knowledge uh typically from

your help center or your other sources

of documents. And we've discovered over

time that the performance of an LLM and

actually answering a question is is

super sensitive to the exact set of

documents that uh that we retrieve. and

even weirdly the order in which we

retrieve them and present them. And so

building a reranker can be really

impactful on the performance of the

system overall. And so you know we've

ended up building our own customary

ranker over the last six months using

modern birth as a building block and uh

training it on a whole bunch of data

from Finn. And I guess one thing we're

sharing that is, you know, somewhat

surprised us is that for our use case um

our own reanker here has outperformed

the previous best-in-class ranker we're

using which is coher rank 3.5 and

improved answer quality and it reduced

our costs by a lot and this is going to

be a consistent theme this going to talk

a lot about you know how build moving to

custom models has improved performance

but also decreased cost a lot and

increased efficiency decreased latency.

We sort of think that this is an

emerging pattern that probably a lot of

people are going to do. Um some back

test results here really quite

surprisingly large gains um from from

moving to uh to to to our own model uh

trained on our own data. Um we've also

uh built a custom retrieval model over

the last while. We call it finx

retriever. Um you know again this was

initially trained on 300,000 real user

queries each of which had a hard

resolution. And a hard resolution is

when someone affirmatively says yes

thank you Finn this has positively

answered my question and um you know our

ability to have you know high performing

AI application we think gives us you

know an advantage that's proving out in

ter when it comes to training our own

models here and um our retriever model

which was a fine tune of a snowflake

model that had a pretty good uh

performance cost uh envelope um has uh

has has performed very well has

outperformed the the previous

competitive retrieval models that we

were using and I I highlight this

because I think there's something

interesting here in this sort of uh

second um box which is that we compared

here we did an experiment where we said

like hey how good is the retrieval model

you know for the applications it's

trained on versus how good is it cross

application you know what's it learning

in terms of learning to be a retrieval

model for customer experience generally

and you Obviously within application

it's the best but it also does

generalize pretty well out of sample

across apps which was uh better better

than we expected. Um you know I guess

the theme here is that you know there's

still a lot of room uh for optimizing

and for improving by training on on very

broad um you know application specific

data. We've also built a custom issue

summary model that powers a part of Finn

called Finn CX summary. I think this is

a good example of the emerging

hypothesis of like small the small LM

hypothesis essentially that small

language models can sometimes perform

pretty well h where when trained for a

specific task Nvidia wrote a paper about

it recently that's been quite

influential. So you know within Finn one

thing we've always done is we have

summarized the end user's query before

going to do the rag thing because

sometimes end users ask very uh very

strange things in you know they they can

have a lot of noise and a lot of

weirdness before they actually ask a

question and exposing that raw to your

rag system it has not been as performant

for us as first like canonicalizing that

query and so we've always had an issue

summary layer and this has been our

attempt to to kind of build our own

model. We had we had an LLM typically

used GPT 4.1 for that as a kind of a

fast performant model. Um but we we want

to experiment with uh with improving it.

And one thing that's always been tricky

about issue summary is is if you say to

an LLM, hey, summarize this issue and

like there's no issue there, uh they

often will get confused. And so we've

always built sort of we've always had a

few shot approach to solving this

problem where we kind of say, "Hey, if

there's just a greeting or if it's just

a goodbye or if it's just like a

negative reaction, please don't

summarize the issue." We've kind of like

always prompted the LLM in in a few shot

way to improve its performance. But I

think as anyone who's run a production

LLM at scale for a while, this runs into

a problem. every kind of battle test at

LLM, you know, you fuse shot in this

way, you end up with many many fshot

examples and and you know, you start get

your latency goes up and your

performance goes down. And so this is an

area we experimented with kind of a

custom approach. And our sort of key

insight to this problem was to split the

task to first say built train a

classifier that was doing a good job at

figuring out if there actually was an

issue there or not. And then if that

classifier said, "Yeah, there is still

an issue here." then to uh to kind of

use a small LM to uh to to then

summarize the issue. And so uh you know

this is a kind of a schematic of the

classifier piece. Again modern births

been a great building block here um

trained on you know data from our

experience. And then we have uh Laura

fine-tuned um Quentry 14B uh which was

good enough to uh to kind of to equal in

our evaluations uh GPT 4.1 for this

summarization task and um and then you

know we get like a more performant model

but kind of more importantly um you know

only only slight increase in resolutions

here cost reduction but more importantly

improve the quality of our product um

moving something you don't do it lightly

but moving something from the LLM to a

propri system that you control does

enable you to to get more fine grain

control and more fine grain tuning of

it. Um, and I'm kind of I've got two

more to go and then I'm going to tie

this into an overall narrative and then

take questions. We built a custom

escalation detection model for Finn as

well. Um, escalation when when somebody

is getting fed up with Finn and they

want to talk to a human, it's just not

helping them. It's very important

delivering a high quality enduser

experience. But it's a difficult thing

to to build a machine learning model for

because our different customers provide

guidance in the form of free text input

that kind of defines a policy for when

FIN should escalate and when it

shouldn't. And um we really uh you know

we have a sometimes based on the

guidance the guidance will be like

definitely escalate if this happens.

Sometimes it'll be like definitely left

an answer and then sometimes there's a

gray area in between. And so we spent a

while working on this. We tried, you

know, a Gemma fine tune. We tried Quen

models of different sizes and that was

good, but um we were kind of keeping to

push to see if we could find a a smaller

better model. We actually ended up uh

training sort of again um a custom model

using encoder uh backbone model using

modern BERT as a building block and with

a multiclass classification on top of

it, multiple different uh classification

heads. And um this worked out really

well for us. In the end, we got a

resolution rate increase. Uh latency

decreased by about half a second. Cost

per resolution decreased by 3% and we

got finer grain control. And it's like,

you know, each one of these models is is

incremental. It's like 3% here, 5%

there. But it all adds up. You know,

when you do it at scale, h you start to

end up with an application that starts

to look a bit differentiated. And the

last thing I have to talk about today is

we also built a feedback model. Right?

Sometimes people feedback is tough,

right? If if you want to have a product

like Finn and it's dealing with real

users, it's not easy to to extract

feedback from that. So, you know, an

example here that can of build that

intuition is someone might say, Finn

might say, "Oh, yeah, to cancel your

subscription, you need to do X, Y, and

Z. Was that helpful?" And the kind of

thing real people say back is like,

"Yes, but apparently my email is also

wrong. What should I do next?" And this

is a really hard challenge for a machine

learning system. It's getting easier,

but traditionally it's hard. We ended up

building sort of a multitask

architecture to do that where we had

like three different classification

heads. One was for like feedback. Is

there no feedback? Is it positive and

negative? Another one, has the user got

a follow-on question? And a third

classification head was, you know, have

they ended the conversation or not? With

sort of a a shared modern birth layer.

And uh this worked really well for us.

we get once again got like a smaller

more efficient model h with really high

overall accuracy with several other kind

of initiatives in flight like this. So

that's sort of a a whistle stop tour of

some of our investments. Um but I guess

like why am I sharing this? There's a

thesis here that I want to kind of to to

to kind of to to deliver in our

experience. Now you can get really good

performance by replacing LLM calls with

more special purpose models. We still we

still use Antropic for our our our

hardest LLM task of like actually

synthesizing an answer to a question

from the rag content. But all the other

pieces of Finn like they all work

together to add up to a good product

experience. We've been able to improve

our business metrics, improve our

resolution rate, improve our margin

substantially, and then also get much

more fine grained control in terms of

like the quality product uh the quality

tradeoffs and metrics. That's one

takeaway. We're pretty happy with how

this has worked out. Data from a high

performing Frontier application has

turned out to be a very valuable

building block for us, more so than we

anticipated, right? And I guess that's

one thing we're sharing is like, hey,

um, our production data turned out to

make a relatively small investment to

get really massive returns. Like it

really surprised us that our model

turned out a reranker in particular that

was was better than cohhere. Um, and you

know, Coher is still a great model. Um,

but uh, certainly for Finn and even

cross customer, even out of sample on a

customer basis, um, it worked really

well. So we kind of think that there's

an emerging pattern here that we would

suggest to anyone with a deep AI

product. You know, start out build your

product expensively, get the quality

really great, stabilize the product, and

then go and optimize it. And there's a

lot of room for optimizing over time.

And this is how we believe in in trying

to add sustainable value um at the AI

layer. And that's really what we're

doing. And we're sharing a lot of this

information for the first time today

because in the past we have a habit of

building really good technology and like

not talking about it much and we're

trying to change that and talk about a

lot and so we have just published a

series of blogs if you want to get into

a lot of technical detail on each one of

these things I've talked about um

available at finalai research and

obviously we have technical

presentations here. We're really

thankful to such a great audience for

coming out and I would love to answer

any of your questions uh briefly. Thank you.

I think we have roving mics if anyone

wants to uh put up their hand and ask a

>> Yeah, thanks for the talk. Uh you

mentioned using log data for training

reinkers and retrieval models. Can you

talk a bit more about your experiments

with LLMs as teacher models? cuz for

things like relevance I assume they're

still very powerful and you can distill

down to modern bird.

>> Yeah. Um I I think LLMs as teacher

models can work obviously um you know

you have to pay careless attention to

the terms of service of the LLM that

you're using. Uh there are open LLMs

that are very powerful these days and

they can work very well as teacher

models and uh you know definitely there

are um yeah we think there's a lot to be

done there. We think that there's an

awful lot of things that people are

using in production where they have a

big heavy LLM and then that can be a

great way to get started but that uh you

know it's been surprising to us how well

the modern BERT style encoder decoder

models can work uh w with a teacher

model perhaps a large openweight teacher

model. So yeah I I think I that that's a

great direction. I'm very bullish in

that direction of investment. Yeah,

maybe one or two more. Um have a mic for

Hi, thank you for taking my question and

great presentation.

>> Thank you.

>> Uh I was curious how are you attributing

acquisition of new customers to the

advancements that you've made to Finn?

Um you know meaning have you gotten

customer feedback that the new um

improvements uh have led to expansion?

It could also be just a coincidence. I

was just curious.

>> Sure. No, absolutely. So, um, you know,

we compete a lot on the quality of our

product and we build based on a

successful resolution. So, we were very

early adopter of outcomebased pricing.

We build a dollar when Finn successfully

answers the question. And so, there is a

sense in which improvements in the core

products directly add to revenue for us.

But also we find that customers,

especially sophisticated customers, they

run Finn in head-to-head trials against

other competing products. And we really

encourage that. And the gold standard is

an AB test. Sometimes people do before

or after tests and sometimes you do an

AB test. And um so we really feel that

that's our differentiator and that's

something we compete very hard on.

That's really the reason why we we do

this and that that is the that that's

kind of the the single biggest thing

that we have deeply deeply invested in

um from a technology perspective and it

it does work for us. It does help us win

head-to-heads and convince customers to

come to Finn. We're very proud of that.

So yeah, and maybe last question here if

that's okay. This uh lady here in the uh

pale shirt.

>> Sweet. Thanks. um understand the value

of data as like a core asset to build

differentiation, but how do you think

about integrations and context when most

users outside of like core B2B likely

have an horizontal LLM up on their

window as like a split screen and how do

you build like that like platform OS,

>> right? So, I mean that is a hard

question. Um so you know

integrations and context obviously

there's a lot going on space is moving

really fast. Uh MCP is a huge big change

to the space to help people pull in

context from lots of different um you

know platforms. We we have a procedures

a tasks product. We spend a lot of time

helping people integrate Finn when they

use it with their other systems um you

know calling APIs and like being able to

kind of h handle complex queries like that.

that.

But look, I think your question is even

broader than that. It's a very evolving

field. Um there's a lot of different

players and there's a lot of people

trying to solve this problem of like I

have an AI system, but to make it really

valuable, I need to integrate it with

all these other business systems and

there's a lot of hard slog there.

There's a lot of hard work to do that.

Uh we have like teams that will partner

with a customer and try and help them do

that integration, but it's it's a lot of

leg work. It's still hard to do. MCP is

is really changing it. Yeah.

differentiation is that a driver

>> um is it a driver of differentiation for

us? Um it's definitely something we're

good at and we invest heavily in. Um I

think everybody is running around trying

to connect the AI systems to the other

systems of business. Um so I I think

it's it's a valuable thing and if you

connect it to an application that does a

great job um it'll do a good job. I I'm

not sure if it's a differentiator or

not. Okay, I better go at that. Um I

want to hand you over to uh Brett Chen

who has very kindly agreed to come and

talk to us today. A tech lead, a member

of technical staff for Perplexity. He's

going to give a keynote on scaling

intelligence. And Brett has uh done some

great work in the past, including

writing a book on lifelong machine

learning. A great chat with him before

Okay. Um hello everyone. I'm Brett. Uh

today I'm I'm going to share our

firsthand experience of building AI

agents and models to serve millions of

users and hundreds of millions of

queries at capacity. And I want to pass

on some lessons lessons and takeaways of

learned in the past and help you prevent

the similar pitfalls.

So yeah, high level I suppose some

people probably heard about pacity and

maybe use peracity but for those who

haven't I'll talk about a little bit

about proacity as a company and then

I'll showcase some of our recent agenda

products and then I'll go into the agent

AI agents and then wrap up talking about

some of the post training models.

So capacity it started three years ago

actually last month we just had the

three years anniversary. uh right now

it's valued at 18 18 billion. We have

about 300 employees globally and we have

office headquarter here in San Francisco

just like three blocks away and plato,

New York City, uh Austin and some other places.

places.

Um so this is our uh data three months

ago public data. Um and our recent data

has been even stronger. But let's talk

about yeah the data in May we have 780

million uh queries and then it's 20%

monthly growth and we have 22 million

active users um and uh AR is 100 million

and more recently our app has been

ranked as the number three in the US app

store for productivity productivity and

our end of year query target goal is 1

billion queries per week.

And the reason I said that the the data

has been uh even stronger recently

because our uh recently launched uh

browser agent uh sorry browser called uh

comet. Um so this is AI native browser

that can actually help you do a lot of

things like a summarize research

automated stuff. Um it can have the

single in action with you as user and do

a lot of compare stuff right like book

meetings like compare products across

different tabs and research topics

without you going switching back and

forth between different apps and it has

your contest. It has your section and

then it can adapt to your workflow and

preferences and it also store data

locally and keep your data private and

secure. And another very interesting one

is we has the voice integration and we

really believe that's the future in

interaction of browser because nowadays

we mostly use the keyboard and mouse

because that's just how we click the

website or type stuff right but with AI

assistant like that we don't actually

need to type to AI system we can just

talk to it just like you talk to another

person. So um yeah so let's watch a

Pull up the clip of Jensen demoing

Perplexity Labs.

I've pulled up a YouTube video showing

Jensen demoing Perplexity Labs at GTC

Paris. It should be at that moment to

formulate what is now a Gentic AI. Let's

take a look at one example. Let me show

[Laughter] [Music]

>> Yeah. By the way, something I'm always

amazed our marketing team is great at

building videos. Uh uh okay. So another

another product uh also reason one

agenda product is called deep research

and labs and it's focusing on those long

running tasks that would take a human

beings hours if not days and we can

finish them for you in minutes. uh it

can deliver in-depth and cited analysis

report and aggregate different web

sources, documents and uh remove

duplicate as well as resolve conflict

and it will summarize that and then

present you for you to make the uh best

decisions and then you also can produce

the reports in different formats so that

you can easily share with others. So for

example you can uh we have a output and

we can build some dashboard or mini apps

that you can use uh yourself uh you can

create a slice and then you can also

export into different format of

documents and again iterate them. Some

other products include like different

verticals like capacity finance where

you can look into the stock market and

economic data and really tailor that to

the um to the market or the u events

that you care about. Similarly for the

sports like you can go and uh search

particular leagues, teams and players.

And then we also have like our discover

product which is a feeder system that uh

curate and provide articles that tailor

to your interest and your needs. Um and

there are many many other products. The

point that the reason I want to bring up

this product is that is want to give you

a sense of prop capacity scale like the

number of different products that we're

building and that connects to the next

point of AI agent because it poses a

unique challenge to AI agent that we

want to build these centralized AI

agents that actually work for all these

products and meets all these products

needs. So okay let's go into the meat

and potatoes of this talk. AI agents. Um

so I'm gonna first start about the

production level AI agents like what

that means right like what kind of

considerations we have there and then

I'm going to talk about the prom I talk

about the eval personalization and then

Okay. So like this is uh just as

mentioned earlier like right like um in

early days of LM like um it's just

simple application talking to LM right

like single step integrations like

things really simple and then we have AI

agent which is the layer between apps

and models and the AI agents are like

sort of foundation of this layer right

doing the orchestration all kind of

workflow and at capacity we are not just

talking about AI agents again not just a

communication between one model and one

application actually we have access to

dozens of models from different

providers and then we use those models

build this AI agent and empower many

more products and applications so so

here we're both talking about external

models as well as we have our in-house

models so that so that's um so that's

the kind of component I'm going to talk

about and that's where I lead my team

building these uh AI agent architecture

and workflow to work for different cases.

cases.

So yeah, so let me um first of all talk

about like how do why what is like AI

right? What kind of problem it is. So

just to abstract a little bit like

instead of talking about certain

components or some details I want to

talk about like think want to frame it

as it's a multi-dimensional optimization

problem and the the different dimension

different objective and constraints

here. First one is quality and that's

what people talk about all the time,

right? Especially when the new model

comes out, right? It's like I'm I'm best

at doing this task, I'm doing that. Like

my score is X% higher than the other

ones. Um and obviously there are many

ways to evaluate it when it comes to

like accuracy when it comes to

relevance, coherence, hallucination. Um

so this is something I think most people

are familiar with. The second one is

latency. And this is something I I think

again many of you are care about like we

want the model to be fast. So we don't

want a user to wait forever right and actually

actually

intensely when I talk with people within

company as well as outside company these

are only two usually two dimensions they

think about and that's it right they

just say oh I want a high quality model

and I want a model to be fast and I

always like do you have other consider

they say no okay but actually when it

comes to production

um potassium grade agents there there

are other key factors

So the third one here is actually

reliability and availability. So this is

actually the key difference in my

opinion that would distingu distinguish

your product from others. Like from my

experience it's very easy to build a

demo or something that can achieve 80 I

would say 50 60% of success rate but

it's much harder to get to 90% 95% or

even 99%. Right? And that's where you go

from an average product to a great

product. And that's reliability. So if

there's just one message you can take

away from this talk is reliability is

what distinguish your product and the

reality here again when it comes to like

the air rate and the success rate the

like uptown and all kind of things and

when it comes to the different models is

even challenging because then we need to

balance do the all kind of load

balancing. The last one and obviously

it's still also top of mind is the cost

right we want the best model we want to

like um best quality but still they come

with a cost so there's a balance here

and so for each product we need to care

about these and there are obviously

other things as well like security all

kind of other things but these are

usually the top four that we think about

and at capacity we also need to think

about these across different

applications because it's not just one

product like we cannot not just feed uh

one model for all products, right? We

need to load balance. We need to figure

out what models works best for the product.

product.

Um okay, so let me go into some of the

touch base on some specific uh points.

Uh just want to again share our

experience and I hope you hope you can

learn something and then avoid the

porce. So I'm going to start with prompt

engineering and you may think hey Brad

like it's 2025 right why are you still

about prompting right it's it's

something that happened early days you

know at the beginning of AM but in fact

believe it or not like even at capacity

as we are doing AI all the like from the

very beginning prompting is some

prompting is something that we still

spend a lot of time and actually require

us to refactor and redesign our system

all the time it's because this is also

a field that it's moving fast. So when

it comes to prompting again conception

is very simple like there are just some

messages they are like usually different

three types system user and system right

like you put the information into one of

a type and put them together make a call

okay but when you actually again comes

to the production level there are a lot

of considerations like for example

single pro versus multiple prompts what

does it mean like in in the previously

like we had this mindset that oh we had

a new product let's create a prompt we

have another model let's create a prompt

right why not like but then we got a

expend an exponential number of prompts

and then no one can manage it like let's

say I want to update one one prompt I

don't know if I need to update others

and then these prompts all look very

different from each other so then we

started doing all the refering like make

sure we have modules we make sure we

share different templates right like

people just cannot just create a random

prompts for their products like we have

on something from some guidance there

and then there are interesting question

about okay who actually own the product

right when it comes to different teams

so it be the product team who actually

know the product better or should it be

the like say the AI team who actually

know the prompting better but may not

know the uh products as well as product

teams so again it's a balance it's it's

some sort of um middle ground but again

that's something that uh it come when it

comes different teams of product. It's

it's very tricky to figure out. Um

another thing contest engineering. I

think some people asked about cont

before like some people say contacts

everything in some sense I agree like

contest is what makes your quality goes

to the next level but then when it comes

to what contest like what kind of cont

you want to put do you want to just put

as much as you can um and that obviously

come with its side effect and that

connects to the next one with prime

caching and this is actually a pretty

big one uh like this is one of reasons

that we did a lot of redes design of our

system because previously we just didn't

pay attention to it and we like not

following the best practices and like

have people just inject uh inject fields

into the prompt as they want and we just

feel like okay that breaks the prompting

and we just leave the free money on the

table. So it's something that uh so one

rule of thumb here is if you're working

on multi time uh agents um you try to

get your pushing u cash rate to up to

80% or more but if you like constant see

the number the rate going be down below

50% that's something you should take a

look um and then there are also eval

driven one right like um again the the

things we did before was uh the product

engineer just um received a bug and did

some vi and come up with some queries

and then they they change the problem.

They did what we call the vibe checking

and if things look good, okay, we ship

it. Um that that works sometimes but

often times it didn't because as as

individual persons change a prompt like

no one actually manages and make sure

this promise still works for all the cases

cases

and so I'm going to briefly touch base

on the eval and like when it comes to

answer I think again this I'm mostly

talk about the element judge eval is a

different piece so I will skip them

there. So when it comes to the answer

like things I mentioned before but

what's interesting right now is it's not

just answer it's not just text right

we're come we're talking about more

advanced output like mini apps or slice

right the slice you don't just have

content you have the images you have the

flow right that how how do you make sure

that makes sense and when it comes to

the things like browser you have actions

right you want to click certain things

you want to type in certain things you

want to like book certain things how do

you know it actually works or not so

that's some very interesting ones. And

then format and styles. That's another

interesting one that people just, you

know, some people prefer paragraphs,

some people prefer bullet list, some

people just want a short answers, some

people want a long answers. So how do we

figure out and that connects to the

personalization and so personalization

eval is definitely another green field

opportunity. So let me also briefly talk

about personalization memory. Um so app

capacity we really believe

personalization is what makes your AI

product stand out because that's what

user feel like oh this product this AI

actually understand me they can actually

solve my needs. So we think the

personation memory as the first first

class season in the AI agent and what

that means is that we do a we do a lot

of work into figuring out what exactly

is should be store as memory for users

and that include like short-term and

long-term and that includes how can we

do actually real time update and this is

actually a very interesting one that we

build an entire infrastructure just to

make sure that if you tell me I like I

like reading books and you ask me right

away what do I like I I will tell you

right away you like reading books and

while this sounds simple again when it

comes to LM things are slow things are

brittle so you actually needs a very um

sophisticated infrastructure to enable it

um and then on the product side

obviously want to manage the uh or have

the transparent memory management as

well as privacy and sensitive sensitive

information certain things we don't want

to store the memory.

Okay. So, um let's also uh briefly talk

about the um MCP and the tools. So, so

this is again a fairly new field, right?

Like MCP is sort of started gaining I

would say industry traction probably

early this year, right? So, it's still

new a lot of ideas out there. So, I want

to share like what what we have been um

trying. So one thing we found is that

instead of having universal tools MCP

meaning like you just feed as many MCP

to the model as possible that just

doesn't work like you need to figure out

what are some high impact ones what are

the ones that actually make a difference

right I think probably people say search

right that's one of them and maybe like

coding so these are some common ones but

what what to your product what actually

are needed for what what kind of MCB are

needed and then also make sure reliability

reliability

um some always experience experience as

that these tools and MCBS are not

reliable. uh many of them are not

reliable because it's so new right

people are just rushed to build them and

then they have a lot of limitations and

how do we actually internally have been

also thinking about how do we build an

ecosystem right as there are more and

more MCP out there how do we actually

figure out how to integrate them how to

leverage them how to figure out that

given a particular user request what are

the best MCPS to use

and and another interesting one related

to MCP related to model is how to

actually manage the state. So because as

the task get longer and longer, we want

to with the browser want to spend

minutes and even later like an hour to

help you achieve something. The longer

it takes, the more problems you're going

to get, right? The the model may be

broken. The things the things can really

uh get worse. So how do we actually make

sure that we can recover backtrack if

things don't go our way?

Okay. So let me uh wrap up quickly with

some post training uh stuff. So I'm talk

about uh two models uh two uh items the

system and the reinforcement learning.

So the different challenge for we have

been facing when it comes to post

training. uh the one of them is just

scale right as as the scale goes by

model gets more powerful but that comes

with all kind of uh challenges

especially when it comes to the

infrastructure so we have been spending

a lot of time time building these

internally we call a lotus um learning

optimization and and tuning system it's

all in inhouse posting model that

support a large scale and can really

simple to understand the hack right like

researchers coming and like try

different configs different algorithms

different models and and quickly come up

with layout results and then we also

enable different state algorithms. Um so

so on the on the side this is architect

arch ar ar ar ar ar ar ar ar ar ar ar ar

ar ar ar ar ar ar ar ar architecture. So

happy to discuss that offline. So

another quick thing another quick

challenge is again when you come to AI

agent it's not going from the previously

chatbased right or just question answer

we want to do two calls when do MCP want

to do things uh beyond just uh text so

that comes with the different challenges

it's very noisy uh we don't know when to

stop right if user asks you to to do the

things to do ask AI to do things that

may take hours should we actually do it

or should we actually like help user

manage expectation. So we have been

training the the our own uh model

through the reinforcement link of our

own agent and environment and for

example in this case we have our browser

as the environment that uh feed the

takes the user actions and then and use

that to train our own uh two call

models. Um again happy to chat more

about it offline. Um yeah. Okay. With

that said, that's the end of my talk.

>> Thank you very much, Brett. We are big

fans of perplexity at intercom. Uh

comment went viral inside intercom. We

all compare all showing each other what

we did with it. um between Fergle and

Brad talk I think like preaching to the

choir here but the delta between a vibe

coded demo in a weekend and a at scale

performant system is it's orders of

magnitude of effort uh and there

historically not enough uh sharing of

the knowledge that we're all learning

about this. So I I love an event like

this where we're getting deep into the

details and exposing it to people to

learn from each other. We're going to

take a short break now. uh grab a drink,

stretch your legs, uh and we'll come

back again in 10 minutes uh for our last

session. Thanks. [Music]

[Music] Oh,

Oh, [Music]

[Music]

that I [Music]

[Music]

Here we go. [Music]

that I hadn't done

that I had [Music]

[Music] a

a [Music]

[Music] Ah,

Ah, a

[Music] that I

that I [Music]

I had moved on that I had moved on.

that I had moved on. [Music]

a a

a [Music]

[Music] that I have.

heat. [Music]

I had moved on. [Music]

a a

a [Music]

[Music] I love

[Music] that I have

that I have done.

done. [Music]

[Music] heat.

heat. [Music]

I [Music]

[Music] had moved on.

I had moved on. [Music]

[Music] Ah,

Ah, [Music]

duh. [Music]

[Music] that I have

that I had done. [Music]

a [Music]

[Music] a

[Music] D.

[Music] that I have.

Heat. Heat. [Music]

that I had done. [Music]

a [Music]

[Music] a

a [Music]

[Music] that I have

that I have done.

done. [Music]

Okay, welcome back. Thanks everybody. The technical papers will be open at the

The technical papers will be open at the end of the event as well. Hope you got

end of the event as well. Hope you got to enjoy the break. Uh hope you're

to enjoy the break. Uh hope you're enjoying the event so far. We have two

enjoying the event so far. We have two sessions left. The first is Molly Mahar

sessions left. The first is Molly Mahar from Finn is going to talk to us about

from Finn is going to talk to us about some of the lessons that we we have

some of the lessons that we we have learned, the hard truths we've learned

learned, the hard truths we've learned building AI products. And then we're

building AI products. And then we're going to have a panel discussion uh with

going to have a panel discussion uh with amazing leaders from cognition and

amazing leaders from cognition and Harvey and Finn talking about their

Harvey and Finn talking about their their own lessons. Um so please put your

their own lessons. Um so please put your hands together and welcome Molly to the

hands together and welcome Molly to the stage.

So, so far tonight you've been hearing about the technical challenges of

about the technical challenges of building AI. Um, I want to talk from a

building AI. Um, I want to talk from a different angle. I want to talk a bit

different angle. I want to talk a bit about the people and the org challenges

about the people and the org challenges of building AI products. So, as Fergle

of building AI products. So, as Fergle mentioned, intercom has been around for

mentioned, intercom has been around for a while. Um, so we have habits, right?

a while. Um, so we have habits, right? We have processes. And two and a half

We have processes. And two and a half years ago, we had to become an AI

years ago, we had to become an AI company. So we had to redesign, rethink

company. So we had to redesign, rethink how we design, how we ship, how we

how we design, how we ship, how we organize ourselves, right? So that that

organize ourselves, right? So that that pain of cultural change that that Ferggo

pain of cultural change that that Ferggo mentioned is the process of us like

mentioned is the process of us like doing things poorly, failing, picking

doing things poorly, failing, picking ourselves up and doing it again and

ourselves up and doing it again and again and again, right? Because a

again and again, right? Because a company is just a group of people with

company is just a group of people with their own habits, with their own

their own habits, with their own incentives, with their own expectations.

incentives, with their own expectations. And so when you're pivoting into AI, you

And so when you're pivoting into AI, you are asking all of these people to change

are asking all of these people to change the way that they work. And I don't know

the way that they work. And I don't know about you all, but like I find it very

about you all, but like I find it very hard to change other people, right?

hard to change other people, right? So these people challenges can sneak up

So these people challenges can sneak up on you if you're not prepared. And so

on you if you're not prepared. And so tonight, I wanted to share um five

tonight, I wanted to share um five painful truths that that I've

painful truths that that I've experienced at least working on AI

experienced at least working on AI products with the hope that they're at

products with the hope that they're at least on your radar um if you're

least on your radar um if you're building things uh might make your lives

building things uh might make your lives a little bit easier. So, let's get to

a little bit easier. So, let's get to it. Um, my truth number one, demos are

it. Um, my truth number one, demos are dangerous. Product orgs love to like

dangerous. Product orgs love to like share early, share often. But when

share early, share often. But when you're doing AI, like that gets really

you're doing AI, like that gets really risky. Okay, a shiny demo, it hides

risky. Okay, a shiny demo, it hides brittleleness, it hides hallucinations,

brittleleness, it hides hallucinations, it hides integration gaps behind it,

it hides integration gaps behind it, right? When you're demoing something,

right? When you're demoing something, you're making a promise about the

you're making a promise about the quality of what you're building. Um, and

quality of what you're building. Um, and if you demo too early, you get like

if you demo too early, you get like product leaders who set like marketing

product leaders who set like marketing launches and things, right, before

launches and things, right, before anything's done. You get teams aligning

anything's done. You get teams aligning around this thing that you haven't even

around this thing that you haven't even built, right? They're prematurely like

built, right? They're prematurely like rationalizing it into the product. And

rationalizing it into the product. And so, we've seen that at intercom, right?

so, we've seen that at intercom, right? We've seen product teams outside of the

We've seen product teams outside of the AI group like they have some small

AI group like they have some small little AI feature that's kind of a thin

little AI feature that's kind of a thin wrapper. So, they're like, "Yeah, this

wrapper. So, they're like, "Yeah, this will be easy to build ourselves." They

will be easy to build ourselves." They do a happy path demo. People get really

do a happy path demo. People get really excited, they get buy in, then they

excited, they get buy in, then they start to build it, right? And it all

start to build it, right? And it all kind of collapses in on itself as they

kind of collapses in on itself as they meet the unhappy path and they need our

meet the unhappy path and they need our help. So, um, here's some ways that we

help. So, um, here's some ways that we have figured out how to deal with this

have figured out how to deal with this situation. So, we, um, AI group, we

situation. So, we, um, AI group, we provide like advice and testing

provide like advice and testing resources to other product teams so we

resources to other product teams so we can get them set up for success. Uh, we

can get them set up for success. Uh, we keep a lot of projects like secret.

keep a lot of projects like secret. Sometimes we just keep them on the down

Sometimes we just keep them on the down low until we're ready. We feel like

low until we're ready. We feel like they're good enough. Um, and when we do

they're good enough. Um, and when we do finally think that they're good enough

finally think that they're good enough to demo more widely, we like hedge those

to demo more widely, we like hedge those demos, right? Like this doesn't work,

demos, right? Like this doesn't work, this is unstable, like here's where we

this is unstable, like here's where we are. Here's how far we have to go,

are. Here's how far we have to go, right? So, we did that with like the

right? So, we did that with like the Finn alpha. We're like working in secret

Finn alpha. We're like working in secret and then when we finally demo, we're

and then when we finally demo, we're just very clear about uh what's risky

just very clear about uh what's risky and what's still unknown. So the

and what's still unknown. So the takeaway there I think for you all is

takeaway there I think for you all is demo only what you're willing to be

demo only what you're willing to be accountable for and be really clear

accountable for and be really clear what's risky and what's unstable because

what's risky and what's unstable because as soon as you show something shiny that

as soon as you show something shiny that polish it communicates a stability that

polish it communicates a stability that your product does not have yet. Um that

your product does not have yet. Um that takes us to truth number two. Polish is

takes us to truth number two. Polish is a trap. Uh do you hear that much from a

a trap. Uh do you hear that much from a designer? Um there's a normal tension in

designer? Um there's a normal tension in product teams like do we ship fast? Do

product teams like do we ship fast? Do we ship high quality? But when you're

we ship high quality? But when you're working in AI, you have this new element

working in AI, you have this new element which is is this thing even feasible to

which is is this thing even feasible to build at all. Right? So you've got

build at all. Right? So you've got designers who really want to polish

designers who really want to polish something. You've got product leaders

something. You've got product leaders who want like brand consistency and then

who want like brand consistency and then you have ML teams who like need to get

you have ML teams who like need to get um something into users hands like fast,

um something into users hands like fast, right? So balancing that is really hard.

right? So balancing that is really hard. Um intercom has always had this strong

Um intercom has always had this strong critique um and feedback culture. uh we

critique um and feedback culture. uh we have the concept of like curious minor

have the concept of like curious minor major feedback that we give. We also

major feedback that we give. We also disagree and commit a lot. But even

disagree and commit a lot. But even those um like healthy processes have not

those um like healthy processes have not stopped us from falling into this like

stopped us from falling into this like death spiral, I'll call it, of like

death spiral, I'll call it, of like you're working on design for something

you're working on design for something and your design is like shaping what the

and your design is like shaping what the output of the model needs to be. And so

output of the model needs to be. And so like if you're revving on design too

like if you're revving on design too much, you're slowing down the process of

much, you're slowing down the process of the model and then you're not actually

the model and then you're not actually like able to make it robust and you

like able to make it robust and you can't design in response to that and

can't design in response to that and goes around and around and around. you

goes around and around and around. you get stuck and it's super painful and

get stuck and it's super painful and super frustrating and so how do we try

super frustrating and so how do we try to handle that right um we've kept the

to handle that right um we've kept the feedback culture like that part's

feedback culture like that part's actually great the transparency and the

actually great the transparency and the clarity that we have with between teams

clarity that we have with between teams um kind of the new thing we have is my

um kind of the new thing we have is my my my role here the AI designer role and

my my role here the AI designer role and I sit between like we sit between design

I sit between like we sit between design team and the ML team and we are like

team and the ML team and we are like this bridge right so we join a project

this bridge right so we join a project from day on um we are ensuring the

from day on um we are ensuring the system quality from like a UX point of

system quality from like a UX point of view and we're balancing that with

view and we're balancing that with polish. So we are actually um like in

polish. So we are actually um like in the weeds with the scientists as an

the weeds with the scientists as an example when we were building the Finn

example when we were building the Finn alpha um I was working on I was working

alpha um I was working on I was working on that and felt like there were gaps in

on that and felt like there were gaps in the quality of Finn's answers and we

the quality of Finn's answers and we were like really close to going to

were like really close to going to launch and I'm like these are not quite

launch and I'm like these are not quite good enough. So um so I went in and like

good enough. So um so I went in and like started writing my own prompt like doing

started writing my own prompt like doing prompt engineering writing my own

prompt engineering writing my own variations doing offline evals and

variations doing offline evals and testing and like getting a a sense of

testing and like getting a a sense of like how does this model work and what

like how does this model work and what are the limits to this right coming up

are the limits to this right coming up with like a good proposal to make to

with like a good proposal to make to Fergle and to the scientists and

Fergle and to the scientists and convincing them that like my version is

convincing them that like my version is actually better it's a better experience

actually better it's a better experience for the users and then like handing that

for the users and then like handing that off to scientists to make it robust

off to scientists to make it robust right and so that's what ended up

right and so that's what ended up launching

launching Um so that's kind of how we act as a

Um so that's kind of how we act as a bridge and helps keep this like instinct

bridge and helps keep this like instinct to polish like in check. Uh so the

to polish like in check. Uh so the takeaway there I think is just resist

takeaway there I think is just resist that urge to polish before feasibility

that urge to polish before feasibility and value are actually proven right

and value are actually proven right because debates about value about um

because debates about value about um polish. They sound like they're about

polish. They sound like they're about craft, but they're about focus. Um, in

craft, but they're about focus. Um, in AI, like your focus like has to shift

AI, like your focus like has to shift overnight sometimes, right? You there's

overnight sometimes, right? You there's a breakthrough, there's a dead end. Um,

a breakthrough, there's a dead end. Um, suddenly your road map is like out the

suddenly your road map is like out the window, right? That's truth number

window, right? That's truth number three. Road maps, they will fail you.

three. Road maps, they will fail you. Uh, static any static plans you have are

Uh, static any static plans you have are just going to collapse when the models

just going to collapse when the models surprise you. Um, like it just does not

surprise you. Um, like it just does not work. You have to be rep prioritizing

work. You have to be rep prioritizing all the time. Intercom used to work in

all the time. Intercom used to work in these six-w week product cycles. So they

these six-w week product cycles. So they had I I saw them I like I've like seen

had I I saw them I like I've like seen these nice schedules that the teams used

these nice schedules that the teams used to have each week they knew exactly what

to have each week they knew exactly what they were working on for the next six

they were working on for the next six weeks. They had cross teamam alignment.

weeks. They had cross teamam alignment. Um that is like no more right that's

Um that is like no more right that's totally gone. Um instead what we do now

totally gone. Um instead what we do now we have we have these like flexible

we have we have these like flexible workstream model where like people and

workstream model where like people and tasks like can get reallocated on demand

tasks like can get reallocated on demand right so the shape can always be

right so the shape can always be shifting and we can be really responsive

shifting and we can be really responsive to the needs of like any project at any

to the needs of like any project at any time. Um so we do that as we're doing

time. Um so we do that as we're doing weekly planning we ask uh are there any

weekly planning we ask uh are there any big bets that we're not making that we

big bets that we're not making that we need to make right now. Um, we also ask,

need to make right now. Um, we also ask, okay, what are the items that we really

okay, what are the items that we really need to derisk right now the most this

need to derisk right now the most this week? So, they're kind of we're kind of

week? So, they're kind of we're kind of like a multi-armed bandit, right? We're

like a multi-armed bandit, right? We're exploring big bets that we haven't made

exploring big bets that we haven't made and we're exploiting the things that we

and we're exploiting the things that we know have value that we need to to build

know have value that we need to to build deeper on. And that rhythm, it feels

deeper on. And that rhythm, it feels natural to um like ML folks, but it

natural to um like ML folks, but it feels like total chaos to product teams

feels like total chaos to product teams who are used to working in these six

who are used to working in these six week cycles, right? So there's hidden

week cycles, right? So there's hidden costs to having to deal um having to

costs to having to deal um having to work this way. You have to deal there's

work this way. You have to deal there's like relationship management, right?

like relationship management, right? Because people are always shifting

Because people are always shifting around. You have to like gain people's

around. You have to like gain people's trust. You're always renegotiating

trust. You're always renegotiating everything. Like that's tough, right?

everything. Like that's tough, right? That's hard for people to do over and

That's hard for people to do over and over and over again. Uh but ultimately

over and over again. Uh but ultimately survival is about this ruthless constant

survival is about this ruthless constant rep prioritization.

rep prioritization. Um so you just have to kind of deal with

Um so you just have to kind of deal with it. Um because customers, execs, other

it. Um because customers, execs, other teams, they all want something from you,

teams, they all want something from you, right? Um and so that's why no is

right? Um and so that's why no is non-negotiable. Um no is necessary.

non-negotiable. Um no is necessary. That's my fourth truth of the night. Um

That's my fourth truth of the night. Um customers are great. I love customers.

customers are great. I love customers. They have a lot of expectations. They

They have a lot of expectations. They have a lot of requests, right? How do

have a lot of requests, right? How do you choose what to build for them? Um we

you choose what to build for them? Um we are lucky. we get a lot of good feedback

are lucky. we get a lot of good feedback that's grounded in like real workflows.

that's grounded in like real workflows. Um, when we get negative feedback like

Um, when we get negative feedback like this will not work for me, that's great,

this will not work for me, that's great, right? We can really trust trust that.

right? We can really trust trust that. But when we get um like positive like I

But when we get um like positive like I think this would be really cool, it's a

think this would be really cool, it's a lot harder to know whether that's a real

lot harder to know whether that's a real need that they have or whether they just

need that they have or whether they just saw something like shiny on someone

saw something like shiny on someone else's demo. Um, and so like parsing out

else's demo. Um, and so like parsing out like what you should actually work on

like what you should actually work on there can be very very hard and it's

there can be very very hard and it's also very scary. Um, it's scary as a

also very scary. Um, it's scary as a person when you're you're like no to a

person when you're you're like no to a customer, right? And they're threatening

customer, right? And they're threatening to churn or you say no to your execs who

to churn or you say no to your execs who like want something specific. Um, or you

like want something specific. Um, or you say no to like five other teams who want

say no to like five other teams who want something from you and you just feel

something from you and you just feel like a dirt bag kind of to say no all

like a dirt bag kind of to say no all the time. We just make Fergle say it all

the time. We just make Fergle say it all the time. So that's like easier for me.

the time. So that's like easier for me. Um but saying no lets you work on the

Um but saying no lets you work on the bets that you are making, right? So

bets that you are making, right? So what do we what do we do to try to like

what do we what do we do to try to like do this? Well, um we use usage data and

do this? Well, um we use usage data and honest feedback to try to separate out

honest feedback to try to separate out like the shiny stuff from the real

like the shiny stuff from the real stuff. We look really hard at whether

stuff. We look really hard at whether something we're thinking about making is

something we're thinking about making is a good business decision or if it's just

a good business decision or if it's just like a really expensive API call in

like a really expensive API call in disguise. Um, and like generally our

disguise. Um, and like generally our default answer is like no actually. Um,

default answer is like no actually. Um, and it gets easier the more you do it

and it gets easier the more you do it because demands will overwhelm you. But

because demands will overwhelm you. But like saying no is not failure, it's

like saying no is not failure, it's focus. Um, but one of the hardest things

focus. Um, but one of the hardest things we've like gone through lately is that

we've like gone through lately is that no only works if people have authority

no only works if people have authority to make it stick, right? Um, so the last

to make it stick, right? Um, so the last truth is ownership can sink you. Um like

truth is ownership can sink you. Um like products live and die by like who's the

products live and die by like who's the DRRi right the directly responsible

DRRi right the directly responsible individual if you have the wrong owner

individual if you have the wrong owner at the wrong time like you can totally

at the wrong time like you can totally sync your product because people have

sync your product because people have like their own agendas right so intercom

like their own agendas right so intercom used to like the ownership model um we

used to like the ownership model um we had before was this triad model with

had before was this triad model with like a PM a designer and an engineering

like a PM a designer and an engineering manager and they made decisions like

manager and they made decisions like collaboratively and that's great for

collaboratively and that's great for like working together and like having a

like working together and like having a lot of agreement but it's a lot slower

lot of agreement but it's a lot slower it dilutes the decision speed. So we've

it dilutes the decision speed. So we've tried new things and new models um as we

tried new things and new models um as we are like working on AI products. So we

are like working on AI products. So we tried just like PME uh teams but uh two

tried just like PME uh teams but uh two things we've noticed marketing pressure

things we've noticed marketing pressure tends to like creep in there um and and

tends to like creep in there um and and and like push to like launch too early

and like push to like launch too early and then it can be hard sometimes to to

and then it can be hard sometimes to to say um no to like a lot of customer

say um no to like a lot of customer demands from some like big customer,

demands from some like big customer, right? that you might be trying to

right? that you might be trying to trying to like build for. Um, we've also

trying to like build for. Um, we've also tried MLE uh teams and it can be hard

tried MLE uh teams and it can be hard like you build it, it's great quality,

like you build it, it's great quality, but like then you have to hand off to a

but like then you have to hand off to a product team to like own it and

product team to like own it and they might feel a lack of ownership,

they might feel a lack of ownership, right? Or a lack of vision in like what

right? Or a lack of vision in like what you've built and so they might you might

you've built and so they might you might not get enough like investment

not get enough like investment afterwards. Um so what we do now we have

afterwards. Um so what we do now we have this like PM as a DRRI and then strong

this like PM as a DRRI and then strong um like a technical ML lead and a design

um like a technical ML lead and a design lead who are like advocating for those

lead who are like advocating for those for for our positions. But like the PM

for for our positions. But like the PM is kind of you have a single decision

is kind of you have a single decision maker, right? So you can move faster and

maker, right? So you can move faster and we found that to be like pretty smooth.

we found that to be like pretty smooth. It's not perfect. We're like still

It's not perfect. We're like still working things out. Like one of the

working things out. Like one of the tough things is that doesn't necessarily

tough things is that doesn't necessarily work the same way for every project

work the same way for every project because you've got different people,

because you've got different people, right? And people are not

right? And people are not interchangeable totally like on on

interchangeable totally like on on different projects. So things work

different projects. So things work differently. Um but the the takeaway I

differently. Um but the the takeaway I think is you you have to scope decision

think is you you have to scope decision rights as carefully as you're scoping

rights as carefully as you're scoping your features um when you're trying to

your features um when you're trying to build these because um if your ownership

build these because um if your ownership fails like all those nos you said they

fails like all those nos you said they mean nothing like your road map still

mean nothing like your road map still collapses. polish doesn't matter and

collapses. polish doesn't matter and your demos like they were all false

your demos like they were all false promises because like your AI product is

promises because like your AI product is just like off track, right? So I think

just like off track, right? So I think competing in AI means you have to live

competing in AI means you have to live these truths like every week. So you

these truths like every week. So you can't just ask like is my is our model

can't just ask like is my is our model ready? You have to ask like is our

ready? You have to ask like is our company ready to handle all this

company ready to handle all this pressure? Am I ready to like hold the

pressure? Am I ready to like hold the line in a tough situation? Um I think

line in a tough situation? Um I think yeah it like one half of like survival

yeah it like one half of like survival is like do you have a really great

is like do you have a really great model? Do you have really great tech?

model? Do you have really great tech? But like the other half is like do you

But like the other half is like do you have a bunch of people who are willing

have a bunch of people who are willing to like deal with really uncomfortable

to like deal with really uncomfortable situations and like hard stuff and like

situations and like hard stuff and like maybe go cry in the bathroom and then

maybe go cry in the bathroom and then come out and like work together and like

come out and like work together and like leave at the end of the day with like a

leave at the end of the day with like a smile on their face happy to come back

smile on their face happy to come back the next day. Um because I think you

the next day. Um because I think you need both of those parts to be able to

need both of those parts to be able to really be successful at building AI

really be successful at building AI products.

products. Thank you. That's it.

Thank you so much, Molly. um truly hard-fought lessons. The last couple of

hard-fought lessons. The last couple of years as we've transformed intercom from

years as we've transformed intercom from this historical SAS company to an AI

this historical SAS company to an AI first company, it has been blood, sweat,

first company, it has been blood, sweat, and tears. And it's it's uh uh

and tears. And it's it's uh uh everything we thought we could take for

everything we thought we could take for granted has changed. Uh it's been really

granted has changed. Uh it's been really really fun. Okay. Uh our last session

really fun. Okay. Uh our last session then we're going to do a panel with uh

then we're going to do a panel with uh amazing leaders from these companies. We

amazing leaders from these companies. We have uh Nico Grupin from Harvey, the AI

have uh Nico Grupin from Harvey, the AI for law firms and the Fortune 500. Uh we

for law firms and the Fortune 500. Uh we have Silus Albertie from Cognition, the

have Silus Albertie from Cognition, the team behind Devon and Windsurf and a

team behind Devon and Windsurf and a person called Fergle Reed behind a

person called Fergle Reed behind a product called Finn you might have heard

product called Finn you might have heard of. Uh please welcome all to the stage.

of. Uh please welcome all to the stage. [Applause]

[Applause] Okay, thank you very much for for doing

Okay, thank you very much for for doing this folks. Um, if our thesis tonight is

this folks. Um, if our thesis tonight is it turns out that building great AI

it turns out that building great AI products is more than just a thin wrap

products is more than just a thin wrap around an LLM to say the least and that

around an LLM to say the least and that there's durable advantage in that. My

there's durable advantage in that. My softball is like do you agree and what

softball is like do you agree and what does that mean for you? Maybe start with

does that mean for you? Maybe start with you Nico.

you Nico. >> Yeah. Uh, we can go in a number of

>> Yeah. Uh, we can go in a number of different directions with this one. So

different directions with this one. So first of all I think the way that you

first of all I think the way that you frame the question absolutely aligns

frame the question absolutely aligns with my mental model which is the

with my mental model which is the product as the focal point. Um I think

product as the focal point. Um I think our story though actually starts even

our story though actually starts even one step earlier than that which is by

one step earlier than that which is by partnering with our customers in this

partnering with our customers in this case law firms embedding ourselves or

case law firms embedding ourselves or immersing ourselves in their workflows

immersing ourselves in their workflows and understanding their core problems

and understanding their core problems right so uh something we take a lot of

right so uh something we take a lot of inspiration from at Harvey is um

inspiration from at Harvey is um unreasonable hospitality so this is of

unreasonable hospitality so this is of course a reference to Will Gera his book

course a reference to Will Gera his book uh Madison 11 Madison Park the

uh Madison 11 Madison Park the restaurant in New York um and the

restaurant in New York um and the operation that they've been able to spin

operation that they've been able to spin up quite successfully. They are large in

up quite successfully. They are large in large part due to their unique approach

large part due to their unique approach to services-based work, right? Um, and

to services-based work, right? Um, and so what this means, you know, from a

so what this means, you know, from a product development standpoint, what

product development standpoint, what what we're trying to convey by taking

what we're trying to convey by taking that as inspiration is it actually

that as inspiration is it actually starts with deep customer obsession.

starts with deep customer obsession. What are your users and what are their

What are your users and what are their core problems? Only after you understand

core problems? Only after you understand that can you work your way backwards to

that can you work your way backwards to the product and product experience

the product and product experience that's needed to solve those problems.

that's needed to solve those problems. and then you work your way backwards to

and then you work your way backwards to the AI models and systems that are

the AI models and systems that are needed to support that product

needed to support that product experience. Um, in my experience, it's

experience. Um, in my experience, it's really challenging to try to push AI

really challenging to try to push AI functionality in the other direction.

functionality in the other direction. Um, given how quickly the ecosystem is

Um, given how quickly the ecosystem is moving, there's a lot of incentives and

moving, there's a lot of incentives and external pressures to do that. Um, and

external pressures to do that. Um, and then when it comes to actually building

then when it comes to actually building a product, yeah, I get asked this

a product, yeah, I get asked this question all the time, which is what's

question all the time, which is what's challenging about building at the

challenging about building at the application layer. I think the reality

application layer. I think the reality is and and I really don't think that

is and and I really don't think that enough people talk about this is that

enough people talk about this is that it's all challenging, right? There's

it's all challenging, right? There's there's lit quite literally no easy

there's lit quite literally no easy part, right? There are certainly

part, right? There are certainly frameworks you can use to kind of scope

frameworks you can use to kind of scope or frame the difficulty of a problem. In

or frame the difficulty of a problem. In the AI world, the things I'm thinking of

the AI world, the things I'm thinking of are things like, you know, how much do

are things like, you know, how much do domain expertise is required to solve

domain expertise is required to solve the problem? How verifiable are your

the problem? How verifiable are your outcomes? Uh how much does the problem

outcomes? Uh how much does the problem space rely on manual processes or tribal

space rely on manual processes or tribal knowledge? Right? Right? And that helps

knowledge? Right? Right? And that helps you frame the AI problem, but it's just

you frame the AI problem, but it's just one in a basket of other problems

one in a basket of other problems including infrastructure integrations,

including infrastructure integrations, you know, security and privacy are table

you know, security and privacy are table stakes for enterprise use cases and of

stakes for enterprise use cases and of course intuitive UX and design. So the

course intuitive UX and design. So the main takeaway for me is when you sign up

main takeaway for me is when you sign up to build at the application layer,

to build at the application layer, you're signing up to solve this whole

you're signing up to solve this whole problem, right? And you have to master

problem, right? And you have to master each of the individual components to

each of the individual components to deliver a valuable product to your

deliver a valuable product to your customers.

customers. >> And it feels fractal, right? It feel it

>> And it feels fractal, right? It feel it feels like going from like it's like

feels like going from like it's like it's like getting 59's reliability or

it's like getting 59's reliability or something, right? It's like the deeper

something, right? It's like the deeper you go into it, the harder and harder it

you go into it, the harder and harder it gets. You start with the idea of like it

gets. You start with the idea of like it sounded like I interpret as you do to me

sounded like I interpret as you do to me an expertise. You really need like

an expertise. You really need like before you even get to like the AI, you

before you even get to like the AI, you need to deeply understand what problem

need to deeply understand what problem you're trying to solve.

you're trying to solve. >> Um Silus, I think like the I've heard

>> Um Silus, I think like the I've heard something similar from Devon and like

something similar from Devon and like the something that's very compelling

the something that's very compelling about the way I've heard Devon being

about the way I've heard Devon being framed is it's not a coding agent. It's

framed is it's not a coding agent. It's trying to solve the job of software

trying to solve the job of software engineering. Is that like do I have that

engineering. Is that like do I have that right?

right? >> That's correct. Yeah. Yeah. I think I

>> That's correct. Yeah. Yeah. I think I also very much agree with the overall

also very much agree with the overall thesis here. I feel like two years ago,

thesis here. I feel like two years ago, everybody was talking about these like,

everybody was talking about these like, oh yeah, the labs are going to eat

oh yeah, the labs are going to eat everything. There's these like thin

everything. There's these like thin wrappers around it. And I think we

wrappers around it. And I think we started out being this like applied AI

started out being this like applied AI lab and also not really sure initially

lab and also not really sure initially what is going to be the bulk of the

what is going to be the bulk of the stack that we're going to own. And then

stack that we're going to own. And then we started actually talking to customers

we started actually talking to customers and building stuff for them and noticed

and building stuff for them and noticed the problem of actually delivering value

the problem of actually delivering value to real engineering organizations is

to real engineering organizations is actually a very deep product problem. So

actually a very deep product problem. So it started with like all the

it started with like all the infrastructure on actually enabling

infrastructure on actually enabling software engineering agents to work in

software engineering agents to work in like real enterprise environments from

like real enterprise environments from like virtual machines to run the code

like virtual machines to run the code and all the plumbing to connect them to

and all the plumbing to connect them to your like AWS and your Jira and your

your like AWS and your Jira and your linear and your GitHub and also all the

linear and your GitHub and also all the different interfaces you want to build

different interfaces you want to build whether it's like a web app integrating

whether it's like a web app integrating with slack and even now the IDE with

with slack and even now the IDE with surf

surf and I think the other deep product

and I think the other deep product problem that we think a lot about is

problem that we think a lot about is interface interfaces.

interface interfaces. So on the one hand obviously like the

So on the one hand obviously like the IDE is like a pretty big interface for

IDE is like a pretty big interface for software engineering but also we all

software engineering but also we all kind of believe okay that might not be

kind of believe okay that might not be the interface in five years but we also

the interface in five years but we also don't think it's just going to be a chat

don't think it's just going to be a chat and I think a lot of what we think about

and I think a lot of what we think about is what is the real like future

is what is the real like future interface of how people write code and

interface of how people write code and we think it's a pretty challenging

we think it's a pretty challenging design problem that also involves

design problem that also involves co-designing

co-designing ML systems and even models and um I

ML systems and even models and um I think it hasn't been solved yet.

think it hasn't been solved yet. >> No. Yeah, definitely hasn't. The the

>> No. Yeah, definitely hasn't. The the idea of uh we'll get into the model

idea of uh we'll get into the model stuff certainly the idea of interacting.

stuff certainly the idea of interacting. There's something in my my mind about

There's something in my my mind about like if you're trying to solve I'd love

like if you're trying to solve I'd love to understand the equivalent in the law

to understand the equivalent in the law use case. Um but we think about this a

use case. Um but we think about this a lot with Finn is like if you're trying

lot with Finn is like if you're trying to solve for us the job of like

to solve for us the job of like replacing what a customer service rep

replacing what a customer service rep does. It isn't just answering questions.

does. It isn't just answering questions. They have to interact with the rest of

They have to interact with the rest of the team. uh and Devon uh silus like you

the team. uh and Devon uh silus like you know like the surfaces like how it

know like the surfaces like how it interacts with the ecosystem it's in um

interacts with the ecosystem it's in um like what kind of what kind of

like what kind of what kind of interaction patterns and services are

interaction patterns and services are you imagining might be in the design

you imagining might be in the design envelope there

envelope there >> a lot of things I mean it starts with

>> a lot of things I mean it starts with the source of the task right so the task

the source of the task right so the task might it might come from like right in

might it might come from like right in the developer seat in the IDE but it

the developer seat in the IDE but it could also come from some customer bug

could also come from some customer bug report in Slack and then somebody tags

report in Slack and then somebody tags Devon right in the thread It could also

Devon right in the thread It could also come from some issue tracking system

come from some issue tracking system like linear. And we even imagine um um a

like linear. And we even imagine um um a lot of other like source of tasks that

lot of other like source of tasks that are already possible with MCP

are already possible with MCP integrations like um data dog alerts

integrations like um data dog alerts triggering automatic agent triage

triggering automatic agent triage >> or or Finn like I'm looking forward to

>> or or Finn like I'm looking forward to the day where Finn opens a task for for

the day where Finn opens a task for for Devon in linear and then gives Deon a

Devon in linear and then gives Deon a hard time like hey it's been two weeks

hard time like hey it's been two weeks and the customer's asking for an update.

and the customer's asking for an update. What's going on Devon? like it's going

What's going on Devon? like it's going to happen, right?

to happen, right? >> That would be sick. Yeah, I think you

>> That would be sick. Yeah, I think you know, right from customer bug report in

know, right from customer bug report in Finn to like PR.

Finn to like PR. >> Yeah, absolutely. Um, yeah, the the law

>> Yeah, absolutely. Um, yeah, the the law use case, what like what is the shape of

use case, what like what is the shape of the the interfaces you might have in in

the the interfaces you might have in in in Harvey?

in Harvey? >> Yeah. Well, first of all, I think

>> Yeah. Well, first of all, I think Silus's point on interfaces is spoton.

Silus's point on interfaces is spoton. In fact, we call the discipline applied

In fact, we call the discipline applied research at Harvey very intentionally,

research at Harvey very intentionally, right? And what that's intended to

right? And what that's intended to convey is that our responsibility here

convey is that our responsibility here is actually equal parts AI and HCI. And

is actually equal parts AI and HCI. And what I'm referring to is human computer

what I'm referring to is human computer interaction. Um on the HCI side, it's

interaction. Um on the HCI side, it's all about what is the right mode of

all about what is the right mode of interaction with the models and with

interaction with the models and with these AI systems. Not just generally,

these AI systems. Not just generally, but for our specific users who are legal

but for our specific users who are legal professional service practitioners. The

professional service practitioners. The biggest sort of transition we've seen

biggest sort of transition we've seen here is there are some extremely complex

here is there are some extremely complex tasks that these folks are taking on on

tasks that these folks are taking on on a day-to-day week- toeek basis. So if

a day-to-day week- toeek basis. So if you imagine something like fund

you imagine something like fund formation, right? If you're a private

formation, right? If you're a private equity firm, you're raising a new fund.

equity firm, you're raising a new fund. This is something that will take

This is something that will take multiple weeks, potentially multiple

multiple weeks, potentially multiple months. You there are negotiations

months. You there are negotiations between a number of parties,

between a number of parties, correspondence between lawyers, between

correspondence between lawyers, between lawyers and clients with LPs to

lawyers and clients with LPs to negotiate specific terms and carveouts

negotiate specific terms and carveouts and side litter agreements. incredibly

and side litter agreements. incredibly complex process, right? It's not clear

complex process, right? It's not clear and in fact I'd go far so far as to say

and in fact I'd go far so far as to say is like it's not going to cut it to have

is like it's not going to cut it to have a light kind of multi-turn interaction

a light kind of multi-turn interaction or a chat interface with that, right?

or a chat interface with that, right? Really what our users are craving and

Really what our users are craving and that the direction we're starting to

that the direction we're starting to steer our product is towards a

steer our product is towards a persistent workspace that houses all of

persistent workspace that houses all of the data and information and work

the data and information and work product that you need. Right? So if

product that you need. Right? So if you're going to raise a new fund, you

you're going to raise a new fund, you can show up to this workspace. It has

can show up to this workspace. It has all of your historical precedent from

all of your historical precedent from deals you've done in the past. you know,

deals you've done in the past. you know, intermediate work product that's

intermediate work product that's completed by the legal team as you go.

completed by the legal team as you go. Um, all of the correspondence, email

Um, all of the correspondence, email threads, back and forth, attachments

threads, back and forth, attachments with legal counsel and clients. Um, and

with legal counsel and clients. Um, and then eventually kind of the finalized

then eventually kind of the finalized kind of polished work product that you

kind of polished work product that you end up using to sign and close the deal.

end up using to sign and close the deal. Um, so all of this is self-contained in

Um, so all of this is self-contained in one workspace. And then as the the

one workspace. And then as the the process kind of unfolds in these phases,

process kind of unfolds in these phases, you can delegate tasks to agents to

you can delegate tasks to agents to complete along the way. You can delegate

complete along the way. You can delegate tasks to humans along the way to

tasks to humans along the way to complete. You can delegate tasks to

complete. You can delegate tasks to human agent teams to complete along the

human agent teams to complete along the way. Um and we're already getting asked

way. Um and we're already getting asked for this to be collaborative and shared

for this to be collaborative and shared between lawyer law firms and and their

between lawyer law firms and and their clients. So um we see a transition from

clients. So um we see a transition from these lightweight almost like ephemeral

these lightweight almost like ephemeral kind of interactions to something that's

kind of interactions to something that's persistent has memory um and is

persistent has memory um and is self-contained.

self-contained. like this sorry what I was going to

like this sorry what I was going to prime me on for gole was like this is

prime me on for gole was like this is this is the amount of depth required in

this is the amount of depth required in really not just executing a task but

really not just executing a task but like solving a job end to end is is what

like solving a job end to end is is what I'm like the thing that is occurring me

I'm like the thing that is occurring me in both of these these cases

in both of these these cases >> yeah I just think it's fascinating to to

>> yeah I just think it's fascinating to to listen to this and just because you know

listen to this and just because you know there's all this narrative around like

there's all this narrative around like AI is getting more and more general and

AI is getting more and more general and will it like you know solve all these

will it like you know solve all these problems really really quickly like is

problems really really quickly like is it going to be six months and it's going

it going to be six months and it's going to be doing everything and there's just

to be doing everything and there's just so complexity to any given task. I mean,

so complexity to any given task. I mean, we see that in customer service. We

we see that in customer service. We often think about like, you know, h

often think about like, you know, h customer service can sometimes be like

customer service can sometimes be like the exception handler or the system

the exception handler or the system integrator of last resort, right? It's

integrator of last resort, right? It's it's it's the thing you go to when you

it's it's the thing you go to when you need a human to navigate your org and

need a human to navigate your org and make something happen and like the

make something happen and like the system couldn't do it automatically. And

system couldn't do it automatically. And so like you bump into the same thing

so like you bump into the same thing which is there's just an insane amount

which is there's just an insane amount of complexity to actually doing real

of complexity to actually doing real tasks um you know outside a lab or a

tasks um you know outside a lab or a back test but actually in a real messy

back test but actually in a real messy environment and just it's fascinating to

environment and just it's fascinating to hear that across the different

hear that across the different disciplines and across the different

disciplines and across the different domains

domains >> the the messiness of all the different

>> the the messiness of all the different interactions the idea of escalating to

interactions the idea of escalating to another AI agent or escalating a problem

another AI agent or escalating a problem to a human getting a human to go do

to a human getting a human to go do something for you. um it's like that's

something for you. um it's like that's there's a a lot of complex domain

there's a a lot of complex domain complexity there. But even uh something

complexity there. But even uh something that's interesting is like even if it's

that's interesting is like even if it's just you know executing a task that

just you know executing a task that requires looking up data and you know

requires looking up data and you know integrating with an API

integrating with an API >> getting a complex process to execute

>> getting a complex process to execute reliably is exceptionally hard. It seems

reliably is exceptionally hard. It seems like any any lessons maybe Fergle

like any any lessons maybe Fergle starting with like things we've learned

starting with like things we've learned from that idea of you know the combining

from that idea of you know the combining error like low errors for a complex task

error like low errors for a complex task is a is a hard problem. Yeah, we we've

is a is a hard problem. Yeah, we we've learned it's really really hard. I'd say

learned it's really really hard. I'd say that's so you know like we have this uh

that's so you know like we have this uh you know tasks product where um you'll

you know tasks product where um you'll go and try and do something like issue a

go and try and do something like issue a refund or something like that. And you

refund or something like that. And you know it it's very easy to get a demo

know it it's very easy to get a demo that like works right now and again. But

that like works right now and again. But it's very difficult if you have like six

it's very difficult if you have like six different steps that need to be

different steps that need to be completed reliably to not have like an

completed reliably to not have like an error creep in. And so, you know, our

error creep in. And so, you know, our approach at the moment with where

approach at the moment with where technology is is to try and build like a

technology is is to try and build like a a tool a tool set to help our customers

a tool a tool set to help our customers factor that big complex workflow they're

factor that big complex workflow they're trying to do into like subcomponents

trying to do into like subcomponents that an LLM can reliably execute on.

that an LLM can reliably execute on. That's the direction we're going. Um,

That's the direction we're going. Um, and uh, but yeah, I it's really hard to

and uh, but yeah, I it's really hard to know like this is definitely the

know like this is definitely the frontier. And I I think, you know, it's

frontier. And I I think, you know, it's very easy to make a demo, but it's an

very easy to make a demo, but it's an awful lot of work to actually complete

awful lot of work to actually complete this this long multi-step running

this this long multi-step running process in in in a messy unconstrained

process in in in a messy unconstrained world. I think that that's that's still

world. I think that that's that's still very frontier and maybe some distance

very frontier and maybe some distance away from certainly from LLMs doing it

away from certainly from LLMs doing it out of the box. I think you're going to

out of the box. I think you're going to need product, you know, scaffolding and

need product, you know, scaffolding and building blocks and everything around

building blocks and everything around that for for a long time.

that for for a long time. >> Y

>> Y >> that's at least our thesis.

>> that's at least our thesis. >> Yeah. No, I mean I totally agree. We

>> Yeah. No, I mean I totally agree. We have basically the the exact equivalent

have basically the the exact equivalent of that problem. Obviously, a common

of that problem. Obviously, a common legal task is large scale kind of

legal task is large scale kind of document review. Uh so we have a product

document review. Uh so we have a product called Vault that's uh intended to

called Vault that's uh intended to handle these sort of use cases. Lawyers

handle these sort of use cases. Lawyers can upload a thousand a 100 thousand

can upload a thousand a 100 thousand files at a time. We're in the process of

files at a time. We're in the process of increasing that to a million files at a

increasing that to a million files at a time. Um, and for those of you who have

time. Um, and for those of you who have uh interacted with lawyers, they're not

uh interacted with lawyers, they're not your typical document, right? Um, so an

your typical document, right? Um, so an actually really common use case we have

actually really common use case we have for our vault product is to uh upload

for our vault product is to uh upload and analyze and extract key terms from

and analyze and extract key terms from credit agreements and loan agreements.

credit agreements and loan agreements. Um, if everyone if anyone here has

Um, if everyone if anyone here has worked with a loan agreement, a single

worked with a loan agreement, a single one of these things can be 400,000

one of these things can be 400,000 tokens in length, right? which is, for

tokens in length, right? which is, for those who are wondering, longer than the

those who are wondering, longer than the Dune novel, which is enough content for

Dune novel, which is enough content for two movies, right? Two of these things

two movies, right? Two of these things is the Lord of the Rings trilogy, right?

is the Lord of the Rings trilogy, right? Um, and lawyers don't just have one or

Um, and lawyers don't just have one or two of these lying around, right?

two of these lying around, right? There's thousands of these lying around.

There's thousands of these lying around. So, you need to be able to handle that

So, you need to be able to handle that process. And so, I've said from day one,

process. And so, I've said from day one, AI is kind of the star of the show right

AI is kind of the star of the show right now. The real heroes of the application

now. The real heroes of the application layer are those who are sorting out AI

layer are those who are sorting out AI infrastructure because it's all novel

infrastructure because it's all novel infrastructure as well. Um, and then one

infrastructure as well. Um, and then one thing I want to hit on which I think you

thing I want to hit on which I think you hit on which is really unique,

hit on which is really unique, especially as agents are taking center

especially as agents are taking center stage right now. What we're seeing is

stage right now. What we're seeing is infrastructure for longunning

infrastructure for longunning asynchronous agents to complete

asynchronous agents to complete increasingly sophisticated tasks. That

increasingly sophisticated tasks. That is one mode of completing complex work,

is one mode of completing complex work, but you're not guaranteed to have the

but you're not guaranteed to have the same input every time or the same output

same input every time or the same output every time. Right? There's some

every time. Right? There's some variance, there's some stochasticity

variance, there's some stochasticity baked in. So what we're see what we're

baked in. So what we're see what we're seeing is you need a complimentary

seeing is you need a complimentary product that can handle repeatable

product that can handle repeatable deterministic units of work, right? So

deterministic units of work, right? So you can identify the agent trajectories

you can identify the agent trajectories that go well and then map them to

that go well and then map them to building blocks that users can interact

building blocks that users can interact with and execute over and over and over

with and execute over and over and over again.

again. >> Yeah, that's something we find is like

>> Yeah, that's something we find is like you know to do these things well you

you know to do these things well you need to be able to mix generative

need to be able to mix generative stoastic processes with deterministic

stoastic processes with deterministic reliable things and like the the blend

reliable things and like the the blend of that is interesting. Okay. So, uh,

of that is interesting. Okay. So, uh, essential domain complexity, you need to

essential domain complexity, you need to understand the domain. Really hard to

understand the domain. Really hard to like once you're in that domain and

like once you're in that domain and interacting with the rest of the world,

interacting with the rest of the world, getting it to actually do things

getting it to actually do things reliably is really hard. One of the

reliably is really hard. One of the things we, you know, we're talking about

things we, you know, we're talking about today, announcing today is that we're

today, announcing today is that we're tuning our own models. Like, that's the

tuning our own models. Like, that's the next bit. Even if you do that all

next bit. Even if you do that all perfectly, there is leverage, durable

perfectly, there is leverage, durable value in in tuning your own models for

value in in tuning your own models for parts of the system. Silus, I think

parts of the system. Silus, I think that's something that the cognition has

that's something that the cognition has done a lot of like that's part of like

done a lot of like that's part of like the was that part of the appeal of

the was that part of the appeal of acquiring wind surf like

acquiring wind surf like >> yeah so we um we think about training

>> yeah so we um we think about training models in a certain way so first of all

models in a certain way so first of all I do think there's this interesting new

I do think there's this interesting new development of application layered

development of application layered companies getting into model training as

companies getting into model training as um Finn is as well um I do think the

um Finn is as well um I do think the philosophy is a little bit different you

philosophy is a little bit different you know I think at a large foundation model

know I think at a large foundation model lab like OpenAI, Anthropic, there's

lab like OpenAI, Anthropic, there's these like separate research ors that do

these like separate research ors that do long-term research and sometimes quite

long-term research and sometimes quite far away from product. I think for us

far away from product. I think for us the product has always been the primary

the product has always been the primary goal. So we try to like go backwards

goal. So we try to like go backwards from the product and figure out where

from the product and figure out where does it make sense where would actually

does it make sense where would actually a custom model lift some like user

a custom model lift some like user metrics or enable like a new experience.

metrics or enable like a new experience. And there's um quite a few places across

And there's um quite a few places across the stack where we um found this to be

the stack where we um found this to be the case. So um and it is true that in

the case. So um and it is true that in winterf um there had been quite a few of

winterf um there had been quite a few of these. For example, um mo most famously

these. For example, um mo most famously the the tap model, right? Um we all know

the the tap model, right? Um we all know C-Pilot back in the day pioneered this,

C-Pilot back in the day pioneered this, but actually um Windsurf before it was

but actually um Windsurf before it was windfur was called Kodium and had one of

windfur was called Kodium and had one of the early products in that segment as

the early products in that segment as well. Um which later evolved to be um

well. Um which later evolved to be um like doing multi-line edits and tap to

like doing multi-line edits and tap to jump. Um so that's um continues to be a

jump. Um so that's um continues to be a very big um focus for us. But we also

very big um focus for us. But we also have SUI1 which is um basically our like

have SUI1 which is um basically our like frontier um coding agent model that was

frontier um coding agent model that was released in May and um is uh still one

released in May and um is uh still one of our most popular models and um

of our most popular models and um basically powered by like reinforcement

basically powered by like reinforcement learning and software engineering tasks

learning and software engineering tasks and um this is for us just the

and um this is for us just the beginning. There's a lot more to come on

beginning. There's a lot more to come on that front. Besides that, we also um see

that front. Besides that, we also um see a lot of potential for training

a lot of potential for training specialized models both for certain

specialized models both for certain verticals. We've um released for example

verticals. We've um released for example um the Kevin model. It's a little of a

um the Kevin model. It's a little of a um open source research project. We

um open source research project. We wrote a blog post and a paper about it

wrote a blog post and a paper about it where we trained a model on a specific

where we trained a model on a specific coding vertical which in this case was

coding vertical which in this case was CUDA kernel writing. But there are many

CUDA kernel writing. But there are many more of these verticals that um we uh

more of these verticals that um we uh work on with our enterprise customers.

work on with our enterprise customers. And the other specialization that we see

And the other specialization that we see is around speed. So very often um coding

is around speed. So very often um coding agents take minutes or even tens of

agents take minutes or even tens of minutes. And sometimes this is fine if

minutes. And sometimes this is fine if you just are like in a purely like

you just are like in a purely like delegation um mode and you maybe come

delegation um mode and you maybe come back half an hour later and review. But

back half an hour later and review. But very often also there is this desire to

very often also there is this desire to be in the loop and actually drive the um

be in the loop and actually drive the um the iteration of the agent. And for

the iteration of the agent. And for these cases we find that the difference

these cases we find that the difference between waiting like 45 seconds or 10

between waiting like 45 seconds or 10 seconds can be the difference between

seconds can be the difference between switching to another tab and scrolling

switching to another tab and scrolling Twitter or actually uh waiting for the

Twitter or actually uh waiting for the agent to be done. Um,

agent to be done. Um, >> so these are the these are the trails

>> so these are the these are the trails Brad was talking about like the the

Brad was talking about like the the different dimensions like the the one

different dimensions like the the one for us is um the latency budget we have

for us is um the latency budget we have on voice is wildly different the latency

on voice is wildly different the latency budget we have on email and we can do

budget we have on email and we can do very different things in terms of

very different things in terms of accuracy and cost trade-offs there. Um,

accuracy and cost trade-offs there. Um, and it's a it's a really hard problem

and it's a it's a really hard problem like yeah the fine tune things like one

like yeah the fine tune things like one of the things I personally get most

of the things I personally get most excited about and the stuff that

excited about and the stuff that Vertical was presenting was the latency

Vertical was presenting was the latency improvements and our ability to when we

improvements and our ability to when we have the model we can control the

have the model we can control the latency a lot more directly as well. Um,

latency a lot more directly as well. Um, is Harvey also like bought into this

is Harvey also like bought into this idea of like, hey, tuning models is

idea of like, hey, tuning models is there's an average.

there's an average. >> Yeah, so we've been doing this since I

>> Yeah, so we've been doing this since I joined two and a half years ago when the

joined two and a half years ago when the company was 6 months old. I think you

company was 6 months old. I think you bring up a good point actually, which is

bring up a good point actually, which is RFT is is super popular right now, but

RFT is is super popular right now, but distillation is still a very viable

distillation is still a very viable option for taking the reasoning

option for taking the reasoning capabilities of larger models,

capabilities of larger models, distilling them into smaller models that

distilling them into smaller models that can do the same task, but much cheaper

can do the same task, but much cheaper and much faster. Um, yeah, we've

and much faster. Um, yeah, we've certainly had our own model training

certainly had our own model training journey here. Actually when I joined um

journey here. Actually when I joined um the sort of state-of-the-art approach

the sort of state-of-the-art approach for customizing models at the time was

for customizing models at the time was continued pre-training or or

continued pre-training or or mid-training. Um so what we did is we

mid-training. Um so what we did is we literally took all of US case law uh

literally took all of US case law uh which is about somewhere between 10 and

which is about somewhere between 10 and 12 billion tokens and we literally did

12 billion tokens and we literally did next token prediction over that in the

next token prediction over that in the hopes that we would find uh sort of step

hopes that we would find uh sort of step change legal reasoning capabilities

change legal reasoning capabilities emerge in the same way that these other

emerge in the same way that these other reasoning capabilities emerge from the

reasoning capabilities emerge from the models. Um long story short it it worked

models. Um long story short it it worked in part. I don't think it was enough to

in part. I don't think it was enough to move them move the needle. Right? This

move them move the needle. Right? This is at the same time that RHF kind of

is at the same time that RHF kind of came around. You can gather, you know,

came around. You can gather, you know, really the thing that we want to

really the thing that we want to optimize for at the end of the day is

optimize for at the end of the day is lawyer preference of outputs, right? And

lawyer preference of outputs, right? And and specifically partner preference of

and specifically partner preference of outputs. Um if you can gather a few uh

outputs. Um if you can gather a few uh thousand uh examples, you can, you know,

thousand uh examples, you can, you know, train a reward model, do DPO um or do

train a reward model, do DPO um or do DPO directly on the preference judgments

DPO directly on the preference judgments um and you're off and running. Um, I

um and you're off and running. Um, I think today we're we're much more

think today we're we're much more focused on on RFT. I think what one

focused on on RFT. I think what one thing we're really interested in is if

thing we're really interested in is if you imagine this sort of emerging AI

you imagine this sort of emerging AI native stack of models where you're

native stack of models where you're doing inference model systems and tools

doing inference model systems and tools and then agents that operate on top of

and then agents that operate on top of them. Um, an area of investment for us

them. Um, an area of investment for us is actually beginning to simulate some

is actually beginning to simulate some of these legal tasks end to end. Right?

of these legal tasks end to end. Right? So imagine the fund formation process

So imagine the fund formation process that I described above. If you can

that I described above. If you can simulate this in in this sandboxed

simulate this in in this sandboxed environment, you can actually begin to

environment, you can actually begin to train agents uh to complete subtasks or

train agents uh to complete subtasks or the entire process. Um and to end uh the

the entire process. Um and to end uh the bottleneck there has and and uh

bottleneck there has and and uh continues to be strong uh verifiable

continues to be strong uh verifiable models, verifier models.

models, verifier models. >> I have to wrap us up.

>> I have to wrap us up. >> Nico, I just just really curious there.

>> Nico, I just just really curious there. Um the simulation is that like using an

Um the simulation is that like using an LLM as a simulator or is it more like a

LLM as a simulator or is it more like a reinforcement style playground sort of

reinforcement style playground sort of simulator? Yeah. So, what we're sort of

simulator? Yeah. So, what we're sort of envisioning is I described the workspace

envisioning is I described the workspace concept um just a few moments ago. Um

concept um just a few moments ago. Um you can allow an agent to take actions

you can allow an agent to take actions within that workspace including actions

within that workspace including actions the choice to interact with other humans

the choice to interact with other humans via email. Right.

via email. Right. >> Um so you we have tools essentially for

>> Um so you we have tools essentially for research, tools for drafting and tools

research, tools for drafting and tools for human interaction. Um that is

for human interaction. Um that is essentially an action space for an RL

essentially an action space for an RL agent. The application in this workspace

agent. The application in this workspace is the environment. Um, and we're

is the environment. Um, and we're looking at ways that we can scale up the

looking at ways that we can scale up the simulation to make it work in practice.

simulation to make it work in practice. I would say it's still, you know,

I would say it's still, you know, forwardlooking, you know, but it's

forwardlooking, you know, but it's something we're investing a lot in.

something we're investing a lot in. >> It's so exciting. I remember like

>> It's so exciting. I remember like learning about reinforcement learning,

learning about reinforcement learning, you know, way back when, like at a

you know, way back when, like at a reinforcement learning startup in 2014

reinforcement learning startup in 2014 or something like that. And, you know,

or something like that. And, you know, back then it was like multi-arm bandit

back then it was like multi-arm bandit for website optimization. Now it's like

for website optimization. Now it's like an entire line of business task could be

an entire line of business task could be in the environment or like software

in the environment or like software engineer is an environment. It's just

engineer is an environment. It's just wild. It's exciting.

wild. It's exciting. >> Uh the thing I'm going to wrap us up

>> Uh the thing I'm going to wrap us up now. The thing that I keep going back to

now. The thing that I keep going back to is your the Allen K quote. I'm just

is your the Allen K quote. I'm just thinking about like the depth at all the

thinking about like the depth at all the layers of this conversation from like

layers of this conversation from like the application layer, the air layer,

the application layer, the air layer, and then now into the model layer. Being

and then now into the model layer. Being able to have the expertise and the

able to have the expertise and the ability to own all of that unlocks

ability to own all of that unlocks these, you know, huge capabilities in

these, you know, huge capabilities in the product products that we're all

the product products that we're all building.

building. >> Okay. Uh, please put your hands together

>> Okay. Uh, please put your hands together for a great panel.

>> And thank you very much for coming this evening in general. Please give another

evening in general. Please give another round of applause for all of our great

round of applause for all of our great speakers, Brett, Molly, Fergle, Silus,

speakers, Brett, Molly, Fergle, Silus, and Nico.

And thank all of you online and here in person for joining us and giving us the

person for joining us and giving us the time. It's really great. If you're

time. It's really great. If you're interested in learning more about the

interested in learning more about the the technical work we've been doing

the technical work we've been doing today, uh finn.ai/ressearchonline

today, uh finn.ai/ressearchonline and in person here, we have technical

and in person here, we have technical session papers uh paper sessions uh

session papers uh paper sessions uh where you can talk to the scientists and

where you can talk to the scientists and engineers who worked on this stuff. Uh

engineers who worked on this stuff. Uh and maybe one favor is if you can go

and maybe one favor is if you can go back and talk to your uh customer uh

back and talk to your uh customer uh leaders about using Finn, that would be

leaders about using Finn, that would be really nice. Uh please hang around, have

really nice. Uh please hang around, have a drink. Be great to talk to you all.

a drink. Be great to talk to you all. Thank you very much.

[Music] that I have

that I have done.

heat. [Music]

I had moved on. [Music]

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Building Frontier AI Products with Fin x Cognition x Harvey AI x Perplexity

Video Transcript

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
Building Frontier AI Products with Fin x Cognition x Harvey AI x Perplexity