YouTube Transcript:
A Practical and Tactical Approach to Temporal and AI | Replay 2024

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

View:

um hello replay attenders glad to see

you today um as you probably can say by

my accent I'm not from B I'm from bellus

which is a quite different place uh but

the reason today is uh because 5 years

ago a single tweet actually changed the

whole direction of the way how I think

and how I perceive the software

implementations and the software

architecture and today I'd like to tell

you the story from this tweet to the

moment once we start implementing EI

workflows in our application

in applications for our

customers so my name is Anon I'm the CTO

of company spal Scout uh we provide

software development bus uh software

development for our customers around the

globe for around 15 years uh and uh by

person who is maintaining the team and

tasked to make sure that we do good job

as a tech leader I have to always make

sure that the job and the task the tools

we use is always optimal and we don't

spend additional time on doing the

typical bootstrap code or anything that

well we don't want to do as a passionate

coder I love to mitigate that by

actually creating my own tools and for

the span of my career I created a number

of Open Source instruments well in

closed Source instruments everything

from Frameworks to orms to database

layers template and agents in DSL and

Etc But as time been passing and our

client pool been growing and the

complexity been growing with it we soon

realized that even we have our own

toolkit and the team which knows that to

use it back in the day it was mostly PHP

we have been lack in one of the very

large abstractions which been seemingly

hard to get and this abstraction as you

guys know today by this presentation and

this conference is workflow engine so

what the first logical solution that

every engineer will do if he cannot get

the instrument he want in his stack well

let's build it itself very smart idea so

we started doing the research and

started to look in in the ways how we

can start implementing the worklow

engine in our products we used to work

with Amazon swf and it looked like a

very nice solution for many things but

it still was quite proprietary and hard

to use in ecosystems like open source or

outside of

Amazon uh at this moment uh once we once

I had one first prototypes we soon

realized that amount of amount of edge

cases that we uncover by running this

engine just become grow exponentially

every day and every moment we've seen

more and more uh problems arise from the

things we expect just to work that's the

moment where I decided to come back and

try to do additional research and see

maybe there is a new tools in the market

maybe there is a new Solutions or some

better pattern around the time uh I

found a similarly well uh simly well

experienced guy in Twitter who was

talking all about workflows and durable

executions the PO they can Implement and

application that can run for the span of

days and months and Etc so I thought

myself okay I mean I he has his solution

I have my own stack why not try to talk

to him and see if he can collaborate to

bring it in so I wrote a Twitter message

and S to my surprise this person said

yeah let's talk so five years ago by

conversation with ma Maxim fatv it

kicked off The Well quite long

collaboration in which we created the

temporal phps DEC and we began to use an

adopt temporal for our own products and

for the products of our

customers so at this moment everything

looks very nice and Coy we have a one

stack we have powerful workflow engine

what else can you dream about what else

do you want and that's and that's about

the moment where gpt3 dropped on a

market once you see this model and once

you realize what state-of-the-art llm

can do they can interpret your user

requests they can make hiu making jokes

or help you to process any information

it become seem it become very obvious

that there is an immerse potential how

we can use these solutions to build

something more complex

yet we seen that while implementing this

solution and building our first

pipelines well summarizing tweets making

the pool request uh reviews and Etc we

still been seeing this pattern over and

over that even if you have this powerful

technology powered by state-of-the-art

models built by very powerful companies

in the world the actual process of

implementation is not that different

from 20 years ago you still go through

the planning phase design phase

implement ation and iteration so we kind

of seen a situation where we have these

keys from Lamborghini but we use it just

to drive to Costco why we still have

these powerful models that we cannot

actually use to enhance our main work

these days you obviously we have copilot

and we have any uh many other tools but

we decided to actually come back to the

drawing board and challenge ourselves to

a bit different question can we create

the software that can not only be

programmed ahead of the time by

Engineers but the soft that can actually

program itself and expand its own

functionality as it go with

collaboration with user and why not

maybe by itself just trying to see how

it can be optimized by this moment we

clearly know that this is going to be a

very challenging task it's going to be a

very complex architecture which is going

to be spawning for many domains and many

uh parts of the system that has to be

collaborating seamlessly well only if

you have some nice engine that will help

us to cope with this complexity with

this engine is obviously temporal since

I'm speaking here today so let's try to

dive and see what we can do in terms of

llm payloads and llm workflows within

your temporal

application so the first what we have to

do to talk about that is to properly

Define the boundaries how we actually

Define the llm calls within our

workflows and surprisingly in terms of

the actual workflow implementation and

in terms of the actual workflow uh data

flow the llm can be defined quite easily

it's a blackbox and many Engineers

actually Define them as a blackbox as

well it's it's very powerful and magical

abstraction you put some data in you get

some data out sometimes this data is

good sometimes this data is just garbage

well that's what but that's something we

have to live with but at the same time

uh as you guys know when you use llms

while this solution has been extremely

powerful and extremely versatile at the

same time it's quite unreliable you will

be seeing everything from failers on API

calls to timeouts to the plain situation

when EI just saying oh you know what I

don't want to do this job well what you

can do about the situation so if you'll

take a look at this if you'll take a

look at this implementation partn you'll

see that one side of equation you have

extremely powerful abstraction which is

highly UND deterministic highly

unreliable and yet extremely powerful on

another side of the equation you have

engine which was designed to actually

mitigate things like that to write

deterministic and very durable workflows

and Implement them in a quite easy

fashion so if you take a look it kind of

makes total sense to combine them

together you use one engine to actually

mitigate issues done by

another if you're seeking to implement a

lemon new

application you're most likely going to

start with two quite simplistic patterns

which in many cases probably going to be

80 or 90% of your whole llm workloads

you're going to start with rock

pipelines the pipelines the designed to

go to some data source maybe Vector

database maybe external

uh website or anything and gather

information which is the most relevant

to the user query and return this

information which in a in a way that

user or maybe other AI can comprehend

and act on from Another Side you have

type of the workloads which kind of

doing the same the only main difference

is that now instead of doing the return

of the text to the user you can actually

perform some kind of arbitrary action

based on a decision done by llm on

behalf of the user it's as easy that as

send email asking to cancel your order

well any order being canceled and

account deleted well be careful what you

wish for when you work with

llm see looking deeper into rock

pipelines uh and maybe in pretty much

every paper which you're going to find

about rock pipelines uh you will see

they have like some distinctive steps

you always have parts that will be

collecting and aggregating normalizing

chuning Dot embedding into a vector

store maybe reshuffling or clustering it

from another side you'll have part part

s which are responsible essentially for

retrieving that and pushing the answer

to the user but what is the most curious

part about rock pipelines if you take a

look how they've been displayed inside

these papers and inside pretty much

every article people wrote they all have

distinctive steps they all have these

blocks and arrows between them which

surprisingly look exactly what we need

it is a simply data workflow and data

passing I'll be sh examples today in PHP

but I think I do that mostly for visual

purpose purposes it can be easily done

on any language you love in Python

nodejs temporal allows you to switch

Stacks quite easily but if you're going

to implement the rock pipeline the very

simplistic approach is most likely going

to look like that it doesn't require

much thinking it's just a number of

steps some of the step will be used in

LM like to summarize query some of the

steps are going to be going to external

source to find this information and push

it back into a pipeline they can spun

for many many actions have some

branching or some additional

conditions the action pipe plant once

again they're not that much different

from temporal perspective the only major

difference is that instead of giving the

response back to user you're trying to

act based on this response and temporal

makes this approach quite simplistic

because when you're trying to act you're

trying to execute something within your

environment and temporal already

connects to all of your environment so

gluing that to your activities and

calling one of your services is

extremely

simple if you'll take a look on LM

activities and this will become

important in a lot of slides you will

also notice right inter syt every time

you're trying to make an llm call First

Step what you're going to do is to

assemble the context that will be sent

to llm or prompt as we call it this

context in a simplistic terms can be

represented as simplistic template you

just have a number of variables number

of things you found in a knowledge base

something you found from the user maybe

on internet who knows you put them all

together and you just send them and then

you just send them to eii you wait for

this response from EI and then you

interpret the result in some in some

structured form the first thing we

noticed while doing pipelines like that

and writing actions like that that it is

actually extremely important to validate

the eii response within a single

activity you can generally speaking get

the EI response send it back to temporal

and then do the execution in different

activity but the problem is you can't

actually trust eii so what will start

happening is that in some cases your

activity will be executed successfully

everything is okay your activity is done

but the payload that been generated is

actually completely invalid and your

work just stuck you cannot execute next

activity at all so it does make sense to

combine them in order to make sure that

you never leave activity with invalid

data generated by

AI so far if you will take a look at

this

workflows they don't possess any threat

to any of the engineers they quite

linear in some cases it's dark in some

cases you can even describe them in some

DSL language but at the end of the day

they adjust temporal workflows the only

thing you do is you replace some actions

inside this pipeline from normal

activities to the activities that go to

lolm and it just work there is no

additional magic and there is no

additional things you have to do except

of just assembling this

workflow the problem will start arising

once you'll start making this workflow

long enough and complex enough to start

processing more and more information

because modern day llm models they're

quite hungry for tokens and some models

can comprehend up to 1 million uh tokens

which is a lot of pages of the text so

if your worklow will be growing and

information going to be passing remember

that temporal stories all the payloads

that pass in and out your activities in

temporal history this will cause a very

nasty problem later on because you will

know you will never have a very

confidence that your worklow won't die

simply because some llm decided to write

a poem instead of giving you the correct

action

oh so the way how we decided solve it

and how you can solve it uh you have

multiple options option number one is to

don't do anything just write smaller

pipelines and in in many cases when

you're doing something very simplistic

it just works you don't necessarily care

and you can always retry or maybe just

ask eii to be a bit shorter in other

cases you can be a bit smarter and try

to use implicit data referencing where

you're going to implement your own data

converter and your own Interceptor layer

that you'll be detecting uh that payload

is larger than you want and uploading

that to external data store to be used

later but what we found working the best

for us and that's the moment why I want

to remember how promts work the moment

we found work the best for us is to

actually use explicit referencing

because at the end of the day all the

information that you put to eii all the

information that EI is trying to act on

this information is actually only needed

in a moment when you compile your prompt

you don't actually need any of this data

or any of user pii inside your workflow

so don't do it all together just keep it

outside and user referencing using some

links some IDs or database uh database

keys this becomes handy when you're

trying to assemble information from

multiple systems because by implementing

Universal referencing mechanism you can

actually combine information from

multiple parts of your application and

then just combine them all and resolve

them all in a one distinctive place

where you actually send information to

eii this way your workflows will be

completely free of any user information

and yet they will be used to orchestrate

this process all together okay so we

have dock workflows action workflows we

did the Der referencing probably nothing

else we want to do users are happy right

no users want not just to use a button

where they click on something expect

something they actually want to talk to

AI because that's how many of the users

in the market perceive eii uh today uh

what you see in the picture is actually

the give uh of one of the sessions we

had with one of the agents which based

on the user request perform additional

actions run to some of the activities

and pull information in to give the

correct answer but implementation of

this workflows might look simil complex

at start until you realize it's actually

not that complex because the model of

temporal allows you not only to write

linear workflows that begin and end but

also the workflows in which you can

Implement such thing as a main Loop by

making the main Loop and running the llm

activity in it and populating this loop

with information what the workflow

receives using signals you you can

Implement quite sophisticated system

that actually leaves on a site with user

and answers and answers his question in

real time at the same time you do

maintain whole state and at the same

time you do maintain whole control of

this process you can see how much token

llm already consumed you can see how

fast it responds and you can do the

actions based on that implementation

once again can be done in any language

but it can fit on a screen it's not that

large temporal makes it so easy because

by exposing you to the code level you

can simply implement this Loop like that

and voila it just

works also by doing that you're going to

get a lot of benefits from composability

model of temporal which means from the

user perspective while user send the

message and go the response back it

doesn't necessarily mean you have to do

a single action by going to AI you can

do something else specifically before

this message is sent to eii what you can

do is enrich it with additional context

replace this block with your pipeline

that connects to your knowledge Source

let's say information about your product

and voila you have customer support

board that now talks to you about your

product

specifically if you're trying to do long

conversations or conversations that span

for days and months maybe it's an email

threat sooner or later you're going to

enter situation that your context of

your agent will be overfilled and the

agent won't be able to act once again

because you run all of this process

inside temporal inside main Loop it is

exceptionally easy to detect this moment

and see how much tokens eii consumed and

use this approach to actually offload

the past conversations and restart the

new llm session with with conversational

history in essence all you do you

summarize the past messages you put them

back into the history or context or the

prompt and you run again the user won't

even notice however from agent

perspective he start from a blank slate

just knowing something from the past

conversation okay so we can talk

we can see what can we do and that's the

next uh and that's the next thing which

you probably going to learn when when

we'll be working with a lot of models

most of them right now expose a new way

for the models to communicate with your

environment and this called tool colon

at a screen you can see actually the

agent creating the tool on demand which

later is going to be executed to run

some analytical query based on user

request but what essentially you do to

make the tool call in inside the

temporal well again it's so easy it's

going to be probably the keyword in

today's presentation because once you

tell AI which type of functions you can

it can call and once you go these

functions as a result from a activity

all you have to do is simply map them to

one of your activities or one of your

workflows why not get the result and

push them back to the queue but be

careful what you want to do is to make

sure that the message that user send

cannot be sent in between these tool

calls otherwise LM model will die they

all want to get response immediately

without any interception well use the

blocking mechanism and Implement them

inside your signal method it's not that

complex code once again is quite

straightforward all you have to do is to

receive the list of tools you want to

call map them to parts of your system

such as activities maybe other workflows

maybe something else get the result back

and you can get this result in a

sequential fashion you can get this

results in parallel fashion to provides

you abstractions to do that in every

language get result back and push them

back in a message queue easy next call

that user do or eii invokes will receive

responses and eii will be able to act

based on that so if you if you have that

you have tool calling and you have

models you can talk you have models you

can communicate and that can look to

information that can execute arbitrary

action in some cases do retri even by

themselves like in many cases EI will

notice that tool call does not work let

me try to do it once again you might be

asking so what next what can you do with

these

pns and the question you can ask

yourself do we even need a user when you

running this workflows they uh they open

you while many challenges such as

hallucinated tool calls or skipping tool

calls they do possess they do open a

huge amount of ability to run workflows

or agents in our case agentic workflows

that will be executing by themselves

autonomously gathering information and

writing the solution as they go the main

problem which you're going to have in

this case is that while you communicate

with agent directly as a user uh you can

supervise him you can say yeah you know

what you're doing it wrong please try

something else don't call this tool G me

information from different part of the

system when you work when you run agents

autonomously you don't have the user so

what you should do you should replace

the user and what can you replace the

user inside the temporal workflows

another workflow so in this position you

are going to create your own supervision

layer which essentially going to play

the role of the user and this

supervision layer is going to be

responsible for receiving the command

and you still need some kind of trigger

either web hook or user or something

else but based on this command it will

automatically form the first prompt or

the first message and task the agent to

execute it the tricky part here could be

is how to

evaluate the the agent actually did some

avilable work first thing you might

notice in applications like that and

this is very nasty thing to see that

agents love to Loop because the moment

agent is making a mistake and trying to

correct it in a very different way but

still incorrect one he's going to make

two error calls and okay I'm agent I did

two error calls it is in my context well

what should I do next probably make

another call because it s so logical so

what you might see in some cases that

agent will try be calling your tools

over and over and over again especially

when tools have been dynamically created

and fail eventually because of the

self-destruct they will simply

overpopulate their context and will

offload well you just can't do anything

thanksfully because you run temporal you

orchestrate and you collect all the

information about all the tools that eii

calls all the payloads and all the

errors you can Implement many mechanism

how to detect that AI is not doing what

you want to do you can do that atically

by simply looking for the partents and

Tool call and seeing the loops when

something happens over and over or you

can do something more complex and use

another eii model or another eii agent

to actually look at the result and

decide if this agent is faulty or the

result has not been done to the purpose

you're going to be creating deeper and

deeper n uh deeper and deeper chains

which kind of lead us to next question

if you have one agent why can we have

many agents can we use them in

collaboration or can we use them by

embedding them into much deeper chains

of decisions and using them to run more

and more sophisticated workflows inside

your system well the answer is obviously

yes because again we deal in a temporal

conference and there is nothing

impossible inside the temporal you will

have to use mostly signals and workflows

to compose applications like that but

the composition of application that run

multiple agents in parallel or have the

collaboration factor is not that complex

and not that different at this video we

are seeing single agent that communic

Ates and delegates the task to other sub

agents that will be executing tools

written by other agents in order to

execute some arbitrary command and

return the result back to the hub which

user communicates to to implement PN

like that what we found works the best

for us and I'm pretty sure that it's

going to be a lot of patterns how you

can compose applications like this you

are creating the common supervision

layer or how we call it a gentic pool

assist a single place inside your

workflow which is the workflow that

essentially orchestrates the commands

between multiple child workflows or your

agents you delegate one of these

workflows to begin to be essentially The

Hub or arbitrary uh agent which you

communicate from outside that's your

entry point or maybe something which you

communicate with user and you let this

agent to communicate with other agents

so how can you do that well tool calling

from the perspective of your Hub agent

the delegation of the task is not that

much difference from actually calling a

single activity inside your system

all you have to do is to take the

payload that EI decided to put to this

delegated task and send it to other

agent and that's another nice place

where temporal is going to help you

tremendously temporal architecture and

especially the way how you write

workflows allows you to actually say

that this tool call is not an activity

this tool call is a signal and you can

use this signal to send the command to

parent supervisor Loop uh pool that will

automatically spawn the child workflow

or your agent deleg task to the child

workl agent and will wait for the

resultant signal that will be containing

the resultant payload take this payload

and send it back to your hop agent and

you have the ability to delegate tasks

while your hop agent doesn't even

actually know how it works it just think

it did a tool call which was very smart

and did some very F uh did some very uh

good work

inide another thing which you can do and

this is something which we experimented

a lot is the ability to start composing

these agents and composing them in

combination with more deterministic and

more simplistic functions you might

create the process of code generation or

code analysis which spans for very long

time some parts of this process are very

deterministic let's say do a git pool

some parts of this process are very

simplistic let's do simple AI llm

analysis we don't really need agents for

that but composing them together and

using ability of temporal to converge

few of the abstraction but yet very

powerful obstructions to one common

system inside the workflow you can start

creating deeper and deeper and deeper

networks that are able to execute much

more complex commands but yet while

you're doing that you still retain all

the visibility you still know every step

that agent took you still know every

step that has been delegated uh or where

the error happened and you can correlate

for that and you can compensate for that

once again if you're trying to implement

that at the end of the day all you do

you're create create a number of

processes that depend on other processes

that depend on other processes doing

that classically possible you can do it

in many languages some languages

specifically designed for that like

maybe airong but temporal makes it so

easy to use in any stack and temporal

makes it as well durable because even if

you shut down your worker if you kill

your agents it will still complete well

thanks to their

model so how do we use Solutions like

that how do we use it for our own

purpose proc we create applications for

ourself and our customers that are able

to solve arbitrary tasks that previously

would require engineering time which

actually no one want to spend do you

really want to have your senior engineer

creating your self mopper for Excel

every week because you have a new form

received from your vendor in this case

we found that there is a huge amount of

pance and a huge amount of parts of the

applications which we don't actually

want to do so let's ask agents to do

them we can ask agents to create them to

validate them execute them and test them

as they go spending not weeks but

minutes to get the working

result so why do we think temporal is

the best solution for youi well if if

presentation didn't say about it

explicitly they made in a completely

opposite spectrums this uh this very

powerful abstraction that allows you to

run NLP and generally speaking thinking

to execute some action is kind of

pointless to run on itself you need to

embed it to something and temporal

provides a very rich environment to make

this edance so easy and so simple and at

the same time so durable so combining

them together allows you to create very

complex chains and very complex

applications that can be pulling

information from many sources for the

span of minutes hours maybe days and

then execu on them to provide you result

result I can talk about that for days

maybe weeks but if you guys want to chat

more please visit us at our Boo or let's

get a drink later today

thank

you I guess any

questions have you run it and how much

does it cost a lot but how much how much

engineering time cost no I I I'm just

curious

um sign significant amount for for like

a mediumsized company but the way that

we can iterate on a pace that we never

seen before like in many cases we can

receive the working PC in 20 minutes on

a call with stakeholders like this

process will involve five people back in

the

day we don't want to use Ai and we don't

want to use it for everything because we

kind of don't trust it in many cases but

it's always going to be the use cases in

your work and your process which you

just don't care

mappings API calls data transformation

something you can easily verify and see

by your own

eyes I think

there well you evaluate it sorry

question how you evaluate a gentic

pipeline well that's a beautiful part

about temporal from the a from the

temporal perspective a gentic pipeline

is a huge workflow which yes you can

test by the steps and you can make sure

that all them work from the workflow

perspective however from the user when

you send the command it's just a

function so you evaluate result by

evaluating the quality of the function

result if if you're doing something that

is going to be fetching information from

your database like rock pipeline there

is a bunch of solution on a market which

can run it and see how well it

correlates with the actual information

so you evaluate result without actually

evaluating all the steps taken inside

you kind of don't even worry about them

agent is willing to do what he thinks is

the best

yeah

sorry why would we choose using the

child agent

if uh the good question I mean the

reason for that is because the context

uh window of each of the agents is

limited it's quite large but still it's

limited so if you have to perform a

simple action that that can only be done

based on information that can be

collected from many different parts of

the system by just collecting this

information you're already going to over

pollute the memory of the agent and he's

going to just be working much slower and

much harder and much more expensive so

instead of that you want to isolate this

process and only get the

result

run it can be in different

language how do we run what sorry yeah

generat

to we run it one runs uh we run it uh

right in the temporal so like the thing

I didn't say in that presentation the

referencing layer which was a single

slide is actually where we spend the

most of our time because when agent

defines the tool we actually defines it

as part of our system which we use

temporal at syncing layer to sync it to

our run times which makes it immediately

available for EI to use so basically by

eii creating the tool inside the system

it automatically declared it and makes

it available to EI if it's been

connected uh well in Declaration of this

agent it can be any language at the end

of the day and we quite we think that

eventually the language when you work

with the application is probably going

to M the

L um is there any other like temporal

specific limitations or things that you

ran into that you didn't

expect um there there there's few but I

mean they're not that large and they're

not that much different from what you

will do in temporal if you run a very

long decision chain the agent that can

spawn for many files and run many

iterations you're eventually going to

run to the con uh to the position then

when your workload just has to be

restarted and restarting the workflow

that has hunger potentially of child

workflows in a tree is quite a challenge

so you might need to implement your own

mechanism to properly collapse all these

workflows and restart them over in the

next iteration

mention at the beginning that

well right now we validated by the user

observation you just test it right in a

mix so you see if it works or not we

don't trying to create well huge

application servers using these tool

calls we just create a simple

Integrations which are much easier to

test but at the end of the day you can

actually feed this tool back into the

agent and that's another property of the

reference layer we create every tool

that agent create they actually become

part of the knowledge base that agent

can use to learn to create new tools or

read existing Tools in analyze if they

work correctly or they can just generate

the

test we have one more

question en

well you can move the llm call to the

separate task que and have a rate limit

on this task

que that's about it but in our case we

actually have our own backend that that

encapsulates all the llm calls where we

have additional priority queue with

additional rate

limiting it's it's it's a simple kind of

side effect that we allow multiple us

multiple organiz ations use the same

model but at the same time we can split

the model used between organizations so

they never collapse in this

regard but at the end of the day even if

you don't have it and it fails well it's

just going to be retried

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube Transcript:A Practical and Tactical Approach to Temporal and AI | Replay 2024

Video Transcript

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
A Practical and Tactical Approach to Temporal and AI | Replay 2024