YouTube Transcript:
A Practical and Tactical Approach to Temporal and AI | Replay 2024
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
View:
um hello replay attenders glad to see
you today um as you probably can say by
my accent I'm not from B I'm from bellus
which is a quite different place uh but
the reason today is uh because 5 years
ago a single tweet actually changed the
whole direction of the way how I think
and how I perceive the software
implementations and the software
architecture and today I'd like to tell
you the story from this tweet to the
moment once we start implementing EI
workflows in our application
in applications for our
customers so my name is Anon I'm the CTO
of company spal Scout uh we provide
software development bus uh software
development for our customers around the
globe for around 15 years uh and uh by
person who is maintaining the team and
tasked to make sure that we do good job
as a tech leader I have to always make
sure that the job and the task the tools
we use is always optimal and we don't
spend additional time on doing the
typical bootstrap code or anything that
well we don't want to do as a passionate
coder I love to mitigate that by
actually creating my own tools and for
the span of my career I created a number
of Open Source instruments well in
closed Source instruments everything
from Frameworks to orms to database
layers template and agents in DSL and
Etc But as time been passing and our
client pool been growing and the
complexity been growing with it we soon
realized that even we have our own
toolkit and the team which knows that to
use it back in the day it was mostly PHP
we have been lack in one of the very
large abstractions which been seemingly
hard to get and this abstraction as you
guys know today by this presentation and
this conference is workflow engine so
what the first logical solution that
every engineer will do if he cannot get
the instrument he want in his stack well
let's build it itself very smart idea so
we started doing the research and
started to look in in the ways how we
can start implementing the worklow
engine in our products we used to work
with Amazon swf and it looked like a
very nice solution for many things but
it still was quite proprietary and hard
to use in ecosystems like open source or
outside of
Amazon uh at this moment uh once we once
I had one first prototypes we soon
realized that amount of amount of edge
cases that we uncover by running this
engine just become grow exponentially
every day and every moment we've seen
more and more uh problems arise from the
things we expect just to work that's the
moment where I decided to come back and
try to do additional research and see
maybe there is a new tools in the market
maybe there is a new Solutions or some
better pattern around the time uh I
found a similarly well uh simly well
experienced guy in Twitter who was
talking all about workflows and durable
executions the PO they can Implement and
application that can run for the span of
days and months and Etc so I thought
myself okay I mean I he has his solution
I have my own stack why not try to talk
to him and see if he can collaborate to
bring it in so I wrote a Twitter message
and S to my surprise this person said
yeah let's talk so five years ago by
conversation with ma Maxim fatv it
kicked off The Well quite long
collaboration in which we created the
temporal phps DEC and we began to use an
adopt temporal for our own products and
for the products of our
customers so at this moment everything
looks very nice and Coy we have a one
stack we have powerful workflow engine
what else can you dream about what else
do you want and that's and that's about
the moment where gpt3 dropped on a
market once you see this model and once
you realize what state-of-the-art llm
can do they can interpret your user
requests they can make hiu making jokes
or help you to process any information
it become seem it become very obvious
that there is an immerse potential how
we can use these solutions to build
something more complex
uh
yet we seen that while implementing this
solution and building our first
pipelines well summarizing tweets making
the pool request uh reviews and Etc we
still been seeing this pattern over and
over that even if you have this powerful
technology powered by state-of-the-art
models built by very powerful companies
in the world the actual process of
implementation is not that different
from 20 years ago you still go through
the planning phase design phase
implement ation and iteration so we kind
of seen a situation where we have these
keys from Lamborghini but we use it just
to drive to Costco why we still have
these powerful models that we cannot
actually use to enhance our main work
these days you obviously we have copilot
and we have any uh many other tools but
we decided to actually come back to the
drawing board and challenge ourselves to
a bit different question can we create
the software that can not only be
programmed ahead of the time by
Engineers but the soft that can actually
program itself and expand its own
functionality as it go with
collaboration with user and why not
maybe by itself just trying to see how
it can be optimized by this moment we
clearly know that this is going to be a
very challenging task it's going to be a
very complex architecture which is going
to be spawning for many domains and many
uh parts of the system that has to be
collaborating seamlessly well only if
you have some nice engine that will help
us to cope with this complexity with
this engine is obviously temporal since
I'm speaking here today so let's try to
dive and see what we can do in terms of
llm payloads and llm workflows within
your temporal
application so the first what we have to
do to talk about that is to properly
Define the boundaries how we actually
Define the llm calls within our
workflows and surprisingly in terms of
the actual workflow implementation and
in terms of the actual workflow uh data
flow the llm can be defined quite easily
it's a blackbox and many Engineers
actually Define them as a blackbox as
well it's it's very powerful and magical
abstraction you put some data in you get
some data out sometimes this data is
good sometimes this data is just garbage
well that's what but that's something we
have to live with but at the same time
uh as you guys know when you use llms
while this solution has been extremely
powerful and extremely versatile at the
same time it's quite unreliable you will
be seeing everything from failers on API
calls to timeouts to the plain situation
when EI just saying oh you know what I
don't want to do this job well what you
can do about the situation so if you'll
take a look at this if you'll take a
look at this implementation partn you'll
see that one side of equation you have
extremely powerful abstraction which is
highly UND deterministic highly
unreliable and yet extremely powerful on
another side of the equation you have
engine which was designed to actually
mitigate things like that to write
deterministic and very durable workflows
and Implement them in a quite easy
fashion so if you take a look it kind of
makes total sense to combine them
together you use one engine to actually
mitigate issues done by
another if you're seeking to implement a
lemon new
application you're most likely going to
start with two quite simplistic patterns
which in many cases probably going to be
80 or 90% of your whole llm workloads
you're going to start with rock
pipelines the pipelines the designed to
go to some data source maybe Vector
database maybe external
uh website or anything and gather
information which is the most relevant
to the user query and return this
information which in a in a way that
user or maybe other AI can comprehend
and act on from Another Side you have
type of the workloads which kind of
doing the same the only main difference
is that now instead of doing the return
of the text to the user you can actually
perform some kind of arbitrary action
based on a decision done by llm on
behalf of the user it's as easy that as
send email asking to cancel your order
well any order being canceled and
account deleted well be careful what you
wish for when you work with
llm see looking deeper into rock
pipelines uh and maybe in pretty much
every paper which you're going to find
about rock pipelines uh you will see
they have like some distinctive steps
you always have parts that will be
collecting and aggregating normalizing
chuning Dot embedding into a vector
store maybe reshuffling or clustering it
from another side you'll have part part
s which are responsible essentially for
retrieving that and pushing the answer
to the user but what is the most curious
part about rock pipelines if you take a
look how they've been displayed inside
these papers and inside pretty much
every article people wrote they all have
distinctive steps they all have these
blocks and arrows between them which
surprisingly look exactly what we need
it is a simply data workflow and data
passing I'll be sh examples today in PHP
but I think I do that mostly for visual
purpose purposes it can be easily done
on any language you love in Python
nodejs temporal allows you to switch
Stacks quite easily but if you're going
to implement the rock pipeline the very
simplistic approach is most likely going
to look like that it doesn't require
much thinking it's just a number of
steps some of the step will be used in
LM like to summarize query some of the
steps are going to be going to external
source to find this information and push
it back into a pipeline they can spun
for many many actions have some
branching or some additional
conditions the action pipe plant once
again they're not that much different
from temporal perspective the only major
difference is that instead of giving the
response back to user you're trying to
act based on this response and temporal
makes this approach quite simplistic
because when you're trying to act you're
trying to execute something within your
environment and temporal already
connects to all of your environment so
gluing that to your activities and
calling one of your services is
extremely
simple if you'll take a look on LM
activities and this will become
important in a lot of slides you will
also notice right inter syt every time
you're trying to make an llm call First
Step what you're going to do is to
assemble the context that will be sent
to llm or prompt as we call it this
context in a simplistic terms can be
represented as simplistic template you
just have a number of variables number
of things you found in a knowledge base
something you found from the user maybe
on internet who knows you put them all
together and you just send them and then
you just send them to eii you wait for
this response from EI and then you
interpret the result in some in some
structured form the first thing we
noticed while doing pipelines like that
and writing actions like that that it is
actually extremely important to validate
the eii response within a single
activity you can generally speaking get
the EI response send it back to temporal
and then do the execution in different
activity but the problem is you can't
actually trust eii so what will start
happening is that in some cases your
activity will be executed successfully
everything is okay your activity is done
but the payload that been generated is
actually completely invalid and your
work just stuck you cannot execute next
activity at all so it does make sense to
combine them in order to make sure that
you never leave activity with invalid
data generated by
AI so far if you will take a look at
this
workflows they don't possess any threat
to any of the engineers they quite
linear in some cases it's dark in some
cases you can even describe them in some
DSL language but at the end of the day
they adjust temporal workflows the only
thing you do is you replace some actions
inside this pipeline from normal
activities to the activities that go to
lolm and it just work there is no
additional magic and there is no
additional things you have to do except
of just assembling this
workflow the problem will start arising
once you'll start making this workflow
long enough and complex enough to start
processing more and more information
because modern day llm models they're
quite hungry for tokens and some models
can comprehend up to 1 million uh tokens
which is a lot of pages of the text so
if your worklow will be growing and
information going to be passing remember
that temporal stories all the payloads
that pass in and out your activities in
temporal history this will cause a very
nasty problem later on because you will
know you will never have a very
confidence that your worklow won't die
simply because some llm decided to write
a poem instead of giving you the correct
action
oh so the way how we decided solve it
and how you can solve it uh you have
multiple options option number one is to
don't do anything just write smaller
pipelines and in in many cases when
you're doing something very simplistic
it just works you don't necessarily care
and you can always retry or maybe just
ask eii to be a bit shorter in other
cases you can be a bit smarter and try
to use implicit data referencing where
you're going to implement your own data
converter and your own Interceptor layer
that you'll be detecting uh that payload
is larger than you want and uploading
that to external data store to be used
later but what we found working the best
for us and that's the moment why I want
to remember how promts work the moment
we found work the best for us is to
actually use explicit referencing
because at the end of the day all the
information that you put to eii all the
information that EI is trying to act on
this information is actually only needed
in a moment when you compile your prompt
you don't actually need any of this data
or any of user pii inside your workflow
so don't do it all together just keep it
outside and user referencing using some
links some IDs or database uh database
keys this becomes handy when you're
trying to assemble information from
multiple systems because by implementing
Universal referencing mechanism you can
actually combine information from
multiple parts of your application and
then just combine them all and resolve
them all in a one distinctive place
where you actually send information to
eii this way your workflows will be
completely free of any user information
and yet they will be used to orchestrate
this process all together okay so we
have dock workflows action workflows we
did the Der referencing probably nothing
else we want to do users are happy right
no users want not just to use a button
where they click on something expect
something they actually want to talk to
AI because that's how many of the users
in the market perceive eii uh today uh
what you see in the picture is actually
the give uh of one of the sessions we
had with one of the agents which based
on the user request perform additional
actions run to some of the activities
and pull information in to give the
correct answer but implementation of
this workflows might look simil complex
at start until you realize it's actually
not that complex because the model of
temporal allows you not only to write
linear workflows that begin and end but
also the workflows in which you can
Implement such thing as a main Loop by
making the main Loop and running the llm
activity in it and populating this loop
with information what the workflow
receives using signals you you can
Implement quite sophisticated system
that actually leaves on a site with user
and answers and answers his question in
real time at the same time you do
maintain whole state and at the same
time you do maintain whole control of
this process you can see how much token
llm already consumed you can see how
fast it responds and you can do the
actions based on that implementation
once again can be done in any language
but it can fit on a screen it's not that
large temporal makes it so easy because
by exposing you to the code level you
can simply implement this Loop like that
and voila it just
works also by doing that you're going to
get a lot of benefits from composability
model of temporal which means from the
user perspective while user send the
message and go the response back it
doesn't necessarily mean you have to do
a single action by going to AI you can
do something else specifically before
this message is sent to eii what you can
do is enrich it with additional context
replace this block with your pipeline
that connects to your knowledge Source
let's say information about your product
and voila you have customer support
board that now talks to you about your
product
specifically if you're trying to do long
conversations or conversations that span
for days and months maybe it's an email
threat sooner or later you're going to
enter situation that your context of
your agent will be overfilled and the
agent won't be able to act once again
because you run all of this process
inside temporal inside main Loop it is
exceptionally easy to detect this moment
and see how much tokens eii consumed and
use this approach to actually offload
the past conversations and restart the
new llm session with with conversational
history in essence all you do you
summarize the past messages you put them
back into the history or context or the
prompt and you run again the user won't
even notice however from agent
perspective he start from a blank slate
just knowing something from the past
conversation okay so we can talk
we can see what can we do and that's the
next uh and that's the next thing which
you probably going to learn when when
we'll be working with a lot of models
most of them right now expose a new way
for the models to communicate with your
environment and this called tool colon
at a screen you can see actually the
agent creating the tool on demand which
later is going to be executed to run
some analytical query based on user
request but what essentially you do to
make the tool call in inside the
temporal well again it's so easy it's
going to be probably the keyword in
today's presentation because once you
tell AI which type of functions you can
it can call and once you go these
functions as a result from a activity
all you have to do is simply map them to
one of your activities or one of your
workflows why not get the result and
push them back to the queue but be
careful what you want to do is to make
sure that the message that user send
cannot be sent in between these tool
calls otherwise LM model will die they
all want to get response immediately
without any interception well use the
blocking mechanism and Implement them
inside your signal method it's not that
complex code once again is quite
straightforward all you have to do is to
receive the list of tools you want to
call map them to parts of your system
such as activities maybe other workflows
maybe something else get the result back
and you can get this result in a
sequential fashion you can get this
results in parallel fashion to provides
you abstractions to do that in every
language get result back and push them
back in a message queue easy next call
that user do or eii invokes will receive
responses and eii will be able to act
based on that so if you if you have that
you have tool calling and you have
models you can talk you have models you
can communicate and that can look to
information that can execute arbitrary
action in some cases do retri even by
themselves like in many cases EI will
notice that tool call does not work let
me try to do it once again you might be
asking so what next what can you do with
these
pns and the question you can ask
yourself do we even need a user when you
running this workflows they uh they open
you while many challenges such as
hallucinated tool calls or skipping tool
calls they do possess they do open a
huge amount of ability to run workflows
or agents in our case agentic workflows
that will be executing by themselves
autonomously gathering information and
writing the solution as they go the main
problem which you're going to have in
this case is that while you communicate
with agent directly as a user uh you can
supervise him you can say yeah you know
what you're doing it wrong please try
something else don't call this tool G me
information from different part of the
system when you work when you run agents
autonomously you don't have the user so
what you should do you should replace
the user and what can you replace the
user inside the temporal workflows
another workflow so in this position you
are going to create your own supervision
layer which essentially going to play
the role of the user and this
supervision layer is going to be
responsible for receiving the command
and you still need some kind of trigger
either web hook or user or something
else but based on this command it will
automatically form the first prompt or
the first message and task the agent to
execute it the tricky part here could be
is how to
evaluate the the agent actually did some
avilable work first thing you might
notice in applications like that and
this is very nasty thing to see that
agents love to Loop because the moment
agent is making a mistake and trying to
correct it in a very different way but
still incorrect one he's going to make
two error calls and okay I'm agent I did
two error calls it is in my context well
what should I do next probably make
another call because it s so logical so
what you might see in some cases that
agent will try be calling your tools
over and over and over again especially
when tools have been dynamically created
and fail eventually because of the
self-destruct they will simply
overpopulate their context and will
offload well you just can't do anything
thanksfully because you run temporal you
orchestrate and you collect all the
information about all the tools that eii
calls all the payloads and all the
errors you can Implement many mechanism
how to detect that AI is not doing what
you want to do you can do that atically
by simply looking for the partents and
Tool call and seeing the loops when
something happens over and over or you
can do something more complex and use
another eii model or another eii agent
to actually look at the result and
decide if this agent is faulty or the
result has not been done to the purpose
you're going to be creating deeper and
deeper n uh deeper and deeper chains
which kind of lead us to next question
if you have one agent why can we have
many agents can we use them in
collaboration or can we use them by
embedding them into much deeper chains
of decisions and using them to run more
and more sophisticated workflows inside
your system well the answer is obviously
yes because again we deal in a temporal
conference and there is nothing
impossible inside the temporal you will
have to use mostly signals and workflows
to compose applications like that but
the composition of application that run
multiple agents in parallel or have the
collaboration factor is not that complex
and not that different at this video we
are seeing single agent that communic
Ates and delegates the task to other sub
agents that will be executing tools
written by other agents in order to
execute some arbitrary command and
return the result back to the hub which
user communicates to to implement PN
like that what we found works the best
for us and I'm pretty sure that it's
going to be a lot of patterns how you
can compose applications like this you
are creating the common supervision
layer or how we call it a gentic pool
assist a single place inside your
workflow which is the workflow that
essentially orchestrates the commands
between multiple child workflows or your
agents you delegate one of these
workflows to begin to be essentially The
Hub or arbitrary uh agent which you
communicate from outside that's your
entry point or maybe something which you
communicate with user and you let this
agent to communicate with other agents
so how can you do that well tool calling
from the perspective of your Hub agent
the delegation of the task is not that
much difference from actually calling a
single activity inside your system
all you have to do is to take the
payload that EI decided to put to this
delegated task and send it to other
agent and that's another nice place
where temporal is going to help you
tremendously temporal architecture and
especially the way how you write
workflows allows you to actually say
that this tool call is not an activity
this tool call is a signal and you can
use this signal to send the command to
parent supervisor Loop uh pool that will
automatically spawn the child workflow
or your agent deleg task to the child
workl agent and will wait for the
resultant signal that will be containing
the resultant payload take this payload
and send it back to your hop agent and
you have the ability to delegate tasks
while your hop agent doesn't even
actually know how it works it just think
it did a tool call which was very smart
and did some very F uh did some very uh
good work
inide another thing which you can do and
this is something which we experimented
a lot is the ability to start composing
these agents and composing them in
combination with more deterministic and
more simplistic functions you might
create the process of code generation or
code analysis which spans for very long
time some parts of this process are very
deterministic let's say do a git pool
some parts of this process are very
simplistic let's do simple AI llm
analysis we don't really need agents for
that but composing them together and
using ability of temporal to converge
few of the abstraction but yet very
powerful obstructions to one common
system inside the workflow you can start
creating deeper and deeper and deeper
networks that are able to execute much
more complex commands but yet while
you're doing that you still retain all
the visibility you still know every step
that agent took you still know every
step that has been delegated uh or where
the error happened and you can correlate
for that and you can compensate for that
once again if you're trying to implement
that at the end of the day all you do
you're create create a number of
processes that depend on other processes
that depend on other processes doing
that classically possible you can do it
in many languages some languages
specifically designed for that like
maybe airong but temporal makes it so
easy to use in any stack and temporal
makes it as well durable because even if
you shut down your worker if you kill
your agents it will still complete well
thanks to their
model so how do we use Solutions like
that how do we use it for our own
purpose proc we create applications for
ourself and our customers that are able
to solve arbitrary tasks that previously
would require engineering time which
actually no one want to spend do you
really want to have your senior engineer
creating your self mopper for Excel
every week because you have a new form
received from your vendor in this case
we found that there is a huge amount of
pance and a huge amount of parts of the
applications which we don't actually
want to do so let's ask agents to do
them we can ask agents to create them to
validate them execute them and test them
as they go spending not weeks but
minutes to get the working
result so why do we think temporal is
the best solution for youi well if if
presentation didn't say about it
explicitly they made in a completely
opposite spectrums this uh this very
powerful abstraction that allows you to
run NLP and generally speaking thinking
to execute some action is kind of
pointless to run on itself you need to
embed it to something and temporal
provides a very rich environment to make
this edance so easy and so simple and at
the same time so durable so combining
them together allows you to create very
complex chains and very complex
applications that can be pulling
information from many sources for the
span of minutes hours maybe days and
then execu on them to provide you result
result I can talk about that for days
maybe weeks but if you guys want to chat
more please visit us at our Boo or let's
get a drink later today
thank
you I guess any
questions have you run it and how much
does it cost a lot but how much how much
engineering time cost no I I I'm just
curious
um sign significant amount for for like
a mediumsized company but the way that
we can iterate on a pace that we never
seen before like in many cases we can
receive the working PC in 20 minutes on
a call with stakeholders like this
process will involve five people back in
the
day we don't want to use Ai and we don't
want to use it for everything because we
kind of don't trust it in many cases but
it's always going to be the use cases in
your work and your process which you
just don't care
mappings API calls data transformation
something you can easily verify and see
by your own
eyes I think
there well you evaluate it sorry
question how you evaluate a gentic
pipeline well that's a beautiful part
about temporal from the a from the
temporal perspective a gentic pipeline
is a huge workflow which yes you can
test by the steps and you can make sure
that all them work from the workflow
perspective however from the user when
you send the command it's just a
function so you evaluate result by
evaluating the quality of the function
result if if you're doing something that
is going to be fetching information from
your database like rock pipeline there
is a bunch of solution on a market which
can run it and see how well it
correlates with the actual information
so you evaluate result without actually
evaluating all the steps taken inside
you kind of don't even worry about them
agent is willing to do what he thinks is
the best
yeah
sorry why would we choose using the
child agent
if uh the good question I mean the
reason for that is because the context
uh window of each of the agents is
limited it's quite large but still it's
limited so if you have to perform a
simple action that that can only be done
based on information that can be
collected from many different parts of
the system by just collecting this
information you're already going to over
pollute the memory of the agent and he's
going to just be working much slower and
much harder and much more expensive so
instead of that you want to isolate this
process and only get the
result
run it can be in different
language how do we run what sorry yeah
generat
to we run it one runs uh we run it uh
right in the temporal so like the thing
I didn't say in that presentation the
referencing layer which was a single
slide is actually where we spend the
most of our time because when agent
defines the tool we actually defines it
as part of our system which we use
temporal at syncing layer to sync it to
our run times which makes it immediately
available for EI to use so basically by
eii creating the tool inside the system
it automatically declared it and makes
it available to EI if it's been
connected uh well in Declaration of this
agent it can be any language at the end
of the day and we quite we think that
eventually the language when you work
with the application is probably going
to M the
L um is there any other like temporal
specific limitations or things that you
ran into that you didn't
expect um there there there's few but I
mean they're not that large and they're
not that much different from what you
will do in temporal if you run a very
long decision chain the agent that can
spawn for many files and run many
iterations you're eventually going to
run to the con uh to the position then
when your workload just has to be
restarted and restarting the workflow
that has hunger potentially of child
workflows in a tree is quite a challenge
so you might need to implement your own
mechanism to properly collapse all these
workflows and restart them over in the
next iteration
mention at the beginning that
well right now we validated by the user
observation you just test it right in a
mix so you see if it works or not we
don't trying to create well huge
application servers using these tool
calls we just create a simple
Integrations which are much easier to
test but at the end of the day you can
actually feed this tool back into the
agent and that's another property of the
reference layer we create every tool
that agent create they actually become
part of the knowledge base that agent
can use to learn to create new tools or
read existing Tools in analyze if they
work correctly or they can just generate
the
test we have one more
question en
well you can move the llm call to the
separate task que and have a rate limit
on this task
que that's about it but in our case we
actually have our own backend that that
encapsulates all the llm calls where we
have additional priority queue with
additional rate
limiting it's it's it's a simple kind of
side effect that we allow multiple us
multiple organiz ations use the same
model but at the same time we can split
the model used between organizations so
they never collapse in this
regard but at the end of the day even if
you don't have it and it fails well it's
just going to be retried
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.
Works with YouTube, Coursera, Udemy and more educational platforms
Get Instant Transcripts: Just Edit the Domain in Your Address Bar!
YouTube
←
→
↻
https://www.youtube.com/watch?v=UF8uR6Z6KLc
YoutubeToText
←
→
↻
https://youtubetotext.net/watch?v=UF8uR6Z6KLc