YouTube Transcript:
A Practical and Tactical Approach to Temporal and AI | Replay 2024
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
um hello replay attenders glad to see you today um as you probably can say by my accent I'm not from B I'm from bellus which is a quite different place uh but the reason today is uh because 5 years ago a single tweet actually changed the whole direction of the way how I think and how I perceive the software implementations and the software architecture and today I'd like to tell you the story from this tweet to the moment once we start implementing EI workflows in our application in applications for our customers so my name is Anon I'm the CTO of company spal Scout uh we provide software development bus uh software development for our customers around the globe for around 15 years uh and uh by person who is maintaining the team and tasked to make sure that we do good job as a tech leader I have to always make sure that the job and the task the tools we use is always optimal and we don't spend additional time on doing the typical bootstrap code or anything that well we don't want to do as a passionate coder I love to mitigate that by actually creating my own tools and for the span of my career I created a number of Open Source instruments well in closed Source instruments everything from Frameworks to orms to database layers template and agents in DSL and Etc But as time been passing and our client pool been growing and the complexity been growing with it we soon realized that even we have our own toolkit and the team which knows that to use it back in the day it was mostly PHP we have been lack in one of the very large abstractions which been seemingly hard to get and this abstraction as you guys know today by this presentation and this conference is workflow engine so what the first logical solution that every engineer will do if he cannot get the instrument he want in his stack well let's build it itself very smart idea so we started doing the research and started to look in in the ways how we can start implementing the worklow engine in our products we used to work with Amazon swf and it looked like a very nice solution for many things but it still was quite proprietary and hard to use in ecosystems like open source or outside of Amazon uh at this moment uh once we once I had one first prototypes we soon realized that amount of amount of edge cases that we uncover by running this engine just become grow exponentially every day and every moment we've seen more and more uh problems arise from the things we expect just to work that's the moment where I decided to come back and try to do additional research and see maybe there is a new tools in the market maybe there is a new Solutions or some better pattern around the time uh I found a similarly well uh simly well experienced guy in Twitter who was talking all about workflows and durable executions the PO they can Implement and application that can run for the span of days and months and Etc so I thought myself okay I mean I he has his solution I have my own stack why not try to talk to him and see if he can collaborate to bring it in so I wrote a Twitter message and S to my surprise this person said yeah let's talk so five years ago by conversation with ma Maxim fatv it kicked off The Well quite long collaboration in which we created the temporal phps DEC and we began to use an adopt temporal for our own products and for the products of our customers so at this moment everything looks very nice and Coy we have a one stack we have powerful workflow engine what else can you dream about what else do you want and that's and that's about the moment where gpt3 dropped on a market once you see this model and once you realize what state-of-the-art llm can do they can interpret your user requests they can make hiu making jokes or help you to process any information it become seem it become very obvious that there is an immerse potential how we can use these solutions to build something more complex uh yet we seen that while implementing this solution and building our first pipelines well summarizing tweets making the pool request uh reviews and Etc we still been seeing this pattern over and over that even if you have this powerful technology powered by state-of-the-art models built by very powerful companies in the world the actual process of implementation is not that different from 20 years ago you still go through the planning phase design phase implement ation and iteration so we kind of seen a situation where we have these keys from Lamborghini but we use it just to drive to Costco why we still have these powerful models that we cannot actually use to enhance our main work these days you obviously we have copilot and we have any uh many other tools but we decided to actually come back to the drawing board and challenge ourselves to a bit different question can we create the software that can not only be programmed ahead of the time by Engineers but the soft that can actually program itself and expand its own functionality as it go with collaboration with user and why not maybe by itself just trying to see how it can be optimized by this moment we clearly know that this is going to be a very challenging task it's going to be a very complex architecture which is going to be spawning for many domains and many uh parts of the system that has to be collaborating seamlessly well only if you have some nice engine that will help us to cope with this complexity with this engine is obviously temporal since I'm speaking here today so let's try to dive and see what we can do in terms of llm payloads and llm workflows within your temporal application so the first what we have to do to talk about that is to properly Define the boundaries how we actually Define the llm calls within our workflows and surprisingly in terms of the actual workflow implementation and in terms of the actual workflow uh data flow the llm can be defined quite easily it's a blackbox and many Engineers actually Define them as a blackbox as well it's it's very powerful and magical abstraction you put some data in you get some data out sometimes this data is good sometimes this data is just garbage well that's what but that's something we have to live with but at the same time uh as you guys know when you use llms while this solution has been extremely powerful and extremely versatile at the same time it's quite unreliable you will be seeing everything from failers on API calls to timeouts to the plain situation when EI just saying oh you know what I don't want to do this job well what you can do about the situation so if you'll take a look at this if you'll take a look at this implementation partn you'll see that one side of equation you have extremely powerful abstraction which is highly UND deterministic highly unreliable and yet extremely powerful on another side of the equation you have engine which was designed to actually mitigate things like that to write deterministic and very durable workflows and Implement them in a quite easy fashion so if you take a look it kind of makes total sense to combine them together you use one engine to actually mitigate issues done by another if you're seeking to implement a lemon new application you're most likely going to start with two quite simplistic patterns which in many cases probably going to be 80 or 90% of your whole llm workloads you're going to start with rock pipelines the pipelines the designed to go to some data source maybe Vector database maybe external uh website or anything and gather information which is the most relevant to the user query and return this information which in a in a way that user or maybe other AI can comprehend and act on from Another Side you have type of the workloads which kind of doing the same the only main difference is that now instead of doing the return of the text to the user you can actually perform some kind of arbitrary action based on a decision done by llm on behalf of the user it's as easy that as send email asking to cancel your order well any order being canceled and account deleted well be careful what you wish for when you work with llm see looking deeper into rock pipelines uh and maybe in pretty much every paper which you're going to find about rock pipelines uh you will see they have like some distinctive steps you always have parts that will be collecting and aggregating normalizing chuning Dot embedding into a vector store maybe reshuffling or clustering it from another side you'll have part part s which are responsible essentially for retrieving that and pushing the answer to the user but what is the most curious part about rock pipelines if you take a look how they've been displayed inside these papers and inside pretty much every article people wrote they all have distinctive steps they all have these blocks and arrows between them which surprisingly look exactly what we need it is a simply data workflow and data passing I'll be sh examples today in PHP but I think I do that mostly for visual purpose purposes it can be easily done on any language you love in Python nodejs temporal allows you to switch Stacks quite easily but if you're going to implement the rock pipeline the very simplistic approach is most likely going to look like that it doesn't require much thinking it's just a number of steps some of the step will be used in LM like to summarize query some of the steps are going to be going to external source to find this information and push it back into a pipeline they can spun for many many actions have some branching or some additional conditions the action pipe plant once again they're not that much different from temporal perspective the only major difference is that instead of giving the response back to user you're trying to act based on this response and temporal makes this approach quite simplistic because when you're trying to act you're trying to execute something within your environment and temporal already connects to all of your environment so gluing that to your activities and calling one of your services is extremely simple if you'll take a look on LM activities and this will become important in a lot of slides you will also notice right inter syt every time you're trying to make an llm call First Step what you're going to do is to assemble the context that will be sent to llm or prompt as we call it this context in a simplistic terms can be represented as simplistic template you just have a number of variables number of things you found in a knowledge base something you found from the user maybe on internet who knows you put them all together and you just send them and then you just send them to eii you wait for this response from EI and then you interpret the result in some in some structured form the first thing we noticed while doing pipelines like that and writing actions like that that it is actually extremely important to validate the eii response within a single activity you can generally speaking get the EI response send it back to temporal and then do the execution in different activity but the problem is you can't actually trust eii so what will start happening is that in some cases your activity will be executed successfully everything is okay your activity is done but the payload that been generated is actually completely invalid and your work just stuck you cannot execute next activity at all so it does make sense to combine them in order to make sure that you never leave activity with invalid data generated by AI so far if you will take a look at this workflows they don't possess any threat to any of the engineers they quite linear in some cases it's dark in some cases you can even describe them in some DSL language but at the end of the day they adjust temporal workflows the only thing you do is you replace some actions inside this pipeline from normal activities to the activities that go to lolm and it just work there is no additional magic and there is no additional things you have to do except of just assembling this workflow the problem will start arising once you'll start making this workflow long enough and complex enough to start processing more and more information because modern day llm models they're quite hungry for tokens and some models can comprehend up to 1 million uh tokens which is a lot of pages of the text so if your worklow will be growing and information going to be passing remember that temporal stories all the payloads that pass in and out your activities in temporal history this will cause a very nasty problem later on because you will know you will never have a very confidence that your worklow won't die simply because some llm decided to write a poem instead of giving you the correct action oh so the way how we decided solve it and how you can solve it uh you have multiple options option number one is to don't do anything just write smaller pipelines and in in many cases when you're doing something very simplistic it just works you don't necessarily care and you can always retry or maybe just ask eii to be a bit shorter in other cases you can be a bit smarter and try to use implicit data referencing where you're going to implement your own data converter and your own Interceptor layer that you'll be detecting uh that payload is larger than you want and uploading that to external data store to be used later but what we found working the best for us and that's the moment why I want to remember how promts work the moment we found work the best for us is to actually use explicit referencing because at the end of the day all the information that you put to eii all the information that EI is trying to act on this information is actually only needed in a moment when you compile your prompt you don't actually need any of this data or any of user pii inside your workflow so don't do it all together just keep it outside and user referencing using some links some IDs or database uh database keys this becomes handy when you're trying to assemble information from multiple systems because by implementing Universal referencing mechanism you can actually combine information from multiple parts of your application and then just combine them all and resolve them all in a one distinctive place where you actually send information to eii this way your workflows will be completely free of any user information and yet they will be used to orchestrate this process all together okay so we have dock workflows action workflows we did the Der referencing probably nothing else we want to do users are happy right no users want not just to use a button where they click on something expect something they actually want to talk to AI because that's how many of the users in the market perceive eii uh today uh what you see in the picture is actually the give uh of one of the sessions we had with one of the agents which based on the user request perform additional actions run to some of the activities and pull information in to give the correct answer but implementation of this workflows might look simil complex at start until you realize it's actually not that complex because the model of temporal allows you not only to write linear workflows that begin and end but also the workflows in which you can Implement such thing as a main Loop by making the main Loop and running the llm activity in it and populating this loop with information what the workflow receives using signals you you can Implement quite sophisticated system that actually leaves on a site with user and answers and answers his question in real time at the same time you do maintain whole state and at the same time you do maintain whole control of this process you can see how much token llm already consumed you can see how fast it responds and you can do the actions based on that implementation once again can be done in any language but it can fit on a screen it's not that large temporal makes it so easy because by exposing you to the code level you can simply implement this Loop like that and voila it just works also by doing that you're going to get a lot of benefits from composability model of temporal which means from the user perspective while user send the message and go the response back it doesn't necessarily mean you have to do a single action by going to AI you can do something else specifically before this message is sent to eii what you can do is enrich it with additional context replace this block with your pipeline that connects to your knowledge Source let's say information about your product and voila you have customer support board that now talks to you about your product specifically if you're trying to do long conversations or conversations that span for days and months maybe it's an email threat sooner or later you're going to enter situation that your context of your agent will be overfilled and the agent won't be able to act once again because you run all of this process inside temporal inside main Loop it is exceptionally easy to detect this moment and see how much tokens eii consumed and use this approach to actually offload the past conversations and restart the new llm session with with conversational history in essence all you do you summarize the past messages you put them back into the history or context or the prompt and you run again the user won't even notice however from agent perspective he start from a blank slate just knowing something from the past conversation okay so we can talk we can see what can we do and that's the next uh and that's the next thing which you probably going to learn when when we'll be working with a lot of models most of them right now expose a new way for the models to communicate with your environment and this called tool colon at a screen you can see actually the agent creating the tool on demand which later is going to be executed to run some analytical query based on user request but what essentially you do to make the tool call in inside the temporal well again it's so easy it's going to be probably the keyword in today's presentation because once you tell AI which type of functions you can it can call and once you go these functions as a result from a activity all you have to do is simply map them to one of your activities or one of your workflows why not get the result and push them back to the queue but be careful what you want to do is to make sure that the message that user send cannot be sent in between these tool calls otherwise LM model will die they all want to get response immediately without any interception well use the blocking mechanism and Implement them inside your signal method it's not that complex code once again is quite straightforward all you have to do is to receive the list of tools you want to call map them to parts of your system such as activities maybe other workflows maybe something else get the result back and you can get this result in a sequential fashion you can get this results in parallel fashion to provides you abstractions to do that in every language get result back and push them back in a message queue easy next call that user do or eii invokes will receive responses and eii will be able to act based on that so if you if you have that you have tool calling and you have models you can talk you have models you can communicate and that can look to information that can execute arbitrary action in some cases do retri even by themselves like in many cases EI will notice that tool call does not work let me try to do it once again you might be asking so what next what can you do with these pns and the question you can ask yourself do we even need a user when you running this workflows they uh they open you while many challenges such as hallucinated tool calls or skipping tool calls they do possess they do open a huge amount of ability to run workflows or agents in our case agentic workflows that will be executing by themselves autonomously gathering information and writing the solution as they go the main problem which you're going to have in this case is that while you communicate with agent directly as a user uh you can supervise him you can say yeah you know what you're doing it wrong please try something else don't call this tool G me information from different part of the system when you work when you run agents autonomously you don't have the user so what you should do you should replace the user and what can you replace the user inside the temporal workflows another workflow so in this position you are going to create your own supervision layer which essentially going to play the role of the user and this supervision layer is going to be responsible for receiving the command and you still need some kind of trigger either web hook or user or something else but based on this command it will automatically form the first prompt or the first message and task the agent to execute it the tricky part here could be is how to evaluate the the agent actually did some avilable work first thing you might notice in applications like that and this is very nasty thing to see that agents love to Loop because the moment agent is making a mistake and trying to correct it in a very different way but still incorrect one he's going to make two error calls and okay I'm agent I did two error calls it is in my context well what should I do next probably make another call because it s so logical so what you might see in some cases that agent will try be calling your tools over and over and over again especially when tools have been dynamically created and fail eventually because of the self-destruct they will simply overpopulate their context and will offload well you just can't do anything thanksfully because you run temporal you orchestrate and you collect all the information about all the tools that eii calls all the payloads and all the errors you can Implement many mechanism how to detect that AI is not doing what you want to do you can do that atically by simply looking for the partents and Tool call and seeing the loops when something happens over and over or you can do something more complex and use another eii model or another eii agent to actually look at the result and decide if this agent is faulty or the result has not been done to the purpose you're going to be creating deeper and deeper n uh deeper and deeper chains which kind of lead us to next question if you have one agent why can we have many agents can we use them in collaboration or can we use them by embedding them into much deeper chains of decisions and using them to run more and more sophisticated workflows inside your system well the answer is obviously yes because again we deal in a temporal conference and there is nothing impossible inside the temporal you will have to use mostly signals and workflows to compose applications like that but the composition of application that run multiple agents in parallel or have the collaboration factor is not that complex and not that different at this video we are seeing single agent that communic Ates and delegates the task to other sub agents that will be executing tools written by other agents in order to execute some arbitrary command and return the result back to the hub which user communicates to to implement PN like that what we found works the best for us and I'm pretty sure that it's going to be a lot of patterns how you can compose applications like this you are creating the common supervision layer or how we call it a gentic pool assist a single place inside your workflow which is the workflow that essentially orchestrates the commands between multiple child workflows or your agents you delegate one of these workflows to begin to be essentially The Hub or arbitrary uh agent which you communicate from outside that's your entry point or maybe something which you communicate with user and you let this agent to communicate with other agents so how can you do that well tool calling from the perspective of your Hub agent the delegation of the task is not that much difference from actually calling a single activity inside your system all you have to do is to take the payload that EI decided to put to this delegated task and send it to other agent and that's another nice place where temporal is going to help you tremendously temporal architecture and especially the way how you write workflows allows you to actually say that this tool call is not an activity this tool call is a signal and you can use this signal to send the command to parent supervisor Loop uh pool that will automatically spawn the child workflow or your agent deleg task to the child workl agent and will wait for the resultant signal that will be containing the resultant payload take this payload and send it back to your hop agent and you have the ability to delegate tasks while your hop agent doesn't even actually know how it works it just think it did a tool call which was very smart and did some very F uh did some very uh good work inide another thing which you can do and this is something which we experimented a lot is the ability to start composing these agents and composing them in combination with more deterministic and more simplistic functions you might create the process of code generation or code analysis which spans for very long time some parts of this process are very deterministic let's say do a git pool some parts of this process are very simplistic let's do simple AI llm analysis we don't really need agents for that but composing them together and using ability of temporal to converge few of the abstraction but yet very powerful obstructions to one common system inside the workflow you can start creating deeper and deeper and deeper networks that are able to execute much more complex commands but yet while you're doing that you still retain all the visibility you still know every step that agent took you still know every step that has been delegated uh or where the error happened and you can correlate for that and you can compensate for that once again if you're trying to implement that at the end of the day all you do you're create create a number of processes that depend on other processes that depend on other processes doing that classically possible you can do it in many languages some languages specifically designed for that like maybe airong but temporal makes it so easy to use in any stack and temporal makes it as well durable because even if you shut down your worker if you kill your agents it will still complete well thanks to their model so how do we use Solutions like that how do we use it for our own purpose proc we create applications for ourself and our customers that are able to solve arbitrary tasks that previously would require engineering time which actually no one want to spend do you really want to have your senior engineer creating your self mopper for Excel every week because you have a new form received from your vendor in this case we found that there is a huge amount of pance and a huge amount of parts of the applications which we don't actually want to do so let's ask agents to do them we can ask agents to create them to validate them execute them and test them as they go spending not weeks but minutes to get the working result so why do we think temporal is the best solution for youi well if if presentation didn't say about it explicitly they made in a completely opposite spectrums this uh this very powerful abstraction that allows you to run NLP and generally speaking thinking to execute some action is kind of pointless to run on itself you need to embed it to something and temporal provides a very rich environment to make this edance so easy and so simple and at the same time so durable so combining them together allows you to create very complex chains and very complex applications that can be pulling information from many sources for the span of minutes hours maybe days and then execu on them to provide you result result I can talk about that for days maybe weeks but if you guys want to chat more please visit us at our Boo or let's get a drink later today thank you I guess any questions have you run it and how much does it cost a lot but how much how much engineering time cost no I I I'm just curious um sign significant amount for for like a mediumsized company but the way that we can iterate on a pace that we never seen before like in many cases we can receive the working PC in 20 minutes on a call with stakeholders like this process will involve five people back in the day we don't want to use Ai and we don't want to use it for everything because we kind of don't trust it in many cases but it's always going to be the use cases in your work and your process which you just don't care mappings API calls data transformation something you can easily verify and see by your own eyes I think there well you evaluate it sorry question how you evaluate a gentic pipeline well that's a beautiful part about temporal from the a from the temporal perspective a gentic pipeline is a huge workflow which yes you can test by the steps and you can make sure that all them work from the workflow perspective however from the user when you send the command it's just a function so you evaluate result by evaluating the quality of the function result if if you're doing something that is going to be fetching information from your database like rock pipeline there is a bunch of solution on a market which can run it and see how well it correlates with the actual information so you evaluate result without actually evaluating all the steps taken inside you kind of don't even worry about them agent is willing to do what he thinks is the best yeah sorry why would we choose using the child agent if uh the good question I mean the reason for that is because the context uh window of each of the agents is limited it's quite large but still it's limited so if you have to perform a simple action that that can only be done based on information that can be collected from many different parts of the system by just collecting this information you're already going to over pollute the memory of the agent and he's going to just be working much slower and much harder and much more expensive so instead of that you want to isolate this process and only get the result run it can be in different language how do we run what sorry yeah generat to we run it one runs uh we run it uh right in the temporal so like the thing I didn't say in that presentation the referencing layer which was a single slide is actually where we spend the most of our time because when agent defines the tool we actually defines it as part of our system which we use temporal at syncing layer to sync it to our run times which makes it immediately available for EI to use so basically by eii creating the tool inside the system it automatically declared it and makes it available to EI if it's been connected uh well in Declaration of this agent it can be any language at the end of the day and we quite we think that eventually the language when you work with the application is probably going to M the L um is there any other like temporal specific limitations or things that you ran into that you didn't expect um there there there's few but I mean they're not that large and they're not that much different from what you will do in temporal if you run a very long decision chain the agent that can spawn for many files and run many iterations you're eventually going to run to the con uh to the position then when your workload just has to be restarted and restarting the workflow that has hunger potentially of child workflows in a tree is quite a challenge so you might need to implement your own mechanism to properly collapse all these workflows and restart them over in the next iteration mention at the beginning that well right now we validated by the user observation you just test it right in a mix so you see if it works or not we don't trying to create well huge application servers using these tool calls we just create a simple Integrations which are much easier to test but at the end of the day you can actually feed this tool back into the agent and that's another property of the reference layer we create every tool that agent create they actually become part of the knowledge base that agent can use to learn to create new tools or read existing Tools in analyze if they work correctly or they can just generate the test we have one more question en well you can move the llm call to the separate task que and have a rate limit on this task que that's about it but in our case we actually have our own backend that that encapsulates all the llm calls where we have additional priority queue with additional rate limiting it's it's it's a simple kind of side effect that we allow multiple us multiple organiz ations use the same model but at the same time we can split the model used between organizations so they never collapse in this regard but at the end of the day even if you don't have it and it fails well it's just going to be retried
Share:
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
How It Works
Copy YouTube Link
Grab any YouTube video URL from your browser
Paste & Extract
Paste the URL and we'll fetch the transcript
Use the Text
Search, copy, or save the transcript
Why you need YouTube Transcript?
Extract value from videos without watching every second - save time and work smarter
YouTube videos contain valuable information for learning and entertainment, but watching entire videos is time-consuming. This transcript tool helps you quickly access, search, and repurpose video content in text format.
For Note Takers
- Copy text directly into your study notes
- Get podcast transcripts for better retention
- Translate content to your native language
For Content Creators
- Create blog posts from video content
- Extract quotes for social media posts
- Add SEO-rich descriptions to videos
With AI Tools
- Generate concise summaries instantly
- Create quiz questions from content
- Extract key information automatically
Creative Ways to Use YouTube Transcripts
For Learning & Research
- Generate study guides from educational videos
- Extract key points from lectures and tutorials
- Ask AI tools specific questions about video content
For Content Creation
- Create engaging infographics from video content
- Extract quotes for newsletters and email campaigns
- Create shareable memes using memorable quotes
Power Up with AI Integration
Combine YouTube transcripts with AI tools like ChatGPT for powerful content analysis and creation:
Frequently Asked Questions
Is this tool really free?
Yes! YouTubeToText is completely free. No hidden fees, no registration needed, and no credit card required.
Can I translate the transcript to other languages?
Absolutely! You can translate subtitles to over 125 languages. After generating the transcript, simply select your desired language from the options.
Is there a limit to video length?
Nope, you can transcribe videos of any length - from short clips to multi-hour lectures.
How do I use the transcript with AI tools?
Simply use the one-click copy button to copy the transcript, then paste it into ChatGPT or your favorite AI tool. Ask the AI to summarize content, extract key points, or create notes.
Timestamp Navigation
Soon you'll be able to click any part of the transcript to jump to that exact moment in the video.
Have a feature suggestion? Let me know!Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.