0:03 we've already seen that prompting a
0:05 large language model can take you quite
0:07 far but there's a technique called
0:11 retrieval augmented generation or rag
0:12 that can significantly expand what you
0:15 can get an LM to do by giving it
0:17 additional knowledge beyond what it may
0:19 have learned from data on the Internet
0:22 or other open sources let's take a look
0:24 if you to ask a general purpose chat
0:27 system such as one of the ones on the
0:28 internet a question like is there
0:31 parking for employees it might answer
0:33 something like I need more specific
0:34 information about your workplace because
0:36 it doesn't know what is the parking
0:39 policy for your company but rag or
0:41 retriev augmented generation we'll see
0:44 is a technique that can give the LM
0:47 additional information so that if you
0:49 ask it if there's parking it can refer
0:52 to policies specific to your company how
0:56 does it work rag has Three Steps step
0:59 one is given a question is there parking
1:01 for employee it'll first look through a
1:03 collection of documents that may have
1:06 the answer for example if your company
1:09 has different documents on the benefits
1:11 offer to employees and the Le policy and
1:13 some documents on the facilities and
1:16 some documents on payroll processes then
1:18 the first step in the rack system would
1:21 be to have a computer find out which if
1:24 any of these documents is most relevant
1:26 to this question and parking seems like
1:27 a question about the facilities about
1:30 the building that uh your team Works in
1:32 and so hopefully you'll select out the
1:35 facilities document as most relevant the
1:36 second step is then to incorporate the
1:38 retrieve document or the retrieve text
1:40 into an updated prompt so let me
1:42 construct a prompt as follows I'm going
1:45 to say use the following pieces of
1:47 context answer the question at the end
1:49 and then I'm going to take the relevant
1:52 text from my facilities documentation
1:53 with the parking policy that all
1:55 employees May Park levels one and two
1:58 and so on and put that into my prompt so
2:00 this is now pretty long prompt because
2:02 it tries to give a lot of context for
2:05 the LM now remember last week we had
2:08 spoken about limitations to The Prompt
2:09 length or the input length for large
2:12 language model that's why in practice
2:15 rather than dumping an entire very long
2:17 document into the prompt you might pull
2:19 out just the part of the document that's
2:21 most relevant to the question and put
2:23 just that into the prompt and then
2:26 finally we add the original question is
2:28 there parking for employees so this is
2:30 called retrieval of augment a generation
2:33 or rag because we're going to generate
2:34 an answer to this but we're going to
2:36 augment how we generate text by
2:38 retrieving the relevant context or the
2:41 relevant information and augmenting the
2:44 prompt with that additional text having
2:46 constructed this prompt the final step
2:49 is to then prompt the LM with this Rich
2:52 prompt and hopefully the LM will then
2:54 give us a thoughtful answer telling us
2:56 about where we can Po in some
2:59 applications using rag in the output
3:01 shown to the user we would also add a
3:04 link to the original Source document
3:06 that led to this answer being generated
3:08 so in this case we might link to that
3:11 facility's documentation so the user can
3:13 if they wish go back and read the
3:15 original Source document and double
3:17 check the answer for themselves rag
3:19 retrieve augmented generation is an
3:22 important technique that is enabling
3:24 many LMS to have context or to have
3:27 information beyond what it may have
3:30 learned on the open internet here are
3:33 some examples of rag based applications
3:35 there are many companies today that are
3:37 offering software that let you chat with
3:40 a PDF file for example if you're reading
3:43 a white paper but you maybe don't have
3:45 time to read the entire thing carefully
3:47 but have a question that you want
3:49 answered based on that white paper there
3:51 are many applications today like Panda
3:54 chat AIO PDF chat PDF and many many
3:57 others that let you upload your PDF file
3:59 and then ask questions and they will use
4:01 use rack to try to generate answers for
4:03 you I find that some of these software
4:06 packages work better and some work worse
4:09 so the results you get may vary but
4:10 there certainly been a lot of excitement
4:12 and interest about building applications
4:15 to let you chat to your PDF files there
4:17 are also more and more rag applications
4:20 that will answer questions based on a
4:23 website's articles for example corera
4:25 coach does multiple things but one of
4:28 the things it does is use rag to try to
4:31 answer questions based on contents on
4:34 the corsera site itself Snapchat also
4:38 has a chatbot that uses text from snap
4:40 to try to answer different questions you
4:42 might have about their products and
4:45 HubSpot which is a marketing automation
4:49 company is another example of a company
4:51 that has a chat bot that lets you post
4:53 questions and tries to generate answers
4:56 for you based on content from the
4:58 company or from the website itself so
5:00 these types of chats are becoming an
5:03 alternative way to let users get answers
5:05 to questions that they may have about
5:08 your company's offerings rag is also
5:10 leading to new forms of web search
5:13 Microsoft Bing has a chat capability
5:16 Google has a generative AI feature as
5:18 well that can generate text in response
5:21 to your queries and startup you.com
5:24 which was actually started by one of my
5:26 former PhD students Richard soer is a
5:28 web search engine that was built
5:32 centered on a chat-like interface so rag
5:35 is used in many applications today and
5:38 excitingly it seems to be transforming
5:41 even web search to wrap up this video
5:44 there's one big idea I'd like to share
5:47 with you which is to think of the LM not
5:50 as a knowledge store but instead as a
5:52 reasoning engine lm's may have read a
5:54 lot of text on the internet and so it's
5:56 tempting to think of them as knowing a
5:58 lot of things and they kind of do but
6:00 they don't know everything everything
6:02 with the rag approach we provide
6:04 relevant context in the prompt itself
6:07 and we ask the to read that piece of
6:09 text and then to process it to get to
6:11 the answer in other words rather than
6:13 counting on it to have memorize in
6:15 enough facts to get us the answer where
6:18 instead using as a reasoning engine to
6:20 process information and not as a source
6:23 of information and I find that this way
6:25 of thinking about LMS as a reasoning
6:28 engine rather than as a way to store and
6:31 retrieve information can expand the set
6:35 of applications that we might brainstorm
6:37 and consider an LM to be capable of
6:40 doing admittedly LM technology is early
6:42 and it doesn't always do that well but
6:44 if an LM isn't just a database that
6:47 stores a lot of information for you but
6:49 it can process and reason through
6:51 information I think that is an exciting
6:53 direction to think about where Elms
6:55 might go from here even though I've
6:58 talked mostly about rag in the context
7:00 of building software application ations
7:02 this idea can also be useful if you're
7:06 using a web user interface sometimes I
7:08 would take a piece of text and just copy
7:10 it into the prompts of an online web UI
7:13 of an l and then tell it to use that
7:15 context to generate an answer for me and
7:19 that too can be an application of rag I
7:21 found that rag is useful for many
7:23 different applications and I hope that
7:26 you will too in the next video we'll
7:28 talk about another technique called
7:30 fine-tuning which is another way to
7:33 expand what an El can do but before I
7:35 wrap up let me just say I hope you
7:37 enjoyed this video on Rag and that you
7:40 can really clean up with this rag stuff