Retrieval Augmented Generation (RAG) is a technique that significantly enhances Large Language Models (LLMs) by providing them with external, relevant knowledge beyond their training data, enabling more accurate and context-aware responses.
Mind Map
Clic para expandir
Haz clic para explorar el mapa mental interactivo completo
we've already seen that prompting a
large language model can take you quite
far but there's a technique called
retrieval augmented generation or rag
that can significantly expand what you
can get an LM to do by giving it
additional knowledge beyond what it may
have learned from data on the Internet
or other open sources let's take a look
if you to ask a general purpose chat
system such as one of the ones on the
internet a question like is there
parking for employees it might answer
something like I need more specific
information about your workplace because
it doesn't know what is the parking
policy for your company but rag or
retriev augmented generation we'll see
is a technique that can give the LM
additional information so that if you
ask it if there's parking it can refer
to policies specific to your company how
does it work rag has Three Steps step
one is given a question is there parking
for employee it'll first look through a
collection of documents that may have
the answer for example if your company
has different documents on the benefits
offer to employees and the Le policy and
some documents on the facilities and
some documents on payroll processes then
the first step in the rack system would
be to have a computer find out which if
any of these documents is most relevant
to this question and parking seems like
a question about the facilities about
the building that uh your team Works in
and so hopefully you'll select out the
facilities document as most relevant the
second step is then to incorporate the
retrieve document or the retrieve text
into an updated prompt so let me
construct a prompt as follows I'm going
to say use the following pieces of
context answer the question at the end
and then I'm going to take the relevant
text from my facilities documentation
with the parking policy that all
employees May Park levels one and two
and so on and put that into my prompt so
this is now pretty long prompt because
it tries to give a lot of context for
the LM now remember last week we had
spoken about limitations to The Prompt
length or the input length for large
language model that's why in practice
rather than dumping an entire very long
document into the prompt you might pull
out just the part of the document that's
most relevant to the question and put
just that into the prompt and then
finally we add the original question is
there parking for employees so this is
called retrieval of augment a generation
or rag because we're going to generate
an answer to this but we're going to
augment how we generate text by
retrieving the relevant context or the
relevant information and augmenting the
prompt with that additional text having
constructed this prompt the final step
is to then prompt the LM with this Rich
prompt and hopefully the LM will then
give us a thoughtful answer telling us
about where we can Po in some
applications using rag in the output
shown to the user we would also add a
link to the original Source document
that led to this answer being generated
so in this case we might link to that
facility's documentation so the user can
if they wish go back and read the
original Source document and double
check the answer for themselves rag
retrieve augmented generation is an
important technique that is enabling
many LMS to have context or to have
information beyond what it may have
learned on the open internet here are
some examples of rag based applications
there are many companies today that are
offering software that let you chat with
a PDF file for example if you're reading
a white paper but you maybe don't have
time to read the entire thing carefully
but have a question that you want
answered based on that white paper there
are many applications today like Panda
chat AIO PDF chat PDF and many many
others that let you upload your PDF file
and then ask questions and they will use
use rack to try to generate answers for
you I find that some of these software
packages work better and some work worse
so the results you get may vary but
there certainly been a lot of excitement
and interest about building applications
to let you chat to your PDF files there
are also more and more rag applications
that will answer questions based on a
website's articles for example corera
coach does multiple things but one of
the things it does is use rag to try to
answer questions based on contents on
the corsera site itself Snapchat also
has a chatbot that uses text from snap
to try to answer different questions you
might have about their products and
HubSpot which is a marketing automation
company is another example of a company
that has a chat bot that lets you post
questions and tries to generate answers
for you based on content from the
company or from the website itself so
these types of chats are becoming an
alternative way to let users get answers
to questions that they may have about
your company's offerings rag is also
leading to new forms of web search
Microsoft Bing has a chat capability
Google has a generative AI feature as
well that can generate text in response
to your queries and startup you.com
which was actually started by one of my
former PhD students Richard soer is a
web search engine that was built
centered on a chat-like interface so rag
is used in many applications today and
excitingly it seems to be transforming
even web search to wrap up this video
there's one big idea I'd like to share
with you which is to think of the LM not
as a knowledge store but instead as a
reasoning engine lm's may have read a
lot of text on the internet and so it's
tempting to think of them as knowing a
lot of things and they kind of do but
they don't know everything everything
with the rag approach we provide
relevant context in the prompt itself
and we ask the to read that piece of
text and then to process it to get to
the answer in other words rather than
counting on it to have memorize in
enough facts to get us the answer where
instead using as a reasoning engine to
process information and not as a source
of information and I find that this way
of thinking about LMS as a reasoning
engine rather than as a way to store and
retrieve information can expand the set
of applications that we might brainstorm
and consider an LM to be capable of
doing admittedly LM technology is early
and it doesn't always do that well but
if an LM isn't just a database that
stores a lot of information for you but
it can process and reason through
information I think that is an exciting
direction to think about where Elms
might go from here even though I've
talked mostly about rag in the context
of building software application ations
this idea can also be useful if you're
using a web user interface sometimes I
would take a piece of text and just copy
it into the prompts of an online web UI
of an l and then tell it to use that
context to generate an answer for me and
that too can be an application of rag I
found that rag is useful for many
different applications and I hope that
you will too in the next video we'll
talk about another technique called
fine-tuning which is another way to
expand what an El can do but before I
wrap up let me just say I hope you
enjoyed this video on Rag and that you
can really clean up with this rag stuff
Haz clic en cualquier texto o marca de tiempo para ir directamente a ese momento del video
Compartir:
La mayoría de las transcripciones están listas en menos de 5 segundos
Copia con un clicMás de 125 idiomasBuscar en el contenidoIr a marcas de tiempo
Pega la URL de YouTube
Ingresa el enlace de cualquier video de YouTube para obtener la transcripción completa
Formulario de extracción de transcripción
La mayoría de las transcripciones están listas en menos de 5 segundos
Instala nuestra extensión para Chrome
Obtén transcripciones al instante sin salir de YouTube. Instala nuestra extensión de Chrome y accede con un clic a la transcripción de cualquier video directamente desde la página de reproducción.