This content explains how to build enterprise-ready Retrieval-Augmented Generation (RAG) systems, emphasizing that RAG is a vital and growing technology, not obsolete, and crucial for securely integrating proprietary data with large language models (LLMs).
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
want to dive in.
>> Let's go ahead and get started here.
>> All right. Um, today we are going to be
discussing how to build enterprise ready
rag systems. Um,
yeah. Uh, before we dive into the actual
content, some really quick introductions
on our side.
Oh, my name is Kevin. I'm on the go to
market team here at Unstructured. I'm
the Ryan Serest of our webinar series.
I'm joined by my better half, Daniel
Scoffield. Um he is a principal
solutions architect. Um awesome.
The agenda for today's conversation, a
little bit of background on who
Unstructured is, what are the problems
that we solve? Um also what is Rag and
what are the problems that we set up to
solve for and then how do we implement
RAG systems successfully across the enterprise.
enterprise.
[snorts] Um some quick housekeeping.
Make sure that chat goes in the chat
box. If you have specific questions you
want us to get to at the end of the
webinar, there is a separate tab for
that. We will send a recording of the
webinar after the conversation with
resources to get started on
unstructured. And if you do want to talk
with somebody like Daniel about your use
cases, go to our website, schedule a
All right, a little bit of background on
unstructured. We started three years ago
as an open- source product looking to
answer the question, do companies
struggle to take unstructured data and
convert it into a structured JSON
output? Oh, Daniel, do you mind going to
the next slide?
Um, you flash forward to today and it's
been a pretty incredible ride. We have
amazing customers. We have over 50
million product downloads of our open
source product and we have some pretty
incredible backers who are helping us
solve this problem at scale. Um, and
without further ado, Daniel, I will turn
it over to you.
>> Thank you for that fantastic
introduction, Kevin. Um, as Kevin said,
my name is Daniel Scoffield. I'm a
principal solutions architect at
Unstructured. I've been with an uh been
involved with countless enterprise rag
deployments. So, I'm excited to be able
to share some of my insights about what
it takes to actually build an enterprise
ready rag system. I'd like to start
today's talk with a quick overview of
the current state of RAG here in late
2025 and then we'll get into the meat of
today's talk. Uh to kick us off though,
I'd like to address uh a common idea you
probably encountered periodically uh
especially the LinkedIn AI space over
the past couple years. And that is this
periodic yet persistent notion that rag
is dead. And this idea comes in various
forms and variations. Um such as that
rag is now somehow obsolete or outdated.
Uh you might hear something like rag is
so 2024.
Um, and it's also uh common when uh you
know you see such things as like a
sizable context window uh increase where
um you know for example when Google
released the first Gemini model with a 1
million token context window uh it
nearly broke the internet with all the
rag is dead chanting from AI
influencers. And it's also of course
whenever a new flavor of rag gets
introduced it's the same thing rag is
dead long live corrective rag. Um, so
you can quote me on this. Rag is not
going away as long as large language
models are still the dominant technology
powering the AI landscape. So the next
time you hear some clickbaity
announcement stating that rag is dead, I
want you to remember back to the many
'90s era rappers who are way ahead of
their time that said it best when they
said rag till we die. Uh, you even have
this nice graphic to remember that by.
>> Is that what they said?
>> That's exactly what they said, Kevin. Okay.
Okay.
>> Uh, of course, this is just a fun idea,
but I honestly would not be surprised if
RAG indeed does actually outlive us all.
Um, looking at the projections for the
back half of this decade, uh, the growth
of the fundamental rag pattern actually
looks stronger than ever. Of course,
these are just projections, but
according to the wellrespected Grand
View Research, uh the REGG market is
expected to grow at an impressive uh
near 40% compound compounded annual
growth rate uh going into 2030, reaching
over a 10 billion uh market size. So,
rest assured, RAG is not only not dead,
but it would seem that it's actually
only just getting started. Um and why is
that? So what problems does rag actually
solve uh for the enterprise? Um and
really the answer starts with a simple
problem. Um and that problem is that LM
are stateless. The only realistic way to
make an LM smart about your business
data. Um aside from fine-tuning, which
is impractical for a lot of reasons, um
the answer is to inject that information
into its context window at runtime.
Couple that with the fact that there is
an explosion of LM adoption where every
company is in a race to take advantage
of AI's capabilities. There's an urgent
demand for making a company's
proprietary context available to these
models. So if you have this demand, how
do you actually solve that securely? Um
this will appeal to the architects out
there. Um if you just fine-tune a model
on all your sensitive HR and finance
data, you've effectively just collapsed
all your access controls and
traceability. So, REGGG actually
provides this auditability and a
separation of concerns around the
security of data. And because of the
security advantage, REGGG has become the
dominant pattern for enterprise
grounding. And on top of that, we're at
the dawn of the agentic era where AIs
don't just answer questions, but they
also take actions. And so, even in this
new world order, rag is still the
dominant pattern for grounding these now
agents with your enterprise context.
And within aentic systems, reggg is
actually not only being used to access
enterprise knowledges, knowledge bases,
but it's also actually being used to
endow the agent itself with memory so it
can recall its past actions and conversations.
conversations.
Um, but what does a mature production
grade version of this architecture
actually look like here in 2025?
Something like this. I assume that this
audience is already familiar with rag,
so I won't spin long here. But just to
get us all on the same page, here's a
quick 30 second tour of a somewhat
standard rag production system in 2025.
So it has two parts. First, you have the
offline ingestion uh in the top right
here uh where we take our data, we chunk
it, embed it into vectors, and then load
it in a vector database. That becomes
the enterprise knowledge base. And then
secondly, we have an online query flow.
Um this can actually be divided into two
steps. the retrieval step on the top
here and then the generation step mostly
on the bottom there. And the query step,
we analyze the query's intent, its
keywords. Um, we then search the vector
database to get the top most relevant
chunks. And then finally, in the
generation flow, we don't just jump
those chunks in. We rerank them. We
maybe run them against an anti-h
hallucination algorithm. and then only
the best context makes its way to the
prompt where it then uh is used to
retrieve uh and synthesize a final
answer. So this complete flow creates a
sound trustworthy system. Um so with
this blueprint in mind uh let's actually
talk about the larger rag ecosystem. So
the most important thing to understand
about rag today is that it's no longer
just referring to one thing. It's an
entire field of research that has
rapidly evolved over the last 5 years.
So this is roughly the journey. It all
started back in 2020 with Meta's famous
paper laying out the approach. Um that's
in hindsight been na named naive rag. Um
and naive rag is basically just the
previous slide if you remove a lot of
the intermediary steps. That's quickly
moved into advanced rag and modular rag
um to address a lot of the quality and
scalability shortcomings of naive rag.
And this is where most production
systems actually start today. Um, and
this is where we introduce critical
components like the reranking, the query
transformations and hybrid search. And
of course, you notice what happens.
You'll see the accuracy jumps, but so
does the cost and complexity as well.
And finally, we're now at
self-reflective and agentic rag, where
the system doesn't just retrieve, but it
analyzes its own results, decides if
they're good enough, and then even
re-queries if needed. Um, and this
five-stage evolution isn't just actually
a simple line. Um, it's really more of
an entire universe of techniques and
architectures. So this is what the
actual universe looks like today. Uh you
don't have to read all the 25 of these,
but uh this grid is here just to reveal
all the different flavors of rag that
you as an architect um basically you
have at your disposal when you're
choosing an implementation pattern for a
given use case. So do you need uh
conversational rag if you're building a
chatbot or if you're synthesizing across
multiple complex documents you might
want to look at multihop rag and so on.
But even once you've chosen the right
flavor of rag for a given use case,
you're still going to need a strong ROI
uh focused business case as well as uh
you know robust security governance uh
evaluation frameworks and so on in order
to move that pilot into production. So
here is our road map for the next 10 to
12 minutes. We're going to cover the
remaining sort of four pillars of a
production system um the top four here
and then also quickly touch on uh rag
for a gentic systems uh which is going
to be light because we're going to
actually discuss this more in depth in a
future webinar.
Um so with that um we're going to start
with the most important questions for of
any enterprise rag system and that is
ROI. What are we building and why? what
is the expected ROI and what meaning
business uh meaningful business outcomes
are we aiming to achieve in building
this system.
So this is actually the step that has a
tendency not to receive the most
rigorous scrutiny and disciplined
approach that it deserves and it's
reportedly one of the biggest reasons
projects fail to move past the pilot
stage. So either the cost of the system
doesn't end up justifying its delivered
value or else it had a weak value
proposition from the start. So, how do
you find a use case that's worth
pursuing? The core idea is simple. You
want to target um high value friction in
the organization. You're not just uh
trying to go around automating tasks.
You want to find a process where your
most expensive employees are wasting the
most time looking for the most important
information. So, it's really that
intersection of high value and high
friction. That's your sweet spot. Um and
of course, this is going to look
different based on your business domain.
Um but let's just look at a few example
use cases here. Um so take the customer
support use case. Uh you have support
agents toggling five screens looking at
five different sources of information.
That's friction where a rag system can
help. Or take a sales enablement use
case where you have sales reps hunting
down scattered internal repositories
searching for uh answers to security
questionnaire in order to answer in
order to close a high stakes deal. Kevin
can probably uh attest to this that he's
been there and done that. That's
friction. If you find the
[clears throat] friction, apply rag, you
get ROI.
Um and so on. Um but once you've found
your high friction use case, how do you
then prove that it's actually working?
That's where success metrics come into
play. This is all about measuring what
actually matters. And the biggest
mistake a team makes and we see this
actually again and again is only
focusing on the technical metrics like
accuracy or uh engagement with the tool.
Nobody on the business side of the org
actually cares about a blue score
though. Success for them is measured by
business outcomes. So you really need to
target the business outcome metrics. Um
and so these are going to be the KPIs of
your VP or director um that they already
have on a dashboard somewhere. You're
not creating new metrics here. You're
actually moving the needle on their
metrics. So in support, it's things like
decreasing average handle time. In
sales, you're decreasing the sales cycle
length. Um this is actually the language
of ROI. And if you aren't moving the
business outcome, the next two don't
actually matter. So of course, if you
are creating that ROI, then technical
metrics also become of great importance.
So we'll cover two of those. The first
is the enduser experience. Um how do you
actually quantify that though? This is
all about whether users like and trust
the tool and if it actually creates a
positive impact in their workflow. Um
the single most important metric in this
area is going to be the simple thumbs up
thumbs down rating um that you embed
within the application itself. And this
is your real world sort of continuous
feedback loop. Finally, we have our
system performance metrics. These are
the ones that we as engineers love. Um
context relevance, did we find the right
stuff and answer faithfulness, did the
LM stick to the script. So these are
essential but of course you have to
remember their only purpose is to serve
the two categories above.
Okay. So now that we have a use case in
mind with a strong ROI potential we need
to shift our mindset uh to security. A
pilot is only going to give uh get off
the ground if it doesn't represent a
massive security risk uh to the organization.
organization.
Um so how do you actually secure a rag
system? Now the most critical part, you
cannot treat an LM based application
that connects to your enterprise data
like a standard application. [snorts]
These systems really require a
three-stage security posture. So first
you have to have your pre- retrieval
guard rails. This is going to be what
authenticates the user ensuring that
they are who they say they are as well
as authorization where you're checking
whether they should have access to the
system at all in the first place. And
this one's actually pretty
straightforward. You'll see this across
many applications. Um but the next one
is actually very rags specific and
that's the retrieval time guardrails.
This is all around securing the data
itself. So this one comes with some uh
challenges. Um but security in this step
basically means that you're ensuring
that both the user as well as the LLM
only see documents or pieces of
documents that they're allowed to see.
And finally there's post retrieval which
is more around securing the answer. And
this is u basically filtering out bad
results um you know flagging incorrect
information possibly toxic content
before it reaches the end user. So now
of all these the most important to
actually get right and also
unfortunately the one that comes with
most unique security challenges is the
second one securing the data. So let's
zoom into that one uh for just a minute.
Uh so within this one um before you
write any code you really want to
understand what the security model needs
to be for the system. So, this is going
to often be use case specific. Um, and
within that use case, you need to ask
yourself who is the audience of this
tool and what is the sensitivity of the
data. If this is a public chatbot or a
generic helper, you might be able to get
away with just setting up a new simple
permissions system from scratch, such as
a basic arbback system where you're uh
implementing by tagging documents with
certain metadata during ingestion and
then filtering on those tags during
retrieval. That's actually the easy
path. Um, but for a lot of internal uh
enterprise use cases, you often hit
what's called the access control
mirroring problem. Um, or at least
that's what we call it. Um, and
basically what is that? Um, so it's
essentially stems from the fact that
with a rag system, you're actually
making a vectorzed copy of your data. So
if your rag system contains data from
SharePoint, from Salesforce, from Slack,
uh, you can't just invent new rules. You
should mirror the existing ones. So uh
you know for example if Jack can't see
the HR folder in SharePoint he probably
should not be allowed to query uh it in
your rag system. Um so how do you
actually achieve that uh in in the wild?
Um you effectively have three
architectural choices. So there's pre-
retrieval filtering. Um this is where
you actually bake the permissions um
from the source uh systems record into
the vector metadata. Um now this
approach actually allows uh for the
fastest query time. But if you go this
route, you often end up with um syncing
issues and you know re-indexing the data
which can of course have uh negative um
impact on cost and also latency in some
ways. Um the next approach is actually
to filter the retrieve content um once
it's been actually fetched. So in this
case you would uh fetch everything and
then check permissions using some sort
of external authorization service. So
this is very secure, easy to keep up to
date. Uh but this can actually add
latency and it can actually be very
challenging at scale. Um so uh typically
uh in full-blown production systems you
often see a hybrid approach. Um where
you basically do some sort of course
filter during the pre- retrieval. Um
this can be filtering on a metadata tag
such as like the department. Um and then
you might couple that with a fine grain
check via the post retrieval approach to
check for the user's actual permissions.
Um, so baking the permissions into the
vector database probably doesn't need
much explanation, but for the post
filter and retrieval option, you might
still be less left with a lot of
questions there. Um, oh, I actually
forgot I skipped that slide. We'll go
ahead and uh skip that for now. Um, and
just move on to evaluation. But um, I
had a slide in here about uh, basically
the different rag security models and
you know tool calling versus uh,
permissions graph ACL table and so
forth. Um but let's go ahead and jump on
to evaluations.
Um so uh this is a critical two-part
challenge. Um you can't just feel if
your rag system is accurate. Uh you need
a practical two-pronged approach to
prove it's working. So first uh is your
offline uh first is your offline
pre-eployment testing. So this is going
to be what's baked into your CI/CD and
it operates as your rag system safety
net. So in this set you in this step you
basically build what's called a golden
set. So this is maybe a 100 to a few
hundred Q&A pairs of your most important
questions. Uh these are the questions
that you can't get wrong. And then
before you deploy any change whether
that's a new prompt a new chunking
strategy etc you want to run your new
build against this golden set
programmatically scoring if those uh if
using those technical success metrics we
saw earlier. So this is your context
relevance and your answer faithfulness.
This is what lets you quantify
improvement. So now you can say things
like our new chunking strategy is 8%
more effective than the old one. Uh but
you can't think of everything. So that's
why you also have to couple that with an
online inproduction feedback um loop. So
this is your thumbs up thumbs down
button. And that is the start of your
most important workflow which is the
continuous improvement workflow.
Um and so this is what that looks like.
Um, and you can basically think of this
as like the thumbs down workflow. So
whenever your system produces a bad
response and a user flags as such with a
thumbs down, that's going to kick off a
task that goes into a review queue where
it gets triaged. So the goal of triage
is to is to determine if the process
failed at either the retrieval step or
the generation step. So from there it
can be addressed accordingly. If it was
a failed retrieval then you want to look
at basically the retrieval components uh
the ingestion, the chunking, the ranking
and so forth. Um, and if it failed at
generation, you want to actually look at
the the prompt. So, you might need to
fine-tune the prompt, adjust the logic,
and so forth. Um, and then, uh, this is
a critical step. Once the fix has been
made, that question and its
corresponding answer can now be added to
the golden set. So, you're effectively
strengthening your test coverage of the
system over time. So, finally, uh, that
takes us to the last pillar in our
discussion today around organizational
alignment. Um, and this is perhaps the
most painful lessons that we've seen a
lot of companies have to learn. Um, and
we've observed it time again in the
field. So, we've talked about ROI, we
talked about security and evaluation,
but how do teams actually try to build
these systems uh for their enterprises?
From what we've seen, this is uh what
usually happens. We call it the DIY's
rat's nest. Because rag is such an
emerging field, there aren't many good
all-in-one platforms out there that can
meet the stringent security requirements
of enterprises. So most organizations
have opted to build it their own little
pipelines in house. The problem is that
for enterprises when every one of their
individual teams is doing this the
result is organizational IT chaos. The
legal team writes a Python script for
PDFs. The sales team finds a hacky way
to scrape Salesforce. You end up with
custom code for chunking, custom scripts
for embedding, and zero standardization.
So, we've talked to organizations with
literally thousands of PC rag pilots
globally, all using different stacks.
So, we can't align on security because
every pipeline is different. You can't
align on quality because everyone is
using different chunking logic. Um, and
I can't tell you how many times we've
been called in just to entangle this
exact mess. So, how do you actually
prevent this outcome?
Well, the way that uh enterprises
typically invariably solve this is by
arriving at an organizational alignment
around an internal genai stack that can
be leveraged across all teams. So
instead of 50 custom DIY solutions
across 50 different departments, you
align the organization around one secure
stable ingestion uh and rag ETL layer.
Um and then that simplifies the
architecture ensuring that the stack is
kept secure and up to date and is
flexible enough to serve a wide range of
use cases. And of course uh that's where
we like to make a plug for ourselves. Uh
we offer just such a platform um on the
uh ingestion and rag ETL layer. So uh
what that gives you is instead of a
tangled nightmare of mess uh you end up
with an effortless unstructured data ETL
feeding as many highv value rack use
cases as your organization decides to
bite off. U so the benefit of here of
course choosing a platform like
unstructured is that there's no vendor
lockin for data sources um also for data
destinations the the available large uh
language models that you want to use in
your system the embedding models and so
on. Plus, it also features
enterprisegrade security and controls
and offers a number of flexible
deployment models. So, what this means
is that uh your developers can stop
writing maintenance scripts for their
DIY rag systems and start building
actual AI products. It also turns a
maintenance nightmare into a predictable
manageable utility. And in short, that
is how you scale rag from a single demo
to an enterprisewide capability that is
ready for the AI future of tomorrow.
Um, and so that's basically it for like
the main part of the sort of enterprise
uh ready uh overview. Um, but I do want
to touch also lightly as I said earlier
on rag for identic systems. Um, I'm only
going to lightly touch on it because
we're going to cover this topic in much
more detail in a future webinar. Um, but
the basic idea here um is that uh you
know you're introducing an intelligent
agent between the user and the rag
system. And so uh what this looks like
is that you're typically going to wrap
the rag system with an MCP server so
that can be called by an agent as one of
its tools. So the advantage here and I
sort of also touched on that earlier
with like the reflection and agentic rag
uh slide. Uh but the agent can actually
um review its response and then make a
subsequent follow-up queries to the rag
system. Uh and all that is basically to
ensure that it has all the information
it needs to effectively either answer
the question posed by the user or else
if it was solving a task it can use that
information and retrieve data uh to
effectively solve that task. Again we'll
cover this more in depth in a future uh
webinar but since agentic rag is
exploding in popularity I did just want
to touch on it here.
And with that um that concludes uh the
talk for today. Um if you enjoyed uh the
discussion uh make sure you check out uh
one of our past webinars. Um this is the
most recent um making your data work for
you rag strategies that scale as well as
um AJ and Kevin are back uh after the
break to discuss rag over evolving
enterprise knowledge. >> Awesome.
>> Awesome.
>> And with that we can take some questions.
questions.
>> Awesome. Um Daniel that was fantastic.
Thank you so so much. We're going to
have to make this a rapidfire round of
questions. So, we're going to go through
these pretty quickly. Um, thank you to
everyone who submitted questions. Um,
the first one is, do you have a
recommended end-to-end blueprint for
integrating unstructured with common
enterprise stacks, Snowflake, data
bricks or data lakes so that rag stays
in sync with upstream data? Um, so we
are the blueprint or that's like what we
are intending to be. So, we have staple
connectors to all of the sources that
you mentioned. Um, and we can run
incremental syncing. so that you're only
syncing updates and deletions to avoid
blowing up your ingress and egress
costs. Quickly on to you, Daniel. How do
you benchmark unstructured pipelines
impact on rag quality versus a naive
ingestion pipeline? Um, eg basic text
extraction and fixedsized chunking.
>> Yeah, I I sort of touched on this with
the eval section. Um, and also a little
bit on the ROI section. Um so
basically with like when you're doing uh
an evaluation of the ingestion portion
um in the context of reggg you can
actually miss a lot of the pictures
because like if you're your um golden
set is too small you know it could
actually be that it's able to answer all
the questions correctly even with a
basic extraction. Um but over time as
that long tail if you have that
continuous improvement and your golden
set begins to grow it'll start to reveal
cracks in the ingestion. Um and so uh
that will basically like identify those.
Um we also have if you want to just like
individually focus on the ing injection
metrics which we recommend. Uh we just
released um a new technique for doing so
called score. Um and definitely check it
out. It's all over our social um you
can't miss it. Um, but it's a very
powerful system for just evaluating uh
na naive ingestion.
>> Awesome. Um, we have some really
additional thoughtful slides. I don't
know if we're going to be able to get to
them because we're coming up on time,
but I just want to say thank you so much
for everyone who joined. Thank you for
the questions. We will respond in email
to the individuals who submitted
questions ahead of time. Um, and we will
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.