YouTube Transcript:
Webinar: How to Build Enterprise-Ready RAG Systems

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

This content explains how to build enterprise-ready Retrieval-Augmented Generation (RAG) systems, emphasizing that RAG is a vital and growing technology, not obsolete, and crucial for securely integrating proprietary data with large language models (LLMs).

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

want to dive in.

>> Let's go ahead and get started here.

>> All right. Um, today we are going to be

discussing how to build enterprise ready

rag systems. Um,

yeah. Uh, before we dive into the actual

content, some really quick introductions

on our side.

Oh, my name is Kevin. I'm on the go to

market team here at Unstructured. I'm

the Ryan Serest of our webinar series.

I'm joined by my better half, Daniel

Scoffield. Um he is a principal

solutions architect. Um awesome.

The agenda for today's conversation, a

little bit of background on who

Unstructured is, what are the problems

that we solve? Um also what is Rag and

what are the problems that we set up to

solve for and then how do we implement

RAG systems successfully across the enterprise.

enterprise.

[snorts] Um some quick housekeeping.

Make sure that chat goes in the chat

box. If you have specific questions you

want us to get to at the end of the

webinar, there is a separate tab for

that. We will send a recording of the

webinar after the conversation with

resources to get started on

unstructured. And if you do want to talk

with somebody like Daniel about your use

cases, go to our website, schedule a

All right, a little bit of background on

unstructured. We started three years ago

as an open- source product looking to

answer the question, do companies

struggle to take unstructured data and

convert it into a structured JSON

output? Oh, Daniel, do you mind going to

the next slide?

Um, you flash forward to today and it's

been a pretty incredible ride. We have

amazing customers. We have over 50

million product downloads of our open

source product and we have some pretty

incredible backers who are helping us

solve this problem at scale. Um, and

without further ado, Daniel, I will turn

it over to you.

>> Thank you for that fantastic

introduction, Kevin. Um, as Kevin said,

my name is Daniel Scoffield. I'm a

principal solutions architect at

Unstructured. I've been with an uh been

involved with countless enterprise rag

deployments. So, I'm excited to be able

to share some of my insights about what

it takes to actually build an enterprise

ready rag system. I'd like to start

today's talk with a quick overview of

the current state of RAG here in late

2025 and then we'll get into the meat of

today's talk. Uh to kick us off though,

I'd like to address uh a common idea you

probably encountered periodically uh

especially the LinkedIn AI space over

the past couple years. And that is this

periodic yet persistent notion that rag

is dead. And this idea comes in various

forms and variations. Um such as that

rag is now somehow obsolete or outdated.

Uh you might hear something like rag is

so 2024.

Um, and it's also uh common when uh you

know you see such things as like a

sizable context window uh increase where

um you know for example when Google

released the first Gemini model with a 1

million token context window uh it

nearly broke the internet with all the

rag is dead chanting from AI

influencers. And it's also of course

whenever a new flavor of rag gets

introduced it's the same thing rag is

dead long live corrective rag. Um, so

you can quote me on this. Rag is not

going away as long as large language

models are still the dominant technology

powering the AI landscape. So the next

time you hear some clickbaity

announcement stating that rag is dead, I

want you to remember back to the many

'90s era rappers who are way ahead of

their time that said it best when they

said rag till we die. Uh, you even have

this nice graphic to remember that by.

>> Is that what they said?

>> That's exactly what they said, Kevin. Okay.

Okay.

>> Uh, of course, this is just a fun idea,

but I honestly would not be surprised if

RAG indeed does actually outlive us all.

Um, looking at the projections for the

back half of this decade, uh, the growth

of the fundamental rag pattern actually

looks stronger than ever. Of course,

these are just projections, but

according to the wellrespected Grand

View Research, uh the REGG market is

expected to grow at an impressive uh

near 40% compound compounded annual

growth rate uh going into 2030, reaching

over a 10 billion uh market size. So,

rest assured, RAG is not only not dead,

but it would seem that it's actually

only just getting started. Um and why is

that? So what problems does rag actually

solve uh for the enterprise? Um and

really the answer starts with a simple

problem. Um and that problem is that LM

are stateless. The only realistic way to

make an LM smart about your business

data. Um aside from fine-tuning, which

is impractical for a lot of reasons, um

the answer is to inject that information

into its context window at runtime.

Couple that with the fact that there is

an explosion of LM adoption where every

company is in a race to take advantage

of AI's capabilities. There's an urgent

demand for making a company's

proprietary context available to these

models. So if you have this demand, how

do you actually solve that securely? Um

this will appeal to the architects out

there. Um if you just fine-tune a model

on all your sensitive HR and finance

data, you've effectively just collapsed

all your access controls and

traceability. So, REGGG actually

provides this auditability and a

separation of concerns around the

security of data. And because of the

security advantage, REGGG has become the

dominant pattern for enterprise

grounding. And on top of that, we're at

the dawn of the agentic era where AIs

don't just answer questions, but they

also take actions. And so, even in this

new world order, rag is still the

dominant pattern for grounding these now

agents with your enterprise context.

And within aentic systems, reggg is

actually not only being used to access

enterprise knowledges, knowledge bases,

but it's also actually being used to

endow the agent itself with memory so it

can recall its past actions and conversations.

conversations.

Um, but what does a mature production

grade version of this architecture

actually look like here in 2025?

Something like this. I assume that this

audience is already familiar with rag,

so I won't spin long here. But just to

get us all on the same page, here's a

quick 30 second tour of a somewhat

standard rag production system in 2025.

So it has two parts. First, you have the

offline ingestion uh in the top right

here uh where we take our data, we chunk

it, embed it into vectors, and then load

it in a vector database. That becomes

the enterprise knowledge base. And then

secondly, we have an online query flow.

Um this can actually be divided into two

steps. the retrieval step on the top

here and then the generation step mostly

on the bottom there. And the query step,

we analyze the query's intent, its

keywords. Um, we then search the vector

database to get the top most relevant

chunks. And then finally, in the

generation flow, we don't just jump

those chunks in. We rerank them. We

maybe run them against an anti-h

hallucination algorithm. and then only

the best context makes its way to the

prompt where it then uh is used to

retrieve uh and synthesize a final

answer. So this complete flow creates a

sound trustworthy system. Um so with

this blueprint in mind uh let's actually

talk about the larger rag ecosystem. So

the most important thing to understand

about rag today is that it's no longer

just referring to one thing. It's an

entire field of research that has

rapidly evolved over the last 5 years.

So this is roughly the journey. It all

started back in 2020 with Meta's famous

paper laying out the approach. Um that's

in hindsight been na named naive rag. Um

and naive rag is basically just the

previous slide if you remove a lot of

the intermediary steps. That's quickly

moved into advanced rag and modular rag

um to address a lot of the quality and

scalability shortcomings of naive rag.

And this is where most production

systems actually start today. Um, and

this is where we introduce critical

components like the reranking, the query

transformations and hybrid search. And

of course, you notice what happens.

You'll see the accuracy jumps, but so

does the cost and complexity as well.

And finally, we're now at

self-reflective and agentic rag, where

the system doesn't just retrieve, but it

analyzes its own results, decides if

they're good enough, and then even

re-queries if needed. Um, and this

five-stage evolution isn't just actually

a simple line. Um, it's really more of

an entire universe of techniques and

architectures. So this is what the

actual universe looks like today. Uh you

don't have to read all the 25 of these,

but uh this grid is here just to reveal

all the different flavors of rag that

you as an architect um basically you

have at your disposal when you're

choosing an implementation pattern for a

given use case. So do you need uh

conversational rag if you're building a

chatbot or if you're synthesizing across

multiple complex documents you might

want to look at multihop rag and so on.

But even once you've chosen the right

flavor of rag for a given use case,

you're still going to need a strong ROI

uh focused business case as well as uh

you know robust security governance uh

evaluation frameworks and so on in order

to move that pilot into production. So

here is our road map for the next 10 to

12 minutes. We're going to cover the

remaining sort of four pillars of a

production system um the top four here

and then also quickly touch on uh rag

for a gentic systems uh which is going

to be light because we're going to

actually discuss this more in depth in a

future webinar.

Um so with that um we're going to start

with the most important questions for of

any enterprise rag system and that is

ROI. What are we building and why? what

is the expected ROI and what meaning

business uh meaningful business outcomes

are we aiming to achieve in building

this system.

So this is actually the step that has a

tendency not to receive the most

rigorous scrutiny and disciplined

approach that it deserves and it's

reportedly one of the biggest reasons

projects fail to move past the pilot

stage. So either the cost of the system

doesn't end up justifying its delivered

value or else it had a weak value

proposition from the start. So, how do

you find a use case that's worth

pursuing? The core idea is simple. You

want to target um high value friction in

the organization. You're not just uh

trying to go around automating tasks.

You want to find a process where your

most expensive employees are wasting the

most time looking for the most important

information. So, it's really that

intersection of high value and high

friction. That's your sweet spot. Um and

of course, this is going to look

different based on your business domain.

Um but let's just look at a few example

use cases here. Um so take the customer

support use case. Uh you have support

agents toggling five screens looking at

five different sources of information.

That's friction where a rag system can

help. Or take a sales enablement use

case where you have sales reps hunting

down scattered internal repositories

searching for uh answers to security

questionnaire in order to answer in

order to close a high stakes deal. Kevin

can probably uh attest to this that he's

been there and done that. That's

friction. If you find the

[clears throat] friction, apply rag, you

get ROI.

Um and so on. Um but once you've found

your high friction use case, how do you

then prove that it's actually working?

That's where success metrics come into

play. This is all about measuring what

actually matters. And the biggest

mistake a team makes and we see this

actually again and again is only

focusing on the technical metrics like

accuracy or uh engagement with the tool.

Nobody on the business side of the org

actually cares about a blue score

though. Success for them is measured by

business outcomes. So you really need to

target the business outcome metrics. Um

and so these are going to be the KPIs of

your VP or director um that they already

have on a dashboard somewhere. You're

not creating new metrics here. You're

actually moving the needle on their

metrics. So in support, it's things like

decreasing average handle time. In

sales, you're decreasing the sales cycle

length. Um this is actually the language

of ROI. And if you aren't moving the

business outcome, the next two don't

actually matter. So of course, if you

are creating that ROI, then technical

metrics also become of great importance.

So we'll cover two of those. The first

is the enduser experience. Um how do you

actually quantify that though? This is

all about whether users like and trust

the tool and if it actually creates a

positive impact in their workflow. Um

the single most important metric in this

area is going to be the simple thumbs up

thumbs down rating um that you embed

within the application itself. And this

is your real world sort of continuous

feedback loop. Finally, we have our

system performance metrics. These are

the ones that we as engineers love. Um

context relevance, did we find the right

stuff and answer faithfulness, did the

LM stick to the script. So these are

essential but of course you have to

remember their only purpose is to serve

the two categories above.

Okay. So now that we have a use case in

mind with a strong ROI potential we need

to shift our mindset uh to security. A

pilot is only going to give uh get off

the ground if it doesn't represent a

massive security risk uh to the organization.

organization.

Um so how do you actually secure a rag

system? Now the most critical part, you

cannot treat an LM based application

that connects to your enterprise data

like a standard application. [snorts]

These systems really require a

three-stage security posture. So first

you have to have your pre- retrieval

guard rails. This is going to be what

authenticates the user ensuring that

they are who they say they are as well

as authorization where you're checking

whether they should have access to the

system at all in the first place. And

this one's actually pretty

straightforward. You'll see this across

many applications. Um but the next one

is actually very rags specific and

that's the retrieval time guardrails.

This is all around securing the data

itself. So this one comes with some uh

challenges. Um but security in this step

basically means that you're ensuring

that both the user as well as the LLM

only see documents or pieces of

documents that they're allowed to see.

And finally there's post retrieval which

is more around securing the answer. And

this is u basically filtering out bad

results um you know flagging incorrect

information possibly toxic content

before it reaches the end user. So now

of all these the most important to

actually get right and also

unfortunately the one that comes with

most unique security challenges is the

second one securing the data. So let's

zoom into that one uh for just a minute.

Uh so within this one um before you

write any code you really want to

understand what the security model needs

to be for the system. So, this is going

to often be use case specific. Um, and

within that use case, you need to ask

yourself who is the audience of this

tool and what is the sensitivity of the

data. If this is a public chatbot or a

generic helper, you might be able to get

away with just setting up a new simple

permissions system from scratch, such as

a basic arbback system where you're uh

implementing by tagging documents with

certain metadata during ingestion and

then filtering on those tags during

retrieval. That's actually the easy

path. Um, but for a lot of internal uh

enterprise use cases, you often hit

what's called the access control

mirroring problem. Um, or at least

that's what we call it. Um, and

basically what is that? Um, so it's

essentially stems from the fact that

with a rag system, you're actually

making a vectorzed copy of your data. So

if your rag system contains data from

SharePoint, from Salesforce, from Slack,

uh, you can't just invent new rules. You

should mirror the existing ones. So uh

you know for example if Jack can't see

the HR folder in SharePoint he probably

should not be allowed to query uh it in

your rag system. Um so how do you

actually achieve that uh in in the wild?

Um you effectively have three

architectural choices. So there's pre-

retrieval filtering. Um this is where

you actually bake the permissions um

from the source uh systems record into

the vector metadata. Um now this

approach actually allows uh for the

fastest query time. But if you go this

route, you often end up with um syncing

issues and you know re-indexing the data

which can of course have uh negative um

impact on cost and also latency in some

ways. Um the next approach is actually

to filter the retrieve content um once

it's been actually fetched. So in this

case you would uh fetch everything and

then check permissions using some sort

of external authorization service. So

this is very secure, easy to keep up to

date. Uh but this can actually add

latency and it can actually be very

challenging at scale. Um so uh typically

uh in full-blown production systems you

often see a hybrid approach. Um where

you basically do some sort of course

filter during the pre- retrieval. Um

this can be filtering on a metadata tag

such as like the department. Um and then

you might couple that with a fine grain

check via the post retrieval approach to

check for the user's actual permissions.

Um, so baking the permissions into the

vector database probably doesn't need

much explanation, but for the post

filter and retrieval option, you might

still be less left with a lot of

questions there. Um, oh, I actually

forgot I skipped that slide. We'll go

ahead and uh skip that for now. Um, and

just move on to evaluation. But um, I

had a slide in here about uh, basically

the different rag security models and

you know tool calling versus uh,

permissions graph ACL table and so

forth. Um but let's go ahead and jump on

to evaluations.

Um so uh this is a critical two-part

challenge. Um you can't just feel if

your rag system is accurate. Uh you need

a practical two-pronged approach to

prove it's working. So first uh is your

offline uh first is your offline

pre-eployment testing. So this is going

to be what's baked into your CI/CD and

it operates as your rag system safety

net. So in this set you in this step you

basically build what's called a golden

set. So this is maybe a 100 to a few

hundred Q&A pairs of your most important

questions. Uh these are the questions

that you can't get wrong. And then

before you deploy any change whether

that's a new prompt a new chunking

strategy etc you want to run your new

build against this golden set

programmatically scoring if those uh if

using those technical success metrics we

saw earlier. So this is your context

relevance and your answer faithfulness.

This is what lets you quantify

improvement. So now you can say things

like our new chunking strategy is 8%

more effective than the old one. Uh but

you can't think of everything. So that's

why you also have to couple that with an

online inproduction feedback um loop. So

this is your thumbs up thumbs down

button. And that is the start of your

most important workflow which is the

continuous improvement workflow.

Um and so this is what that looks like.

Um, and you can basically think of this

as like the thumbs down workflow. So

whenever your system produces a bad

response and a user flags as such with a

thumbs down, that's going to kick off a

task that goes into a review queue where

it gets triaged. So the goal of triage

is to is to determine if the process

failed at either the retrieval step or

the generation step. So from there it

can be addressed accordingly. If it was

a failed retrieval then you want to look

at basically the retrieval components uh

the ingestion, the chunking, the ranking

and so forth. Um, and if it failed at

generation, you want to actually look at

the the prompt. So, you might need to

fine-tune the prompt, adjust the logic,

and so forth. Um, and then, uh, this is

a critical step. Once the fix has been

made, that question and its

corresponding answer can now be added to

the golden set. So, you're effectively

strengthening your test coverage of the

system over time. So, finally, uh, that

takes us to the last pillar in our

discussion today around organizational

alignment. Um, and this is perhaps the

most painful lessons that we've seen a

lot of companies have to learn. Um, and

we've observed it time again in the

field. So, we've talked about ROI, we

talked about security and evaluation,

but how do teams actually try to build

these systems uh for their enterprises?

From what we've seen, this is uh what

usually happens. We call it the DIY's

rat's nest. Because rag is such an

emerging field, there aren't many good

all-in-one platforms out there that can

meet the stringent security requirements

of enterprises. So most organizations

have opted to build it their own little

pipelines in house. The problem is that

for enterprises when every one of their

individual teams is doing this the

result is organizational IT chaos. The

legal team writes a Python script for

PDFs. The sales team finds a hacky way

to scrape Salesforce. You end up with

custom code for chunking, custom scripts

for embedding, and zero standardization.

So, we've talked to organizations with

literally thousands of PC rag pilots

globally, all using different stacks.

So, we can't align on security because

every pipeline is different. You can't

align on quality because everyone is

using different chunking logic. Um, and

I can't tell you how many times we've

been called in just to entangle this

exact mess. So, how do you actually

prevent this outcome?

Well, the way that uh enterprises

typically invariably solve this is by

arriving at an organizational alignment

around an internal genai stack that can

be leveraged across all teams. So

instead of 50 custom DIY solutions

across 50 different departments, you

align the organization around one secure

stable ingestion uh and rag ETL layer.

Um and then that simplifies the

architecture ensuring that the stack is

kept secure and up to date and is

flexible enough to serve a wide range of

use cases. And of course uh that's where

we like to make a plug for ourselves. Uh

we offer just such a platform um on the

uh ingestion and rag ETL layer. So uh

what that gives you is instead of a

tangled nightmare of mess uh you end up

with an effortless unstructured data ETL

feeding as many highv value rack use

cases as your organization decides to

bite off. U so the benefit of here of

course choosing a platform like

unstructured is that there's no vendor

lockin for data sources um also for data

destinations the the available large uh

language models that you want to use in

your system the embedding models and so

on. Plus, it also features

enterprisegrade security and controls

and offers a number of flexible

deployment models. So, what this means

is that uh your developers can stop

writing maintenance scripts for their

DIY rag systems and start building

actual AI products. It also turns a

maintenance nightmare into a predictable

manageable utility. And in short, that

is how you scale rag from a single demo

to an enterprisewide capability that is

ready for the AI future of tomorrow.

Um, and so that's basically it for like

the main part of the sort of enterprise

uh ready uh overview. Um, but I do want

to touch also lightly as I said earlier

on rag for identic systems. Um, I'm only

going to lightly touch on it because

we're going to cover this topic in much

more detail in a future webinar. Um, but

the basic idea here um is that uh you

know you're introducing an intelligent

agent between the user and the rag

system. And so uh what this looks like

is that you're typically going to wrap

the rag system with an MCP server so

that can be called by an agent as one of

its tools. So the advantage here and I

sort of also touched on that earlier

with like the reflection and agentic rag

uh slide. Uh but the agent can actually

um review its response and then make a

subsequent follow-up queries to the rag

system. Uh and all that is basically to

ensure that it has all the information

it needs to effectively either answer

the question posed by the user or else

if it was solving a task it can use that

information and retrieve data uh to

effectively solve that task. Again we'll

cover this more in depth in a future uh

webinar but since agentic rag is

exploding in popularity I did just want

to touch on it here.

And with that um that concludes uh the

talk for today. Um if you enjoyed uh the

discussion uh make sure you check out uh

one of our past webinars. Um this is the

most recent um making your data work for

you rag strategies that scale as well as

um AJ and Kevin are back uh after the

break to discuss rag over evolving

enterprise knowledge. >> Awesome.

>> Awesome.

>> And with that we can take some questions.

questions.

>> Awesome. Um Daniel that was fantastic.

Thank you so so much. We're going to

have to make this a rapidfire round of

questions. So, we're going to go through

these pretty quickly. Um, thank you to

everyone who submitted questions. Um,

the first one is, do you have a

recommended end-to-end blueprint for

integrating unstructured with common

enterprise stacks, Snowflake, data

bricks or data lakes so that rag stays

in sync with upstream data? Um, so we

are the blueprint or that's like what we

are intending to be. So, we have staple

connectors to all of the sources that

you mentioned. Um, and we can run

incremental syncing. so that you're only

syncing updates and deletions to avoid

blowing up your ingress and egress

costs. Quickly on to you, Daniel. How do

you benchmark unstructured pipelines

impact on rag quality versus a naive

ingestion pipeline? Um, eg basic text

extraction and fixedsized chunking.

>> Yeah, I I sort of touched on this with

the eval section. Um, and also a little

bit on the ROI section. Um so

basically with like when you're doing uh

an evaluation of the ingestion portion

um in the context of reggg you can

actually miss a lot of the pictures

because like if you're your um golden

set is too small you know it could

actually be that it's able to answer all

the questions correctly even with a

basic extraction. Um but over time as

that long tail if you have that

continuous improvement and your golden

set begins to grow it'll start to reveal

cracks in the ingestion. Um and so uh

that will basically like identify those.

Um we also have if you want to just like

individually focus on the ing injection

metrics which we recommend. Uh we just

released um a new technique for doing so

called score. Um and definitely check it

out. It's all over our social um you

can't miss it. Um, but it's a very

powerful system for just evaluating uh

na naive ingestion.

>> Awesome. Um, we have some really

additional thoughtful slides. I don't

know if we're going to be able to get to

them because we're coming up on time,

but I just want to say thank you so much

for everyone who joined. Thank you for

the questions. We will respond in email

to the individuals who submitted

questions ahead of time. Um, and we will

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Webinar: How to Build Enterprise-Ready RAG Systems