YouTube Transcript:
Gen AI Course | Gen AI Tutorial For Beginners

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

This mini-course provides a comprehensive introduction to Generative AI (Gen AI), covering its fundamentals, the LangChain framework for building Gen AI applications, and practical implementation through two end-to-end projects: an equity news research tool and a retail industry Q&A tool.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

welcome to this generative AI mini

course first we will understand the Gen

AI fundamentals then we will learn Lang

chain which is a python framework used

for building gen application and in the

end we will build two endtoend gen AI

projects the first project will be using

commercial GPT model where we will build

equity news research tool the second

project will be using open-source llm

model where we will build a Q&A tool in

retail industry let's start with the

definition of gen ai ai can be

categorized into two sections generative

ai non-generative ai when you talk about

non-generative AI you are dealing with

problems such as you have a chest x-ray

and you want to find out if this person

has pneumonia or not or maybe you have

some data on person's credit history and

you want to figure out if the person

should be given a loan or not in these

problems you are not creating new

content you have data and based on that

data you are making certain decisions in

the case of generative AI however you

are generating new content the classical

example is chat GPD chat GPD is a gen

application here you can write your

resume you can plan your trip you can

even create an image of Hulk playing

Gujarati dandia in summary generative AI

is a category of AI that is associated

with generating new content and this new

content can be text images video audio

Etc let's now look at the evolution of

jna in the early days of AI the kind of

problems that we used to solve was

predicting home price based on factors

such as the area the bedrooms the age

Etc this is called statistical machine

learning and the factors which determine

the price of the home such as area

bedroom age Etc they were called

features now these were simple features

when it comes to image recognization

such as identifying if the image is cat

or dog the feature was kind of complex

see here the features are the whiskers

or the pointy ears that the cat has and

the this data which is an image data is

just a bunch of pixel so it is

unstructured data in the case of house

price prediction you have structur data

area bedroom Etc so therefore these

features in an image they are called

complex features the cat's face could be

in a different angle and this ear can be

in let's say in this corner or let's say

the legs are not present at all so that

makes uh image detection problem much

complex and sometimes you can't use

statistical ml for this therefore neural

networks were invented and that gave

birth to deep learning so there was a

statistical machine learning before then

came deep learning where neural networks

was the main approach that we used used

after that came recurrent neural network

so in recurrent neural network what you

have is this kind of network and let's

say you are trying to translate this

particular sentence in a different

language let's say you want to translate

from English to Hindi what you do is you

feed the first word to your neural

network so I will feed D to a neural

network and it will give me this

translation this A1 is nothing but the

translation of D in let's say Hindi

after that you feed the second word

and the translation of previous word to

the same network so these are not four

different networks okay this is just a

time exess it's the same network where

you feed the translation of the previous

word or previous sentence so that

created a kind of a loop within the

neural network and it is called

recurrent neural network this was used

for solving problems like language

translation and then came more

sophisticated problem which was kind of

generative in nature so here is the

email which I got and in Gmail when I

try to respond it will try to

autocomplete you just notice that when I

say thanks for reaching out here's my

availability see it is autoc completing

all these words and the way it works is

it looks at the content of the email so

let's say Angela said hey we have a

potential collaboration opportunity do

you have time to talk so the network

will read this sentence and then uh it

will try to predict the probability of

the next word so when I'm saying Angela

thanks for the probability of uh saying

contacting me is higher and probability

is generally between zero and one let's

say contacting me has a probability of

091 I can also say thanks for reaching

out and probability of that could be

higher I can say thanks for finishing

the project but uh in the context of the

previous email that Angela send me the

probability of this might be lower I'm

putting these numbers randomly but you

get an idea if Angela had said that uh

hey D I have finished the project please

check it out in that case I can say

Angela thanks for finishing the project

so if we had that kind of context then

the probability here would be higher but

since we're talking about the

collaboration the probability of

finishing the project is lower so you

get an idea here if you pass huge Corpus

of text and try to build a language

understanding you will be able to

predict the next word that comes in the

sentence and this is called language

model it is an AI model that can predict

the next word or set of words for a

given sequence of words so here I have

text from Wikipedia okay and just look

at it uh this is the Wikipedia article

on India's Freedom Movement and here to

train the neural network networ we can

create the training pairs from this

paragraph we can uh come up with uh a

problem that we want to solve and the

problem is uh filling the missing word

so here I'm saying first Indian

nationalist so first Indian nationalist

to embrace okay so what is the missing

word here well nationalist similarly we

can come up with this kind of training

Pairs and then we give these training

pairs to neural network for training

okay and this approach of training a

neural network is called

self-supervised learning so the good

point here is that when you want to

train a language model you don't need a

lot of label data you can take Wikipedia

text text from uh news articles text

from variety of books and generate these

training Pairs and then train the neural

network and after that neural network

will be able to predict the missing word

or the word that comes next in the

sentence just like how Gmail

autocomplete is working and when you

feed huge amount of data to this neural

network and let's say when you have so

many layers in the neural network let's

say the neural network is Big what we

get is large language model gp4 a model

behind chat GPT is a large language

model it has 100 75 billion parameters

and when I say parameters is nothing but

all these weights in the neural network

so see this neural network has only 10

parameters imagine a huge network with

so many layers having 175 billion

parameters the critical breakthrough in

uh jni came when this particular paper

called attention is all you need was

published which gave rise to a spatial

neural network architecture called

Transformer so just to summarize

previously we had statistical machine

learning then came deep learning which

was based on neural network then came

recurrent neural network and then came

Transformers which is very powerful and

talking about Transformers we have

variety of architectures or variety of

models for example Google has this model

called bird open AI came up with this

model called GPT if you look at your

chat GPT application you will find this

model right now it's gp4 so that is the

model that they use and it's called

generative pre-rain Transformer that's a

long form of GPT similar to text model

we have image models too for example Del

stable diffusion Etc this is the image

generated by a model called stable

diffusion and if you have chat GPD pro

version you can generate image for

example generating an image of a person

selling what on mask look at this image

I mean is quite better I know this

person is not wearing the mask but still

it's doing pretty decent job to

summarize we have text to text model

such as bird GPT we have text to image

model such as di stable diffusion where

you give text and generates image and

open a SORA is an example of text to or

video model you can just Google open as

s you'll find the demo you can give a

text prompt and it will generate an

entire video out of it so all of this

has become possible because of

Transformer architecture and now that

you have good understanding of variety

of models in geni space let's look at an

analogy based understanding of llm what

I'm trying to do here is clarify all

these Concepts such as llm Vector DB Etc

because these concepts are used heavily

in the field of gen okay so next I'm

going to play an analogy based animated

video video of [Music]

[Music]

llm Peter pande has a curious parrot

called buddy buddy has a great mimicking

ability and a sharp memory Shar memory

buddy listens to all the conversations

in Peter's home and can mimic them very

accurately now when he hears feeling

hungry I would like to have some Biryani

for this case the probability of him

saying Biryani cherries or food is much

higher than the words such as bicycle or

book Budd he doesn't understand the

meaning of Biryani or food or cherries

the way humans do all he's doing is

using statistical probability along with

some Randomness to predict the next word

or set of words purely based on the past

conversations he has listened to we can

call Buddy is stochastic parot

stochastic Means A system that is

characterized by Randomness or

probability a language model is somewhat

like a stochastic parrot there are

computer programs that use a technology

called neural networks to predict the

next set of words for a sentence for a

simple explanation of a neural network

please watch this particular

video just like how Budd is trained on

Peter's home conversations data set you

can have a language model that is

trained on for example all movie related

articles from Wikipedia and it will be

able to predict the next set of words

for a movie related sentence Gmail

autocomplete is one of the many

applications that uses a language model

underneath now that we have some

understanding of a language model let's

understand what the heck is a large

language model let's go back to our bir

example allbody got some Divine

superpower and now he can listen to

Peter's neighbor conversations

conversations that are happening in

schools and universities in the town in

fact not only in his town but all the

towns across the world with this extra

power and knowledge nowbody can complete

the next set of work words on a history

subject give your nutrition advice or

even write a poem like our powerful

parrot buddy large language models are

trained on a huge volume of data such as

Wikipedia articles Google news articles

online books and so on if you look

inside the llm you will find a neural

network containing trillions of

parameters that can capture more complex

patterns and nuances in a language chat

GPT is an application that uses llm

called gpt3 or gp4 behind the

scenes other examples of llms are palm2

by Google and Lama by

meta on top of statistical predictions

llm uses another approach called

reinforcement learning with human feedback

feedback

rlf let's understand this once again

with Buddy one day Peter was having a

conversation with his cute little

2-year-old son son don't eat too much

bananas else I will punish you with an iron

iron

Rod hearing this Peter realized that

buddy has been listening to the

conversation from abusive parents in his

town what he said was the effect of that

Peter then starts keeping a close eye on

what buddy is saying for the same

question buddy can produce multiple

answers and all Peter has to do is tell

him which one is toxic and which one is

not after this training budy doesn't use

any toxic language while training CAD

GPT open a used a similar approach of

human intervention

rlf open I used a huge Workforce of

humans to make Chad GPT less toxic while

llms are very powerful they don't have

any subjective experience emotions or

Consciousness that we as humans have

llms work purely based on the data that

they have been trained on now that you

have some understanding of llm let let's

cover two other important topics called

embeddings and Vector database embedding

is nothing but a numeric representation

of text in form of a vector such that

you can capture the meaning of that text

once you create embeddings for a given

text you can do math with the words and

sentences such as Paris minus France

plus India equal to Delhi or apple minus

Tim Cook plus sat Nela equal to

Microsoft this sounds crazy right but it

is actually possible due to embeddings

and Vector database allows to store

these embeddings and perform efficient

search on these embeddings so let's try

to understand these Concepts in a bit

more detail we'll start first with the

startup boom that is going on in the

field of vector databases there are some

AI startups that have raised millions of

dollars of funding and they have one

product in common which is Vector

database let's try to understand what

exactly is Vector database today when

you search in Google calories in apple

versus employees in apple Google figures

out that the first Apple means fruit and

the second one is a company have you

ever wondered how does Google does this

it uses a technique called semantic

search semantic search means not

searching using the exact keyword

matching but understanding the intent of

a user query and using the context to

perform the search for doing semantic

search internally it uses the concept of

embedding word embedding or sentence

embedding is nothing but a numerical

representation of text let's first

understand how exactly embedding Works

let's figure out how you can represent

this word Apple into a numeric

presentation given this particular

context one way is to think about

different features of properties of

words here you can have related to

phones e location has talk Etc as

properties and then you assign value for

each of these properties Revenue here

means $82 billion you get a sequence of

numbers as a result and that is nothing

but a vector so this Vector is a word

embedding for the word Apple for this

particular context if you're talking

about apple the fruit then the embedding

might look something different because

the value of these properties is

different and when you have the

embeddings for different words looking

at the embedding you can say that the

second apple and the word orange they

are similar because look at their values

they're matching of course there are

some values which are not matching but

compared to this vector and the first

Vector second and third Vector are kind

of similar same way if you have let's

say Samsung as a word you can represent

that into a numeric uh presentation is

it related to phones yes is it a

location no and when you look at again

all the vectors you can figure out that

the first and the fourth vectors are

kind of similar so using these vectors

you can figure out the similarity not

just similarity you can actually do a

complex arithmetics such as this this is

a famous example in the NLP domain where

you can perform this mathematics using a

technique called word to word to is a

technique to represent word into a

numeric representation I have made a

separate video so if you want to know

more about it you can go and look at

this video in that video I have

explained how you can generate

handcrafted features for each of these

and you can do this particular math now

just for intuition I explain everything

using handcrafted features but in

reality you use some complex statistical

techniques to generate these word

embeddings again if you have curiosity

you can watch those two videos or this

particular video on bir so far let's say

you have this understanding that there

are variety of these techniques that you

can use to represent a word or a

sentence or even a document into an

embedding and here are just different

techniques which are being used uh in

chat GPT era obviously Transformer based

embedding techniques are getting popular

so when you're using open AI API for

embedding uh you know what technique it

is using underneath when you're are

building any text based AI application

you might have thousands or even

millions of embedding vectors and you

need to store them somewhere when you

think about storing them the first

option that comes to mind is a

traditional relational database so let's

say for our use case we have these four

articles the first two red are related

to apple the fruit the remaining are

Apple the company you will first

generate the embedding let's say using

open AI API and then you will save that

into let's say your SQL database now

when you have a search query you will

also generate embedding for that and you

will try to compare this embedding with

the stored embedding and try to retrieve

the relevant documents here you will use

a concept of cosign similarity to

retrieve the matching vectors and you

can display it in your Google search

result this in theory works okay but in

reality you will have millions of

Records or even millions of Records in

your database and that's when things

starts getting interesting because just

think about matching this Vector for a

query Vector if you want to match with

these stored vectors then one of the

approach you can use is linear search

where you go one by one and if cosine

similarity is close to one then you will

put that Vector into your result data

set and then you can keep on going and

store your result vectors now you

already realize the problem if there are

millions of stored Vector embeddings

your computation is going to be too much

you know your head will be raised

because you can't handle delay and

computational uh requirements for such a

use case you need to do something smart

how do we do this in a traditional

database well we use a thing called

index database index helps you search

things faster similarly in this

particular use case we can use one

hashing function we don't need to go

into detail what that hashing function

is but let's say this hashing function

is creating buckets of s similar looking

embeddings okay and then when you have a

search query you can let that go through

the same hashing function which will

bucket it into one of these three

buckets and then within that bucket you

can do individual linear search this way

you are only matching with those vectors

which are in bucket one you don't have

to match it with bucket two and bucket

three this will speed things up and this

technique is called locality sensitive

hashing this is one of the Techni

techniques that Vector databases is

using there are many such techniques and

those techniques are outlined in this

beautifully written article I'll provide

a link to this article you can uh read

through it so far you have realized that

Vector databases help you do faster

search they also help you store things

in an optimal way so these are the two

big benefits why Vector databases are

gaining popularity after databases we

need to understand the concept of

retrieval augmented generation also

known as rag in my company at lck

Technologies where we work on multiple

AI client projects one of the common

requirements is we have this private

organizational data or we have this

public custom data set can we fine-tune

chat GPT on this or can we build chat

GPT like Solution on this specific data

set and the way you can do that is by

using rag let me explain this by giving

you an analogy let's say you have a

college student called mea mea is a very

smart individual she's uh generally good

with things she's studying in computer

science now we want her to appear in

some competitive biology exam which will

be based on this book called How

microbes rule the world the idea here is

to not make Mera a biologist but she

should just go and appear in that

competitive exam and win the prize now

you can do this thing in two ways option

one is full-fledged training where she

goes to a college attends biology

classes for one entire year and becomes

expert in biology option number two is

if the exam committee is allowing open

book exam mea can take this book with

her you might be aware about open book

exam concept where you can take a book

in the examination and you can refer to

the book to write the answer which

option will you go for obviously option

number two because you will save time

and money both mea doesn't have to wait

for one entire year to uh finish her

biology studies instead she will take

this book and since she's very good in

terms of Reading Writing and generally

she's very smart so she can quickly

figure out the answers and she can uh

write those answers during the exam

similar to this approach when you're

building rag based jna application what

you do is you take question uh from a

user and the llm or the large language

model will refer to our database now

this database could be Excel files PDF

documents SQL database anything this

could be either your private internal

organizational data or it could be a

public data the idea is you want llm to

pull the answer from this particular

data sources and llm can do that using

this concept called rag okay we will go

over technical understanding of rag when

we do our end to end project later on in

this video but let's first look at the

tools that we can use for building gen

AI applications Chad GPT is a gen AI

application okay which uses gp4 as a

large language model now this AI model

is similar to human brain which is train

see if I look at my brain it has been

trained through my experiences through

the college studies through the books

that I have read Etc gp4 is a similar

model that is Stained on your Wikipedia

articles books and so much of that data

now having brain alone is not enough you

need a body correct so I have this brain

but uh if you want to uh get benefit of

the knowledge in this brain you need

this body like this eyes this voice

which can interact and solve variety of

problems similar to that gp4 model is

just a just like a human brain you need

body around it and that body is nothing

but a backend server if you have done

software engineering you might be aware

about this web servers which are based

on nodejs Jango fast API Etc so you

build this kind of a server now when

you're using chat GPT application uh the

server runs in open AI Cloud but let's

say tomorrow you want to build your own

application let's say this kind of uh

database Q&A system for retail store

where you ask a question and it will

pull the answer from your internal

private database we are going to build

this tool by the way later on okay so

stay tuned but here this UI or the front

end will be making a call to a backend

server which can be using this gp4 model

and it will use retrieval augmented

generation rag to pull the answer from

my SQL database we will see in when we

do that project how exactly this is done

but I'm just giving you an overview

right now so the answer that you looking

for that how much total price of

inventory for small size t-shirt this

answer is is available in this database

and gp4 is converting your uh question

into SQL query and it is pulling the

answer and you can deploy this backend

server into Azure Cloud uh Azure has

this service called aszure open AI

service where you can host your private

gp4 model so that your data is protected

it doesn't go out if you don't want to

use Azure you can use things like Amazon

badrock now Amazon badrock doesn't have

GPT model but it has other foundational

models such as uh Cloe and I think it

has this model called llama see llama 2

mistal stable diffusion it has so many

models so to summarize there are

commercial models such as GPT jini Etc

but there are open- Source models such

as MRA Lama Etc now let's say you build

your application using GPT model and

tomorrow you are not able to afford the

gp4 bill because they charge per token

okay and let's say you want to save your

cost what do you do well you use open

source model such as Lama 2 so in this

case you might have build time to uh

develop this code let's say you have

20,000 lines of code and now you have to

redo all these things wouldn't it be

nice if we have some way where you write

an application you write some code and

you plug your gp4 model and tomorrow if

you can't afford the bill you plug a

different model okay so that way is Lang

chain Lang chain is a python framework

that is used to build J applications and

it's a very popular framework nowadays

so if you are building J application

you'll have to learn this framework

because that makes building llm apps

easier to summarize the tooling for Gen

first thing you need is some model some

llm model it can be a commercial model

such as gp4 or or open source model such

as llama or mistal it can be image

models as well then you need cloud

service so you can use aure open AI

Amazon badrock Google Cloud there are so

many Cloud options that you have and

then you need framework like Lang chain

you can also use hugging face

Transformer library to use variety of

Open Source model and then in terms of

deep learning libraries you can use py

torch tensor flow Etc now is the time

that we learn about Lang chain uh

framework so I had this other Lang chain

crash course on my YouTube channel uh in

which I went through the Lang chain

framework so I'm just going to uh play

that now it is a crash course on Lang

chain all right so let's get started

with Lang chain now Lang chain is a

framework that allows you to build

applications on top of llm or large

language model in this crash course

video we are going to go over all the

basics of Lang chain and then we will

build a restaurant idea generator

application using streamlet where you

can input any cuisine Indian Mexican Etc

and it will generate a fancy restaurant

name along with the manual items first

let us understand uh what is Lang chain

and what kind of problem does it

addressed when you are using Chad GPD as

an application internally it is making

call to open AI API which internally

uses uh any llm such as GPD 3.5 or 4 in

this case chat GPD itself is not an l l

m it is an application whereas GPD 3.54

these are large language models now

let's say you want to build an

application for restaurant idea

generator where you give a Cuisine and

then it will generate a fancy name such

as sto Temptation for Mexican and manual

items as well let's say you give Indian

Cuisine it will say Okay Curry K Palace

or Sahara Palace for Arabic along with

this manual items so this is a sample

application which we are going to build

but this is an llm based application and

for this we can use the same

architecture as jet GPT where we can

directly call open AI API and here I

have provided a screenshot of their main

API uh so you can call it and you can

get a behavior similar to chat GPT

internally it will use GPT 3.5 or GPT 4

model and this case once again Resturant

idea generator is an application similar

to chat GPT but internally you are using

open AI API and the llms now there are

couple of limitations of following this

approach and by the way uh the reason

I'm telling you this is nowadays there

is a big boom in the industry where

every business wants to build their own

llm you would think why they can't use

chat GPD because chat GPD has no access

to your internal organization data so

people want to build applications which

are based on llm okay so there is a

clear demand and clear boom in the

industry for this and why do business

not use this kind of architecture well

there are couple of things to consider

first of all calling open AI API has a

cost associated with it for every th000

token they will

charge2 or something you can check open

AI pricing page but there is a cost

associated with it and if you're a

startup who is having funding issues and

you know your budget is limited this is

going to be a bottleneck for you another

thing is you might have noticed Chad GP

doesn't answer latest question its

knowledge is limited to September 2021

as of this video recording so if you

want to incorporate some latest

information let's say from Google

Wikipedia or somewhere else you can't

get that here the other issue is attick

is my own software development and data

science company if I want to know how

many employees joined last month chat

GPT can't answer because it doesn't have

access to my own internal organization

data so if you use this kind of

architecture for building your

application uh you will hit some

roadblocks or you will rather have some

limitation and look open AI guys are

pretty smart actually if they want they

can address all of this but their stance

is very clear we will provide

foundational apis and building framework

is something that other people should do

and that's what happens see if you have

just open a API it is not enough to

build llm therefore you need need some

kind of framework where you can call

open AI gpt3 gp4 or maybe if you want to

save the cost you call some open-source

models such as hugging face Bloom there

are so many models out there let's say

you want to use them you don't want to

uh spend money on openi gpt3 model uh

then this framework should provide that

plug-and play uh support you know where

you can integrate to one of these models

and your code kind of Remains the Same

this framework should also Pro provide

integration with Google search Wikipedia

or even integration with your own

organizational databases so that the

application can pull information from

these various sources as well and this

framework is Lang chain that is what it

does it's a framework that allows you to

build applications using llm okay let's

install Lang chain now and uh do some

initial setup let us first create an

account on open AI you can go to open a

website click on login and create a

credentials and once your login is

created you will come to a dashboard so

let me just show you so you'll go to

open AI say login you're logged in click

on API and then from your account you

can go to your manage account and API

Keys you will find a key here which will

look something like SK hyphen something

that is like a password so you need to

use that key in our code for Lang chain

you can also create a separate keys for

separate projects so I have some client

projects going on a YouTube tutorials so

for each of them I have a separate key

in your case you can just use the one

key by the way you can generate a new

key here as well uh so let's say you

copy that key to some secure place after

that you won't be able to access it here

so you have to delete and create a new

key okay so let's say you have that key

ready with you uh here then you can just

import OS module and then in OS module

you can create a environment variable

with that particular key your key will

be SK something okay uh in my case I

have stored that key in one python file

okay I don't want to share that key with

all of you that is a reason and that

python file look something like this you know

know

secretor key. Pi uh it will have my own

internal key I can have n number of keys

here and I'm just importing that

variable here and just setting it here

contrl enter so that thing is set now

let's go to the terminal and install

some modules so you're going to install

Lang chain module that's number one and

the second module you're going to

install is called pip install open AI

once you have installed those modules

let's now import few important things

from Lang chain uh we are going to

import the llm called open

AI now we are using open a because open

AI I know it cost some money but it is

the best one uh if you want some other

ones then just just hit Tab and it will

just show you hugging phas whatever the

the other type of whatever other llms

that it has available it will show you

all of that we are right now happy with

open Ai and then I will create my

openi model it has a variable called

temperature now what temperature means

is how creative you want your model to

be so if the temperature is set to let's

say zero it means it is very safe it is

not taking any bets but if it is one it

will take risk it might generate wrong

output but it is very creative at the

same time I tend to set it to

6.7 things like that and now in that llm

you can pass any question so let's say I

want to open a restaurant for Indian

food and I want some fancy name for it

I'm not able to come up with that

product name idea or restaurant name

idea and let's see what this guy does

and I typed in the same question in uh

here also see I want to open a

restaurant for Mexican food it told me

this uh if you say Indian

food it will tell you something else so

we are using essentially the same

concept here okay so here uh it says

okay Maharaja Palace Cuisine uh if you

say Italian food see the name sounds

real as if it's an Italian restaurant so

we have imported that open class which

created an llm and in the llm we are

just passing a simple text now I don't

want to keep on changing this same

string so I will now go ahead and create

something called a prompt template so

from Lang chain.

chain.

prompts you can import a prompt template

and in that prompt

template you can pass some variables

such as input variable what will be the

input variable by the way

variables it will be a

Cuisine and then the template that you

want looks something like this so I'll

just copy paste here and what we're

doing is just changing that Italian Etc

with that kuin

variable and this template is called

prompt template and let's say this is

for a restaurant name that's why I'm

saying name here uh and once that

template is created you can just say

prompt template name. format and in that

format you can

pass cuisine as let's say

Mexican and see I want to open a

restaurant for Mexican food if you say

Italian it will say Italian food this is

more like a python string formatting you

would be wondering why you don't use

Python string formatting well let me

just show you that using something

called chain so we going to use this

concept of chain in Lang chain and it is

one of the most important objects in in

in Lang chain framework you can figure

out from the name of the framework

itself uh and we are importing llm chain

and llm chain is essentially a very

simple object where you are saying my

llm is this whatever you created here

okay that is my llm and my prompt

is this prompt

template and this is my chain and in the

chain you can say chain. run let's say I

want to open a American

restaurant see the All American Grill

and bar so now here I don't have to pass

the whole sentence I want to open a

restaurant for is that I just pass the

cuisine the variable and it will just

work every time Mexican see so

internally it is calling openi API uh

and we made that connection via this

module here so if you are using hugging

phas then you'll have to do the hugging

facee setup and it will call hugging

phas okay here we are kind of paying

that cost but by the way uh when you

created that open API account you got $5

free credit so you should be okay $5 is

more than enough for initial learning

and exploration and after that if you

like it you can go ahead and pay money

so that is the simple chain that we got

here now let's look at something called

a sequential chain uh so let me just

explain the concept first so far what we

did is we had this chain for generating

restaurant name and it was generating it

but let's say for that restaurant you

want to generate a manual item food

manual items so you can have this second

component or a second chain where you

pass restaurant name as an input and it

should give you the menu items that you

should include in that restaurant so if

it is Indian restaurant it will say p

Mangus Etc if it is Mexican it will say

quadia burrito things like that this

thing is called Simple sequential chain

and let's uh code that up but just to

clarify the idea here you have one input

and one output and you can have

intermediate steps where the input of

the second step is the output of the

first step it is as easy as that so here

uh once again I'm generating everything

from scratch I have generated the name

chain the same way that we did before

this is the exit copy paste of the

previous code so nothing fancy here and

then we are going to create another

chain and I'm just copy pasting just to

save your time where the input is

restaurant name and we are saying

suggest me some food menu items for

restaurant that so it is like saying

this see uh you you are saying I want to

open a restaurant for Indian forood

suggest a fancy name for this only one name

name

please so let's say you talk to CH GPT

and CH GP generated this now you are

saying that generate food menu items for

seon spice and then return it as a comma separated

separated

list see this this is what you want you

you want to generate all this

list and you have now two chains which

we are going to do control enter and

execute this code and now from Lang

chain from Lang

chain dot chains it will show you all

kind of chains you're going to import a

simple sequential chain okay and that

simple sequential chain will

contain these individual chains that we

created and by the way the order matters

here so this is a restaurant name chain

and then you have a food item

chain and that's it and now you will say

chain. run let's say you want to

generate it for Indian food you're

getting a response here and you print

a response

sometimes it takes time so you have to

wait for a few seconds but it will

generate the manual items see Lamb kma

vegetable CH FR Sak paneer yummy huh

you're probably getting a water in your

mouth uh let's do Mexican for all those

Mexican food

lovers so while this chain looks good it

is generating those food menu items uh

I'm not getting the restaurant name as

such because if Mexican food it is

saying all these item but what is the

restaurant name that was that

intermediate step here that was the

intermediate step but in the output in

the simple sequential chain it gives you

just one output but I want the rest name

and the menual items both for that we

have to use a different chain called

sequential chain and this sequential

chain can have multiple input multiple

outputs so I can just say okay give me a

name for Indian restaurant which is

wagan and then in the output I can say

give me restaurant name and items both

as an

output all right so let's try something

like this here I think this whole code

kind of Remains the Same I'm just going

to add one or two extra things here

so see this is my first chain and the

extra thing that I have added is the

output key so the output of the first uh

chain is the restaurant name and the

second chain looks something like this

where the output key is manual items

okay and now let's create the simple

sequence and chain so I'm going to say

from Lang chain. chains import

sequential chain we already used simple

sequential chain this is a sequential

chain which is little kind of generic so

sequential chain will take what kind of

parameters will it take well first of

all it will say chains and these are the

have and then my input variables you can

specify all the input variables so input

variables I'm not including that veon

Etc let let's keep things simple all I

want is two output remaining things are

same okay so in the output variables I should

should

say that I want a restaurant name

and the manual items as my output

variable and let's call this a

chain and by the way when you run this

chain you can't just say Mexican because

you might have multiple input variables

that's why you need to give a dictionary

you will say Cuisine

Arabic uh run not supported okay run is

not yes so run is not supported you have

to just call it just like that no

function just chain and then bracket

just give the

argument see Hummers with P bread

falafal and the name of the restaurant

is the Arabian Bistro so it's giving in

fact it's giving input as well so it's

giving input and both the outputs

whatever code we have written so far we

will use that code and create a

streamlit based application for

restaurant name generator I'm in my C

code directory I have created this empty

folder called restorant name generator

you see there are no files here and I'm

going to Now launch Pam uh which is a

free uh python code editor and in this

you can select open and you can open

that particular folder so I will go to

my C code directory and in that I will

locate Resturant name generator hit okay

and it will create a I think

empty main.py file and I can just remove

the content here and I'm going to just

import of streamlet Library first now if

you don't know about streamlit it is a

library that allows data scientist to

build POC application proof of concept

application simple applications very

very quickly you don't have to use

frontend Frameworks such as reactjs Etc

this Library will allow you to do all of

this things very very fast so let me

just show you so you can um just create

a simple application with with a title

restaurant name generator and by the way

you have to do pip install pip install

streamlit before you start using it

otherwise you'll get an error so make

sure you have run that uh and this is

the simple app with one title now I can

go to terminal and I can just say

streamlet run Main pi and it is going to

open up an application in my browser see

simple application with this particular

title now I can create uh the Picker

where you can pick the cuisine and for

that I'll use the sidebar so in stream

LD there is something called sidebar

where you can create a select box and

give a name to that box so you will say

pick a Cuisine and give all the options

that you want in that particular drop

down so I will just put bunch of Cuisines

Cuisines

indan American Mexican and so

on and then here you will say if or let

me just show you how this looks so just

say hit contr s save and click on rerun

you can click go here and rerun or just

press R key and it will show you see you

get this kind of nice picker uh and if

someone picks any entry let's say

someone picks Mexican what's going to

happen is that call is going to return

that value in a variable which we will

store in this Cuisine okay so I I will

just store it in this particular

variable and you can say if Cuisine then

do something okay what do I want to do I

want to generate the restaurant fancy

restaurant name and list of manual items

here for that let me just write a

a dummy code here so I will just call

that function let's say get restaurant

name and items where you supply the

cuisine as input and it returns let's

say restorant

name it is always a good idea to write

this kind of stub function or empty

function so that you can check your

wiring and then you can write the actual

code in that function so let's say

my restorant name is Cur delight and my

then get those things as a

response and then from the response

let's say the restaurant name I can show

it here on the right hand side see here

somewhere below this header I want to

show it and I will use um maybe let's

say st.

header I will use

st. header as a UI

control so let's do this s3. [Music]

header and when I get the menual items

so let me get the menual items here so

the menu items are going to be this uh

Manu items here and whenever you have

comma separator string uh you can call

this split function and specify the

separator which is comma here and this

should return you a list and once you

get that list you can iterate over that

list and you can write those items here uh

um I can just say item and maybe I can

put some character just to indicate this

is an item you can also

uh write like kind of like a header

where you'll say okay these are manual items

items

menu items okay hit save this code is

ready you go back to your UI rerun and

see you're getting the restaurant name

in the menual items now when you change

this selection it's not going to change

because obviously we are returning the

hardcoded response and the next step is

to write that code which we wrote in our

Jupiter notebook and put that code here

okay and since I like to modularize my

code I'm going to create a new python

file let's call it Lang chain

Helper and in this file I will copy

that

module and you just simply call that

here see now let's focus on Lang chain

helper so what do we need to do here

well the same thing that we did in our

notebook so I'm just going to copy paste

some code from my notebook here okay we

don't need to go over it because we have

already uh return that

code I will also create create a file

for my secret key so I will call it

secret key and in that secret key file I

will place my open a secret key now I'm

not going to show you my key because of

course it's private thing but you will

type whatever key you got uh remember

you got $5 credit so you can do a lot of

things with $5 is more than enough uh so

put that uh that thing

here and then you are using that key see

you importing that variable from that

python file here

directly okay so my key is ready what

else do I need to do well again copy

paste business folks copy paste is a

boon for any programmer or a data

scientist so we just copy pasted the

code for the sequential chain that we

wrote in our notebook see here we create

a restaurant name chain here we create

manual items chain and we just return

this response folks this is this is so

straightforward okay and I have this

habit of

creating this function main function

just so that I can taste it so I will say

say

if name isore meain uh then

print generate Resturant name let's say

Italian okay and now what I'll do is

I'll pause the video and put my real

secret key here all right my secret key

I have placed it uh and now I can run

it and see what happens so we are

generating Italian food restaurant name

in the manual items perfect so the

restaurant name is La do V whatever and

manual items are margara Piza alfredo

lasagna and all that one thing I'm

noticing here is I see some extra slen

characters here so maybe we need to

remove them so the way to do that would

be let's go back to our streamlit code

and instead of just saying respons

restaurant name we can call this St

function that will remove the leading

and trailing wi spaces including those

slash and characters and you can use

that same thing here as

well before calling split so hit contrl

s save let's go back

rerun see Italian food this is my

restaurant name manual item you can

change it to Mexican change it to

whatever just play with it and folks The

Art of Getting skillful at coding is to

practice just by watching this video

you're not going to learn it so make

sure you're practicing while watching

this video all right our streamlit

application is ready as a next step we

are going to look into something called

agents which is a very powerful Concept

in Lang chain and by the way all the

code that we are writing check video

description we're going to give you all

of that code agents is a very powerful

Concept in Lang

chain what happens when you type this in

jat GPD when you say give me two flights

options from New York to Delhi on a

given date obviously it won't be able to

answer because it has knowledge till September

September

2021 but if you have chat GPD Plus

subscription there is this thing called

plugins and I have installed those

plugins especially the xedia plugin

xedia is a website which helps you find

tickets and when I give the same

question now with the plug-in enable

magically it will start working so it

will go to xedia plug-in try to pull the

information on the flights for a given

date and given source and destination

and then it will start typing those two

options see number one option is only I

think $500 ticket $518 ticket which is a

pretty good deal by the way I should I

think I should book it for my next India

and there is the second option and it

will give you a link where you can go

and book those tickets on xedia so what

exactly happened when we enable uh this

plugin let's try to understand that so

when you think about llm many people

think that it is just a knowledge engine

it has knowledge and it just Tred to

give answer based on that knowledge but

the knowledge is only limited till September

September

2021 the thing that we miss out is it

has a reasoning component so it is a

reasoning engine too and using that

reasoning engine it can figure out that

when someone types this kind of question

see when as a human when we look at this

questions what do we think let's say if

we have to go to Wikipedia or not

Wikipedia xedia and if you have to type

uh convert this question let's say if

your friend asks you this question and

let's say you are that reasoning engine

you go to xedia and in the source you

will put New York destination you will

put Delhi date you will put 1st August

how can you do that because you have

that reasoning engine in your brain

similarly llm has a reasoning engine

using which from that sentence it will

figure out source is this destination is

this that and it will call the xedia

plug-in and that will return the

response back let's look at some other

question when was Elon mus born what is

his age now in

2023 now maybe this can be answered by

the llm knowledge but let's say you are

asking some question which is related to

an event which happened in 2022 now this

guy doesn't have knowledge after

September 2021 but once again it has a

reasoning capability so it will say okay

in order to answer that question first I

need to find out when was Elon Musk born

for that it can use things like

Wikipedia so agents essentially do this

thing agents will have tools

and using that tool it will try to fetch

the answer silon mus was born in 1971 and then there could be another tool

and then there could be another tool which will tell you 2023 minus 1971 how

which will tell you 2023 minus 1971 how much is that so there there is a math

much is that so there there is a math tool uh that it can use to compute that

tool uh that it can use to compute that and it will in the end say Elon Musk is

and it will in the end say Elon Musk is 52y old so this is what agents are

52y old so this is what agents are agents will connect with external tools

agents will connect with external tools it will use llm reasoning capabilities

it will use llm reasoning capabilities to perform a given task let's look at a

to perform a given task let's look at a different question how much was US GDP

different question how much was US GDP in

in 2022 plus five I'm doing just like a

2022 plus five I'm doing just like a it's a it's a silly operation no one

it's a it's a silly operation no one cares uh but US GDP in 2022 llm doesn't

cares uh but US GDP in 2022 llm doesn't know because it knowledge is still 2021

know because it knowledge is still 2021 so it will go to Google it will find

so it will go to Google it will find that answer and then it will use math

that answer and then it will use math tool to do plus five all these tools

tool to do plus five all these tools like Google Search tool math tool and

like Google Search tool math tool and Wikipedia tool are available as part of

Wikipedia tool are available as part of Lang chain and you can configure your

Lang chain and you can configure your agent so your agent is nothing but using

agent so your agent is nothing but using all these tools and lm's reasoning

all these tools and lm's reasoning capability to to perform a given task

capability to to perform a given task that is your agent and this agent can be

that is your agent and this agent can be used in our jupyter notebook so that's

used in our jupyter notebook so that's what I'm going to show you next so let's

what I'm going to show you next so let's first import couple of uh important

first import couple of uh important modules and

modules and classes and once I have imported them I

classes and once I have imported them I will will create tools so I will say

will will create tools so I will say load

load tools and I will give list of tools now

tools and I will give list of tools now if you do Google search On Tools here so

if you do Google search On Tools here so let's say if you do Google search L

let's say if you do Google search L chain agent load tools you'll come here

chain agent load tools you'll come here you will see list of tools see I have

you will see list of tools see I have Wikipedia as a tool I have twio I have

Wikipedia as a tool I have twio I have all these tools that I can use so we're

all these tools that I can use so we're going to use Wikipedia tool here is

going to use Wikipedia tool here is called Wikipedia

called Wikipedia Wikipedia and the math tool is called

Wikipedia and the math tool is called llm math and here you need to provide

llm math and here you need to provide the llm variable is the one which we

the llm variable is the one which we created above somewhere here see this is

created above somewhere here see this is the variable okay so this thing is

the variable okay so this thing is called

called tools and then you can create an agent

tools and then you can create an agent using this initialize agent method okay

using this initialize agent method okay so initialization method will

so initialization method will take two tools it will take llm and it

take two tools it will take llm and it will take agent and in the agent I will

will take agent and in the agent I will give

give this zero now see hold on zero short

this zero now see hold on zero short react uh description react means uh

react uh description react means uh thought and action so when we are

thought and action so when we are reasoning we first have a thought then

reasoning we first have a thought then we figure out where to go and we take an

we figure out where to go and we take an action so it mimics that particular

action so it mimics that particular concept here I will call this an agent

concept here I will call this an agent and and then I will ask the question

and and then I will ask the question agent. run when was Elon mus born and

agent. run when was Elon mus born and what is his AG in

what is his AG in 2023 so let's see what this gives us see

2023 so let's see what this gives us see perfect it says 52 year old in 2023 if

perfect it says 52 year old in 2023 if you want to go uh step by step in the

you want to go uh step by step in the reasoning process you can say verbos is

reasoning process you can say verbos is equal to True uh and it will tell you by

equal to True uh and it will tell you by the way verbos is equal to True is the

the way verbos is equal to True is the variable that you can use here in any

variable that you can use here in any function to kind of figure out the

function to kind of figure out the internal steps that it is taking so here

internal steps that it is taking so here the first step when it Encounters this

the first step when it Encounters this question it knows actually that it has

question it knows actually that it has to go to Wikipedia to get the birth date

to go to Wikipedia to get the birth date of Elon Musk so it went to Elon Musk

of Elon Musk so it went to Elon Musk Wikipedia page and which will have this

Wikipedia page and which will have this particular date here and then um I think

particular date here and then um I think it uses the Matt tool sometimes I don't

it uses the Matt tool sometimes I don't know it should have used the mat tool

again okay I don't know why this is not working but previously I was seeing that

working but previously I was seeing that let me show you a previous snapshot that

let me show you a previous snapshot that I have um here it say okay it went to

I have um here it say okay it went to Wikipedia for Elon mus birth dat and

Wikipedia for Elon mus birth dat and then it use C action as a calculator so

then it use C action as a calculator so there it is using the llm math tool and

there it is using the llm math tool and it is just calculating the final answer

it is just calculating the final answer it is saying it is

it is saying it is 52 year old let's try a different option

52 year old let's try a different option so this time we are going to use U Sur

so this time we are going to use U Sur API so if you don't know about Sur API

API so if you don't know about Sur API it is Google search API whatever you do

it is Google search API whatever you do uh in Google and what results uh it

uh in Google and what results uh it gives you if you want to access those

gives you if you want to access those results programmatically you can use

results programmatically you can use this particular API you can log in using

this particular API you can log in using your Gmail account I have already logged

your Gmail account I have already logged in and when you go to dashboard

in and when you go to dashboard it will give you this API key so this is

it will give you this API key so this is similar to our open AI API key it will

similar to our open AI API key it will be a big big string I have stored that

be a big big string I have stored that API key into my private file that secret

API key into my private file that secret key file that I have and I'm going to

key file that I have and I'm going to initialize some environment variable so

initialize some environment variable so for Ser API you need to initialize this

for Ser API you need to initialize this variable and this is the key which I got

variable and this is the key which I got from there okay so you can just copy

from there okay so you can just copy paste this key if you want to keep

paste this key if you want to keep things simple I don't want to show that

things simple I don't want to show that key publicly here that's why I I have

key publicly here that's why I I have this thing here and once I have this

this thing here and once I have this thing the next steps are kind of similar

thing the next steps are kind of similar so I can just copy paste pretty much

so I can just copy paste pretty much everything here and I will just say Okay

everything here and I will just say Okay initialize the agent sir PPA and llm

initialize the agent sir PPA and llm math are the two tools I'm going to use

math are the two tools I'm going to use and in my agent I will say agent.

and in my agent I will say agent. run and what was the US GDP in

run and what was the US GDP in 2022 and plus + 5 okay and while it is

2022 and plus + 5 okay and while it is executing it let's do Google here so

executing it let's do Google here so when you do Google US GDP 2022 it will

when you do Google US GDP 2022 it will tell you

tell you 25.46 so this Sur API this API will do

25.46 so this Sur API this API will do Google search and it will tell you the

Google search and it will tell you the answer that it is this so check this so

answer that it is this so check this so first it is searching US GDP this and

first it is searching US GDP this and then it is adding F number to this and

then it is adding F number to this and it is giving you uh this particular

it is giving you uh this particular result we'll talk more about agents in

result we'll talk more about agents in our future videos uh one thing I noticed

our future videos uh one thing I noticed is agents are not perfect sometimes they

is agents are not perfect sometimes they give stupid answer this whole thing is

give stupid answer this whole thing is evolving so in the future it will get

evolving so in the future it will get better but for agent this is what we

better but for agent this is what we have and now we will talk about memory

have and now we will talk about memory when you look at any chatboard

when you look at any chatboard application such as chat GPT you will

application such as chat GPT you will notice that it remembers the past

notice that it remembers the past conversation exchange here I asked who

conversation exchange here I asked who won the first Cricket World Cup then a

won the first Cricket World Cup then a it's totally irrelevant conversation

it's totally irrelevant conversation what is 5 + 5 then I'm asking who was

what is 5 + 5 then I'm asking who was the captain of the winning team now see

the captain of the winning team now see here I did not say which match which

here I did not say which match which game cricket football Etc but it

game cricket football Etc but it remembers that I'm talking about cricket

remembers that I'm talking about cricket and it is giving me a relevant answer

and it is giving me a relevant answer same thing happens with human

same thing happens with human conversation we start a topic then we

conversation we start a topic then we keep on saying things but we remember

keep on saying things but we remember what the topic is about if you look at

what the topic is about if you look at the llm Chain by default These Chains do

the llm Chain by default These Chains do not have memory they are stateless and

not have memory they are stateless and if you look at the available methods

if you look at the available methods that this chain has you'll find that it

that this chain has you'll find that it has an element called memory here see

has an element called memory here see memory uh and if you check the memory so

memory uh and if you check the memory so here if you try to print the memory see

here if you try to print the memory see you don't get anything uh because the

you don't get anything uh because the object is set to none now if you want

object is set to none now if you want you can attach memory to it so for

you can attach memory to it so for memory you have to create an additional

memory you have to create an additional object and attach it so that it

object and attach it so that it remembers all these

remembers all these conversations this is useful especially

conversations this is useful especially if you're building a chat boat let's say

if you're building a chat boat let's say for your customer uh care Department uh

for your customer uh care Department uh in that many times they need to save the

in that many times they need to save the transcripts of those conversations for

transcripts of those conversations for legal and compliance reasons so here uh

legal and compliance reasons so here uh I'm going to

I'm going to import uh an object called

import uh an object called conversational buffer memory which is a

conversational buffer memory which is a very common type of memory in Lang CH

very common type of memory in Lang CH module and create an object of that

module and create an object of that class so I will just say this is a

class so I will just say this is a conversation buffer memory and then the

conversation buffer memory and then the same chain uh I will print here but I

same chain uh I will print here but I will just pass memory as an additional

argument okay and then I'll run the same chain one more time uh for a different

chain one more time uh for a different question

question now when you look at chain. memory see

now when you look at chain. memory see there is a memory attached to it we

there is a memory attached to it we explicitly attach this conversion memory

explicitly attach this conversion memory to it and if you look at the buffer and

to it and if you look at the buffer and if you just print it for nice alignment

if you just print it for nice alignment Etc see human then AI this human and AI

Etc see human then AI this human and AI this now this looks good you can save

this now this looks good you can save this into your database as a saved

this into your database as a saved transcript of your customer service

transcript of your customer service center conversation uh but one problem

center conversation uh but one problem with this particular object which is

with this particular object which is conversation buffer memories that it

conversation buffer memories that it will keep on growing endlessly so let's

will keep on growing endlessly so let's say you have 100 conversational

say you have 100 conversational exchanges and by that what I mean is one

exchanges and by that what I mean is one question answer pair so one question

question answer pair so one question answer pair is one conversational

answer pair is one conversational exchange this is second so in total this

exchange this is second so in total this is two conversional exchange so here if

is two conversional exchange so here if you have 100 conversational exchange

you have 100 conversational exchange what's going to happen is next time when

what's going to happen is next time when you ask a question to open AI when you

you ask a question to open AI when you say chain run it is going to send all

say chain run it is going to send all this past history to open Ai and open AI

this past history to open Ai and open AI charges you per token so this is one

charges you per token so this is one token second token third four and so on

token second token third four and so on for th000 token they charge like2

for th000 token they charge like2 something based on on the model so your

something based on on the model so your cost is going to go up so if you want to

cost is going to go up so if you want to save the cost and kind of do things in

save the cost and kind of do things in an optimized way uh you need to restrict

an optimized way uh you need to restrict this buffer size you can say just

this buffer size you can say just remember last five conversational

remember last five conversational exchanges okay and this thing can be

exchanges okay and this thing can be done using something called converation

done using something called converation chain so open AI Lang chain provides

chain so open AI Lang chain provides this conversation chain which is just

this conversation chain which is just very simple object so let me just create

very simple object so let me just create that

that conversation chain where you can just

conversation chain where you can just pass

pass llm is equal to open

llm is equal to open AI temperature is equal to

AI temperature is equal to 0.7 and I'll just call it

0.7 and I'll just call it convo and let's check the default prompt

convo and let's check the default prompt that is associated with it see default

that is associated with it see default promp is this uh let me just check the

promp is this uh let me just check the template let me just print the template

template let me just print the template associated with this this is the default

associated with this this is the default template that

template that comes and it says that the following is

comes and it says that the following is a friendly conversation between human

a friendly conversation between human and Ai and there is history and there is

and Ai and there is history and there is input so if you look at this uh

input so if you look at this uh conversation window here there is a

conversation window here there is a history this is a history and the next

history this is a history and the next question you're going to type is the

question you're going to type is the input so input history same way input

input so input history same way input history

history okay now when you ask bunch of questions

okay now when you ask bunch of questions through this so I'm just going to copy

through this so I'm just going to copy paste uh those questions here uh what is

paste uh those questions here uh what is 5 +

5 + 5 and then who was the captain of

5 and then who was the captain of the winning team now when you do con.

the winning team now when you do con. memory it won't be empty because by

memory it won't be empty because by default conversation chain has this

default conversation chain has this memory associated with it so by default

memory associated with it so by default this conversation chain object comes

this conversation chain object comes with inbuilt conversation buffer memory

with inbuilt conversation buffer memory and if you

and if you print

print the buffer you will see the entire

the buffer you will see the entire transcript of our conversation now while

transcript of our conversation now while this looks good once again think about

this looks good once again think about the open AI token cost if this keeps on

the open AI token cost if this keeps on building the buffer endlessly

building the buffer endlessly then you might have 5,000 tokens in one

then you might have 5,000 tokens in one conversation and when you make a next

conversation and when you make a next call like this it's going to actually

call like this it's going to actually send entire history entire conversation

send entire history entire conversation to open a and that will increase your

to open a and that will increase your bill on the API call to tackle that

bill on the API call to tackle that problem maybe what you can do is you can

problem maybe what you can do is you can say just send only last 10 or 20

say just send only last 10 or 20 conversational exchanges because that's

conversational exchanges because that's what I care about that might be enough

what I care about that might be enough based on the use case that you're

based on the use case that you're dealing with uh and for that there is an

dealing with uh and for that there is an object

object call from Lang chain. memory

call from Lang chain. memory import there is a conversational bu for

import there is a conversational bu for window memory you are restricting the

window memory you are restricting the window you are saying let's

window you are saying let's say my window and K is the parameter

say my window and K is the parameter you're saying just remember only last

you're saying just remember only last one conversational exchange which is one

one conversational exchange which is one question answer pair okay so let's try

question answer pair okay so let's try this out and let's see uh how this goes

this out and let's see uh how this goes so I'm going to copy paste some code

so I'm going to copy paste some code here created a same conversational chain

here created a same conversational chain object asking my first question asking

object asking my first question asking my second question now when I ask this

my second question now when I ask this second question here what is 5 + 5 it

second question here what is 5 + 5 it remembers the previous exchange but when

remembers the previous exchange but when I I asked the third question here it

I I asked the third question here it remembers only what is 5 + 5 it doesn't

remembers only what is 5 + 5 it doesn't remember this it's like a shortterm

remember this it's like a shortterm memory lost like momento or gz movie it

memory lost like momento or gz movie it just forgot what happened here so when I

just forgot what happened here so when I ask this question it will say I'm sorry

ask this question it will say I'm sorry I don't know because it doesn't know

I don't know because it doesn't know which game you're talking about which

which game you're talking about which particular match you're talking about

particular match you're talking about okay so here I know it's probably not

okay so here I know it's probably not the best example uh but the idea is I

the best example uh but the idea is I wanted to demonstrate this K parameter

wanted to demonstrate this K parameter here and Bas on this use case uh you

here and Bas on this use case uh you might see benefit in using

might see benefit in using conversational buffer window memory that

conversational buffer window memory that is all we had in terms of Lang chain

is all we had in terms of Lang chain fundamentals Now using these

fundamentals Now using these fundamentals which you just learned now

fundamentals which you just learned now we are going to build build two end to

we are going to build build two end to end projects first project is in finance

end projects first project is in finance domain second project is in retail

domain second project is in retail domain I previously published these two

domain I previously published these two projects on YouTube and I'm going to use

projects on YouTube and I'm going to use the same two videos because these are

the same two videos because these are very high quality end to endend projects

very high quality end to endend projects Below in the video description the link

Below in the video description the link for the code is also available so let's

for the code is also available so let's start with the first project

now today we will build an end to end llm project that covers a real life

llm project that covers a real life industry use case of equity research

industry use case of equity research analysis we will build a news research

analysis we will build a news research tool where you can give a bunch of news

tool where you can give a bunch of news article URLs and then when you ask a

article URLs and then when you ask a question it will retrieve the answer

question it will retrieve the answer based on those news articles in terms of

based on those news articles in terms of Technology we have used Lang chain open

Technology we have used Lang chain open Ai and streamlet to make this project

Ai and streamlet to make this project more interesting we have added some fun

more interesting we have added some fun storytelling as well so let's take a

storytelling as well so let's take a look at that story

look at that story first what if Rocky lived in the Chad

first what if Rocky lived in the Chad GPT era how would he invest all his

GPT era how would he invest all his money would he use Chad GPT to find best

money would he use Chad GPT to find best investments no way he would hire someone

investments no way he would hire someone for that Rocky B's recruitment team got

for that Rocky B's recruitment team got Peter pande the equity research analyst

Peter pande the equity research analyst Peter read lendy stock market articles

Peter read lendy stock market articles for his research but Rocky by did not

for his research but Rocky by did not like it Rocky said

Peter promised to create a chat boat like chat GPT for his investment Rocky B

like chat GPT for his investment Rocky B liked Peter's grit and he said fasten

liked Peter's grit and he said fasten your seat melt so get ready folks we are

your seat melt so get ready folks we are going to create a chatbot for Rocky by

going to create a chatbot for Rocky by perhaps the rocky

perhaps the rocky boat equity research analysts such as

boat equity research analysts such as Peter Pand in our Rocky by story do

Peter Pand in our Rocky by story do exist in real life let me give an

exist in real life let me give an example of a mutual fund you might know

example of a mutual fund you might know about about all these mutual funds where

about about all these mutual funds where you can invest your money so all these

you can invest your money so all these three yellow color people are the Common

three yellow color people are the Common People Like Us who are investing their

People Like Us who are investing their money in the mutual fund and mutual fund

money in the mutual fund and mutual fund will eventually invest in individual

will eventually invest in individual stocks now they need to pick right

stocks now they need to pick right amount of stocks for which they might

amount of stocks for which they might have a team of research analyst and the

have a team of research analyst and the job of this team is to provide a

job of this team is to provide a research on these companies let's say

research on these companies let's say tataa Motors Reliance how these

tataa Motors Reliance how these companies are doing or what are going to

companies are doing or what are going to be their profits next year how is their

be their profits next year how is their management is this a good stock to buy

management is this a good stock to buy you know they do all this research and

you know they do all this research and in this uh research team every

in this uh research team every individual person might have couple of

individual person might have couple of stock let's say Peter P is working for

stock let's say Peter P is working for HDFC mutual fund he might be looking at

HDFC mutual fund he might be looking at Tata Motors and Reliance and his job is

Tata Motors and Reliance and his job is to do research on these stocks okay so

to do research on these stocks okay so daily he comes to his job and he will

daily he comes to his job and he will read a bunch of Articles from money

read a bunch of Articles from money control Economic Times or maybe he has

control Economic Times or maybe he has access to premium product such as

access to premium product such as Bloomberg terminal and he will do all

Bloomberg terminal and he will do all his research based on the news articles

his research based on the news articles the earning report the quarterly

the earning report the quarterly Financial reports and so on now you can

Financial reports and so on now you can understand reading news articles from

understand reading news articles from these various websites is tedious task

these various websites is tedious task there are so many articles so much

there are so many articles so much information to consume see here I'm

information to consume see here I'm showing a pnl of tataa Motors why don't

showing a pnl of tataa Motors why don't we build a tool which looks like this

we build a tool which looks like this where you can put bunch of news article

where you can put bunch of news article on left hand side and I'm showing just

on left hand side and I'm showing just three you can have n number of Articles

three you can have n number of Articles and then when you ask a question okay so

and then when you ask a question okay so see I'm showing all these articles and

see I'm showing all these articles and these are like different articles on

these are like different articles on moneycontrol.com and when you post this

moneycontrol.com and when you post this question it will retrieve the answer

question it will retrieve the answer 6.55 to 8.1 lakh that was the answer and

6.55 to 8.1 lakh that was the answer and it pulled that from this particular

it pulled that from this particular article see and the article link is in

article see and the article link is in the below okay and you can also say okay

the below okay and you can also say okay give give me a summary I mean it's not

give give me a summary I mean it's not it doesn't have to be the number the

it doesn't have to be the number the answer doesn't have to be only one

answer doesn't have to be only one number it can also summarize the entire

number it can also summarize the entire article okay I know about all this

article okay I know about all this because when I was working with

because when I was working with Bloomberg for 12 years uh in Bloomberg

Bloomberg for 12 years uh in Bloomberg terminal we used to get research reports

terminal we used to get research reports from Jeff open Hammer all these

from Jeff open Hammer all these different companies and we would process

different companies and we would process that data and show that data on the

that data and show that data on the terminal all right so I hope you have

terminal all right so I hope you have some understanding of the industry use

some understanding of the industry use case this is a real industry use case

case this is a real industry use case this tool can be used by companies such

this tool can be used by companies such as Jeffrey's open Hammer right folks so

as Jeffrey's open Hammer right folks so this is not some toy toy project let's

this is not some toy toy project let's think about technical architecture now

think about technical architecture now we need to go back to basics in order to

we need to go back to basics in order to build the technical architecture

build the technical architecture whenever you talk about building any llm

whenever you talk about building any llm app the first thing that comes to your

app the first thing that comes to your mind is can I use chat GPD for this

mind is can I use chat GPD for this because chat GPD is free well actually

because chat GPD is free well actually you can you type your question uh in

you can you type your question uh in chat GB and you say answer this question

chat GB and you say answer this question based on below article do not make

based on below article do not make things up and then from that News

things up and then from that News website you copy paste the article here

website you copy paste the article here and what happens is jet GPT has a

and what happens is jet GPT has a capability see EPS is

capability see EPS is 8.35 it can pull the answer from that

8.35 it can pull the answer from that given text so the question is then why

given text so the question is then why do I need to build this tool why can't I

do I need to build this tool why can't I use chat GPT for this purpose apparently

use chat GPT for this purpose apparently there are three issues with this

there are three issues with this approach number one is copy pasting

approach number one is copy pasting articles is tedious equity research

articles is tedious equity research analyst are busy folks they don't have

analyst are busy folks they don't have time to you know go to website copy

time to you know go to website copy paste and then then uh get the answer

paste and then then uh get the answer they also need an agregate knowledge

they also need an agregate knowledge base because when they're asking

base because when they're asking question they don't know where the

question they don't know where the answer might be they might have this

answer might be they might have this question how many uh let's say Tata Nano

question how many uh let's say Tata Nano Tata motor sold in last quarter now the

Tata motor sold in last quarter now the answer might be in any article so how do

answer might be in any article so how do they know which article to pull and also

they know which article to pull and also some answers might be spread over three

some answers might be spread over three or four different articles

or four different articles okay so they need some kind of aggregate

okay so they need some kind of aggregate knowledge base and chat GPT can't give

knowledge base and chat GPT can't give that and third issue is chat GPT is word

that and third issue is chat GPT is word limit in chat GPD you can't copy paste a

limit in chat GPD you can't copy paste a huge article it has a limit on number of

huge article it has a limit on number of words you can supply so we need to build

words you can supply so we need to build some kind of tool where it can go to the

some kind of tool where it can go to the news website which our equity research

news website which our equity research annalist is putting Trust on and it

annalist is putting Trust on and it pulls all those articles into some kind

pulls all those articles into some kind of knowledge base so so here the

of knowledge base so so here the database that I'm showing here is some

database that I'm showing here is some kind of knowledge base and you can build

kind of knowledge base and you can build a chat GPT like chatboard which can pull

a chat GPT like chatboard which can pull data from that knowledge base now let's

data from that knowledge base now let's think about this particular article on

think about this particular article on Nvidia let's say I have this particular

Nvidia let's say I have this particular question okay what was nvidia's

question okay what was nvidia's operating margin compared to other

operating margin compared to other companies in semiconductor industry and

companies in semiconductor industry and give me the answer based on following

give me the answer based on following article and when I give that entire

article and when I give that entire article it will give me the answer in a

article it will give me the answer in a perfectly

perfectly fine expected manner but let's think

fine expected manner but let's think about it we are building a tool here we

about it we are building a tool here we are not using chat GPT so obviously

are not using chat GPT so obviously behind the scene we will be calling open

behind the scene we will be calling open AI API and whenever you call a open API

AI API and whenever you call a open API there is a cost associated with it per

there is a cost associated with it per thousand tokens or you know if you want

thousand tokens or you know if you want to think about tokens in a simple Layon

to think about tokens in a simple Layon language you can say okay word maybe

language you can say okay word maybe okay so amount of text that you supply

okay so amount of text that you supply to AI there will be cost associated with

to AI there will be cost associated with it so if you supply more tax there is

it so if you supply more tax there is more cost but read the question

more cost but read the question carefully folks this is very interesting

carefully folks this is very interesting the answer of this question is actually

the answer of this question is actually in the first paragraph we don't have to

in the first paragraph we don't have to supply second paragraph it is not

supply second paragraph it is not necessary because see

necessary because see 17.37% that's the answer and therefore

17.37% that's the answer and therefore you don't need to supply the second

you don't need to supply the second paragraph so is there a way we can

paragraph so is there a way we can smartly figure out that okay for this

smartly figure out that okay for this question I need to only give this much

question I need to only give this much chunk if you do that you will save a lot

chunk if you do that you will save a lot of money on your open AI Bill okay so

of money on your open AI Bill okay so just think about this article as two

just think about this article as two different paragraphs and based on the

different paragraphs and based on the question you can figure out which

question you can figure out which paragraph to supply in your prompt

paragraph to supply in your prompt thinking about this in a generic way you

thinking about this in a generic way you might have bunch of Articles let's say

might have bunch of Articles let's say on Nvidia and when you are asking a

on Nvidia and when you are asking a question what's the price of

question what's the price of h00

h00 GPU you want to figure out a relevant

GPU you want to figure out a relevant chunks so let's say the relevant chunks

chunks so let's say the relevant chunks where h00 GPU priz is mentioned are

where h00 GPU priz is mentioned are chunk 4 and chunk two in this case when

chunk 4 and chunk two in this case when you are building a prompt you don't need

you are building a prompt you don't need to give all the chunks one to n into

to give all the chunks one to n into your prompt you can just give chunk two

your prompt you can just give chunk two and chunk four chunk is just a block of

and chunk four chunk is just a block of text which is relevant where the answer

text which is relevant where the answer might be present for your given question

might be present for your given question okay and when you do that it will give

okay and when you do that it will give you the fin answer so the question now

you the fin answer so the question now comes is how do I find relevant chunks

comes is how do I find relevant chunks you can't use direct keyword search I

you can't use direct keyword search I can't say Okay h100 GPU is like contrl f

can't say Okay h100 GPU is like contrl f try to look into all the chunks and

try to look into all the chunks and wherever h100 gpus is pris give me those

wherever h100 gpus is pris give me those chunks okay h100 GPU is probably simple

chunks okay h100 GPU is probably simple example but look at this example when I

example but look at this example when I go to Google and say calories in apple

go to Google and say calories in apple versus revenue of Apple it knows that

versus revenue of Apple it knows that the first one is a fruit and the second

the first one is a fruit and the second one is a company how does it know that

one is a company how does it know that well it uses a concept of semantic

well it uses a concept of semantic search it looks at the context you know

search it looks at the context you know we as a human when I say calorie I I

we as a human when I say calorie I I kind of figure out it's a fruit and

kind of figure out it's a fruit and revenue is of Apple is company similarly

revenue is of Apple is company similarly in any NLP application if you're using

in any NLP application if you're using semantic search it can figure out based

semantic search it can figure out based on the context what is the meaning of

on the context what is the meaning of this word apple is it a fruit or is it a

this word apple is it a fruit or is it a company and we use something called word

company and we use something called word embedding and or sentence embedding and

embedding and or sentence embedding and a vector database for this I have given

a vector database for this I have given a separate video for this so because

a separate video for this so because this explanation might take more time so

this explanation might take more time so I don't want to uh spend time explaining

I don't want to uh spend time explaining all these Concepts if you know this

all these Concepts if you know this already then fine you can move ahead

already then fine you can move ahead otherwise the link of this video is in

otherwise the link of this video is in description below so you can pause the

description below so you can pause the video and watch that one first but let's

video and watch that one first but let's say just for

say just for Simplicity um embeddings and Vector

Simplicity um embeddings and Vector databases allow you to figure out so

databases allow you to figure out so embeddings will allow you to figure

embeddings will allow you to figure figure out a relevant chunk and Vector

figure out a relevant chunk and Vector databases will allow you to kind of

databases will allow you to kind of perform a faster search on that database

perform a faster search on that database and then you can give your prompt to

and then you can give your prompt to open a and get the answer so now

open a and get the answer so now thinking about the technical

thinking about the technical architecture the first component will be

architecture the first component will be some kind of document loader where you

some kind of document loader where you can get all your news articles and load

can get all your news articles and load it into that object and then the second

it into that object and then the second one will be splitting that into multiple

one will be splitting that into multiple chunks and then storing that into a

chunks and then storing that into a vector database so that

vector database so that when you have a given question let's say

when you have a given question let's say I have a given question what is the

I have a given question what is the price of h00 GPU you can go to Vector

price of h00 GPU you can go to Vector database and retrieve relo and chunk

database and retrieve relo and chunk which is chunk two and chunk four okay

which is chunk two and chunk four okay so Vector database allows you to perform

so Vector database allows you to perform that faster search because this Vector

that faster search because this Vector database might have millions and

database might have millions and millions of Records okay so the way

millions of Records okay so the way Vector databases are designed they help

Vector databases are designed they help you do a faster search and once you have

you do a faster search and once you have that question you give it to your prompt

that question you give it to your prompt and you get your your answer in terms of

and you get your your answer in terms of Lang chain uh we will be using all these

Lang chain uh we will be using all these classes that I have shown in the orange

classes that I have shown in the orange color uh and we will be building our

color uh and we will be building our application now for a short term phase

application now for a short term phase one we are building this particular tool

one we are building this particular tool in streamlet when you are doing this

in streamlet when you are doing this project in the industry let's say you

project in the industry let's say you working as a data scientist for Jeff

working as a data scientist for Jeff you're not going to build the whole

you're not going to build the whole project in one go you will first build

project in one go you will first build POC proof of concept where you will

POC proof of concept where you will build this kind of tool in streamlet you

build this kind of tool in streamlet you know left side number of articles URLs

know left side number of articles URLs and the right side you put a question

and the right side you put a question gives answer so that way you get a

gives answer so that way you get a confidence that this approach works and

confidence that this approach works and once we are happy with the result of

once we are happy with the result of this tool for long term the architecture

this tool for long term the architecture may look something like this you need to

may look something like this you need to First build a database inje system it

First build a database inje system it will have two different system one first

will have two different system one first one is database inje system where you go

one is database inje system where you go to your

to your trustworthy news article website you

trustworthy news article website you write a web scrapper and you have it

write a web scrapper and you have it implemented either in Native python or

implemented either in Native python or tool like bright data and then you run

tool like bright data and then you run that on some kind of crown job schedule

that on some kind of crown job schedule let's say this Crown job runs every 2

let's say this Crown job runs every 2 hours or every 1 hour it will pull the

hours or every 1 hour it will pull the data and it will convert that text into

data and it will convert that text into embedding vectors using open a or llama

embedding vectors using open a or llama or B whatever embedding uh you want to

or B whatever embedding uh you want to use then that goes into Vector database

use then that goes into Vector database and for Vector database we can use pine

and for Vector database we can use pine code mil chroma these are like popular

code mil chroma these are like popular ones today we can use any of these

ones today we can use any of these Solutions and that will be your database

Solutions and that will be your database injection system the second component

injection system the second component will be chatboard where in react or some

will be chatboard where in react or some kind of UI framework you will build

kind of UI framework you will build chatboard similar to chat GPD a person

chatboard similar to chat GPD a person types in a question question gets

types in a question question gets converted into embedding once again open

converted into embedding once again open AI or llama whatever embedding you want

AI or llama whatever embedding you want to use and then from Vector database you

to use and then from Vector database you pull relevant chunk so this green and

pull relevant chunk so this green and orange are relevant chunks which matches

orange are relevant chunks which matches with the question what was Q3 2023 EPS

with the question what was Q3 2023 EPS for T Motors and then based on those

for T Motors and then based on those chunks you form your prompt you give it

chunks you form your prompt you give it to your llm and the answer uh you put it

to your llm and the answer uh you put it back into your uh UI for the chat board

back into your uh UI for the chat board so again this is the overall

so again this is the overall architecture that you'll be working with

architecture that you'll be working with uh remember that when you are working in

uh remember that when you are working in Industry as a NLP engineer or data

Industry as a NLP engineer or data scientist you first do brainstorming

scientist you first do brainstorming with your team you come up with this

with your team you come up with this kind of nice technical architecture and

kind of nice technical architecture and then you start uh doing some coding okay

then you start uh doing some coding okay you don't want to go into a wrong

you don't want to go into a wrong direction all right so in the next

direction all right so in the next section we'll be talking about tax

section we'll be talking about tax loaders before we talk about tax loaders

loaders before we talk about tax loaders make sure you have watched this Lang

make sure you have watched this Lang chain crash course so you have a basic

chain crash course so you have a basic overview of Lang chain Library assuming

overview of Lang chain Library assuming that you have watched this course the

that you have watched this course the next step for you will be to install

next step for you will be to install Lang Chain by running pip install Lang

Lang Chain by running pip install Lang chain this is the command that you run

chain this is the command that you run um so let me just show you so you will

um so let me just show you so you will run pip install Lang chain to install

run pip install Lang chain to install Lang chain library and once that is

Lang chain library and once that is installed you can launch a jupyter

installed you can launch a jupyter notebook and there you can import the

notebook and there you can import the class so you can say from Lang chain.

class so you can say from Lang chain. document loaders

document loaders import text loader okay so I have

import text loader okay so I have imported a simple tax loader there are

imported a simple tax loader there are multiple types of loaders that Lang

multiple types of loaders that Lang chain offers and we will look into them

chain offers and we will look into them uh one by one so text loader allows you

uh one by one so text loader allows you to load data from a text file here I

to load data from a text file here I have a nvidia's news in one text file

have a nvidia's news in one text file which is called nvda newor 1.txt so I

which is called nvda newor 1.txt so I will just load that here and and kind of

will just load that here and and kind of show you how this thing works so

show you how this thing works so nvda news I think 1.txt

nvda news I think 1.txt and I will call it loader and then you

and I will call it loader and then you will do loader. load and then it returns

will do loader. load and then it returns a data object and if you print a data

a data object and if you print a data object looks something like this it has

object looks something like this it has all the news content inside it and if

all the news content inside it and if you look at it it is actually an array

you look at it it is actually an array okay and array zeroth element is that

okay and array zeroth element is that document which has page content as one

document which has page content as one one of its element so let me just show

one of its element so let me just show you here in a separate set

you here in a separate set so here if you do page

so here if you do page content uh page

content uh page content see it shows you the entire text

content see it shows you the entire text content that it has the other element

content that it has the other element that this class has is metadata so

that this class has is metadata so metadata is your the name of your text

metadata is your the name of your text file now if you Google let's say Lang

file now if you Google let's say Lang chain text loader so let me do Lang

chain text loader so let me do Lang chain actually Lang chain documentation

chain actually Lang chain documentation and if you go to the documentation here

and if you go to the documentation here you will find let's see you will find

you will find let's see you will find the documentation for various loader

the documentation for various loader classes that you have now it is sometime

classes that you have now it is sometime hard to navigate this uh and it it can

hard to navigate this uh and it it can change based on at what time you're

change based on at what time you're looking at this but see document loaders

looking at this but see document loaders python guide will give you all these

python guide will give you all these loaders so this is a text loader the

loaders so this is a text loader the second one they have is a CSV loader so

second one they have is a CSV loader so let me talk about CSV loader real quick

let me talk about CSV loader real quick so I'll just copy paste this thing here

so I'll just copy paste this thing here and I have a CSV loader class here CSV

and I have a CSV loader class here CSV loader obviously I need to pass a CSV

loader obviously I need to pass a CSV file and luckily I have this movies. CSV

file and luckily I have this movies. CSV file which has around nine records you

file which has around nine records you can see nine records where you have

can see nine records where you have movie title industry the revenue and so

movie title industry the revenue and so on I will provide all these files in

on I will provide all these files in video description below so make sure you

video description below so make sure you check it code all the files everything

check it code all the files everything will be provided to you

will be provided to you here I will say movies. CSV that will

here I will say movies. CSV that will give you a loader and loader. loadad

give you a loader and loader. loadad that will give you data

that will give you data okay loader. load and if you look at

okay loader. load and if you look at length of data you will find nine

length of data you will find nine records because in the CSV file we had

records because in the CSV file we had nine records and if you look at the very

nine records and if you look at the very first record once again you're getting

first record once again you're getting that document class okay you can say

that document class okay you can say type here and you will see the the

type here and you will see the the document class from Lang chain library

document class from Lang chain library and that class has two elements right

and that class has two elements right page content which is a page content uh

page content which is a page content uh so let's see what is what is inside page

so let's see what is what is inside page content so page content is entire record

content so page content is entire record okay entire record in your CSV file

okay entire record in your CSV file which has movie ID title and so on

which has movie ID title and so on separated by sln and if you look at

separated by sln and if you look at metadata it has the m. CSV now one may

metadata it has the m. CSV now one may argue that this metadata I want to have

argue that this metadata I want to have maybe movies name or movie ID as kind of

maybe movies name or movie ID as kind of the metadata and metadata is something

the metadata and metadata is something that we will be using in our project so

that we will be using in our project so when remember in the preview we saw when

when remember in the preview we saw when you type in any questions and it will

you type in any questions and it will not provide you the answer but it will

not provide you the answer but it will provide you a source link so how does it

provide you a source link so how does it uh reference back to the source link the

uh reference back to the source link the answer is it is through this matter data

answer is it is through this matter data and and we'll look into it but for now

and and we'll look into it but for now here I will say that my source column

here I will say that my source column here so my source column is let's say

here so my source column is let's say movie ID or title okay so what are the

movie ID or title okay so what are the columns I have so movie ID it has title

columns I have so movie ID it has title maybe I can keep it title as a source

maybe I can keep it title as a source column so here when I keep it as title

column so here when I keep it as title what you will see is in the metad data

what you will see is in the metad data you get kgf2 for the first eord kgf2

you get kgf2 for the first eord kgf2 right see for the first eord kgf2 second

right see for the first eord kgf2 second one Doctor Strange Dr Strange and so on

one Doctor Strange Dr Strange and so on so you can view all the records so here

so you can view all the records so here is one record and then here is the the

is one record and then here is the the metadata see metadata metadata and so on

metadata see metadata metadata and so on okay now let's talk about the

okay now let's talk about the unstructured URL loader because that is

unstructured URL loader because that is something we'll be using in our project

something we'll be using in our project in our project we will be going through

in our project we will be going through some news article let's say this

some news article let's say this particular article on HDFC Bank

particular article on HDFC Bank and we want to load the text content

and we want to load the text content from this article directly into jupyter

from this article directly into jupyter notebook using some rade Lang chain

notebook using some rade Lang chain class and that class is unstructured URL

class and that class is unstructured URL loader so this is how you import it and

loader so this is how you import it and by the way you need to install couple of

by the way you need to install couple of libraries before that and the way you

libraries before that and the way you install those libraries is by providing

install those libraries is by providing this particular command okay in the

this particular command okay in the notebook if you run this it will install

notebook if you run this it will install all these libraries other thing is you

all these libraries other thing is you can copy paste this into your git bash

can copy paste this into your git bash or Windows command shell and install it

or Windows command shell and install it okay it's just a usual thing folks uh

okay it's just a usual thing folks uh you should know how to install librar so

you should know how to install librar so these are the all libraries that you

these are the all libraries that you need to use and by the way unstructured

need to use and by the way unstructured URL loader uses a library called

URL loader uses a library called unstructured so if you do python

unstructured so if you do python unstructured

unstructured Library this is the library that it uses

Library this is the library that it uses underneath to kind of go to that website

underneath to kind of go to that website look into the Dom object the HTML

look into the Dom object the HTML structure and pull all the information

structure and pull all the information so let's create that class and the

so let's create that class and the arguments is all the URLs that you want

arguments is all the URLs that you want to supply and the two URL that I want to

to supply and the two URL that I want to supply are basically I'll just Supply to

supply are basically I'll just Supply to different articles here

different articles here okay and that will be your

okay and that will be your loader so my loader is this and our

loader so my loader is this and our usual step is loader do load you get

usual step is loader do load you get data as a return value and when you do

data as a return value and when you do length of data it will take some time

length of data it will take some time but it will return two because you have

but it will return two because you have two articles and you look at the first

article and see again same thing page content and let's see what do we have in

content and let's see what do we have in metadata see in metadata you have the

metadata see in metadata you have the source URL link so metadata is the

source URL link so metadata is the source URL link and we will be using

source URL link and we will be using this in our news research tool after we

this in our news research tool after we load our documents through loader

load our documents through loader classes in Lang chain next step is to do

classes in Lang chain next step is to do Tex splitting and we have character TX

Tex splitting and we have character TX splitter recursive tax splitter these

splitter recursive tax splitter these kind of classes the reason we do this is

kind of classes the reason we do this is because any and llm will have a token

because any and llm will have a token size limit that's why we need to reduce

size limit that's why we need to reduce the big block of tax into smaller chunks

the big block of tax into smaller chunks so that it is within this particular

so that it is within this particular limit and what may happen because of

limit and what may happen because of these classes that we're using in from

these classes that we're using in from Lang chain is individual chunks that

Lang chain is individual chunks that that we get after we do split might not

that we get after we do split might not be very big or it might not be closer to

be very big or it might not be closer to the Token limit which is

the Token limit which is 4,097 let's say the first chunk is 3,000

4,097 let's say the first chunk is 3,000 second chunk is 1,000 it would make

second chunk is 1,000 it would make sense if I merge these two so that it is

sense if I merge these two so that it is closer to the Limit and it kind of work

closer to the Limit and it kind of work more efficiently so we have to perform

more efficiently so we have to perform merge step so first you have a huge

merge step so first you have a huge block of tax you divide things into

block of tax you divide things into smaller chunks chks and then you can

smaller chunks chks and then you can perform merge so that each individual

perform merge so that each individual chunks that you're getting which is in

chunks that you're getting which is in this blue green orange color they're

this blue green orange color they're closer to that limit which which could

closer to that limit which which could be 497 2,000 depends on the llm that

be 497 2,000 depends on the llm that you're using we also want to do some

you're using we also want to do some overlapping so that when you are reading

overlapping so that when you are reading this orange paragraph you need some

this orange paragraph you need some context from the blue paragraph which is

context from the blue paragraph which is which is you know one step ahead so you

which is you know one step ahead so you see part of this blue paragraph goes

see part of this blue paragraph goes into orange also so that is chunk

into orange also so that is chunk overlapping similarly part of this

overlapping similarly part of this orange paragraph goes into this green

orange paragraph goes into this green chunk also you see this this orange

chunk also you see this this orange thing at the top that is called

thing at the top that is called overlapping the chunks all of this can

overlapping the chunks all of this can be done using some simple apis in Lang

be done using some simple apis in Lang chain so let's look at it here I have

chain so let's look at it here I have taken the Wikipedia article of

taken the Wikipedia article of Interstellar movie you might have seen

Interstellar movie you might have seen that science fiction movie

that science fiction movie and we are going to perform Tex

and we are going to perform Tex splitting on this one now when you think

splitting on this one now when you think about text splitting let's say I have a

about text splitting let's say I have a limit of 200 tokens that I'm using in my

limit of 200 tokens that I'm using in my llm how do you split this TX so that

llm how do you split this TX so that each chunk is is of size 200 well the

each chunk is is of size 200 well the obvious thing that comes to your mind is

obvious thing that comes to your mind is why don't we use Simple slice operator

why don't we use Simple slice operator in Python and kind of divide things that

in Python and kind of divide things that way but when I do that you will notice

way but when I do that you will notice that it might cut off the words in

that it might cut off the words in between see

between see M what is M mad demon demon right so

M what is M mad demon demon right so here it is kind of cutting that off and

here it is kind of cutting that off and doesn't look that great you at least

doesn't look that great you at least want to have a complete word so this

want to have a complete word so this simple slice operator is not going to

simple slice operator is not going to work then you'll say oh what's a big

work then you'll say oh what's a big deal I might write a for Loop this kind

deal I might write a for Loop this kind of for Loop where each of the chunks so

of for Loop where each of the chunks so if you look at these chunks each of

if you look at these chunks each of these chunks is less than 200 okay you

these chunks is less than 200 okay you can do that uh but again this writing

can do that uh but again this writing this kind of for Loops is little tedious

this kind of for Loops is little tedious and it can have other issues as well

and it can have other issues as well Lang chain provides a very simple API so

Lang chain provides a very simple API so that you don't have to do all this work

that you don't have to do all this work manually and that API is given through

manually and that API is given through various TX splitter classes so let's try

various TX splitter classes so let's try the first simple one so from Lang chain.

the first simple one so from Lang chain. text spitter

text spitter you can

you can import let's say character text splitter

import let's say character text splitter okay and when you have this character

okay and when you have this character text splitter class it will take

text splitter class it will take separator as an argument separator as in

separator as an argument separator as in through which character you want to

through which character you want to separate things out here let's say we

separate things out here let's say we want to separate things out with new

want to separate things out with new line character which is sln which means

line character which is sln which means each line can be one chunk or multiple

each line can be one chunk or multiple of those lines because it will do more

of those lines because it will do more steps also and my chunk size is let's

steps also and my chunk size is let's say 200 as such it is like 4,000

say 200 as such it is like 4,000 something but just for Simplicity we are

something but just for Simplicity we are saying 200 and chunk overlap I will I

saying 200 and chunk overlap I will I will keep it zero just to keep things

will keep it zero just to keep things simple and this is my uh

simple and this is my uh splitter okay and that splitter I can

splitter okay and that splitter I can use to split the text so I will say

use to split the text so I will say split text and here is my text and what

split text and here is my text and what I'm getting as a return is the chunks

I'm getting as a return is the chunks and let's check the chunks length Okay

and let's check the chunks length Okay chunks length is nine and if you look at

chunks length is nine and if you look at it see it's like one this is first chunk

it see it's like one this is first chunk second third and so

second third and so on and if you look at the individual

on and if you look at the individual chunks length let's say for Chunk in

chunks length let's say for Chunk in Chunk we will print the length of the

Chunk we will print the length of the chunk you'll notice that while most of

chunk you'll notice that while most of the chunks are less than 200 there are

the chunks are less than 200 there are some which is more than 200 see the

some which is more than 200 see the chunk size for these chunks is more than

chunk size for these chunks is more than 200 so why did that happen well let's

200 so why did that happen well let's look at some of those chunks so the last

look at some of those chunks so the last two are kind of big so if you look at

two are kind of big so if you look at the last two ones you will notice that

the last two ones you will notice that see this one is pretty big in that

see this one is pretty big in that entire chunk there is no slen it's like

entire chunk there is no slen it's like a one big or multiple sentences without

a one big or multiple sentences without slash n so

slash n so obviously uh it can exed the size maybe

obviously uh it can exed the size maybe you can change this slash into dot so

you can change this slash into dot so that you know after every sentence ends

that you know after every sentence ends yeah it will just take that as one

yeah it will just take that as one fragment but what if you have a bunch of

fragment but what if you have a bunch of questions you might not have dot well

questions you might not have dot well you can use space but it no matter what

you can use space but it no matter what you use you will always face one or the

you use you will always face one or the other issue for some cases character

other issue for some cases character text splitter will work but we need

text splitter will work but we need something little more advanced with

something little more advanced with which can split things on multiple

which can split things on multiple separators and maybe we can have some

separators and maybe we can have some rules that first divide things by 2 sln

rules that first divide things by 2 sln then one sln then dot then space things

then one sln then dot then space things like that and this is something that you

like that and this is something that you could do using character text splitter

could do using character text splitter which is of recursive nature and it is

which is of recursive nature and it is called recursive character text splitter

called recursive character text splitter okay so in recursive character text

okay so in recursive character text splitter

splitter the arguments are kind of going to be

the arguments are kind of going to be the same except that you can provide a

the same except that you can provide a list of separators see here you provide

list of separators see here you provide just one separator here you can provide

just one separator here you can provide list of separator so I can say okay my

list of separator so I can say okay my first separator is Ln second one is

first separator is Ln second one is sln third one could be dot or less

sln third one could be dot or less space and chunk size and chunk overlap

space and chunk size and chunk overlap I'll just keep it

I'll just keep it same and I will call this

same and I will call this our splitter okay and we will

our splitter okay and we will split our text so let's say you are

split our text so let's say you are splitting your text you store it in

splitting your text you store it in chunks and you look at the length of the

chunks and you look at the length of the chunks okay so total 13 and I will also

chunks okay so total 13 and I will also print the size of the individual chunks

print the size of the individual chunks here and you can see that majority of

here and you can see that majority of them are now or actually all of them are

them are now or actually all of them are less than 200 so let's understand how it

less than 200 so let's understand how it works under the hood so what this will

works under the hood so what this will do internally see you'll just make one

do internally see you'll just make one API call but internally what it is doing

API call but internally what it is doing is this so first it will try the first

is this so first it will try the first separator which is this correct and it

separator which is this correct and it will split the things so now we had a

will split the things so now we had a big text blob it split that into three

big text blob it split that into three chunks see 1 2 3 okay that is what it

chunks see 1 2 3 okay that is what it did and once again if if you want to

did and once again if if you want to print the size I can print it

print the size I can print it here see all three of them are more than

here see all three of them are more than 200 size because we are splitting using

200 size because we are splitting using sln so then uh let's think about the

sln so then uh let's think about the first chunk itself so first

first chunk itself so first chunk is this I'll just call it first

chunk is this I'll just call it first split just to kind of

split just to kind of keep things simple so this is my first

keep things simple so this is my first split and if you look at the length of

split and if you look at the length of the first split it is

the first split it is 439 when Lang chain detects that see

439 when Lang chain detects that see first it will separate things out using

first it will separate things out using slash and then it will check individual

slash and then it will check individual chunks so individual chunks are three

chunks so individual chunks are three okay this is 439 so when it says that

okay this is 439 so when it says that this is more than the specified size

this is more than the specified size which is 200 it will further split that

which is 200 it will further split that using the second oper which is sln so

using the second oper which is sln so here what it will do internally is it

here what it will do internally is it will say split this using sln and it

will say split this using sln and it will get again three

will get again three more splits okay and if you look at the

more splits okay and if you look at the size of these three splits let's look at

size of these three splits let's look at the first split so first split if you

the first split so first split if you look at it or let me just say second

look at it or let me just say second split is equal to

split is equal to this and the second split if you look at

this and the second split if you look at the

the length of the first first one it is less

length of the first first one it is less than 200 so it is fine second one is 121

than 200 so it is fine second one is 121 it is fine but if you look at this third

it is fine but if you look at this third one it is

one it is 210 so it is definitely more than 200 so

210 so it is definitely more than 200 so what it will do is it will then go look

what it will do is it will then go look at the third separator which is space

at the third separator which is space and then it will separate things out and

and then it will separate things out and once it separate things out then it will

once it separate things out then it will again merge we looked at the merge step

again merge we looked at the merge step if you remember

if you remember from here see it will also do merging

from here see it will also do merging because each individual chunk size might

because each individual chunk size might be too small so it will just kind of

be too small so it will just kind of merge everything so obviously when you

merge everything so obviously when you split things apart using pace so let's

split things apart using pace so let's do that so when I split things apart

do that so when I split things apart using SP space obviously each chunk will

using SP space obviously each chunk will be very less right this chunk is only

be very less right this chunk is only three character this is two character

three character this is two character one character we can't have chunk which

one character we can't have chunk which is that small so it will then merge

is that small so it will then merge those things as I have shown you in the

those things as I have shown you in the slide it will merge things such that it

slide it will merge things such that it is kind of optimized so what it will do

is kind of optimized so what it will do is this see let me just print for Chunk

is this see let me just print for Chunk in second

chunk first two it will keep it as it is the third one it will divide so what is

the third one it will divide so what is our size 200 so it will say it will

our size 200 so it will say it will create one Chun using size 200 the other

create one Chun using size 200 the other one using 10 roughly it could be plus or

one using 10 roughly it could be plus or minus B because you need to keep your

minus B because you need to keep your word intact you can't break the word

word intact you can't break the word apart so that that is a reason when we

apart so that that is a reason when we are using this API see you got 199 and

are using this API see you got 199 and 10 see 210 I mean there is a space

10 see 210 I mean there is a space character so one character is here and

character so one character is here and there but you see 105 120 so 106 121

there but you see 105 120 so 106 121 okay so those two are same and then

okay so those two are same and then there was 19 9 and 10 so it split this

there was 19 9 and 10 so it split this 210 into 2 one you know 199 and 10 so it

210 into 2 one you know 199 and 10 so it is like 209 in one space character so

is like 209 in one space character so total

total 210 so that is how it is doing the

210 so that is how it is doing the splitting I hope you got some

splitting I hope you got some understanding of recursive text splitter

understanding of recursive text splitter this is something that we will be using

this is something that we will be using in our news research tool now that you

in our news research tool now that you have some understanding of text splitter

have some understanding of text splitter let's look into the next step which is

let's look into the next step which is Vector databases now for Vector

Vector databases now for Vector databases there are lot of options Pine

databases there are lot of options Pine con milver chroma but we are not going

con milver chroma but we are not going to use them in our project we will use

to use them in our project we will use something called phase uh which is a

something called phase uh which is a kind of like a lightweight in memory

kind of like a lightweight in memory Vector database type of thing phase

Vector database type of thing phase stands for Facebook AI similarity search

stands for Facebook AI similarity search it is actually a library that allows you

it is actually a library that allows you to do faster search into set of vectors

to do faster search into set of vectors that you have uh but it can be also used

that you have uh but it can be also used as a vector database if your project is

as a vector database if your project is smaller and if your requirements are

smaller and if your requirements are kind of lightweight requirements okay so

kind of lightweight requirements okay so you can read about phase but I will give

you can read about phase but I will give you a very quick understanding so what

you a very quick understanding so what will happen is once you have set of

will happen is once you have set of chunks that you have created using

chunks that you have created using recursive tax splitter for our project

recursive tax splitter for our project we will convert those into embeddings

we will convert those into embeddings see embedding conversion is a must step

see embedding conversion is a must step we can use either open ey embedding

we can use either open ey embedding hugging face embeddings word to there

hugging face embeddings word to there are so many embeddings out there in the

are so many embeddings out there in the world based on a problem statement we

world based on a problem statement we can use any of them and then we will

can use any of them and then we will store them into a vector database so if

store them into a vector database so if we were using pine con or Milas we would

we were using pine con or Milas we would have stored these into that proper

have stored these into that proper Vector databases but for our project we

Vector databases but for our project we are just going to store them into phase

are just going to store them into phase Index this is like an inmemory structure

Index this is like an inmemory structure which can uh do a faster search on your

which can uh do a faster search on your vectors so let's say if you have an

vectors so let's say if you have an input question called what is the price

input question called what is the price of h00 GPU we will again first convert

of h00 GPU we will again first convert that into a Vector using the same

that into a Vector using the same embedding technique and then we will

embedding technique and then we will give it to phase index and what phase

give it to phase index and what phase index will do is let's say these vectors

index will do is let's say these vectors that we have created out of these chunks

that we have created out of these chunks are let's say I have 1 million Vector

are let's say I have 1 million Vector phase will efficiently perform a search

phase will efficiently perform a search for a given vector and it will tell you

for a given vector and it will tell you out of those 1 million how many of those

out of those 1 million how many of those vectors are similar okay and this I have

vectors are similar okay and this I have explained in detail in this particular

explained in detail in this particular video which I'm going to provide a link

video which I'm going to provide a link in a video descript deson so please

in a video descript deson so please watch it if you haven't seen that but

watch it if you haven't seen that but let me just quickly show you how phase

let me just quickly show you how phase Works uh by using some uh simple code so

Works uh by using some uh simple code so you have to first uh install these two

you have to first uh install these two libraries okay and once these libraries

libraries okay and once these libraries are installed I'm going to import pandas

are installed I'm going to import pandas um and I will just increase the pandas

um and I will just increase the pandas data frame column withd I I'll explain

data frame column withd I I'll explain why I'm doing that later on but I'm

why I'm doing that later on but I'm loading a CSV file which has like eight

loading a CSV file which has like eight records you know so different eight text

records you know so different eight text and their category they are either

and their category they are either Health fashion these type of categories

Health fashion these type of categories so I'm loading that uh into a data frame

so I'm loading that uh into a data frame here and my data frame shape looks

here and my data frame shape looks something like this and my data frame

something like this and my data frame looks something like this now I will

looks something like this now I will convert this text okay these eight

convert this text okay these eight sentences into vectors and the way I'm

sentences into vectors and the way I'm going to do it is using the sentence

going to do it is using the sentence Transformer Library so I will say

Transformer Library so I will say from sentence

from sentence Transformer import sentence Transformer

Transformer import sentence Transformer okay and for this sentence Transformer

okay and for this sentence Transformer I'm going to use a model uh or a

I'm going to use a model uh or a Transformer entity called this L all

Transformer entity called this L all mpet and if you want to read more about

mpet and if you want to read more about this you can just say a hugging face

this you can just say a hugging face sentence Transformer uh you can do

sentence Transformer uh you can do reading and you can kind of figure out

reading and you can kind of figure out how it works but in simple language all

how it works but in simple language all they're doing is is converting this text

they're doing is is converting this text into a vector so how does it do that so

into a vector so how does it do that so I will just say encoder is this and then

I will just say encoder is this and then encoder do encode okay and that encode

encoder do encode okay and that encode expects an array of text and array of

expects an array of text and array of text is DF do text see DF do text when

text is DF do text see DF do text when you give it it gives you that that

you give it it gives you that that entire column and that you can store in

entire column and that you can store in the vectors and let me just print the

the vectors and let me just print the vector shape

vector shape it might take some time by the way if

it might take some time by the way if you're running for the first time just

you're running for the first time just have some patience you can see that

have some patience you can see that there are total eight vectors if you see

there are total eight vectors if you see this it's like a two- dimensional array

this it's like a two- dimensional array okay so first one is this second one is

okay so first one is this second one is this third one is this and so on um so

this third one is this and so on um so meditation and yoga can improve mental

meditation and yoga can improve mental health the vector corresponding to that

health the vector corresponding to that is this one and you see dot dot dot so

is this one and you see dot dot dot so the total size of one vector is 768

the total size of one vector is 768 which I'm going to store it in a in a

which I'm going to store it in a in a variable so see so vectors if you do

variable so see so vectors if you do Vector do shape and one so this is the

Vector do shape and one so this is the size of each vector and we have total

size of each vector and we have total such eight vectors I will store this

such eight vectors I will store this into a parameter called Dimension

into a parameter called Dimension because that Dimension I'm going to use

because that Dimension I'm going to use later on and then I will import the

later on and then I will import the phase Library okay so once phase is

phase Library okay so once phase is imported I will call index flat L2

imported I will call index flat L2 so this is uh the index that uses a ukan

so this is uh the index that uses a ukan distance or L2 distance okay so that's

distance or L2 distance okay so that's the index that we are using once again

the index that we are using once again if you want to know more detail you can

if you want to know more detail you can go to either phase. or go to their

go to either phase. or go to their GitHub page you can do more reading but

GitHub page you can do more reading but as such it is very simple it is just

as such it is very simple it is just creating similar to database index it is

creating similar to database index it is just creating an index that allows you

just creating an index that allows you to do faster search later on okay so

to do faster search later on okay so here I can supply dim actually and that

here I can supply dim actually and that will be my index so I'm creating an

will be my index so I'm creating an index of size 768 here and when you

index of size 768 here and when you print that index you'll see nothing it

print that index you'll see nothing it just created some empty index now in

just created some empty index now in that empty index I can add some vectors

that empty index I can add some vectors correct so when I added it now my Vector

correct so when I added it now my Vector is kind of ready so going back to to

is kind of ready so going back to to that picture again we have total eight

that picture again we have total eight vectors right total eight vectors and

vectors right total eight vectors and the size of each Vector the size of this

the size of each Vector the size of this particular areay 768 we are just adding

particular areay 768 we are just adding that into phase index now phase index

that into phase index now phase index will internally construct some kind of

will internally construct some kind of data structure what the data structure

data structure what the data structure is that is out of the scope of this

is that is out of the scope of this video but some data structure that

video but some data structure that allows you to do fast similarity search

allows you to do fast similarity search so for a given Vector we can find okay

so for a given Vector we can find okay out of these eight Vector which two

out of these eight Vector which two vectors or which three vectors are

vectors or which three vectors are similar Okay so so here now uh once I

similar Okay so so here now uh once I have index I can do index. search and

have index I can do index. search and here I want to supply search Vector but

here I want to supply search Vector but what is the search Vector well we don't

what is the search Vector well we don't have search Vector ready so I will give

have search Vector ready so I will give some input search query and the search

some input search query and the search query is let's say I want to buy a polo

query is let's say I want to buy a polo dessert okay and that search query we

dessert okay and that search query we have to of course encoder do

have to of course encoder do encode if you look at that P picture we

encode if you look at that P picture we need to kind of convert this into a

need to kind of convert this into a vector so that is what I'm doing here

vector so that is what I'm doing here when I say encode encode my search query

when I say encode encode my search query and I get the vector back if you look at

and I get the vector back if you look at the vector shape see it's a simple array

the vector shape see it's a simple array 768 but this search Vector expects two

768 but this search Vector expects two dimensional array so I'm going to use

dimensional array so I'm going to use numai and convert this Vector into two

numai and convert this Vector into two dimensional array so it's simple folks

dimensional array so it's simple folks what I did is something similar to you

what I did is something similar to you know I I put let's say Vector into an

know I I put let's say Vector into an empty array outside so it was one

empty array outside so it was one dimensional array 768 now it become two

dimensional array 768 now it become two dimensional and if you print that see

dimensional and if you print that see Vector was simple one dimensional array

Vector was simple one dimensional array but if you look at this S V now it is

but if you look at this S V now it is same Vector but see there are just there

same Vector but see there are just there is one outer array outside outside that

is one outer array outside outside that and the reason is this particular

and the reason is this particular function expects that format okay uh

function expects that format okay uh okay some argument is missing how many

okay some argument is missing how many uh similar vectors do you want this is

uh similar vectors do you want this is like K nearest neighbor so let's say I

like K nearest neighbor so let's say I want two similar vectors and it gives me

want two similar vectors and it gives me these two vectors okay so it returns a

these two vectors okay so it returns a tle and the first one is the distances

tle and the first one is the distances the second one is the index in our

the second one is the index in our original data frame so in our original

original data frame so in our original data frame locate the rows which has

data frame locate the rows which has index three and two okay so which one is

index three and two okay so which one is three and two

three and two so three and two both are articles

so three and two both are articles related to fashion and you can see that

related to fashion and you can see that I want to buy a polo t-shirt is kind of

I want to buy a polo t-shirt is kind of similar to Fashion okay so that's what

similar to Fashion okay so that's what it did uh I can store them in distances

it did uh I can store them in distances and I I can store them in

and I I can store them in tupple and if you look at

tupple and if you look at I right three and two you can locate

I right three and two you can locate that uh using DF do log so if you do DF

that uh using DF do log so if you do DF do location and do 3 and two it will

do location and do 3 and two it will give you those articles and if you want

give you those articles and if you want to be kind of like in a programmatic way

to be kind of like in a programmatic way you can do it similar thing so I of 0 is

you can do it similar thing so I of 0 is 3 and two only okay so that's what it is

3 and two only okay so that's what it is giving you now one thing you might have

giving you now one thing you might have notic is in this text okay let me print

notic is in this text okay let me print the text here search

the text here search query so in this

query so in this text I want to buy Polo t-t see the exit

text I want to buy Polo t-t see the exit word is not present here see so this is

word is not present here see so this is not like a keyword search this is a

not like a keyword search this is a semantic search which means it is

semantic search which means it is capturing the context or the meaning of

capturing the context or the meaning of this sentence and giving you the similar

this sentence and giving you the similar sentence here if you look at our entire

sentence here if you look at our entire data

data frame see it has meditation and yoga and

frame see it has meditation and yoga and all that but it it give you only fashion

all that but it it give you only fashion related articles you can change a

related articles you can change a sentence so let me say that an apple a

sentence so let me say that an apple a day keeps the doctor away

day keeps the doctor away okay and I will go here and I will say

okay and I will go here and I will say run all the cells below this particular

run all the cells below this particular cells uh okay I think there is some

cells uh okay I think there is some problem okay let me run all the cells

problem okay let me run all the cells here you notice that when I say an apple

here you notice that when I say an apple a day keeps doctor away the search

a day keeps doctor away the search results were related to health once

results were related to health once again you will see that in the similar

again you will see that in the similar vectors which are these two the exact

vectors which are these two the exact words are not matching see an apple they

words are not matching see an apple they keeps the doctor away it is not present

keeps the doctor away it is not present in any of these sentences but if as a

in any of these sentences but if as a human you have to think which are the

human you have to think which are the two similar sentences for an apple a

two similar sentences for an apple a doctor keeps a doctor away out of all

doctor keeps a doctor away out of all these eight you would probably give

these eight you would probably give these two because we're talking about

these two because we're talking about health here and it gave me the health

health here and it gave me the health related articles okay uh you can try

related articles okay uh you can try something else which is looking for a

something else which is looking for a place so let's say looking for a place

place so let's say looking for a place to visit holidays okay and go to sell

to visit holidays okay and go to sell run all and you notice once again it

run all and you notice once again it give you two articles which are related

give you two articles which are related to travel so I'm going to provide all

to travel so I'm going to provide all these individual notebooks by the way in

these individual notebooks by the way in the video description below so just

the video description below so just check it just think about it uh and you

check it just think about it uh and you will get an idea so uh this was just a

will get an idea so uh this was just a quick demo of phase Library uh this is

quick demo of phase Library uh this is something that we will be using in our

something that we will be using in our news research tool project let us now

news research tool project let us now discuss the retrieval QA with sources

discuss the retrieval QA with sources chain once you have stored all your

chain once you have stored all your vectors in a vector database the next

vectors in a vector database the next component will be asking a question and

component will be asking a question and retrieving all the relevant chunks let's

retrieving all the relevant chunks let's say my relevant chunk is chunk number

say my relevant chunk is chunk number two and chunk number four using these

two and chunk number four using these chunks I will form an llm prompt The

chunks I will form an llm prompt The Prompt will be something like I have s00

Prompt will be something like I have s00 GPU what is the price of it give me the

GPU what is the price of it give me the answer based on the below text which is

answer based on the below text which is Chun 2 and Chun 4 and then llm will give

Chun 2 and Chun 4 and then llm will give you the answer the benefit of this is

you the answer the benefit of this is you can tackle the problem of the token

you can tackle the problem of the token limit and also save some Bill on your

limit and also save some Bill on your open API calls so now when you think

open API calls so now when you think about combining this chunk see here what

about combining this chunk see here what we did is whatever chunk you get you put

we did is whatever chunk you get you put all of them in one single prompt now as

all of them in one single prompt now as a result here I got chunk number two and

a result here I got chunk number two and four but actually I I might get more

four but actually I I might get more chunks let's say I got four chunks and

chunks let's say I got four chunks and combined size of this chunks is more

combined size of this chunks is more than the llm token limit so then then

than the llm token limit so then then that is the drawback of this method this

that is the drawback of this method this method is called by the way stuff method

method is called by the way stuff method so you're getting all the similar

so you're getting all the similar looking chunks from your vector database

looking chunks from your vector database then you're forming a prompt and when

then you're forming a prompt and when you give all these chunks together it

you give all these chunks together it may cross that L llm limit token limit

may cross that L llm limit token limit so that is a drawback of this method if

so that is a drawback of this method if you know that that chunks will not cross

you know that that chunks will not cross llm token limit then is fine the stuff

llm token limit then is fine the stuff method will still work it is the

method will still work it is the simplest of all but the better method

simplest of all but the better method especially when the combined chunk size

especially when the combined chunk size is bigger is map reduce in map reduce

is bigger is map reduce in map reduce method what we do is we make individual

method what we do is we make individual llm call per chunk so let's say I have

llm call per chunk so let's say I have these four similar chunks so for my

these four similar chunks so for my question let's say what is my h00 GPU

question let's say what is my h00 GPU price give me the answer based on chunk

price give me the answer based on chunk one then again I'll ask a question what

one then again I'll ask a question what is the price of h00 GPU size give me is

is the price of h00 GPU size give me is the answer based on chunk two chunk

the answer based on chunk two chunk three chunk four so you are asking four

three chunk four so you are asking four different questions and each time you

different questions and each time you pass different context which is chunk

pass different context which is chunk one 2 3 and four obviously you get four

one 2 3 and four obviously you get four answers here there is a type of by the

answers here there is a type of by the way this is fc1 fc2 fc3 and 4 and so on

way this is fc1 fc2 fc3 and 4 and so on so this is like a filter chunk or or an

so this is like a filter chunk or or an individual answer and then you make a

individual answer and then you make a fifth call and you combine all these

fifth call and you combine all these answers together and you say to your llm

answers together and you say to your llm that out of all these four answers just

that out of all these four answers just give me the best answer or just combine

give me the best answer or just combine all these answers together and give me

all these answers together and give me the final answer this way you will teer

the final answer this way you will teer that that uh token size limit but the

that that uh token size limit but the drawback here is you are making five llm

drawback here is you are making five llm calls see 1 2 3 4 and five in the

calls see 1 2 3 4 and five in the previous method you made just one call

previous method you made just one call so that is always a drawback so now

so that is always a drawback so now let's do some coding and try to

let's do some coding and try to understand this thing uh in a little

understand this thing uh in a little deeper fashion I have imported all these

deeper fashion I have imported all these necessary libraries you need to give

necessary libraries you need to give your open API key here

your open API key here okay if you create a free account they

okay if you create a free account they give you like $5 free credit so you can

give you like $5 free credit so you can use that and after this account is

use that and after this account is created you get that key okay we have

created you get that key okay we have covered all of that in our Lang chain

covered all of that in our Lang chain crash course so here I'll will create an

crash course so here I'll will create an llm object and then I will use un

llm object and then I will use un unstructured URL loader this is

unstructured URL loader this is something folks we have already looked

something folks we have already looked into it so I will not go in the in the

into it so I will not go in the in the detail here I'm just loading two

detail here I'm just loading two different articles so first article is

different articles so first article is on Tesla okay Wall Street Rises Tesla

on Tesla okay Wall Street Rises Tesla whatever and the second article is on

whatever and the second article is on tataa motors which is India based

tataa motors which is India based automotive company and here we are

automotive company and here we are loading both of these articles into our

loading both of these articles into our data loader and then we are using the

data loader and then we are using the same recursive text splitter which we

same recursive text splitter which we have looked into before and creating

have looked into before and creating this individual chunks so we created

this individual chunks so we created total 41 individual chunks I mean you

total 41 individual chunks I mean you can check the individual chunks here see

can check the individual chunks here see this is the page content okay 0 1 2 3 4

this is the page content okay 0 1 2 3 4 whatever you can you can check it out

whatever you can you can check it out right like

right like nine and then once that is done you will

nine and then once that is done you will create open API embedding so how do you

create open API embedding so how do you do that well see we created this this

do that well see we created this this particular class here so I will say

particular class here so I will say embeddings is equal to open a embeddings

embeddings is equal to open a embeddings and then uh you will use the phase class

and then uh you will use the phase class that we imported here and call a method

that we imported here and call a method called from documents now see from

called from documents now see from documents method in Phase will accept

documents method in Phase will accept the documents or the chunks that you

the documents or the chunks that you created here and then it will take

created here and then it will take another parameter which will be your

another parameter which will be your embedding so here I'm using open API

embedding so here I'm using open API embedding so I'm giving that you can use

embedding so I'm giving that you can use hugging phas or any other embedding too

hugging phas or any other embedding too and the

and the resulting index will be this Vector

resulting index will be this Vector index which we have and once you have

index which we have and once you have that Vector index so I'm not running

that Vector index so I'm not running this code by the way because I already

this code by the way because I already run this code and I have saved this

run this code and I have saved this Vector index into a file okay and the

Vector index into a file okay and the way I saved it is I can use this code

way I saved it is I can use this code see you have Vector index and then

see you have Vector index and then Vector index you can write it to a file

Vector index you can write it to a file called Vector index uh pickle file so

called Vector index uh pickle file so previously before shooting this tutorial

previously before shooting this tutorial I already ran this code I saved Vector

I already ran this code I saved Vector index into a file and let me show you

index into a file and let me show you that file on my disk see Vector index. P

that file on my disk see Vector index. P pickle file so this is sort of like a

pickle file so this is sort of like a vector database it is a vector database

vector database it is a vector database which I have saved as a pickle file on

which I have saved as a pickle file on my disk and now I can load that pickle

my disk and now I can load that pickle file by running this code see now I can

file by running this code see now I can run this code and my Vector index is

run this code and my Vector index is loaded into a memory so this Vector

loaded into a memory so this Vector index now have knowledge of both of

index now have knowledge of both of these articles now let's create a

these articles now let's create a retrieval QA with sources chain uh class

retrieval QA with sources chain uh class object the first argument that it

object the first argument that it expects is llm so wherever we created

expects is llm so wherever we created our llm which is here you know what just

our llm which is here you know what just to okay I'll just put that in here and

to okay I'll just put that in here and the

the another argument is retriever so

retriever is basically how you're planning to retrieve that Vector

planning to retrieve that Vector database so Vector database you can give

database so Vector database you can give as an argument here and you can just say

as an argument here and you can just say as

as retriever okay this is just a syntax

retriever okay this is just a syntax that we're using and this thing we going

that we're using and this thing we going to call

to call chain okay here I need to save from

llm and you will see that it has created this chain and in the chain you will see

this chain and in the chain you will see interesting prompt which might get you

interesting prompt which might get you very exited which is this use the

very exited which is this use the following portion of long document to

following portion of long document to see if any text is relevant to the

see if any text is relevant to the answer of this question uh this is a

answer of this question uh this is a thing we have discussed before now we

thing we have discussed before now we are going to ask some sample question so

are going to ask some sample question so my simple question is what is the price

my simple question is what is the price of Thiago icng and if you look at the

of Thiago icng and if you look at the article uh the Thiago icng price Thiago

article uh the Thiago icng price Thiago icng price is between this and that this

icng price is between this and that this so you you would want this particular

so you you would want this particular answer uh from our code okay so that is

answer uh from our code okay so that is my expectation I will I will enable the

my expectation I will I will enable the debugging in Lang Lang chain so that I

debugging in Lang Lang chain so that I can see what going underneath and then

can see what going underneath and then you will say chain in the chain the

you will say chain in the chain the question that you want to ask is this

question that you want to ask is this and this is the argument that you give

and this is the argument that you give okay so now let's run this code and see

okay so now let's run this code and see what happens so this will kind of show

what happens so this will kind of show you some internal debugging so when I

you some internal debugging so when I said what is the price of Thiago icng

said what is the price of Thiago icng first it retrieved the similar looking

first it retrieved the similar looking chunks from my Vector database so there

chunks from my Vector database so there are total four chunks 1 2 3 and four

are total four chunks 1 2 3 and four okay and the question is same for all of

okay and the question is same for all of these so see the chunk is the company

these so see the chunk is the company also said it introduced Diago see my

also said it introduced Diago see my actual answer is read in uh first chunk

actual answer is read in uh first chunk itself but it will still retrieve four

itself but it will still retrieve four similar looking most similar looking

similar looking most similar looking chunks now see this is also similar

chunks now see this is also similar looking but my answer is not there

looking but my answer is not there exactly this is also similar looking and

exactly this is also similar looking and this is also similar looking okay all

this is also similar looking okay all this code is available in the GitHub by

this code is available in the GitHub by the way you can run it and you can just

the way you can run it and you can just go through it yourself so that is Step

go through it yourself so that is Step number one which is chunk one and then

number one which is chunk one and then you combine a question and you ask four

you combine a question and you ask four question individual questions to your

question individual questions to your llm so all this questions will go to the

llm so all this questions will go to the llm so my first prompt my first prompt

llm so my first prompt my first prompt is use the following portion of a long

is use the following portion of a long document to see if any of the text is

document to see if any of the text is relevant to the answer of this

relevant to the answer of this questions right return any relevant text

questions right return any relevant text ver ver team and this is the paragraph

ver ver team and this is the paragraph that you

that you give okay similarly this is the the

give okay similarly this is the the second question this is the third and

second question this is the third and this is the fourth question so these are

this is the fourth question so these are the fourth question so you passed all

the fourth question so you passed all those four questions to llm so there are

those four questions to llm so there are four llm calls as a result you will get

four llm calls as a result you will get four answers correct so let's see the

four answers correct so let's see the four answers now so the first answer is

four answers now so the first answer is this the Thiago I price so you know this

this the Thiago I price so you know this is the final answer but still it will

is the final answer but still it will give generate four answers so this is

give generate four answers so this is the first answer this is the second

the first answer this is the second answer this this answer doesn't look

answer this this answer doesn't look good but it will still give you some

good but it will still give you some answer this is the third answer and this

answer this is the third answer and this is the fourth answer okay so fc1 fc2 3 4

is the fourth answer okay so fc1 fc2 3 4 these are the four answers that it give

these are the four answers that it give now you will combine those four answers

now you will combine those four answers in a summary chunk and give one more

in a summary chunk and give one more call to your llm so where is that

call to your llm so where is that question see these are the four four

question see these are the four four answers okay 1 2 3 4 now here is the

answers okay 1 2 3 4 now here is the combined summary what is the price of I

combined summary what is the price of I Ang CNG so the summary is is content See

Ang CNG so the summary is is content See four answers are combined so G given the

four answers are combined so G given the exit prompt is given the following

exit prompt is given the following whatever summaries uh give me the final

whatever summaries uh give me the final answer okay and then in the end it will

answer okay and then in the end it will give you a final answer and the final

give you a final answer and the final answer is the tago price is between this

answer is the tago price is between this and this is the source reference okay so

and this is the source reference okay so once again it is using this map and

once again it is using this map and reduce method I have given this notebook

reduce method I have given this notebook with a lot of comments so maybe you can

with a lot of comments so maybe you can read it and get an idea but overall uh

read it and get an idea but overall uh now what happened is we looked at all

now what happened is we looked at all the individual components we started

the individual components we started with taex loader splitter phase we

with taex loader splitter phase we talked about retrieval just now now we

talked about retrieval just now now we going to combine all these pieces

going to combine all these pieces together and build our final project our

together and build our final project our final project is not going to take much

final project is not going to take much time because all the individual pieces

time because all the individual pieces are ready we have also understood the

are ready we have also understood the fundamentals behind the scenes see see

fundamentals behind the scenes see see learning only API is not important you

learning only API is not important you need to understand the fundamentals how

need to understand the fundamentals how it works underneath then only you can

it works underneath then only you can become a great data scientist or anal

become a great data scientist or anal engineer so far we have cleared all our

engineer so far we have cleared all our fundamentals our individual pieces are

fundamentals our individual pieces are ready we need to just assemble them and

ready we need to just assemble them and it's not going to take much time I'm

it's not going to take much time I'm super excited to move on to the next

super excited to move on to the next section now and this is the argument

section now and this is the argument that you give okay so now let's run this

that you give okay so now let's run this code and see what happens now we are

code and see what happens now we are going to use all the components that we

going to use all the components that we have built so far we will assemble them

have built so far we will assemble them and build our entire project I have this

and build our entire project I have this this directory where I'll will be

this directory where I'll will be keeping my project in the notebooks

keeping my project in the notebooks folder you will see all the individual

folder you will see all the individual notebooks and here outside I will do

notebooks and here outside I will do main project coding right now you're

main project coding right now you're seeing two files requirement. txt and

seeing two files requirement. txt and EnV file if you look at requirement. txt

EnV file if you look at requirement. txt it has all the libraries which we are

it has all the libraries which we are using so you can run pip install minus r

using so you can run pip install minus r requirement. txt to install all the

requirement. txt to install all the libraries in the EnV file I have open AP

libraries in the EnV file I have open AP key so you will put your own key here

key so you will put your own key here which you have got by you know getting

which you have got by you know getting that $5 free credit on open or or if you

that $5 free credit on open or or if you have a paid account just use that now

have a paid account just use that now let me create a main.py file here so I

let me create a main.py file here so I will say

will say main.py and here I'm going to import all

main.py and here I'm going to import all the necessary libraries so since the

the necessary libraries so since the list is pretty long uh I'm just going to

list is pretty long uh I'm just going to copy paste it from somewhere and the

copy paste it from somewhere and the very first thing I will do is uh load

very first thing I will do is uh load that open API key now so far we were

that open API key now so far we were using os. environment variable but

using os. environment variable but that's a little clumsy way of doing it

that's a little clumsy way of doing it there is a better way which is using the

there is a better way which is using the dot EnV python module so if you look at

dot EnV python module so if you look at this particular python

this particular python module uh you have to install it first

module uh you have to install it first of all and then you can just use these

of all and then you can just use these two lines and I I'll tell you what what

two lines and I I'll tell you what what they do actually so here it will take

they do actually so here it will take all the Environ M variables from EnV

all the Environ M variables from EnV file and it will load them into the

file and it will load them into the environment okay so if you look at uh

environment okay so if you look at uh let's say EnV file EnV file is this so

let's say EnV file EnV file is this so it will set domain environment variable

it will set domain environment variable as this root URL as this and so on okay

as this root URL as this and so on okay so it's just one call and within one

so it's just one call and within one call we have loaded our API key and the

call we have loaded our API key and the API key is not visible in the code so

API key is not visible in the code so it's kind of like a cleaner way it's

it's kind of like a cleaner way it's standard practice that people are using

standard practice that people are using nowadays uh for loading this now let me

nowadays uh for loading this now let me just uh write some basic UI so I will

just uh write some basic UI so I will say st. title okay our title of the

say st. title okay our title of the application is news research tool so

application is news research tool so that's what I'll say here and then I

that's what I'll say here and then I will create the sidebar so if if I to

will create the sidebar so if if I to show you the UI of the uh our tool it

show you the UI of the uh our tool it will have on the left hand side it will

will have on the left hand side it will have this kind of three URLs URL one Ur

have this kind of three URLs URL one Ur So This is URL 1 this is URL 2 this is

So This is URL 1 this is URL 2 this is URL 3 I will have process button below

URL 3 I will have process button below it and the title that I'm putting is

it and the title that I'm putting is here it will be visible here and here I

here it will be visible here and here I will have on the right hand side the

will have on the right hand side the actual question and below that there

actual question and below that there will be an answer okay so in the sidebar

will be an answer okay so in the sidebar I'm going to say Side by sidebar

title news article URLs let's say that's my sidebar uh title and I will have

my sidebar uh title and I will have three URLs so I will say for I in

three URLs so I will say for I in range let's say

range let's say three okay and

three okay and St do

St do sidebar dot text input so I will take

sidebar dot text input so I will take the URL from text input and I will use a

the URL from text input and I will use a format string here I will say

format string here I will say URL I + 1 because it starts with zero so

URL I + 1 because it starts with zero so I'll say URL 1 URL 2 and URL 3 okay and

I'll say URL 1 URL 2 and URL 3 okay and below that there will be a button so the

below that there will be a button so the button will be

button will be called process URLs maybe okay and when

called process URLs maybe okay and when you click that button that value I will

you click that button that value I will get in process URL

clicked and when I say process URL clicked here when I press that button

clicked here when I press that button the flow will go here okay let's just

the flow will go here okay let's just run whatever we have so far bar and see

run whatever we have so far bar and see what happens okay so the way to run this

what happens okay so the way to run this is stimate run

is stimate run main.py uh when you run

that it will show you this kind of uh UI see three URLs there's a news research

see three URLs there's a news research tool here we'll add the question box but

tool here we'll add the question box but here you can add those URLs and when you

here you can add those URLs and when you hit this button process URL button

hit this button process URL button actually the flow will go to to this if

actually the flow will go to to this if condition okay so let's write some code

condition okay so let's write some code inside this if condition so

inside this if condition so here we will use the unstructured URL

here we will use the unstructured URL loaders and we will get all the URLs now

loaders and we will get all the URLs now here you need the URLs right so where

here you need the URLs right so where are my URL so let's say I create this

are my URL so let's say I create this URL variable and those

URL variable and those URL you

URL you can you can build that array here so

can you can build that array here so whenever you enter that URL it will come

whenever you enter that URL it will come to this array and that array is passed

to this array and that array is passed here okay and that is called loader and

here okay and that is called loader and you do loader. load okay this should be

you do loader. load okay this should be very apparent to you so this first step

very apparent to you so this first step is loading data after you load the data

is loading data after you load the data you all know next step is splitting the

you all know next step is splitting the data and for that you will use recurso

data and for that you will use recurso recursive character text splitter which

recursive character text splitter which has two arguments I guess separate and

has two arguments I guess separate and chunk size I'm not worrying about chunk

chunk size I'm not worrying about chunk chunk overlap that much although you can

chunk overlap that much although you can play with

it so this is my teex splitter and I will say from text splitter split my

will say from text splitter split my documents and my documents will go here

documents and my documents will go here and I will get the individual chunks

and I will get the individual chunks okay and those chunks I will then do

okay and those chunks I will then do embedding I'm going a little fast

embedding I'm going a little fast because we have already covered all

because we have already covered all these things before create

these things before create embeddings and okay let's say create

embeddings and okay let's say create embeddings okay so these are the

embeddings okay so these are the embeddings I have and then you will say

embeddings I have and then you will say phase from documents and the first

phase from documents and the first argument is documents and the second

argument is documents and the second argument is embeddings okay uh and save

argument is embeddings okay uh and save it to phase

it to phase index and we'll call uh this particular

index and we'll call uh this particular variable let's say this okay and then we

variable let's say this okay and then we are going to save this in memory phase

are going to save this in memory phase index that we have on this so we'll save

index that we have on this so we'll save it uh in a pickle format and here is how

it uh in a pickle format and here is how you save it the file path is going to be

you save it the file path is going to be you can give any file name okay so I'm

you can give any file name okay so I'm just going to give plus a phas store

just going to give plus a phas store open a i Pi okay so that is that is

open a i Pi okay so that is that is stored here uh and just to show the

stored here uh and just to show the progress uh when we are processing the

progress uh when we are processing the URL so let me just show you the UR here

URL so let me just show you the UR here when you enter bunch of URL here process

when you enter bunch of URL here process URL here like below this news research

URL here like below this news research tool here I want to show up progress bar

tool here I want to show up progress bar so progress bar will say Okay loading

so progress bar will say Okay loading the data splitting the data things like

the data splitting the data things like that and for that we have to use a main

that and for that we have to use a main placeholder so I will say

maincore placeholder is equal to st. Mt so this way you are creating an empty U

so this way you are creating an empty U kind of like UI element and when you are

kind of like UI element and when you are loading the data you can say dot text

loading the data you can say dot text and you are saying okay data loading

and you are saying okay data loading started okay so I'll just again copy

started okay so I'll just again copy paste here and I will uh put this kind

paste here and I will uh put this kind of uh progress bar before I think split

of uh progress bar before I think split document and also here all right so so

document and also here all right so so far I think my code looks good I can

far I think my code looks good I can rerun this and see how it goes so I'm

rerun this and see how it goes so I'm going to rerun so you can click on rerun

going to rerun so you can click on rerun or just Place R button so my code is

or just Place R button so my code is Rerun now and I'm going to give these

Rerun now and I'm going to give these three articles here so this is my first

three articles here so this is my first article then this one is my second

article then this one is my second article so first second and the third

article so first second and the third one is this all three are articles

one is this all three are articles related to data Motors and see it will

related to data Motors and see it will say data loading started so now it's

say data loading started so now it's loading going to these three URLs using

loading going to these three URLs using unru URL loader loading the data then it

unru URL loader loading the data then it is splitting them

is splitting them into, uh token chunks then it is

into, uh token chunks then it is building embeddings using open API calls

building embeddings using open API calls and then it is using phase to kind of

and then it is using phase to kind of build an index and save it to a disk so

build an index and save it to a disk so when you look at the folder that we have

when you look at the folder that we have uh you will find this file phase store

uh you will find this file phase store open a pickle this is like our Vector

open a pickle this is like our Vector database so now our Vector database is

database so now our Vector database is ready so next step we will enter a

ready so next step we will enter a question box here okay so let's do that

question box here okay so let's do that coding now I'm going to say main

coding now I'm going to say main placeholder do text

placeholder do text input and the text input will have

input and the text input will have question and whatever query you are

question and whatever query you are getting that question you will say if

getting that question you will say if query so when you type in a question hit

query so when you type in a question hit enter it will come here now what should

enter it will come here now what should be the first thing that you you'll be

be the first thing that you you'll be doing here well you will be loading the

doing here well you will be loading the the vector database so you'll say say

the vector database so you'll say say okay if the file exists there could be a

okay if the file exists there could be a reason where this particular file you

reason where this particular file you know doesn't exist so you want to make

know doesn't exist so you want to make sure you're doing that check if the file

sure you're doing that check if the file exist then you need to read that file so

exist then you need to read that file so you'll say with

you'll say with open file path read it's a binary file

open file path read it's a binary file ASF and

ASF and pickle.

pickle. load so you load that file and you call

load so you load that file and you call it Vector store Vector store

it Vector store Vector store okay we are using the Python's like

okay we are using the Python's like pickle modu you can see here we imported

pickle modu you can see here we imported that and then we are creating the

that and then we are creating the retrieval QA chain okay so Vector store

retrieval QA chain okay so Vector store Vector store is here uh this QA chain

Vector store is here uh this QA chain expects llm as an input so I don't think

expects llm as an input so I don't think we have created llm so let me just copy

we have created llm so let me just copy paste it here I I have the code ready

paste it here I I have the code ready here so that I can save time on the

here so that I can save time on the recording so I have created Creed llm

recording so I have created Creed llm object now my chain looks good and then

object now my chain looks good and then in chain you will say uh what is the

in chain you will say uh what is the format so you supply a question

format so you supply a question here okay this particular argument you

here okay this particular argument you will supply as true and then you will

will supply as true and then you will get your result now result will be a

get your result now result will be a dictionary which will have two elements

dictionary which will have two elements okay so result will have it will look

okay so result will have it will look something like this so it will

something like this so it will have

um answer and it will also have whatever is the answer so it will have whatever

is the answer so it will have whatever is the answer and it will also have

is the answer and it will also have sources which will contain the URLs or

sources which will contain the URLs or whatever it can be an array

whatever it can be an array so

so result answer contains answer we need to

result answer contains answer we need to display that right so St dot our main

display that right so St dot our main placeholder dot you know what I will

placeholder dot you know what I will just uh use st. header here and and

just uh use st. header here and and say here is my answer and then St do

say here is my answer and then St do subheader and that subheader will have

subheader and that subheader will have result

result answer okay so let's try this much so

answer okay so let's try this much so far so I'm going to bring that UI here

far so I'm going to bring that UI here click on rerun and it just rerun and I

click on rerun and it just rerun and I will ask now what is the price of Thiago

will ask now what is the price of Thiago icng okay so this is a question based on

icng okay so this is a question based on those three articles uh if you look at

those three articles uh if you look at that article here I think it is here see

that article here I think it is here see I cng's price is

I cng's price is 6.55 to

6.55 to 8.1 and see it is giving the answer uh

8.1 and see it is giving the answer uh properly I want to also see the source

properly I want to also see the source like from which URL it retrieve the

like from which URL it retrieve the article so we can do that real quick

article so we can do that real quick here just to save time on recording I'm

here just to save time on recording I'm not going to go too much in details

not going to go too much in details because this is very minor

because this is very minor so you are basically going to result and

so you are basically going to result and get the sources element from the

get the sources element from the dictionary because in that dictionary

dictionary because in that dictionary this might not be present Okay this may

this might not be present Okay this may be present this might not be present

be present this might not be present that's why I'm calling get uh argument

that's why I'm calling get uh argument and if it is present then you are

and if it is present then you are creating another subheader and providing

creating another subheader and providing sources list now why list because

sources list now why list because sometimes answer may come from multiple

sometimes answer may come from multiple URLs so you need to handle that scenario

URLs so you need to handle that scenario so let's bring this code here now rerun

so let's bring this code here now rerun this and ask the question again

this and ask the question again see Thiago C priz is this and here is

see Thiago C priz is this and here is the URL so if you look at this

the URL so if you look at this URL uh it will have the

answers correct this now you can ask a summarization question as well for

summarization question as well for example I have this article K ches and I

example I have this article K ches and I want to just summarize this article this

want to just summarize this article this is their recommendation on this talk so

is their recommendation on this talk so I will say can you please summarize this

I will say can you please summarize this article okay and by the way this is like

article okay and by the way this is like bold so let me just change this thing

bold so let me just change this thing I'm just going to call it right because

I'm just going to call it right because that way it is not bold

that way it is not bold okay so

here you see it it already summarized the article and it also gave the source

the article and it also gave the source for that but but this is like little

for that but but this is like little bigger font so I'm just changing it to

bigger font so I'm just changing it to smaller font but overall folks this this

smaller font but overall folks this this tool is ready now it's going to be very

tool is ready now it's going to be very useful to my equity research analyst my

useful to my equity research analyst my peter Panda who is investing on aoki's

peter Panda who is investing on aoki's behalf because now you don't have to

behalf because now you don't have to read so many articles whatever question

read so many articles whatever question you have you can ask it to this news

you have you can ask it to this news research tool it will not only give you

research tool it will not only give you the answer but it gives you the sources

the answer but it gives you the sources reference see this is very important

reference see this is very important folks and by the way with llm boom many

folks and by the way with llm boom many clients are building this kind of tool

clients are building this kind of tool how do I know it well my own company in

how do I know it well my own company in ATC Technologies we have some us-based

ATC Technologies we have some us-based clients for whom we are building uh

clients for whom we are building uh these llm projects and this is a real

these llm projects and this is a real life use case that I'm showing you so

life use case that I'm showing you so this is not like a toy project it is

this is not like a toy project it is based on a real things which are

based on a real things which are happening in the industry document

happening in the industry document summarization building chatboard similar

summarization building chatboard similar to chat GPT on custom data so this is

to chat GPT on custom data so this is sort of like we build a chat board which

sort of like we build a chat board which can answer your questions on custom data

can answer your questions on custom data which is my these three URLs so we just

which is my these three URLs so we just buil the basic proof of concept and it

buil the basic proof of concept and it is already working a code everything is

is already working a code everything is given in the video description below so

given in the video description below so please try it out longterm we have

please try it out longterm we have discussed this like longterm when you

discussed this like longterm when you building this project in the industry

building this project in the industry you will build two components one is

you will build two components one is Data inje System where you will write

Data inje System where you will write some kind of web Scrapper that goes

some kind of web Scrapper that goes through all these um websites and it

through all these um websites and it retrieves all those articles now for web

retrieves all those articles now for web scripting you can use native python or

scripting you can use native python or bright data uh even unstructured URL

bright data uh even unstructured URL loader will work but you know it might

loader will work but you know it might start not working at some point because

start not working at some point because websites will detect the scrolling

websites will detect the scrolling activity and they might block you uh

activity and they might block you uh that is a reason why people use tools

that is a reason why people use tools like bride data which is a proxy network

like bride data which is a proxy network based tool

based tool then you will create embeddings it could

then you will create embeddings it could be open AI hugging phase and store it in

be open AI hugging phase and store it in a vector database so if I'm doing this

a vector database so if I'm doing this in the industry like a big project I

in the industry like a big project I will not use phase I will use Vector

will not use phase I will use Vector database I can still use phase for as a

database I can still use phase for as a library but otherwise I'll use a vector

library but otherwise I'll use a vector database and once you have data in the

database and once you have data in the vector database you can build UI in

vector database you can build UI in react or whatever tool and then call

react or whatever tool and then call that Vector database to retrieve the

that Vector database to retrieve the similar looking chunks and post the

similar looking chunks and post the answer back to

chatboard data data data data lights me I can't Avo

it in today's video we are going to build end to endend llm project where we

build end to endend llm project where we are going to use all these Technologies

are going to use all these Technologies at Le te is a store that sells t-shirts

at Le te is a store that sells t-shirts their data is stored in a mySQL database

their data is stored in a mySQL database we will build a tool similar to chat GPD

we will build a tool similar to chat GPD where you can ask a question in natural

where you can ask a question in natural human language it will convert that

human language it will convert that question into SQL query and execute it

question into SQL query and execute it on our database you will get a feeling

on our database you will get a feeling as if you are talking to a database in a

as if you are talking to a database in a plain English language it's going to be

plain English language it's going to be a very interesting project let us

a very interesting project let us discuss project requirements our at te's

discuss project requirements our at te's t-shirt store sells four Brands mainly

t-shirt store sells four Brands mainly van Hussein leis Nike and Adidas and the

van Hussein leis Nike and Adidas and the mySQL database has first table which is

mySQL database has first table which is called t-shirts where we maintain the

called t-shirts where we maintain the inventory count so basically Levy's

inventory count so basically Levy's black color small size t-shirt have 15

black color small size t-shirt have 15 uh stock quantity left okay so 1564

uh stock quantity left okay so 1564 these are the stock quantities and this

these are the stock quantities and this price is price per unit so one leis

price is price per unit so one leis black Smalls size t-shirt will cost you

black Smalls size t-shirt will cost you $19 the second table I have is discounts

$19 the second table I have is discounts so for example t-shirt ID one which is

so for example t-shirt ID one which is leev black small T-shirt has 10%

leev black small T-shirt has 10% discount in real life the database will

discount in real life the database will have so many different tables to make

have so many different tables to make things simple for learning I'm just

things simple for learning I'm just going to use two tables the t-shirt

going to use two tables the t-shirt store manager is Tony Sharma whenever he

store manager is Tony Sharma whenever he has questions related to stock quantity

has questions related to stock quantity discounts and so on he uses a software

discounts and so on he uses a software which is built on top of this mySQL

which is built on top of this mySQL database if you look at retail domain

database if you look at retail domain overall they will have these softwares

overall they will have these softwares where you can use various UI options on

where you can use various UI options on the software to get answers of your

the software to get answers of your questions

questions and Tony is fine using this software but

and Tony is fine using this software but many times what happens is he has custom

many times what happens is he has custom questions little complex questions and

questions little complex questions and the software can't figure it out so then

the software can't figure it out so then he will have to download the data in

he will have to download the data in Excel do certain things manually

Excel do certain things manually whenever he's busy he goes to Loki who

whenever he's busy he goes to Loki who is a data analyst working for this

is a data analyst working for this company and Loki knows SQL so let's say

company and Loki knows SQL so let's say if Tony asked this question that how

if Tony asked this question that how many white color Nike T-shirts do we

many white color Nike T-shirts do we have in stock and he will just simply

have in stock and he will just simply run the SQL query on that database and

run the SQL query on that database and get the answer back to Tony Sharma but

get the answer back to Tony Sharma but Loki is busy as well he's busy building

Loki is busy as well he's busy building power be dashboards and he doesn't have

power be dashboards and he doesn't have uh too much time for these ad hoc

uh too much time for these ad hoc queries also L is the only data analyst

queries also L is the only data analyst working for this company and Tony

working for this company and Tony sometimes have issues where you know

sometimes have issues where you know Loki is out on leave and he's not

Loki is out on leave and he's not available and then he has to do all this

available and then he has to do all this work manually because Tony himself

work manually because Tony himself doesn't know SQL so then he goes to uh

doesn't know SQL so then he goes to uh data scientist who is working for the

data scientist who is working for the same company and you might guess what is

same company and you might guess what is the name of that data scientist well

the name of that data scientist well Peter P who looks somewhat like me and

Peter P who looks somewhat like me and he says hey Peter buddy we are living in

he says hey Peter buddy we are living in chat GPT era llm Lang chain all these

chat GPT era llm Lang chain all these cool Frameworks have come up why don't

cool Frameworks have come up why don't you build a tool similar to chat GPT

you build a tool similar to chat GPT where I can ask a question in a human

where I can ask a question in a human language and it somehow converts that to

language and it somehow converts that to a SQL query executes it on a database

a SQL query executes it on a database and gets me the answer that the answer

and gets me the answer that the answer is 3165 Peter likes this thought and he

is 3165 Peter likes this thought and he agrees to build this particular tool so

agrees to build this particular tool so let's look at the technical architecture

let's look at the technical architecture of this tool whenever you have a

of this tool whenever you have a question you need to convert that to a

question you need to convert that to a SQL query using some llm now we are

SQL query using some llm now we are going to use Google Palm here which will

going to use Google Palm here which will do this conversion and we will use

do this conversion and we will use Google pal from Lang chain framework

Google pal from Lang chain framework within Lang chain framework you can use

within Lang chain framework you can use Google palm and other type of llms we

Google palm and other type of llms we will use a SQL database chain class

will use a SQL database chain class within Lang chain framework this will

within Lang chain framework this will work okay for simple queries but as the

work okay for simple queries but as the queries get little complex out of the

queries get little complex out of the box Google Palm model will fail

box Google Palm model will fail sometimes it will give errors and we

sometimes it will give errors and we need to do some special handling we will

need to do some special handling we will use a concept of few short learning here

use a concept of few short learning here few short learning means you need to

few short learning means you need to prepare the training data set where you

prepare the training data set where you have a sample question and a

have a sample question and a corresponding SQL query here you will

corresponding SQL query here you will list down all those queries where out of

list down all those queries where out of the box box Google model is failing and

the box box Google model is failing and you can prepare these queries with the

you can prepare these queries with the help of your data analyst Mr lokal and

help of your data analyst Mr lokal and you prepared this data set it is called

you prepared this data set it is called few short learning because you don't

few short learning because you don't need to prepare like thousand samples

need to prepare like thousand samples you know you can have some few samples

you know you can have some few samples here and then you will convert this

here and then you will convert this training data set into embedding vectors

training data set into embedding vectors if you don't have any idea on what is

if you don't have any idea on what is word embedding sentence embedding go to

word embedding sentence embedding go to YouTube search for code Basics embedding

YouTube search for code Basics embedding or code Basics word embedding you will

or code Basics word embedding you will find couple of videos where I have

find couple of videos where I have provided very simple intuitive

provided very simple intuitive explanation for these embeddings we will

explanation for these embeddings we will use hugging face Library once embeddings

use hugging face Library once embeddings are created we will store them into a

are created we will store them into a vector database when you think about

vector database when you think about Vector database there are a couple of

Vector database there are a couple of options that you have pine cone MERS

options that you have pine cone MERS chrom rdb face Etc we are going to use

chrom rdb face Etc we are going to use chrom rdb it is open source and it will

chrom rdb it is open source and it will work perfectly okay for our project once

work perfectly okay for our project once the vector database is ready we will

the vector database is ready we will pair it up with uh Google Palm llm we'll

pair it up with uh Google Palm llm we'll use few short promt template to create

use few short promt template to create the SQL database chain and the last

the SQL database chain and the last piece will be building a UI in streamlet

piece will be building a UI in streamlet we will write just few lines of code

we will write just few lines of code five or six lines of code and our UI

five or six lines of code and our UI will be ready to continue further on

will be ready to continue further on this project obviously you need to have

this project obviously you need to have Lang chain Basics clear for which I have

Lang chain Basics clear for which I have this particular video where I have

this particular video where I have covered all the basics in this one

covered all the basics in this one single video so make sure uh you have

single video so make sure uh you have either watched it or you already know

either watched it or you already know the Lang chain fundamentals so just the

the Lang chain fundamentals so just the basics you also need to know what is

basics you also need to know what is Vector database in this six minute video

Vector database in this six minute video I have given a very simple explanation

I have given a very simple explanation of what is Vector database so if you

of what is Vector database so if you have not seen it please uh see that now

have not seen it please uh see that now let's do a review of Google Palm there

let's do a review of Google Palm there are three popular options when you talk

are three popular options when you talk about building llm application open AI

about building llm application open AI gp4 model which is best in the market

gp4 model which is best in the market but it is paid the other two unpaid are

but it is paid the other two unpaid are meta's Alama and Google um I could have

meta's Alama and Google um I could have used meta's llama for this project but

used meta's llama for this project but you have to download that llama model

you have to download that llama model locally or less in your Google collab

locally or less in your Google collab and it is very heavy like sometimes it's

and it is very heavy like sometimes it's the the size is in gigabytes and it's

the the size is in gigabytes and it's kind of little bit hard to set up

kind of little bit hard to set up whereas Google spam is very easy to set

whereas Google spam is very easy to set up it works similar to open a API where

up it works similar to open a API where you just make a query to their Google

you just make a query to their Google server and the Beautiful Thing here is

server and the Beautiful Thing here is it is all free okay so we going to use

it is all free okay so we going to use that as a next step we are going to set

that as a next step we are going to set up API key for Google pal I have opened

up API key for Google pal I have opened makers suit. google.com website where

makers suit. google.com website where you can login using your Google account

you can login using your Google account you need to go to gate API key and you

you need to go to gate API key and you can create API key in your existing

can create API key in your existing Google Cloud project and if you don't

Google Cloud project and if you don't have that just click on create API key

have that just click on create API key in a new project so here I will use just

in a new project so here I will use just any project and create an API key now

any project and create an API key now this API key short of like a password so

this API key short of like a password so make sure you don't share it with others

make sure you don't share it with others I'm showing you this apiq right now but

I'm showing you this apiq right now but I'm going to delete it after I use it in

I'm going to delete it after I use it in my project so I'll copy it save it at a

my project so I'll copy it save it at a safe place so that I can use it later on

safe place so that I can use it later on in my code talking about maker suit it

in my code talking about maker suit it gives you a taste pad where you can try

gives you a taste pad where you can try different prompts for example text

different prompts for example text prompt let's go here and here you can

prompt let's go here and here you can write different prompts and it will use

write different prompts and it will use this text bison model Google p is

this text bison model Google p is architecture but the specific model that

architecture but the specific model that it is using is text bison the creativity

it is using is text bison the creativity parameter means if it is more closer to

parameter means if it is more closer to one then it will be more creative if it

one then it will be more creative if it is more closer to zero it will be less

is more closer to zero it will be less creative you can try some sample prompts

creative you can try some sample prompts for example summarize a paragraph and

for example summarize a paragraph and when you run it it will summarize the uh

when you run it it will summarize the uh paragraph you can try poem writing or

paragraph you can try poem writing or write your own custom prompt behind the

write your own custom prompt behind the scene it is using the same API that we

scene it is using the same API that we will be using in our project therefore

will be using in our project therefore if you want to quickly taste your API

if you want to quickly taste your API this taste pad allows you to do that

this taste pad allows you to do that very easily you can play with different

very easily you can play with different prompts but as far as API key is

prompts but as far as API key is concerned we are all set now we will set

concerned we are all set now we will set up our mySQL database now I have

up our mySQL database now I have launched my SQL workbench by going here

launched my SQL workbench by going here and typing MySQL workbench if you're not

and typing MySQL workbench if you're not aware about MySQL don't worry you can go

aware about MySQL don't worry you can go to YouTube type code basic SQL tutorial

to YouTube type code basic SQL tutorial I have this 1 and a half hour tutorial

I have this 1 and a half hour tutorial where I have given a complete idea for

where I have given a complete idea for any beginner uh so you can follow that

any beginner uh so you can follow that and learn my SQL easily this tool is by

and learn my SQL easily this tool is by the way free free you can download it

the way free free you can download it easily by going to Google searching for

easily by going to Google searching for my SQL workbench I will open this local

my SQL workbench I will open this local instance and if you check video

instance and if you check video description below I have given you all

description below I have given you all the code files this will have a database

the code files this will have a database directory you can go here and drag and

directory you can go here and drag and drop this SQL file here this file is

drop this SQL file here this file is taking care of creating database and

taking care of creating database and tables within it you can click on this

tables within it you can click on this execute icon and it will create the

execute icon and it will create the tables and data within it when you click

tables and data within it when you click on this refresh iccon that's when you

on this refresh iccon that's when you will see atck t-shirt database you can

will see atck t-shirt database you can right click on it and set it as a

right click on it and set it as a default data set if you have not set it

default data set if you have not set it like that before so you just say set as

like that before so you just say set as a default schema and you will see that

a default schema and you will see that this font will convert into bold now

this font will convert into bold now table wise we have first table which is

table wise we have first table which is T-shirts if you click on this third icon

T-shirts if you click on this third icon you will see some sample record for

you will see some sample record for example t-shirt ID one is when who SS

example t-shirt ID one is when who SS red color T-shirt in small size price of

red color T-shirt in small size price of one t-shirt is $15 we have total 70

one t-shirt is $15 we have total 70 t-shirts available in our store that's a

t-shirts available in our store that's a stock quantity if you talk about

stock quantity if you talk about discounts t-shirt id1 which is the same

discounts t-shirt id1 which is the same van H red color T-shirt has 10% discount

van H red color T-shirt has 10% discount what it means is the $15 is original

what it means is the $15 is original price 10% of 15 is 1.5 so when I sell

price 10% of 15 is 1.5 so when I sell this one t-shirt I'm going to give 1

this one t-shirt I'm going to give 1 $1.5 discount to a customer so they'll

$1.5 discount to a customer so they'll get it for

get it for $13.5 these records by the way will be

$13.5 these records by the way will be different when you execute this SQL

different when you execute this SQL script because we are using some random

script because we are using some random numbers here so don't worry if you don't

numbers here so don't worry if you don't see the same exit numbers in your case

see the same exit numbers in your case they are likely going to be different

they are likely going to be different all right our database is set up now

all right our database is set up now let's start coding in our jupyter

let's start coding in our jupyter notebook I rent python High M notebook

notebook I rent python High M notebook to launch my jupyter notebook and here I

to launch my jupyter notebook and here I have created a new python notebook I'm

have created a new python notebook I'm going to import Google pal model from

going to import Google pal model from Lang chain. llms okay now you can use uh

Lang chain. llms okay now you can use uh open EI all kind of models Google Palm

open EI all kind of models Google Palm is

is free so let's create a an object for

free so let's create a an object for this llm and here I'm going to pass

this llm and here I'm going to pass Google AP

Google AP key which will be stored in a variable

key which will be stored in a variable called API key and I will initialize

called API key and I will initialize that variable here and add my specific

that variable here and add my specific key here folks as I said before please

key here folks as I said before please use your key I'm going to delete my key

use your key I'm going to delete my key later on so code will not work if you

later on so code will not work if you use my

use my key once llm object is created you can

key once llm object is created you can ask some sample prompt for example write

ask some sample prompt for example write a PO PO on my love for DOA Dosa is a

a PO PO on my love for DOA Dosa is a South Indian food I love that and you

South Indian food I love that and you can you

can you know print a poem on that and you see it

know print a poem on that and you see it is working good now folks before you run

is working good now folks before you run this code make sure all your libraries

this code make sure all your libraries are installed so you can run this

are installed so you can run this command pip install hyphen r with

command pip install hyphen r with requirement. txt file and if you look at

requirement. txt file and if you look at the requirement. txt file it has all

the requirement. txt file it has all these requirements Lang chain chroma DB

these requirements Lang chain chroma DB all of that so I'm assuming you have

all of that so I'm assuming you have installed all of that all right now

installed all of that all right now let's create a an SQL database object

let's create a an SQL database object and for that you can import this

and for that you can import this particular class and when you create SQL

particular class and when you create SQL database object you will say from URI

database object you will say from URI okay and here you are going to pass a

okay and here you are going to pass a URI or URI is like URL it specifies what

URI or URI is like URL it specifies what is your database what is a host username

is your database what is a host username password and so on so we'll store all

password and so on so we'll store all this information in different

this information in different variables my database is running locally

variables my database is running locally therefore Local Host username password

therefore Local Host username password is root and atore t-shirt is a database

is root and atore t-shirt is a database name see you see it here okay

name see you see it here okay now the way URI is formed is using this

now the way URI is formed is using this syntax I'll just copy paste to save time

syntax I'll just copy paste to save time on recording you don't need to remember

on recording you don't need to remember all these things anyway so this is the

all these things anyway so this is the syntax of it all right and then the

syntax of it all right and then the second parameter is sample rows in

second parameter is sample rows in table and and I'll show you what this uh

table and and I'll show you what this uh number three means so

number three means so here the result that I got I will store

here the result that I got I will store it in this variable called DB and this

it in this variable called DB and this will

will have a property called DB info which we

have a property called DB info which we can print when I do this uh I will get a

can print when I do this uh I will get a confirmation that I'm able ble to

confirmation that I'm able ble to connect to my SQL database from my

connect to my SQL database from my jupyter notebook and see it is able to

jupyter notebook and see it is able to pull all this information which means my

pull all this information which means my jupyter notebook is now connected to my

jupyter notebook is now connected to my database now we are ready to create our

database now we are ready to create our SQL database chain okay so in Lang chain

SQL database chain okay so in Lang chain there are all kind of chains like SQL

there are all kind of chains like SQL database chain um and for different type

database chain um and for different type of use cases you will have these

of use cases you will have these different chains so this SQL database

different chains so this SQL database chain if you notice is imported from

chain if you notice is imported from Lang chain experimental module now if

Lang chain experimental module now if you are seeing this video in the future

you are seeing this video in the future it is possible you can import it

it is possible you can import it directly from a linkchain module but as

directly from a linkchain module but as of right now it is part of the

of right now it is part of the experimental module in the future if

experimental module in the future if they make it available just just remove

they make it available just just remove this thing you know use your common

this thing you know use your common sense and you should be able to run it

sense and you should be able to run it now let's create the chain object okay

now let's create the chain object okay and this chain object will

and this chain object will take first parameter will be M that we

take first parameter will be M that we created second one is the DB object see

created second one is the DB object see this particular DB

this particular DB object and then you can store this into

object and then you can store this into DB

DB chain and now you can run a simple query

chain and now you can run a simple query before I do that I will pass one more

before I do that I will pass one more parameter veros true so that I can see

parameter veros true so that I can see the scho query that it is generating and

the scho query that it is generating and I can see some internal details the

I can see some internal details the first question I'm asking is this

first question I'm asking is this okay let me just copy paste here how

okay let me just copy paste here how many Nike white color extra small size

many Nike white color extra small size t- do I have and let's store it in this

t- do I have and let's store it in this UNS variable control

UNS variable control [Music]

[Music] enter okay this is happening because

enter okay this is happening because here I need to use from llm see it

here I need to use from llm see it pulled a right answer it is saying 59

pulled a right answer it is saying 59 and if you look at this query the query

and if you look at this query the query that it generated if you run it it it is

that it generated if you run it it it is actually the right query that it

actually the right query that it generated okay so here

generated okay so here uh let me see here I can run that

query see 59 if you just look at uh Nike T-shirts

59 if you just look at uh Nike T-shirts overall uh or let's say let me just do

overall uh or let's say let me just do star here only Nike T-shirts see Nike

star here only Nike T-shirts see Nike t-shirts are this if you look at Nike

t-shirts are this if you look at Nike white

white t-shirts this much and in that extra

t-shirts this much and in that extra small size t-shirt quantity is 59 and

small size t-shirt quantity is 59 and the answer that it is giving is 59 if

the answer that it is giving is 59 if you do

you do qns1 see it is giving 59 by the way it

qns1 see it is giving 59 by the way it is giving a dictionary as an output if

is giving a dictionary as an output if you want to get directly 59 here you can

you want to get directly 59 here you can use run so when you do that q&s will

use run so when you do that q&s will have the direct answer now there are a

have the direct answer now there are a couple of observations I have llm is

couple of observations I have llm is actually doing pretty good job because I

actually doing pretty good job because I said extra small size and it is smart

said extra small size and it is smart enough to figure out that extra small

enough to figure out that extra small means excess and it is able to map that

means excess and it is able to map that to size column when I say white color W

to size column when I say white color W is small but it is able to map it to

is small but it is able to map it to capital W because it looked into our

capital W because it looked into our database and figured that our color

database and figured that our color starts with a capital letter so you see

starts with a capital letter so you see this is the power of LM now this was

this is the power of LM now this was relatively simple query let me try a

relatively simple query let me try a different query and the query is what is

different query and the query is what is the price of the inventory for all SM

the price of the inventory for all SM small size t-shirts now while it

small size t-shirts now while it executes this let me run that code here

executes this let me run that code here so we want to get all small size t-shirt

so we want to get all small size t-shirt okay so here I will

okay so here I will say where small size okay let me get all

say where small size okay let me get all small size t-shirts these are all small

small size t-shirts these are all small size t-shirt and what is the total price

size t-shirt and what is the total price total price will be price into stock

total price will be price into stock quantity so here I need to run sum I

quantity so here I need to run sum I will say

will say sum price into quantity okay and when I

sum price into quantity okay and when I run

that it is actually stock quantity so when I run that this is the price I get

when I run that this is the price I get now let's see what we got in our

now let's see what we got in our notebook 215 wrong answer folks so why

notebook 215 wrong answer folks so why did that happen let's just think about

did that happen let's just think about it so here the problem is it said sum of

it so here the problem is it said sum of price it did not say sum of price into

price it did not say sum of price into stock quantity it forgot to multiply by

stock quantity it forgot to multiply by quantity if you think a little bit you

quantity if you think a little bit you will actually find an obvious reason and

will actually find an obvious reason and llm is thinking

llm is thinking that whatever t-shirts I have so let me

that whatever t-shirts I have so let me show you these

show you these t-shirts it is thinking that the price

t-shirts it is thinking that the price column is for all the t-shirts so for

column is for all the t-shirts so for Levy white color small size t-shirts I

Levy white color small size t-shirts I have total 51 t-shirts available and the

have total 51 t-shirts available and the price of total total price of all 51

price of total total price of all 51 t-shirt is 13 that is what llm is

t-shirt is 13 that is what llm is thinking because the column name is not

thinking because the column name is not price per unit it says price so price

price per unit it says price so price could be total price or it could be

could be total price or it could be price per unit it is assuming it is the

price per unit it is assuming it is the total price and if this was a total

total price and if this was a total price then the answer that llm gave

price then the answer that llm gave would be correct but in real life

would be correct but in real life database column names are not going to

database column names are not going to be perfect so this is representing a

be perfect so this is representing a real life scenario okay so the

real life scenario okay so the conclusion that we get is llms will make

conclusion that we get is llms will make mistake and we need to tell it somehow

mistake and we need to tell it somehow that the price column is price per unit

that the price column is price per unit it is not the total price we can do this

it is not the total price we can do this using few short learning we will do that

using few short learning we will do that after some time let me run some few more

after some time let me run some few more queries okay and meanwhile I will store

queries okay and meanwhile I will store the right qns answer okay and the way

the right qns answer okay and the way you can do that is you can actually run

you can do that is you can actually run the explicit query so the explicit query

the explicit query so the explicit query that we have is this okay so I'm going

that we have is this okay so I'm going to

to run that query

run that query here so in DB chain. run you can

here so in DB chain. run you can actually run the explicit query and we

actually run the explicit query and we got this right answer which is stored in

got this right answer which is stored in my qns 2 all right so far it is looking

my qns 2 all right so far it is looking good now I will run the third query

good now I will run the third query which will be little bit complex so I'm

which will be little bit complex so I'm saying if I sell all my lais t-shirt

saying if I sell all my lais t-shirt today with discounts how much revenue my

today with discounts how much revenue my store will generate now when you want to

store will generate now when you want to apply

apply discounts you need to do some kind of

discounts you need to do some kind of join okay so you have all the Ley

join okay so you have all the Ley t-shirts okay so these are all the Ley

t-shirts okay so these are all the Ley t-shirts that you have you need to

t-shirts that you have you need to multiply price by stock quantity and

multiply price by stock quantity and then you need to sum all of this you

then you need to sum all of this you will get total revenue then you need to

will get total revenue then you need to go to Discount table and figure out on

go to Discount table and figure out on leis t-shirts how much discount you have

leis t-shirts how much discount you have for example one of the T-shirt ID for

for example one of the T-shirt ID for Ley is three Ley white extra small is

Ley is three Ley white extra small is three and for three we have 20% discount

three and for three we have 20% discount so on this price so 44 into 94 you need

so on this price so 44 into 94 you need to apply 20% discount in that so let's

to apply 20% discount in that so let's see if llm can handle this kind of

see if llm can handle this kind of complex case no so it failed it failed

complex case no so it failed it failed because in the query now you see all

because in the query now you see all these columns discount. start date

these columns discount. start date discount. end dat usually whenever you

discount. end dat usually whenever you have discount you will have start date

have discount you will have start date and ended column because discounts can't

and ended column because discounts can't run forever right there will be started

run forever right there will be started and ended but in our database if you

and ended but in our database if you look at discount table we don't have

look at discount table we don't have start and end date so llm is using its

start and end date so llm is using its general knowledge and it is assuming

general knowledge and it is assuming that there will be start date and end

that there will be start date and end date in our table we need to tell it

date in our table we need to tell it that hey buddy don't use your brain okay

that hey buddy don't use your brain okay look at the table and if you find a

look at the table and if you find a column then only use it here start it

column then only use it here start it doesn't exist how come you just use it

doesn't exist how come you just use it you know so we need to tell that and

you know so we need to tell that and again we will do that using few short

again we will do that using few short learning after some time for now let me

learning after some time for now let me run that

run that query uh explicitly okay so here this is

query uh explicitly okay so here this is the query uh to get the answer and we'll

the query uh to get the answer and we'll run it and by the way this is not a

run it and by the way this is not a MySQL tutorial so I'm not going into

MySQL tutorial so I'm not going into detail uh but let me just just very

detail uh but let me just just very quickly explain how this thing works so

quickly explain how this thing works so here if you look at this particular

here if you look at this particular query it has this part is a subquery and

query it has this part is a subquery and if you execute this query it will pull

if you execute this query it will pull all leis t-shirt it will multiply price

all leis t-shirt it will multiply price by quantity and it will give you that so

by quantity and it will give you that so for t-shirt ID 17 if you sell all the

for t-shirt ID 17 if you sell all the T-shirt you will get this much revenue

T-shirt you will get this much revenue for 64 you'll get this you can sum them

for 64 you'll get this you can sum them up to get a total revenue and then you

up to get a total revenue and then you need to um make a join of this table

need to um make a join of this table with discount table see this query by

with discount table see this query by the way the result is stored in table

the way the result is stored in table called a and you are doing a left join

called a and you are doing a left join with discount table and then you are

with discount table and then you are applying the discount here so if you

applying the discount here so if you execute this this is actually the answer

execute this this is actually the answer okay

okay 24367 and if you look at this 24367 is

24367 and if you look at this 24367 is the answer we got which we have stored

the answer we got which we have stored in this

in this qns 3

qns 3 variable similarly let me run a few more

variable similarly let me run a few more queries so this is if I sell all Levis

queries so this is if I sell all Levis t-shirts you know how much revenue will

t-shirts you know how much revenue will I generate I will generate this much and

I generate I will generate this much and another question I have is how many

another question I have is how many white color leis t-shirts I

white color leis t-shirts I have now now let's go and figure that

have now now let's go and figure that out so you want to know how many white

out so you want to know how many white color Le leis t-shirts So when you say

color Le leis t-shirts So when you say total Lis t-shirts it is this

total Lis t-shirts it is this much and white color right so you will

much and white color right so you will say and color is equal

say and color is equal to White okay this much so total white

to White okay this much so total white color lais t-shirts are 94 + 51 + 29 15

color lais t-shirts are 94 + 51 + 29 15 and 95 but in our code what's happening

and 95 but in our code what's happening is see 94 151 it it pulled all that

is see 94 151 it it pulled all that numbers but it did not sum it up so if

numbers but it did not sum it up so if you look at the answer the answer is 94

you look at the answer the answer is 94 why it did that because it is not able

why it did that because it is not able to figure out that it needs to do sum

to figure out that it needs to do sum here okay so the right query here is sum

here okay so the right query here is sum of stock

quanti so that will be 284 that's the right answer again it failed so what

right answer again it failed so what what what do we do now well we run the

what what do we do now well we run the query explicitly so that later on I can

query explicitly so that later on I can use it in my few short learning okay so

use it in my few short learning okay so qns 5 query I will just copy paste

qns 5 query I will just copy paste whatever I wrote in my C workbench and

whatever I wrote in my C workbench and qns 5 now is

qns 5 now is 284 so now we have all these answers and

284 so now we have all these answers and we have all these queries uh let's try

we have all these queries uh let's try few short learning uh so that our llm

few short learning uh so that our llm can improve uh on the errors that it is

can improve uh on the errors that it is making in few short learning the first

making in few short learning the first thing we need to do is provide the

thing we need to do is provide the question and query pairs where llm was

question and query pairs where llm was getting confused once we have those

getting confused once we have those training example the Second Step would

training example the Second Step would be to convert it into embeddings and

be to convert it into embeddings and we're going to use hugging pH for that

we're going to use hugging pH for that so let's go to a notebook and start

so let's go to a notebook and start putting together those few short

putting together those few short examples in a simple python list and

examples in a simple python list and each of these examples would be a

each of these examples would be a dictionary and dictionary will

dictionary and dictionary will have one element which will be uh your

have one element which will be uh your question okay so let's say my question

question okay so let's say my question is

is this and then the SQL query

this and then the SQL query corresponding to that question would be

corresponding to that question would be this so we previously ran all this

this so we previously ran all this queries so I'm just copy pasting just to

queries so I'm just copy pasting just to save time

save time other than these two we need to have SQL

other than these two we need to have SQL result and

result and answer as a parameter now why do we need

answer as a parameter now why do we need this well just hold on we will uh see

this well just hold on we will uh see this later this is the syntax that the

this later this is the syntax that the default Lang chain SQL prompt is using

default Lang chain SQL prompt is using therefore we are using the same syntax

therefore we are using the same syntax I'll show you a little later and this

I'll show you a little later and this first answer if you remember we stored

first answer if you remember we stored that into qns1 Okay so one is nothing

that into qns1 Okay so one is nothing but it is this 59 okay so that's what we

but it is this 59 okay so that's what we are having so we put we take all these

are having so we put we take all these samples and put them into this single

samples and put them into this single array and once we have this thing ready

array and once we have this thing ready the second thing is we use hugging phase

the second thing is we use hugging phase for generating embedding and for that

for generating embedding and for that we'll

we'll use uh we'll import the hugging face

use uh we'll import the hugging face embedding class and we are going to use

embedding class and we are going to use this particular embedding now folks

this particular embedding now folks there are so many different ways to

there are so many different ways to generate these embeddings I tried this

generate these embeddings I tried this particular embedding it was working fine

particular embedding it was working fine so that's what why I'm using this you

so that's what why I'm using this you can even try open embedding if you're

can even try open embedding if you're ready to pay the price and there are

ready to pay the price and there are instructor embedding uh in the other

instructor embedding uh in the other project that we did for at Tech domain

project that we did for at Tech domain we used instructor embedding so you can

we used instructor embedding so you can use whatever embed can solve your need

use whatever embed can solve your need and this will be stored in this

and this will be stored in this particular

particular variable and let me just you know we can

variable and let me just you know we can say embed query what embedding will do

say embed query what embedding will do is you can type any query and it will

is you can type any query and it will generate an embedding which is which is

generate an embedding which is which is just an array okay so let me save it

just an array okay so let me save it here and the example you can use is okay

here and the example you can use is okay let's say we generate embedding for this

let's say we generate embedding for this particular sentence so this e will be a

particular sentence so this e will be a list of size

list of size 384 and when you look at these numbers

384 and when you look at these numbers they don't actually make sense but they

they don't actually make sense but they capture the meaning of this particular

capture the meaning of this particular sentence in a right way so that if

sentence in a right way so that if someone types A different query I mean

someone types A different query I mean the query is like similar to this but

the query is like similar to this but the words are different even then the

the words are different even then the embedding of that em and embedding of

embedding of that em and embedding of this query will be similar in terms of

this query will be similar in terms of cosine similarity uh of course so I'm

cosine similarity uh of course so I'm going to uh remove this and I'm going to

going to uh remove this and I'm going to now uh create a vector database and for

now uh create a vector database and for that we need to create a blob of all

that we need to create a blob of all these sentences okay so let's say I have

these sentences okay so let's say I have this sentence what I need to do is I

this sentence what I need to do is I need to

need to remove all these keys because they are

remove all these keys because they are really not needed okay so I remove all

really not needed okay so I remove all these keys and then I kind

these keys and then I kind of merge these strings together so see I

of merge these strings together so see I will merge all these strings

will merge all these strings together and I will generate a single

together and I will generate a single big string with some space in between

big string with some space in between like this and this Q ands whatever that

like this and this Q ands whatever that answer I think it was 51 59 whatever

answer I think it was 51 59 whatever right

right so I want to generate this kind of blob

so I want to generate this kind of blob and this text block I will vectorize and

and this text block I will vectorize and store it in my database now to do that I

store it in my database now to do that I can use list comprehension I can say

can use list comprehension I can say uh for example in few shorts so this is

uh for example in few shorts so this is the array right like few shorts this is

the array right like few shorts this is the

the array I can

array I can say

um example do values so I'm interested only in the values these values not the

only in the values these values not the keys so when I do that uh I will get

keys so when I do that uh I will get this kind

this kind of uh a list and each of these elements

of uh a list and each of these elements are dict values I want to generate a

are dict values I want to generate a string out of it and how do you generate

string out of it and how do you generate a string from a list well if you know

a string from a list well if you know python you can do join of

python you can do join of this and I will say

this and I will say two vectorizes

two vectorizes this

this see it generated this uh list and if you

see it generated this uh list and if you look at the first element it is simply

look at the first element it is simply taking all these four values and and

taking all these four values and and making one one big string out of it okay

making one one big string out of it okay now

now let's create a vector database for which

let's create a vector database for which we are going to

we are going to import chroma chroma is the vector

import chroma chroma is the vector database that we are using in this

database that we are using in this project and then from chroma we can say

project and then from chroma we can say from text where you

from text where you supply the text okay the the array of

supply the text okay the the array of text and the second parameter is

text and the second parameter is embedding okay so embeding is equal to

embedding okay so embeding is equal to embeddings and the last parameter is the

embeddings and the last parameter is the metadata which is few short so the

metadata which is few short so the entire few short array that we have we

entire few short array that we have we are giving it to as a metadata you can

are giving it to as a metadata you can go ahead and read the documentation but

go ahead and read the documentation but the essence of this statement is that

the essence of this statement is that this is how you generate a vector store

this is how you generate a vector store so this Vector store is this Vector

so this Vector store is this Vector store it's already created and the job

store it's already created and the job of vector store is to take an input

of vector store is to take an input question so let's say if I have an input

question so let's say if I have an input question like this it will convert that

question like this it will convert that into embedding and it will pull you the

into embedding and it will pull you the similar looking few short example so

similar looking few short example so let's try that and uh to see how that

let's try that and uh to see how that thing works so for that similarity

thing works so for that similarity matching you need to import another

matching you need to import another class called semantic similarity example

class called semantic similarity example selector and in that you will pass two

selector and in that you will pass two parameters so first first thing is

parameters so first first thing is obviously you need Vector store so you

obviously you need Vector store so you will say Vector store is equal to Vector

will say Vector store is equal to Vector store and then K is equal to 2 which

store and then K is equal to 2 which means

means pull me two similar example K can be 1 2

pull me two similar example K can be 1 2 3 I mean if you want three example just

3 I mean if you want three example just say three this I have stored in example

say three this I have stored in example selector variable and you can say select

selector variable and you can say select examples okay so you can give a a a

examples okay so you can give a a a sentence okay so you can give a sentence

sentence okay so you can give a sentence like this

like this here and you can say can you pull me

here and you can say can you pull me similar looking things from this and see

similar looking things from this and see the similar looking question is how many

the similar looking question is how many many t-shirt do we have left so just

many t-shirt do we have left so just read this two statement okay this and

read this two statement okay this and this they look similar and the second

this they look similar and the second based match is this this is not exactly

based match is this this is not exactly matching but this is like a second based

matching but this is like a second based match that you can get okay so this

match that you can get okay so this mechanism that you give input sentence

mechanism that you give input sentence to Vector database and you can pull

to Vector database and you can pull similar looking queries see if you can

similar looking queries see if you can pull similar looking queries then my llm

pull similar looking queries then my llm can look into those and from those

can look into those and from those queries it can learn and it can produce

queries it can learn and it can produce a good result all right now if you

a good result all right now if you remember we already discussed giving a

remember we already discussed giving a custom prompt to our llm because our LM

custom prompt to our llm because our LM is making mistakes such as discount

is making mistakes such as discount table doesn't have a start date it is

table doesn't have a start date it is still using start date in my my SQL

still using start date in my my SQL query so I want to have a custom MySQL

query so I want to have a custom MySQL prompt saying that only use

prompt saying that only use database table

database table columns right do not just make things up

columns right do not just make things up so I want to give some instructions so

so I want to give some instructions so that it doesn't make a mistake now I

that it doesn't make a mistake now I have to write that SQL prompt on my own

have to write that SQL prompt on my own but the good news is that Lang chain

but the good news is that Lang chain already provides this prompt to you you

already provides this prompt to you you can import that

can import that prompt by doing this and if you print

prompt by doing this and if you print that prompt let's see how it

looks see you are my SQL expert given the question create my SQL query first

the question create my SQL query first never query for all columns I don't want

never query for all columns I don't want to say select star I want to say select

to say select star I want to say select XY specific colums you must query Only

XY specific colums you must query Only The Columns that are needed to

The Columns that are needed to answer pay attention to use only the

answer pay attention to use only the columns that you see in the tables below

columns that you see in the tables below see this is important we are saying we

see this is important we are saying we are going to give you the table info and

are going to give you the table info and only use see table info is this

only use see table info is this folks this table info that you printed

folks this table info that you printed use the columns only from these tables

use the columns only from these tables okay that's what we are seeing also if

okay that's what we are seeing also if you're talking about any date uh used

you're talking about any date uh used current date for the today so we giving

current date for the today so we giving lot of useful

lot of useful instructions uh and then we are forming

instructions uh and then we are forming a query okay if you look at the prefix

a query okay if you look at the prefix so let me print that one also I think

so let me print that one also I think suffix so suffix is like this okay So

suffix so suffix is like this okay So eventually what we'll do is see we'll

eventually what we'll do is see we'll take our prefix so prefix is this we'll

take our prefix so prefix is this we'll take our suffix suffix is this and our

take our suffix suffix is this and our actual query will come in between so

actual query will come in between so here if you look at our prefix see

here if you look at our prefix see prefix has this this format question SQL

prefix has this this format question SQL quer equal result answer and that is the

quer equal result answer and that is the format we have used here see look at

format we have used here see look at these four elements that's exactly the

these four elements that's exactly the format that we are using so now let's

format that we are using so now let's think about the query the the query in

think about the query the the query in the middle okay will go in the middle

the middle okay will go in the middle for this we need to import a prompt

for this we need to import a prompt template okay and then that prompt

template okay and then that prompt template folks just to save again time

template folks just to save again time I'm just copy pasting things it will

I'm just copy pasting things it will have question SQL query SQL result

have question SQL query SQL result answer and the template will be

answer and the template will be something like this okay so what happens

something like this okay so what happens is actually when you actually type in a

is actually when you actually type in a query

query um this this this query will have that

um this this this query will have that format okay question will go here the

format okay question will go here the actual question that you're generating

actual question that you're generating the the SQL query and so on okay it's I

the the SQL query and so on okay it's I think intuitive if you have seen my

think intuitive if you have seen my previous videos you will you will get

previous videos you will you will get some idea now comes the time to create

some idea now comes the time to create our few short prompt template okay and

our few short prompt template okay and in this few short prompt template we

in this few short prompt template we will pass bunch of parameters the first

will pass bunch of parameters the first one is obviously the example selector

one is obviously the example selector that we have created see this is the

that we have created see this is the example

example selector so if you do this you will

selector so if you do this you will establish the association between your

establish the association between your LM and Vector database you will say hey

LM and Vector database you will say hey llm if you're confused look into this

llm if you're confused look into this Factor database okay so that is what we

Factor database okay so that is what we are doing here the second one is the

are doing here the second one is the example prompt that we have created and

example prompt that we have created and then the next two are the prefix and

then the next two are the prefix and suffix so now

suffix so now using these three it will generate this

using these three it will generate this kind of single prompt that you can pass

kind of single prompt that you can pass to your Google Palm llm and the last

to your Google Palm llm and the last parameter is the input variable okay so

parameter is the input variable okay so in the input variable you'll see things

in the input variable you'll see things like table info so table info is this

like table info so table info is this this is the table info okay and if you

this is the table info okay and if you look at our query see that is a table

look at our query see that is a table info so here actually wherever you see

info so here actually wherever you see this this bracket here you will actually

this this bracket here you will actually put that table info okay so you will put

put that table info okay so you will put all of this so your actual prompt will

all of this so your actual prompt will be a little bigger you will say this

be a little bigger you will say this see now you're saying that um use the

see now you're saying that um use the table info so here you I think you say

table info so here you I think you say somewhere right info

somewhere right info info okay see pay attention to use only

info okay see pay attention to use only the column names you can see in the

the column names you can see in the tables below so which tables see only

tables below so which tables see only use following tables this so this type

use following tables this so this type of big prompt will be formed when you

of big prompt will be formed when you write this particular this particular

write this particular this particular few short prompt template we are going

few short prompt template we are going going to save it in a variable

going to save it in a variable here okay and then we are creating the

here okay and then we are creating the same chain see we created this this this

same chain see we created this this this chain before okay if you if you look at

chain before okay if you if you look at this code remember we created this

this code remember we created this object so it's exactly same okay we are

object so it's exactly same okay we are doing that here but now we need to add

doing that here but now we need to add one more parameter which is prompt this

one more parameter which is prompt this is the only additional thing we are

is the only additional thing we are passing so that whenever it is confused

passing so that whenever it is confused it uses that new information and now

it uses that new information and now let's give those queries which were

let's give those queries which were failing so how many white color lais

failing so how many white color lais t-shirts we have okay if you remember it

t-shirts we have okay if you remember it was not using sum before now it is using

was not using sum before now it is using sum so see it it worked second query is

sum so see it it worked second query is how much is the price of the inventory

how much is the price of the inventory for all small size t-shirts and

for all small size t-shirts and previously it was not multiplying it

previously it was not multiplying it with uh this quantity so let me just

with uh this quantity so let me just show you this one so if you look at the

show you this one so if you look at the previous query

previous query see it was saying 215 because it was not

see it was saying 215 because it was not multiplying it with the stock quantity

multiplying it with the stock quantity now it is doing that see it is producing

now it is doing that see it is producing the right answer and it's not like see

the right answer and it's not like see you can give little similar query so I

you can give little similar query so I will say how much is the price

will say how much is the price of all the extra small size

of all the extra small size t-shirts this is little different and it

t-shirts this is little different and it will work uh it will

will work uh it will see it will say size is

see it will say size is excess that way so let's try the most

excess that way so let's try the most difficult ones one that we had which was

difficult ones one that we had which was that Lees so instead of Lees I'm saying

that Lees so instead of Lees I'm saying Nikes and I'm saying after discount how

Nikes and I'm saying after discount how much revenue will it generate see it

much revenue will it generate see it worked it said brand is Nike this is

worked it said brand is Nike this is this you can you can change this a

this you can you can change this a little bit you know when you change the

little bit you know when you change the language it is still doing that semantic

language it is still doing that semantic search you know when we when we did this

search you know when we when we did this semantic

semantic uh similarity example selector it is

uh similarity example selector it is doing semantic search therefore you're

doing semantic search therefore you're not passing the exact query which you

not passing the exact query which you passed in your few shot you can pass

passed in your few shot you can pass little different queries as well so

little different queries as well so folks try different queries and uh it is

folks try different queries and uh it is possible that for some queries it may

possible that for some queries it may not work in that case you will take that

not work in that case you will take that query and the right SQL query and then

query and the right SQL query and then you will add it to your few short

you will add it to your few short example right now I have five but if you

example right now I have five but if you want to make all kind of queries work

want to make all kind of queries work you might have 40 or 50 different type

you might have 40 or 50 different type of few short examples okay so wherever

of few short examples okay so wherever it is failing take a question take a

it is failing take a question take a write SQL query and add it to few short

write SQL query and add it to few short example and after that it will not make

example and after that it will not make a mistake all right we are all set to

a mistake all right we are all set to put all these things together and write

put all these things together and write a streamlit UI which you will see is

a streamlit UI which you will see is only few line of code so we are almost

only few line of code so we are almost there folks please stay on you have come

there folks please stay on you have come so far I know it requires a lot of

so far I know it requires a lot of patience but learning llm is amazing for

patience but learning llm is amazing for your career now we will write the code

your career now we will write the code for our project I have created at

for our project I have created at project folder here and here you will

project folder here and here you will find two files requirement txt and

find two files requirement txt and t-shirt sales Jupiter notebook from here

t-shirt sales Jupiter notebook from here I'm going to launch pyam community

I'm going to launch pyam community Edition which is a free editor for

Edition which is a free editor for Python and there you can open that

Python and there you can open that particular folder so I will go here and

particular folder so I will go here and in the C code directory I will open at

in the C code directory I will open at Le T project

Le T project folder like that and we'll not create

folder like that and we'll not create any virtual environment so I'll just say

any virtual environment so I'll just say cancel so there is no virtual

cancel so there is no virtual environment and let me pull uh this

environment and let me pull uh this window right here the first file we are

window right here the first file we are going to create is

going to create is main.py okay and in this file we are

main.py okay and in this file we are going to create our llm object so now

going to create our llm object so now now I will copy paste the code from our

now I will copy paste the code from our jup jupyter notebook to here we use

jup jupyter notebook to here we use jupyter notebook for all our

jupyter notebook for all our experimentation and this is what data

experimentation and this is what data scientists do in uh when they're working

scientists do in uh when they're working for companies they will do some

for companies they will do some experimentation in the notebook and when

experimentation in the notebook and when they feel the code is ready they will

they feel the code is ready they will try to productionize it and they will

try to productionize it and they will move it to a proper python file

move it to a proper python file structure or a project structure so

structure or a project structure so let's

let's import all those uh libraries so I'm I'm

import all those uh libraries so I'm I'm just going to copy paste all the

just going to copy paste all the libraries that we imported in our

libraries that we imported in our jupyter notebook you can configure your

jupyter notebook you can configure your python here and the first thing if you

python here and the first thing if you remember we were doing was creating a

remember we were doing was creating a Google Palm object okay so here I'm

Google Palm object okay so here I'm creating Google Palm object and we need

creating Google Palm object and we need to give Google API key now in production

to give Google API key now in production code you don't hard code your key here

code you don't hard code your key here the standard practice is to create

the standard practice is to create environmental file so do EnV and you

environmental file so do EnV and you will keep your key here okay so this is

will keep your key here okay so this is the key I have kept here and how do I

the key I have kept here and how do I get this key from here to main. Pi

get this key from here to main. Pi well we

well we use this special python module called

use this special python module called environment from there we will import

environment from there we will import this method do

this method do load this one and when you execute this

load this one and when you execute this method it will specifically look for EnV

method it will specifically look for EnV file and it will read the content and it

file and it will read the content and it will set this as an environment variable

will set this as an environment variable so this key will be environment variable

so this key will be environment variable and this will be the value okay so after

and this will be the value okay so after this line it has set the environment

this line it has set the environment variable now how do I get the value of

variable now how do I get the value of that environment well you need OS module

that environment well you need OS module so you will say OS do environment and in

so you will say OS do environment and in that the variable that we want is this

that the variable that we want is this particular thing okay and temperature is

particular thing okay and temperature is 0.1 I I will not keep creativity very

0.1 I I will not keep creativity very high otherwise it will start bluffing

high otherwise it will start bluffing okay and once llm object is created the

okay and once llm object is created the next one is obviously the DB

next one is obviously the DB object and the third one is uh our

object and the third one is uh our embedding okay and what I'm thinking is

embedding okay and what I'm thinking is I will create uh a function which will

I will create uh a function which will encapsulate all this code so let me put

encapsulate all this code so let me put all this thing in a function

all this thing in a function here all

here all right and here I'm going to copy paste

right and here I'm going to copy paste all those things so if you remember we

all those things so if you remember we had embeddings we had um our Vector

had embeddings we had um our Vector database and we have few short so few

database and we have few short so few short was an array and I would like to

short was an array and I would like to put that array in a separate file so I

put that array in a separate file so I will call it few

will call it few shorts and this file will contain all of

shorts and this file will contain all of this now answer I have hardcoded folks

this now answer I have hardcoded folks because here we are giving this as an

because here we are giving this as an example to our L that's see this is how

example to our L that's see this is how this is the format of your answer the

this is the format of your answer the exit answer it will get by executing

exit answer it will get by executing this query so so that's something you

this query so so that's something you need to keep in mind and you can import

need to keep in mind and you can import that thing here you can say from few

that thing here you can say from few shots import

shots import few shots okay so it will not give an

few shots okay so it will not give an error now after example selector I just

error now after example selector I just copy paste all the

copy paste all the code so you're creating a same exact

code so you're creating a same exact same SQL database chain and returning it

same SQL database chain and returning it here and we will create a main F python

here and we will create a main F python function we'll

function we'll say if it is main this is how you create

say if it is main this is how you create a main function in Python by the way and

a main function in Python by the way and you will get this chain and then you

you will get this chain and then you will run this chain

will run this chain okay you

okay you will say whatever is your query and you

will say whatever is your query and you will put in the result okay so what is

will put in the result okay so what is my query let's just give some sample

my query let's just give some sample query to test when you're doing this

query to test when you're doing this type of coding it makes sense that you

type of coding it makes sense that you write some code and then test it you

write some code and then test it you write some code and test it so here I

write some code and test it so here I will just run this and see it gave the

will just run this and see it gave the answer and if you look at the query the

answer and if you look at the query the query seems to be right you can try

query seems to be right you can try different queries here uh but let's say

different queries here uh but let's say this is working now we are ready to

this is working now we are ready to write our streamlit code for streamlit I

write our streamlit code for streamlit I would like to keep the UI code in

would like to keep the UI code in main.py and I would move all the Lang

main.py and I would move all the Lang chain code in a separate file I'll call

chain code in a separate file I'll call it uh Lang chain helper

it uh Lang chain helper maybe and let me just crl a xrl v you

maybe and let me just crl a xrl v you know contrl c contrl v is is the most

know contrl c contrl v is is the most powerful weapon for all programmers and

powerful weapon for all programmers and then

then here I will import that method now let's

here I will import that method now let's do streamlit coding you'll say import

do streamlit coding you'll say import streamlet as St st. title what is my

streamlet as St st. title what is my title well my title is

this and you'll have input box right so text input box you will type a question

text input box you will type a question here and that question you will get

here and that question you will get here and then if question meaning if

here and then if question meaning if someone types in question and they hit

someone types in question and they hit enter the code flow will come here first

enter the code flow will come here first let's uh teste this uh be skeleton so

let's uh teste this uh be skeleton so here in the terminal you can run

here in the terminal you can run streamlet run main.py and it will launch

streamlet run main.py and it will launch the UI in a browser so see my UI looks

the UI in a browser so see my UI looks good when I type a question hit

good when I type a question hit enter it will take that question in this

enter it will take that question in this question variable and the flow will come

question variable and the flow will come here so here what we need to do here we

here so here what we need to do here we need to obviously first get a

need to obviously first get a chain and then we'll say chain. run this

chain and then we'll say chain. run this is my

is my question and you get the answer and then

question and you get the answer and then St do header so you will put another

St do header so you will put another element on the UI and you will say St do

element on the UI and you will say St do write your

write your answer the good thing about streamlet is

answer the good thing about streamlet is that you don't have to reun it from here

that you don't have to reun it from here you can go here and just click on rerun

you can go here and just click on rerun now let's ask those questions so I will

now let's ask those questions so I will hit enter here see hooray

hit enter here see hooray 3083 let's ask a different

3083 let's ask a different question and by the way if you want to

question and by the way if you want to see a correspond query since we have set

see a correspond query since we have set waros parameter to be true let's say you

waros parameter to be true let's say you get 59 answer you're not sure if it's

get 59 answer you're not sure if it's right or wrong you can either go to my

right or wrong you can either go to my SQL run the query or you can look at the

SQL run the query or you can look at the query here

query here see how many t-shirts do we have left

see how many t-shirts do we have left for Nike and exess and white see Nike

for Nike and exess and white see Nike exess white sum of stock quantity this

exess white sum of stock quantity this is

is perfect and then um yeah so folks you

perfect and then um yeah so folks you can ask uh different questions and get

can ask uh different questions and get your answers now this tool will be very

your answers now this tool will be very useful to our store manager Tony Sharma

useful to our store manager Tony Sharma because he will be able to ask questions

because he will be able to ask questions directly and get answers on most of the

directly and get answers on most of the questions that's it folks we are done

questions that's it folks we are done with this gen AI mini course we also

with this gen AI mini course we also finished two end to end projects which

finished two end to end projects which is something you can add in your resume

is something you can add in your resume if you like this video please give it a

if you like this video please give it a thumbs up and share it with your friends

thumbs up and share it with your friends who wants to learn gen AI if you have

who wants to learn gen AI if you have any questions there is a comment box

any questions there is a comment box below

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Gen AI Course | Gen AI Tutorial For Beginners