Fine-tuning is an alternative technique to RAG for enhancing Large Language Models (LLMs) by enabling them to absorb larger contexts, adopt specific styles, or gain domain-specific knowledge, often with more precision than prompt-based methods.
Mind Map
클릭해서 펼치기
클릭해서 인터랙티브 마인드맵 전체 보기
whereas rag gives you one way to give
additional information to a Lun language
model there's another technique called
fine-tuning which is another way to give
it more information in particular if you
have context that is bigger that can fit
into the input length for the input
context window length for the LM then
fine tuning gives you another way to get
an LM to absorb this information and
fine tuning also turns out to be useful
for getting the LM output text in a
certain in given style but this actual
implementation is a bit harder than rag
let's take a look let's say you have an
LM trained the way that we had described
previously with sentence found on the
internet like my favorite food is a
bagel with cream cheese then it may have
learned from hundreds of billions of
words or maybe more than a trillion
words to predict the next word like this
an El like this will have learned to
generate text that sounds like what's on
the internet and this process of
training a large language model on a lot
of data is often called pre-training now
let's say I want to modify the LM to
have a relentlessly positive and
optimistic attitude about everything
there's a technique called fine-tuning
that we can use to cause the LM to do a
little bit more learning to change its
outputs to be in this example much more
positive and optimistic to fine tune the
LM we would come up with a set of
sentences a set of texts that takes on a
positive optimistic attitude such as
what a wonderful chocolate click or the
novel was thrilling given text like this
you can then create an additional data
set using what a wonderful chocolate
cake you would have given what next word
it will try to predict a what a next
word is wonderful what a wonderful
chocolate and so one and it turns out
that if you take an LM that has been
pre-trained on hundreds of billions of
words and fine-tune it on just an
additional say 10,000 words or more
could be 100,000 words if you have more
data or even a million words if even
more data F tuning to this relatively
modest Siz data set can shift the output
of your LM to take on this positive
optimistic attitude now maybe shifting
an LM to have a relentlessly positive
attitude isn't that helpful an
application but fine-tuning is used in
many real applications one class of
applications that fine tuning is useful
is when the task isn't easy to Define in
a prom for example if you want to use an
L to summarize customer service calls a
generic om May locally call like this
and summarize it to say the customer
tells the agent about a problem with a
monitor but if you run a customer call
center you might want it to generate
specifics of about what the conversation
was about it was about the MK 4127 KX
reported broken by customer
542 and so on and if you create a data
set with maybe just hundreds of examples
of human expert written summaries and
have a large language model that's
learned from hundreds of billions of
words on the internet so it's learned a
lot of general knowledge on the internet
but if you additionally fine tune it on
maybe just hundred of carefully
handwritten summaries of this specific
style then that would shift the L's
ability to write summaries in the style
that you want and the specific style of
summary is actually not that easy to
Define in a text prompt maybe you could
do it but fine tuning would just be a
very precise way to tell the Elum what
summaries you want another example of
when a task isn't easy to Define in a
prompt is if you want to mimic a
specific writing or speaking style so
Tommy Nelson who's been working with me
on this course actually tried kind of
just for fun to get an LM to sound like
me but it turns out that the way most
individuals sound is not that easy to
describe in a prompt I mean how would
you give someone clear instructions to
sound like me so if you were to prompt a
general prosum and ask it to sound like
me you get texts like this which I don't
think it sounds that much like me but if
were to take a lot of transcripts of the
way I actually talk and have an OM be
fine-tuned to train it to really sound
exactly like me by learning on my actual
words then asking it to write something
that sounds like me results in text like
this which I don't know this sounds more
like how I would talk but because
mimicking a specific writing or speaking
style is very difficult to do VI
prompting because just difficult to
describe a specific person's Style by
writing text instructions fine tuning
turns out to be a more effective way to
get an alarm to speak in a certain style
and if you're building an artificial
character maybe a cartoon character fine
tuning could also be a way to get an Al
to speak in a certain style other than
Ts that AR easy to Define in the prompt
a second broad class of applications of
fine tuning is to help the um gain a
domain of knowledge for example if you
want an OM to be able to read and
process medical notes this is what a
medical note written about a patient by
a doctor might look like and this is
really not normal English PT is patient
Co complaining of s so shortness of
breath doe dis near on exertion PE this
is the results of the physical
examination and so on treatment is the
follow up with the primary care
physician stat chess x-ray continuing
treatment as needed on oxygen but this
is really not normal English and if you
were to take an LM trained on normal
English it wouldn't be very good at
processing text like this so if you were
to find T LM on a collection of medical
records then the LM could get much
better at absorbing this body of
knowledge about what medical notes sound
like and you could then use that to
build other appications on top of it to
better understand medical records or
legal documents here's a piece of legal
Le kind of written by lawyers for
lawyers that's really difficult for non-
lawyers to read license GRS licy Pro
section 2 A3 and non-exclusive right and
so on and so on within 15 days hereof I
don't know about you I do not use the
word he of in my ordinary day-to-day
speech but this is what legal documents
sound like and if if you want your LM to
gain a body of knowledge about how to
read and understand legal documents then
take an LM and fine-tuning it to legal
documents would help it to gain that
body of knowledge and similarly
financial documents too fine-tuning and
LM on a large set of financial documents
would help it to better gain that body
of knowledge about finance and make it
better at applications involving
processing documents that look like this
finally another reason to find t om is
to get a smaller model to perform a task
that may previously have required a
larger model we'll discuss later this
week some of the pros and cons of
choosing a larger versus a smaller model
but for some applications that need a
lot of knowledge or need complex
reasoning you might use a relatively
large model say with over 100 billion
parameters but if you were to use a
model like that such a model may have
relatively High latency meaning after
you prompted you might need to wait a
while to get back a response and if you
were deploying this on your own
computers it could be quite costly and
even though we said in the earlier video
that these models aren't that expensive
maybe want it to be even cheaper and
that's because a 100 billion paramet
model may take specialized computers
such as a GPU server or other really
fast computers to run you probably have
a hard time running such a large model
on a normal laptop or PC and certainly
not on a smartphone today but if you can
get your application to work on a much
smaller model say 1 billion parameters
then that's the range of model size that
they would run much more easily on a
laptop or a PC or on a mobile phone so
for example if what you want is to
classify restaurant reviews as positive
or negative sentiment this is a simple
enough task that you probably don't need
a 100 or 200 billion parameter model to
run but maybe a 1 billion parameter
model would be just fine maybe even smaller
smaller
frankly but these smaller models aren't
as smart or not as they aren't as good
as a really large models which is why if
you were to take a small model and then
fine-tune it on the data set like the
one shown here not just three examples
but maybe a few hundred or maybe a
thousand examples if you have that much
data then you can get a small model say
a billion parameters to do really well
on a task like this so to summarize
fine-tuning gives you another technique
in addition to rag to help improve the
capabilities of an LM you might use it
for tasks that are hard to specify in a
prompt such as if you wanted to Output
text in a style or if you want the to
gain a body of knowledge such as about
medical Nots or if you want to get a
smaller and cheaper to run L to do a
task that might otherwise have required
a larger
L it turns out that Rag and fine tuning
are both relatively cheap to implement
rag just is modifications of your prompt
and fine-tuning you might be able get
started with tens of dollars or maybe
low hundreds of dollars
depending on how much data you want to
find tune on there's another technique
pre-training your own model that turns
out to be very expensive and today
almost no one other than reasonably
large companies usually tech companies
are attempting this but for completeness
let's take a look at the next video at
텍스트나 타임스탬프를 클릭하면 동영상의 해당 장면으로 바로 이동합니다
공유:
대부분의 자막은 5초 이내에 준비됩니다
원클릭 복사125개 이상의 언어내용 검색타임스탬프로 이동
YouTube URL 붙여넣기
YouTube 동영상 링크를 입력하면 전체 자막을 가져옵니다
자막 추출 양식
대부분의 자막은 5초 이내에 준비됩니다
Chrome 확장 프로그램 설치
YouTube를 떠나지 않고 자막을 즉시 가져오세요. Chrome 확장 프로그램을 설치하면 동영상 시청 페이지에서 바로 자막에 원클릭으로 접근할 수 있습니다.