YouTube Transcript:
Visiting Professor Lecture: Prof. Tsunglin Liu - Challenges and Fun in Exploring Biological Big Data
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
the foods that this is milkfish, mango,
mango,
sliced beef. Yeah. Very nice. Wonderful.
And I I think you all know this.
Yeah. Yeah. This is the great invention
of Taiwan.
>> All right. So uh uh
we I do I I I really feel quite uh
familiar with you because uh uh in our
university and even in our college we
have lots of Indonesian students. Yeah.
Example uh among our 194 uh undergrad
students 30 36% of them are the national
students. Oh,
>> and many most are from Malaysia, but we
do have lots of students from Indonesia,
right? We had 26 Indonesia students in
our college and you can see they have
been from here uh they went to hot pot.
Okay. And they have a very nice
tradition which is uh the second year
Indonesian student will treat the first
year Indonesian student. Got it? So if
you ever go Taiwan especially they
should go in university you'll be
treated by your senior students and
these are some of our graduate students
we have six Indonesian uh uh students uh
studying a master and PhD
and uh actually uh this is uh one of my
uh college students uh he's now doing
the special topics in my life uh is
quite talented uh uh we have some
internship program. So he did his
internship uh in Synica which is the
highest uh institute research institute
Taiwan and he got us second place. Yeah
it's very talented student. So I very
good impression of you. Yeah. And uh and
he he now got the exchange student
opportunity. is going to uh I think uh
Europe for the semester. Yeah. Only
sporting supported by our college. Okay.
Yeah. So we do have some uh exchange
student opportunities between KU and
many other universities uh universities
all over the world. So if you ever uh
visited NCKU or uh study NCKUs, you
you'll have chance yeah to move on to
other countries.
So um all right. So uh yeah, you will
come to Taiwan, but you don't come to
Taiwan just for the food, right? You
want to study, you want to gain some
skills. And so this picture shows uh
summarize what we are doing in our
college. So you can see that we have
professors studying tomato. Yeah. She
studing the sugar transport mechanism uh
in to
uh can you guess what this is?
>> Algae. Okay. Yeah. So we also study
micro algae. Yeah. And one of the
applications very quite interesting.
Have you ever heard of DHA? DHA
DHA
that kind of fatty acids. DHA very good
supplements. Yeah. Very good for your brain.
brain.
>> That's why you don't have
have
>> Yeah. So this this this particular
microaggress is very amazing. uh it can
store up to 69% of its body weight of
DHA and okay so uh so we are trying to
coding the genome and trying to see
what's going on how it is able to
tolerate such a high concentration of DH
yeah in his in its B life
right so we have some other uh research
topics for example the strength Okay.
So, one of our professor is studying
shrimp is uh she's uh studying the white
spot virus
in the shrimp. Okay. So, she's studying
the mechanism of uh this uh white virus infection.
infection.
Okay. So, after lots of work to figure
out the fundamental mechanism this
bacteria uh this virus infection after
that figure out a way to prevent the
infection. Okay. And so, uh, she's able
to raise, uh, the shrimp to a very large
size. Okay. This big. Okay. This big.
And if you go to NCKU, uh, you can buy
the shrimp from our souvenir shop.
Just put it in the oven. Yeah. Just
steam it little bit. Yeah. And then you
can enjoy Yeah. the research findings. Okay.
Okay.
So I trying to uh translate uh
fundamental fundamental research uh to
industry. Example uh our team is working
on information study. Uh so he found a
compound that can reduce the information.
information.
So uh so he just thought that oh perhaps
I can mix the ingredients to the
cosmetic products.
Okay. Uh for example, if you are young,
I think most of you are young, maybe you
have pimples on your face. Uh it's kind
of a information condition, right? So
you can try the cosmetics lotions with
anti-inflammation function. That would
be nice. Great. Right. So again uh you
can if you come to NCKU you can find
All right. So uh we have many different
research topics. Uh actually most of the
topics are stud also studied by your
faculty members here. So uh you are in a
And um uh a little bit self introduction
uh I actually never study biology. Yeah.
In my high school. Okay. Because at that
time uh we are allowed to skip biology.
We are not going to uh medical schools.
Okay. If you want to go to engineering
schools like physics, yeah like uh
chemical engineering, some electronic
engineering then we don't need biology.
So at that time I was lazy. Yeah. So one
subject less then it's easier. So I
didn't study biology and I did my
bachelor degree in physics and also my
PhD in physics. Okay. So uh this is my
uh business topic of my PhD is
statistical basics of RNA secondary
structure. Okay. And uh why I picked
this is because uh when I went to Ohio
State University where I where I did my
PhD uh this is my adviser. Yeah. and
came and gave gave a lecture about this
hardness secondary structure and and
showed me that oh this hardness
secondary structure if you want to
describe the properties this structure
you can use a formula yeah to describe
it yeah I was like oh most of your
physics should love formula and even
surprising that I can use a formula to
describe bi biological molecule. Wow, it
was like fantastic to me. Uh so I ended
up uh studying in his lab. Uh and I love
this formula a lot.
>> Yeah. Yeah. And uh after I graduated uh
so uh I was looking for post. Okay. So
guess I was studying this RNA secondary
structure. I thought oh yes 82 bio
physics. I still want to find a job in
physics uh field of physics but related
to biology. So this professor Boris
Shriman is a physicist in UC Santa
Barbara. He's doing bio bio physics and
I thought that oh it's great. Yeah this
is my target I'm trying I try to call
him and then uh so we chat a little bit
uh in the form blah blah blah and then
he said okay you can come. Okay. I went
to use this in the barber. I imagine
myself uh to stay in the office uh with
black box and uh people writing chops.
Yeah, black box.
>> I I just learned that you are still
writing on here.
I love this kind of stuff a lot in 360.
Yeah. But now I do this.
I use only power like nowadays. Okay. Uh
but it turns out that when I went to UC
Santa Barbara Yeah. Orish Ryman was
actually looking at postto for his
friends. Yeah. And co with a biologist
neuroscience institute and so I ended up
staying in Kentos lab. Okay. And doing bioinformatics.
bioinformatics.
Okay. Yeah. That's how I started my
bioinformatics career. This is what my
lab how my lab uh look like. Uh so uh
just uh sharing with you like your
career path usually full of surprise
you think that you are going to be a
biologist maybe like like 10 years later
you are AI engineers. Okay.
So uh try to stay open-minded, try to
learn something. Yeah. It's
It's
can be beyond your uh Okay. And then
when I came back to Taiwan um I went to
E academia synag which is the highest
research institute and my advisor told
me hey you should start working on this
soal next generations
okay and since then I was doing uh data
analysis analysis on next generation
sequencing data till now. Okay. Yeah. So
this is my sir. Yeah. He guided me.
Yeah. Through my life. Yeah. And today
I'm going to tell you some story about
this machine. Okay. This is the so-cal
next generation sequencer.
And uh before that I want you to guess
around 2007. This is the first version
of the illuminina sequencing machine
analyzer. So how much do you think it is
at that time? Yeah. >> Yes.
>> Yes.
>> What kind of dollar is it? Like US
dollar or >> US?
>> US?
>> US dollar.
>> How much?
It's almost $1 million. You're right.
It's like8 million.
>> Okay. Yeah. So, this is my machine. Uh
$.8 million around 2010. Okay. Around 15
years ago.
And uh I want to say that uh so uh uh
when I went to NCKU
uh the professor next to my door yeah he
bought this it was very rich.
Oh, at that time I just uh I was
teaching BL's class and then I show my
students, let's go to the professor's
lab. Let's check with this uh uh yeah,
he enjoys life better than me. He just
stays in the air conditioner room with
24° C all year round. Okay. Sitting on a
table like in shaking.
So uh yeah he really he really enjoyed
his life and but like uh uh five years
later yeah
professor next to my door shut down
shut down
and imagine that you spent like nearly
$1 million after five years shut it down
just like throwing this running. Yeah.
No way. Yeah.
And why is that? Why do you think he
shut down
sequencer? Yes.
Don't worry. I still can't candy for you.
you. >> Wait.
>> Wait.
The technology in the machine test is
updated compared to other machine that
has been that has been developed.
so it was outdated. Yeah. And you know
what outdated implies?
Why? Why? When something is outdated
like my iPhone my iPhone is still iPhone X.
X.
I'm still using it. Okay. But why an
outdated machine? Yeah.
Yeah. Yeah. You don't want to use Why?
Why is an outdated machine? If a machine
is out, you don't you don't want to use
it anymore. And that's because uh when
new machines comes out the cost of
sequencing decrease.
Okay. So suppose we need to spend $100
using the original machine and using the
next generation. Uh the next version is
50. The next one's 25. Next one's like
20 50. Yeah, $15.
So, uh that's why professor shut down
and he outsourced the samples. Okay. So,
this one is actually called I prefer to
call it now generation sequences
sequences
and it still can be quite expensive. Not
that expensive as it just came out.
Yeah, still expensive. And So uh this is
the number of sequences in 2015
and this is a world make of uh NGS
people next generate the sequencing
abbreviated as NG
and you can see that uh there are lots
of like these are like three digit numbers
numbers
and uh so for example can you figure out
where it is here. >> Yeah.
>> Yeah.
The test of your knowledge.
knowledge. >> Yeah.
>> Not really. >> England.
>> England.
>> New England.
>> Near M.
>> It's like near Maine.
>> Near M. Uh, >> northeast.
>> northeast.
>> But I'm not sure about this. How about
this? Yeah. One of these two too. Yeah.
>> Okay. New York. It's somewhere near
Harvard and MIT. Okay.
>> So like hundreds hundreds of such sequences
sequences
in their university. Okay. And uh it's
like two big uh hot of a sequencer in China.
China.
>> Yes. Everybody knows. Yeah. Okay. Okay.
Yeah. So, so imagine that one single
machine costs nearly $1 million.
How much does the company earn? Yeah,
but uh if you are somehow lucky enough
to buy the share that share of the
company, you're a rich guy right now. I
wasn't so lucky. Okay. Yeah, I should
buy a stock.
All right. Uh so there is a reason why
these sequences are so popular because
they generate sequences like this. Okay.
So you can see that it's lots of
sequence C A C
and it's
pretty long like 300
and and the amazing thing is that it can
generate like billions of such billions
before NGS do you know how many
sequences before NGS the gold standard
of sequencing is called sers sequence.
Okay. And guess how many sequences you
can get? Yeah. Doing by doing sensor
how many? >> Thousands.
>> Thousands.
fewer 50
50
Just one.
If you do sing the sequence, you get
only one sequence at a time. Okay. Just
one sequence. And this HS it generates
millions million sequences. Yeah. At a
time. Okay. So now you can uh start uh
to appreciate Yeah. It's value. Right.
It's like a million times faster
compared to the old uh standard singular
sequencing. Okay. And so imagine that uh
you have such kind of data. Yeah. NG
sequences like millions of them and uh
so what's next? Once you have this data
ahead, what's next?
Next thing to do. Okay. So the next
thing to do is to extract meaning from
these sequences. Okay. That's why we say
to decode the genome
the genome right. So uh let me give you
some examples of what it means by what I
means by extracting many sequences.
All right. So imagine that you have such
a machine and you are able to do
sequencing uh of the genome of many
different species. Okay. So the big
question will be uh what can we learn
from sequencing one of the organisms or
many organisms in different environments.
environments.
What can we learn from that? So uh let's
make it concrete and try to focus at
human. Okay, imagine that you are
sequencing the human genome. What could
you learn from sequencing the human genome?
genome?
Okay, so now discuss with your neighbors
come up with the potential application
that you've learned. Yeah, from
>> Come up with a um spell for cancer.
I don't know
about cancers. Okay. So what about
cancer? Yeah. If you sequencing uh you
sequence you know what about cancer?
Yeah. Actually this is the right you
know one of the right application and
you if you want to treat uh treat cancer
one way is to the so-called uh
sequencing is to to do sequence let me
jump to the application
okay so this is the uh the field of
application called precision medicine
okay precision medicine and now I want
you to read the figure and let's tell
what it means. Okay. Yeah. Biology
students should be good at the trading
features. Okay. Yeah. Yes.
>> You you you you know that fine.
>> You haven't got the candy yet.
>> To illustrate the picture, right?
>> Yeah. That's right. How do you how do
you explain this picture? What what's
the idea of precision medicine?
>> So, what I can
>> How about you use the mic? Oh, yeah.
From what I can get from precision
medicine, if we can learn uh individual
genomes, hope we can give them uh
medicine that can more effectively treat
their conditions and health problems.
Whereas with traditional medicine, the
one size fits all. Some medicines might
work less for people, some medicines
might more personalized medicine. You
can get something that's tailored for
your needs, thus completing your issue
>> How do you learn this? Where where do
you learn that from?
>> Or you just know it. Yeah. From a >> clue.
>> Usually I I watch YouTube a lot.
>> You watch YouTube?
>> Oh, okay. Watch YouTube.
I also watch content content especially
in field more towards health and
medicine. I do have quite the interest
in that.
>> Oh what kind of contents like the great biological
biological
>> like for example the progress of like
medicines and research upon
like ways of feeding tuberculosis and
>> yeah so you read uh some news. Yeah.
>> Publish the science library of science.
There's a magazine or website or channel
>> or Yeah. YouTube channels.
>> YouTube channels. >> Yes.
>> Yes.
>> Okay. YouTube are your best friend.
>> Yeah. YouTube.
>> All right. So, uh let's uh that's one
very important uh application. So, now
in the clinic
the most applicable uh scenario is
Yeah. And u uh because uh have you ever
heard of targeted therapy?
>> Yeah, targeted therapy is pretty
expensive. Yeah. Uh I know the price.
Yeah. New Taiwan dollar to convert it.
Okay. So how much do you think if you
want to uh take the the targeted therapy
for treating your case, how much you
half million. [Music]
[Music]
country. >> Huh?
>> Huh?
>> It really depends on where you live. >> Yes.
>> Yes.
>> Some country charge you more for.
>> For example, America like they can their
health system. >> Yeah.
>> Yeah.
In Taiwan, you pay around $3,000 US.
US.
>> 3,000. But that's already a lot. >> Yeah.
>> Yeah.
>> Yeah. Most Taiwanese they make like only
like 1.5,000
month because it's like twice their
monthly salary.
>> So targeted therapy is pretty expensive.
So before you take the treatment uh we
need to make sure that it work.
>> So how do they how do people make sure
that targeted therapy will work how
how bad
bad
right? So you sequence your genome and
uh 10 lots of this
and then we try to convert it uh to something.
something.
Yeah, like that. Okay. So uh you'll find
that uh within your genome or actually
within the genome of your cancer cells,
it's not your normal cells, your cancer
cells, right? But at some positions of
the chin there is
and these determines which kind of
targeted therapy you need pre
sequencing is very important right and
also um imagine that uh you uh 10
sequences all right so usually what's
the first thing that you do okay so the
first things that you do is that uh yeah
you want to figure out uh our human genome
genome
and so our human genome it's good to
always know some numbers you know the
size of you know how many bases are
so it's a very huge uh and the All of
our human genome
uh all the genomic settlements are
within a gene and you counted the the
segments of protein coding genes. What
do you think? How what's the percentage
of you know that protein coding genes? >> 15%.
>> 15%. >> 50%
>> 2%. Uh it's a little bit more. Yeah,
>> that is like between two to 5%. Okay.
Yeah. So between two to 5%. So you can
see that uh the gray area here they're
not changed and the protein coding.
Okay. So if you ever learn this fact
yeah do you feel anything?
You have any feeling for this fact?
smaller than expected huh most
>> and then expect it to be less small the
coding region is very small than expected
expected
>> yeah it's very small right so that
>> they are not protein they don't crop
proteins okay so what do they do 5% of
our do they do they junk
>> they are junk
>> This is very interesting.
>> Very interesting. It turned out that
there are some uh so-called non protein
coding gen within the genome. So for
example, small RNA, long enough coding,
many many different kind of protein
coding genes exist in the gen. The
people used to think that genome is full
of junk. But now they started to learn
that yeah some people call them data
data but uh people still to realize the
value meaning of this. So this is
something that we can learn yeah by
sequencing human gene. Okay. And in
order to do that you need you need to
first know the region of protein gene
right? We want to build a function for
and you know what? Yeah, you can do that
by writing computer codes
to write program to predict
where in the genome is a gene for coding product.
product.
Isn't it amazing?
and a program the program is able to
tell yeah whether this part
of protein or not whether it's a protein cigar
amazing yeah amazing to me write a
program and this program is able to
decode uh the meaning
Okay. And then just like I said, uh you
can figure out whether uh your genome
carrier mutation at a certain position
and whether the mutations will affect
your phenotypes, right? So when I say
phenotypes, this is one of the example.
>> Yeah. Have you ever watched this movie
hangover? Okay.
And I think some of you when you drink,
you'll get flushed. Yeah. you get
reduced, right?
>> And that's actually because uh there's a mutation
mutation
your gene uh which is called arco
dehydrogenase and and so if you carry a
certain mutant in your arco
dehydrogenase gene then you get flush
you get left flush easily when you
drink. Okay, you also get drunk more
easily. Right? But this is a important
this is a example of how the knowledge
of human genome can uh tell you what we
can learn but
yeah and uh so now are you interested in
learning whether you carry the intern or not
not
>> right yeah you know nowadays you going
to sequence your genome how much does it cost
cost
The secret is to know your genome to
know the mutants are the mutation variant
100 to 100,000
>> that $1,000 >> $1,000,000
>> It's okay. It's okay. Right.
Right.
So if you if I ask if you want to
sequence your own by spending $1,000.
>> Depends on whether you have money or not.
not.
>> Yes. So I ask my students in Taiwan also
they say no I rather don't know uh my
risk before developing a certificate.
Yeah. Yeah. And uh but if you somehow
develop cancer, you have to do it. Okay.
So, NGS is not applicable uh is not
practice. Yeah. In many hospitals in
Taiwan. Okay. Especially when uh cancer
patients they need to decide whether
they want to take the targeted therapy
or not. Okay. So that's uh yeah one
important area
actually I think uh by the way uh it is
very important to know how much you can drink
online
>> and the the chance of having a party is
much higher
in the party is unavoidable as someone
with encourage you to dream and you
don't want to be quested. Okay. So we
want to know very well how much you can
literally fe students. Okay.
>> And uh so I usually encourage my kids.
Yeah. My students uh they spend one day
staying at home with your parents. Just
drink as much as you can
until you test out and then know how
much you can drink and and that that
knowledge is very important to keep you
safe. Yeah. So yeah, when you have a
holiday going back home. Yeah, give that
a try. Okay, this is very important. Yeah.
Yeah.
>> All right. uh
another very important application of
NGS in human.
Yeah. So anybody can tell. Yeah. And
there story from the figure.
This is uh the topic uh the title is
you recognize
this is >> what?
>> what?
>> Uh it's it's a syndrome. Uh is called
tong syndrome.
>> Ever heard of tong syndrome?
>> Down syndrome. And it's because uh your
chromosome 21
has three carbons instead of two. Okay.
Then usually the kids will grow like
that. Okay. And so uh so before all the
performed uh performed uh it is
important to know whether the kids
is synchronous
and but what's the method to check?
Yeah. Before NGS
what people do is they use a needle
pop into the uterus of of the mother.
>> Yeah. and then take something like that
and there are some babies DNA gen
and then do the sequence okay and to
tell maybe not sequencing but now they
can tell whether there are three copies
of chromosome 21
uh in the emionic group okay so that's
But you know this kind of dangerous
right here needle in the uterus maybe
like dance right. Yeah some of
>> okay so the uh the risk of uh of
accidents doing this kind of test is
also pretty high. has like three 500
every one in like 500 pregnant women.
>> So now um with NGS what we can do is that
that
Okay mother's
loss uh the baby is connected to the
mother right through the Coco so uh
baby's DNA will be also present mother's ro
ro
and guess how much what's the percentage
of DNA the spot that are from that is
from the baby [Music]
>> 10% why are you so close
it's about 10% 10% % of ABC
come from the mother 10% of mother's DNA
in the blood from the baby. Okay. So we
can sequence the mother's blood DNA and
then figure out whether the baby has
three copies of chromosome 21. Okay. So
it's a much safer it's called invasive.
They don't need a needle drop needle
into the uterus anymore. Okay. So these
are the important applications and there
are many other applications for example
we can compare now we were talking about
human yeah human genome right and we can
also sequence the genome of different
species like for example chimpanzeee
right but what can we learn from
>> similarities
from original.
>> So why do you want to >> change
>> change
evolutionary pattern?
>> Find the evolutionary pathway
>> pattern pattern
>> pattern. Evolutionary pattern. By the
way, do you know how different are you
from your neighbor?
>> So like like what I said
always carries a mutant, right? So you
and your neighbors your classmates are
are different. Yeah. In your genome. And
what's the percentage of the genome of
your genome that is different from your class?
class?
>> Less than 1% less than 1%.
>> Less than 1%. Yeah. But how much? >> 0.7.
the the like it really depends on your
friend's ethnicity and all that because
here we we have to
>> Yeah. Yeah. Yeah. Yeah. half in half.
>> I would say uh maybe my guess would be
wrong but it would be around 0.5 I guess >> smaller.3
smaller >> 0.1
>> one in a thousand bases and your genome
are different from your class.
Yeah. So that's not the difference. No.
Yes. The percentage of difference
between a chicken
and human.
What's the temperature difference
It's about one
only 10 times the difference degree of
difference between you and your
customers. Okay. So although looks very
much different but actually they're very
similar. You check their genome. Okay.
Right. So what can you learn? You have a
chimp at hand and you know learn
disease from.
So what does it have to do with the transitional
transitional origin?
Okay. Okay. So, uh HIV virus uh
virus that's actually come from chimp, >> right?
>> right?
>> So, uh yeah, I understand
maybe you can
hint on how to deal with the disease
that right. All right. So, uh any other
possibilities? Yeah. We can learn from
[Music]
>> Oh wow. So interesting trees. Yeah.
Because she's interested in the history.
Yeah. Of all species, right? It's indeed
beautiful. Yeah. All the histories of uh
different species we we all are but we
exchange that birds and fish are all
evolutionary history. Okay. So after all
these examples I want just want to say
that uh
so many things we can do with sequencing
the gene with sequencing energy and
that's why so many people in the world
are using NGS
okay yeah generate so much data this and
the task of a bin formatician is to
extract the menu. Yeah. From the data.
Okay. So, uh it's a wonderful uh
direction of research. You are able to
spread the meaning from this ATCG
and you all can make some contributions.
Yeah. Many different videos starting
with interesting questions. Yeah. I I
just told you. Okay. So bioformat is
quite important and uh so
so this is uh
also I think many of you are working on
biology microbiology study that not only
the genome of uh big species like chimp
dog fish uh we can also sequence the
genome of bacteria bats there are so
many applications
Yeah, some bacteria can produce vitamin
A can produce certain enzyme used for
for many different reactions, right? So,
uh in order to know how to manipulate
this bacteria, yeah, it is very useful
to know the genome. Okay. So that's one
potential applications and also microbes
are everywhere and soil and the air
water uh there even in the ice. Okay.
Even in fast break. Yeah. There are
bacteria. Okay. I should have lots of uh
micro in our human
sites. All kinds of body sites. So it is
very uh interesting to explore uh the uh
okay the human microbial so that's one
of the major thing of my study okay
human microbial
and I think it's uh already about an
>> okay so how about we take a break okay
then we'll come back I'll show you what
I did All
right, let's have a break. And if you
it's not good. Uh then you can set up
that small window between
the time the time when you can switch
the DNA and when it actually started late
get his the
the first
ice. Okay.
uh so
I'll continue the topic about human
microbiota because it's one of my
research topic
>> Yeah, the
>> zoom is uh
>> connect. Oh, wait. I think it's fine.
And then you share again
getting
Before I uh jump into my research topic,
uh I just want to uh give you one more
last example importance of the
microbiota. Okay. So he here is a nature
paper experiment. Okay. Very nice
experiment. Can you tell what can you tell?
Basically the uh this picture is showing
uh how your gut microbiome could uh
could affect how your how your body uh
how your body works, how your body
functions. For example, like uh if two
uh like twins twins are identical
genetically, right? Right. But if you
but if you transplant different
microbiomes into uh different two
different microbiomes into the winds
then uh it could affect how they absorb
their nutritions and uh and uh
ultimately it could affect how how they how
how
obese one mouse could be compared to its twin.
twin.
>> Yeah. Yeah. Exactly. Exactly. Very good.
Uh so if your gut mic can transfer the
gut micro from
the mouse okay and then you have this
mouse uh have uh microbes from this
whole face. Okay. And then this one will
end up being fat. Okay. If you are this
one has mic drops from this wing. Yeah.
then he will remain lean. Okay. So the
summary is that if you are fat your
fault. Okay.
>> Maybe it's the problem of your micro.
>> Yeah. So uh microbes are very important
and uh and there is uh something called
ga grandis
right? So once inside your stomach or
intestine, your digestive gut, your
gastric system, yeah, will affect your
brain. Okay. So if sometimes you feel
blue, yeah, maybe it's because uh your
gut cups. Yeah. And actually recent studies has have revealed that gut
studies has have revealed that gut micros
micros impacts on several uh environmental
impacts on several uh environmental diseases including psy. Okay. So
diseases including psy. Okay. So learning the impact of human microbes
learning the impact of human microbes can be very important and this is one of
can be very important and this is one of my major research studies. Okay. All
my major research studies. Okay. All right. So now imagine that you have this
right. So now imagine that you have this engine sequences. You collect that uh
engine sequences. You collect that uh micro from many different body sites for
micro from many different body sites for your nose, your tongue,
your nose, your tongue, dots. By the way, do you know how to
dots. By the way, do you know how to connect microbes from the dots?
connect microbes from the dots? Know
Know >> stool sample?
>> stool sample? >> Yeah, stool sample. you use to sample uh
>> Yeah, stool sample. you use to sample uh to to explore to yeah explore the d
to to explore to yeah explore the d micro
micro of course you can use some uh endoscope
>> and take samples from guts but that's more intrusive so usually what people do
more intrusive so usually what people do is they like your stool of businesses
is they like your stool of businesses and then try to look extract the DNA and
and then try to look extract the DNA and then do the soal um thing s RNA
then do the soal um thing s RNA sequence.
sequence. So 16S RNA is a very important long gene
So 16S RNA is a very important long gene of bacter
G but they differ. Yeah. In certain area we can uh tell which bacteria it is.
we can uh tell which bacteria it is. Okay. So with such a data yeah the goal
Okay. So with such a data yeah the goal is to determine the composition of
is to determine the composition of different microbes in each surface.
different microbes in each surface. Right. So that's the goal. All right.
Right. So that's the goal. All right. And how how do we do that? Yeah. Explore
And how how do we do that? Yeah. Explore your imagination. Imagine that you have
your imagination. Imagine that you have data like this. Each single sequence uh
data like this. Each single sequence uh comes from one particular bacteria and
comes from one particular bacteria and you need to determine the species
you need to determine the species of genetics or different.
of genetics or different. So how do you do that? H how discuss
So how do you do that? H how discuss with your neighbors? How how are you
with your neighbors? How how are you going to do that?
>> Any potential
potential approach? Yes.
>> Uh maybe we can use bus from PI or something.
something. >> Yeah, very good. use plus any everyone
>> Yeah, very good. use plus any everyone knows plus
knows plus >> right
>> right yeah very good question
yeah very good question okay so imagine that you have a sequence
okay so imagine that you have a sequence like that uh uh to determine its uh
like that uh uh to determine its uh species yeah you have a bacterial
species yeah you have a bacterial sequence uh one way is to do sequence
sequence uh one way is to do sequence alignment okay which is what does right
alignment okay which is what does right is doing sequence alignment so it lines
is doing sequence alignment so it lines uh the rings to all the potential
uh the rings to all the potential sequences, all the available sequences
sequences, all the available sequences of all known bacteria in the database.
of all known bacteria in the database. Okay. And they try to find out uh which
Okay. And they try to find out uh which reference sequence match the best to the
reference sequence match the best to the read. Okay. And genius sequence. Okay.
read. Okay. And genius sequence. Okay. So, so however this process is pretty
So, so however this process is pretty long. Have you ever tried using Right.
long. Have you ever tried using Right. So it takes a few minutes.
So it takes a few minutes. >> Yes.
>> Yes. >> Right. 500 by the way like 20 more
>> Right. 500 by the way like 20 more seconds depending on the length of the
seconds depending on the length of the sequence right and so it's very long and
sequence right and so it's very long and imagine that you have millions of such
imagine that you have millions of such things
things still want to use plants
still want to use plants no you're not going to graduate
no you're not going to graduate yeah how can we speed up procedures
yeah how can we speed up procedures that's what environment
that's what environment people do. Okay. Yeah. We try they try
people do. Okay. Yeah. We try they try to invent some good methods to speed up
to invent some good methods to speed up process of learning. Okay. Or the
process of learning. Okay. Or the process of determining
process of determining species. Okay. Uh so here I want to show
species. Okay. Uh so here I want to show you a very nice uh approach. Yeah.
you a very nice uh approach. Yeah. Instead of doing this uh uh sequence
Instead of doing this uh uh sequence alignment. Yeah. There is a tool called
alignment. Yeah. There is a tool called RDB classifier. What it does is counts
RDB classifier. What it does is counts the appearance the presence of words.
the appearance the presence of words. Words are consecutive bases in a small
Words are consecutive bases in a small window and example this is a word CCGA
window and example this is a word CCGA and this is also CCG. So you you you do
and this is also CCG. So you you you do counting of words. Okay. And if you
counting of words. Okay. And if you found that uh this NJ sequence contain
found that uh this NJ sequence contain words many words that also appear in the
words many words that also appear in the reference
reference then you'll know that they are very
then you'll know that they are very similar. You don't need to do alignment
similar. You don't need to do alignment actually just skip out and uh when you
actually just skip out and uh when you have lots of single words yeah we say oh
have lots of single words yeah we say oh yeah that it's a very simple idea but it
yeah that it's a very simple idea but it turns out to be very efficient and yeah
turns out to be very efficient and yeah it's not so difficult right just count
it's not so difficult right just count words and you can speed up the alignment
words and you can speed up the alignment process uh to a thousand times and so
process uh to a thousand times and so this is the the beauty of uh the
this is the the beauty of uh the challenge and the beauty of biformatics
challenge and the beauty of biformatics al
so so this RDB classifier was invented uh we
this RDB classifier was invented uh we will start using that to determine the
will start using that to determine the species of each sequence and but still
species of each sequence and but still although it's like faster by thousand
although it's like faster by thousand times it's still too slow.
times it's still too slow. You're too slow.
You're too slow. And how can you further speed up the
And how can you further speed up the process? How can we further speed?
process? How can we further speed? >> Okay. So, it turns out that one possible
>> Okay. So, it turns out that one possible solution is to first mention that you
solution is to first mention that you have so many
have so many like this, but some of them are highly
like this, but some of them are highly similar and they very likely belong to
similar and they very likely belong to the same species. Okay. So if you are
the same species. Okay. So if you are able to cluster highly similar sequences
able to cluster highly similar sequences together, yeah, then you just need to
together, yeah, then you just need to annotate one and the rest could be from
annotate one and the rest could be from the very likely to come from the senses.
the very likely to come from the senses. So clustering very important procedure
So clustering very important procedure in bioformatics.
in bioformatics. Okay, faster sequences
Okay, faster sequences and then by clustering sequences you
and then by clustering sequences you need to compare similarity between
need to compare similarity between sequences. Yeah. And the comparison can
sequences. Yeah. And the comparison can be very difficult as well.
be very difficult as well. Imagine that you have 10 sequence. 10
Imagine that you have 10 sequence. 10 sequences and you want to do pairwise
sequences and you want to do pairwise comparison. Okay. How many combinations
comparison. Okay. How many combinations are there
are there for comparing
>> that of supplementals? >> Yeah, you still remember that from your
>> Yeah, you still remember that from your high school
high school class
class and sequences you want to pick two for
and sequences you want to pick two for comparison. How many potential how many
comparison. How many potential how many combinations are there?
>> 10 sequences. >> Yes, there's four nucleides.
>> Yes, there's four nucleides. Uh each sequence is like that. Yeah.
Uh each sequence is like that. Yeah. Each sequences this is one sequence.
Each sequences this is one sequence. >> Oh no. You want to compare any such
>> Oh no. You want to compare any such sequence
sequence single base?
this? That's why math is important. Okay. Especially in bioformatics. Yeah.
Okay. Especially in bioformatics. Yeah. You got so you should not be afraid of
You got so you should not be afraid of that if you want to do bios. Okay.
that if you want to do bios. Okay. >> Yeah.
>> Yeah. >> So,
>> So, right.
right. >> So, so let's make it easier. How many
>> So, so let's make it easier. How many combinations are there if there are only
combinations are there if there are only three?
three? >> A B C. You want to compare the two to
>> A B C. You want to compare the two to any two of them?
any two of them? >> Three. Yes. Three. What if you have four
>> Three. Yes. Three. What if you have four sequence?
>> So you can give four you have four options and for the first sequence and
options and for the first sequence and then three options for the second right
then three options for the second right >> then you can swap the two or four * 3id
>> then you can swap the two or four * 3id by two
by two >> six. Okay. So now what if you have 10
>> six. Okay. So now what if you have 10 sequence?
Yes. 45. Okay. That's where componentics works.
works. All right. So imagine that you have a
All right. So imagine that you have a million of six.
million of six. >> Yeah. That's how many calculation.
>> Yeah. That's how many calculation. That's a lot. Okay. So in order to do
That's a lot. Okay. So in order to do the clustering, we need to invent
the clustering, we need to invent another algorithm to speed up the
another algorithm to speed up the process.
process. Okay. And so uh there's an algorithm in
Okay. And so uh there's an algorithm in this tool called uh yeah parse okay uh
this tool called uh yeah parse okay uh it's able to do uh the clustering much
it's able to do uh the clustering much faster
faster and uh I'm not going to test the detail
and uh I'm not going to test the detail but uh yeah it's able to do clustering
but uh yeah it's able to do clustering in a very fast manner. Okay. No, not
in a very fast manner. Okay. No, not that. Once you are able to cluster
that. Once you are able to cluster highly similar sequences together and
highly similar sequences together and you can reduce the number of clusters
you can reduce the number of clusters but then originally it can be like
but then originally it can be like million sequences
million sequences for example but after class three
for example but after class three usually there are like thousands of
usually there are like thousands of clusters. Okay. So it's like thousands
clusters. Okay. So it's like thousands times faster and and with this thousand
times faster and and with this thousand times faster now you apply RDP class
times faster now you apply RDP class they can do tile management
they can do tile management then you can graduate okay in China
then you can graduate okay in China right so that's the beauty of mutational
right so that's the beauty of mutational right so uh so this uh you are able to
right so uh so this uh you are able to obtain the composition of different
obtain the composition of different bacteria in different samples
bacteria in different samples Okay. Now the question next the next
Okay. Now the question next the next question will be uh are the for example
question will be uh are the for example the uh bacterial composition in this
the uh bacterial composition in this group different from that group. Okay.
group different from that group. Okay. For for example if this is the uh
For for example if this is the uh microbial like
microbial like micro composition of healthy individual
micro composition of healthy individual and these are the uh disease individual.
and these are the uh disease individual. Then you want to figure out which
Then you want to figure out which bacteria
bacteria are different in terms of the abundance.
are different in terms of the abundance. Okay. And then perhaps that bacteria is
Okay. And then perhaps that bacteria is related to the disease. Yeah. Okay. So
related to the disease. Yeah. Okay. So we want to figure out which bacteria is
we want to figure out which bacteria is more. Yeah. And this simple question
more. Yeah. And this simple question turned out to be not very easy.
turned out to be not very easy. Okay. Why is not using uh there's this
Okay. Why is not using uh there's this challenge called compositional ch here
challenge called compositional ch here is the illustration of case. Okay. So
is the illustration of case. Okay. So assuming that you find three bacterias
assuming that you find three bacterias with the same amount of abundance with
with the same amount of abundance with the same abundance in the so you get 100
the same abundance in the so you get 100 rates for this one 100 rates for that
rates for this one 100 rates for that one 100 rates for this one. Okay. And uh
one 100 rates for this one. Okay. And uh so each of them occupy about one/ird of
so each of them occupy about one/ird of the community. Right? Now imagine that
the community. Right? Now imagine that this bacteria
this bacteria expand. Yeah. They duplicate faster.
expand. Yeah. They duplicate faster. Okay. And but these two remain the same.
Okay. And but these two remain the same. They don't increase. All right. And this
They don't increase. All right. And this spend two times.
spend two times. So uh if you do sequencing you will get
So uh if you do sequencing you will get like 200 reads this bacteria time. Okay.
like 200 reads this bacteria time. Okay. Now if you calculate the fraction of
Now if you calculate the fraction of this one become half this will become
this one become half this will become quarter quarter.
quarter quarter. Now if you compare the fraction this
Now if you compare the fraction this material one/3 to quarter
material one/3 to quarter you'll feel that all this reduced.
you'll feel that all this reduced. Okay but in fact it doesn't change.
Okay but in fact it doesn't change. Yeah. So there's a problem. This problem
Yeah. So there's a problem. This problem is called compositional problem because
is called compositional problem because all the fraction of all these bacteria
all the fraction of all these bacteria should add up.
should add up. Okay. So there's a constraint.
Okay. So there's a constraint. Therefore, if you simply compare the row
Therefore, if you simply compare the row fraction, then there's a problem. Yeah.
fraction, then there's a problem. Yeah. You cannot reflect the absolute
You cannot reflect the absolute abundance
abundance of the bacteria. Okay. So that's a a
of the bacteria. Okay. So that's a a problem called compositional problem and
problem called compositional problem and it's quite challenging data analysis. So
it's quite challenging data analysis. So what can we do? Yeah. Can we do facing
what can we do? Yeah. Can we do facing such a problem? Okay. And some people
such a problem? Okay. And some people were genius. Yeah, they invented
were genius. Yeah, they invented approach and approach the idea is to
approach and approach the idea is to compare uh compare to to study the uh
compare uh compare to to study the uh ratio of relative abundance. These are
ratio of relative abundance. These are relative abundance. Okay. And instead of
relative abundance. Okay. And instead of uh simply compare the ratios of relative
uh simply compare the ratios of relative abundance, you can compare the ratio of
abundance, you can compare the ratio of two relative abundance. For example, if
two relative abundance. For example, if you compare this with that ratio is one.
you compare this with that ratio is one. Okay. And when you compare this with
Okay. And when you compare this with that ratio is two. Okay. So this really
that ratio is two. Okay. So this really increase. Yeah. But if you compare this
increase. Yeah. But if you compare this with that the ratio is one
with that the ratio is one although now the ratio is quarter but
although now the ratio is quarter but the relative the ratio of the two
the relative the ratio of the two relative still work.
relative still work. Okay. So the idea of comparing ratio of
Okay. So the idea of comparing ratio of two different bacteria can solve the
two different bacteria can solve the problem of composition of grow
problem of composition of grow challenge of composition.
challenge of composition. So that's a fantastic idea. Uh and um so
So that's a fantastic idea. Uh and um so now I have shown you uh three major
now I have shown you uh three major challenges.
challenges. this one compositional problem
this one compositional problem and this one clustering problems and
and this one clustering problems and alignment problems. So in order to get
alignment problems. So in order to get this composition
this composition of microbes, you actually need to have a
of microbes, you actually need to have a very nice idea about how to process the
very nice idea about how to process the data. Okay. And that's what
data. Okay. And that's what bioinformatics
bioinformatics uh is beautiful. That's why bionat is
uh is beautiful. That's why bionat is important because they want to uh speed
important because they want to uh speed up not only speed up the process but
up not only speed up the process but also uh gives you the correct
also uh gives you the correct interpretation
interpretation of the results. Okay. So this positions
of the results. Okay. So this positions problem compositional problem exit
problem compositional problem exit and you should uh have a good idea about
and you should uh have a good idea about that you want to and now that so from
that you want to and now that so from the ideas that I just share with you
the ideas that I just share with you it's not so difficult right yeah and
it's not so difficult right yeah and they are the kernel of these three
they are the kernel of these three different computational tools RDB
different computational tools RDB classifier
classifier pass and angl okay although there are
pass and angl okay although there are some more detailed mathematics you need
some more detailed mathematics you need to know in order know
to know in order know the secrets of the tools but you get the
the secrets of the tools but you get the main idea of the tools and that's how
main idea of the tools and that's how these tools uh help people to uh
these tools uh help people to uh correctly uh infer the compos
correctly uh infer the compos composition of differential communities
composition of differential communities okay that's how uh we can we can arrive
okay that's how uh we can we can arrive at a very solid conclusion All
at a very solid conclusion All right. So, uh that's the fun part uh of
right. So, uh that's the fun part uh of learning plan for matters. Okay. And so,
learning plan for matters. Okay. And so, uh so with these tools at hands, uh now
uh so with these tools at hands, uh now uh going to tell you some of the
uh going to tell you some of the applications. Okay. So, in this project,
applications. Okay. So, in this project, uh I collaborate collaborating with the
uh I collaborate collaborating with the clinical doctors uh from a children's
clinical doctors uh from a children's hospital,
hospital, right? And we were interested in the
right? And we were interested in the mass of microbiota the bacteria from
mass of microbiota the bacteria from your nose nose cavity. Uh and we are
your nose nose cavity. Uh and we are specifically interested uh in athmetic
specifically interested uh in athmetic children
children you know.
you know. So people have the children will
So people have the children will difficult difficult uh in breathing or
difficult difficult uh in breathing or some sounds or breathe. Okay. So we want
some sounds or breathe. Okay. So we want to study their uh nasal microbiota.
to study their uh nasal microbiota. Okay. So that time we collect collective
Okay. So that time we collect collective samples from many different children.
samples from many different children. And then we extract the DNA send it to
And then we extract the DNA send it to sequencing company. Okay. And they give
sequencing company. Okay. And they give us uh they gave us some data. Okay.
us uh they gave us some data. Okay. And but when we got the data back
And but when we got the data back uh we start to analyze the data. This is
uh we start to analyze the data. This is the soal uh principal component
the soal uh principal component analysis. So this uh figure shows how
analysis. So this uh figure shows how similar two samples are in terms of
similar two samples are in terms of microbial composition and the closer the
microbial composition and the closer the two points uh the more similar their
two points uh the more similar their microbial comp compositions. Okay. And
microbial comp compositions. Okay. And then you can see that this is the blue
then you can see that this is the blue one is uh the samples the first batch of
one is uh the samples the first batch of experiments. So we collect the samples
experiments. So we collect the samples from children in two different batch
from children in two different batch batches and send the sequencing send the
batches and send the sequencing send the sample to sequencing company. This is
sample to sequencing company. This is the uh route on the first page and the
the uh route on the first page and the second batch. Okay. So you can see
second batch. Okay. So you can see clearly that the two batches are very
clearly that the two batches are very different and yeah they're very
different and yeah they're very different. However, uh the patients
different. However, uh the patients involved the two batches basically
involved the two batches basically similar. They are all athletic children.
similar. They are all athletic children. Yeah. It's not that one at the first
Yeah. It's not that one at the first batch are all mal second are all
batch are all mal second are all females. Yeah. I mean we we have uh this
females. Yeah. I mean we we have uh this clinical features they distribute it
clinical features they distribute it yeah within the two beds yeah and so
yeah within the two beds yeah and so forth. Uh that's very weird. Yeah. So
forth. Uh that's very weird. Yeah. So what do you think happened?
what do you think happened? What do you think it happened?
That's the real challenge you have when you get the data.
you get the data. Okay.
Okay. So you you expect that two batch the
So you you expect that two batch the samples of two batch will be rather
samples of two batch will be rather homogeneous and so that shouldn't be
homogeneous and so that shouldn't be separate here. So what might be the
separate here. So what might be the problem?
Anyway, uh sequencing one samples it cost like 5,000 you talent,
cost like 5,000 you talent, tons, right? So uh this piece of data
tons, right? So uh this piece of data they're like you have to spend at least
300 Yeah. $300,000,000. Yeah. Not count. Okay. Anyway, so it's
Yeah. Not count. Okay. Anyway, so it's not cheap. Okay. Uh but do you trust
not cheap. Okay. Uh but do you trust this data? No. Right. Because they are
this data? No. Right. Because they are not much engines. Yeah. But what are you
not much engines. Yeah. But what are you going to do if you find this? First of
going to do if you find this? First of all, uh never fully trust the result
all, uh never fully trust the result from a sequencing of them. Okay, that's
from a sequencing of them. Okay, that's one important lesson from because
one important lesson from because every everyone makes mistakes and single
every everyone makes mistakes and single one mistakes
one mistakes and uh so uh when you get data from a
and uh so uh when you get data from a sequencing company you have check
sequencing company you have check yourself because they're not going to
yourself because they're not going to check it for you. All right,
check it for you. All right, check it and you you also need to know
check it and you you also need to know how to check it, right? So for example
how to check it, right? So for example this is soal principle component
this is soal principle component analysis. So if you study bioinformatics
analysis. So if you study bioinformatics you'll learn how to do some kind of
you'll learn how to do some kind of and it's a very useful analysis. You can
and it's a very useful analysis. You can apply it to many different uh subjects.
apply it to many different uh subjects. All right. So let's back to uh back to
All right. So let's back to uh back to this problem. Yeah. What are you going
this problem. Yeah. What are you going to do?
to do? You have already spent quite some money
You have already spent quite some money and you figure out that data were not
and you figure out that data were not consistent.
consistent. What are you going to do?
What are you going to do? What are you going to do? Spend a lot of
What are you going to do? Spend a lot of money. They are not consistent.
money. They are not consistent. S the company.
S the company. [Music]
So uh uh so what I did at that time is that I have to convince the company that
that I have to convince the company that they make a mistake.
they make a mistake. Okay? And and you have to show evidence,
Okay? And and you have to show evidence, right? So here is the percentage of
right? So here is the percentage of certain bacteria in the first batch and
certain bacteria in the first batch and second batch. We can clearly see that oh
second batch. We can clearly see that oh my many samples of the second batch are
my many samples of the second batch are the fraction of this bacteria is pretty
the fraction of this bacteria is pretty high that's almost zero for the first
high that's almost zero for the first batch. Okay the same for this one same
batch. Okay the same for this one same for that one. So it is possible that
for that one. So it is possible that there is
there is what keyword
what keyword contamination.
contamination. Okay. Contamination. But it is important
Okay. Contamination. But it is important to check the contamination.
to check the contamination. >> Yeah.
>> Yeah. >> Because when you do sequencing, you
>> Because when you do sequencing, you actually do not sequence only one s at a
actually do not sequence only one s at a time.
time. >> Nowadays, you need to sequence 96
>> Nowadays, you need to sequence 96 samples at a time. Okay. So, it's
samples at a time. Okay. So, it's possible that some risks from this
possible that some risks from this sample will move across to another
sample will move across to another sample and you ended up with uh if you
sample and you ended up with uh if you if you Yeah. You ended up with
if you Yeah. You ended up with some uh alien synances from brothers and
some uh alien synances from brothers and okay so uh so that's that happens that
okay so uh so that's that happens that happens so you you need to know the pros
happens so you you need to know the pros and cons of NGS for example you need to
and cons of NGS for example you need to know that 96 samples are sequence at the
know that 96 samples are sequence at the same time
same time but you need to know why is it important
but you need to know why is it important yeah because if you sequence multiple
yeah because if you sequence multiple samples at a time there is a chance of
samples at a time there is a chance of okay yeah and so we don't just learn the
okay yeah and so we don't just learn the facts oh wasn't the same at the same
facts oh wasn't the same at the same time yeah but they need to know the
time yeah but they need to know the consequence okay so uh yeah and because
consequence okay so uh yeah and because I show them the data yeah they uh
I show them the data yeah they uh decided to uh return the money back
decided to uh return the money back >> so All right.
>> so All right. >> Okay. And so we use the money to another
>> Okay. And so we use the money to another round of banana company.
round of banana company. >> Yes.
>> Yes. >> Sure. Sure.
>> Sure. Sure. >> Uh since uh some project or some
>> Uh since uh some project or some research that um leads us uh that us
research that um leads us uh that us multiple uh genomes at a time. But since
multiple uh genomes at a time. But since you said that uh sequencing multiple
you said that uh sequencing multiple tumors at a time may cause uh
tumors at a time may cause uh contamination
contamination >> uh how do you do that? How do we do how
>> uh how do you do that? How do we do how do you sequence multiple uh rese
contamination is not uh avoidable sometimes. Okay. Okay. So in the fields
sometimes. Okay. Okay. So in the fields of this meta genomics usually uh okay.
of this meta genomics usually uh okay. Okay. So first of all what we do is we
Okay. So first of all what we do is we usually prepare standard sample. Yeah.
usually prepare standard sample. Yeah. With a known bacteria in the sample. For
With a known bacteria in the sample. For example 12 bacteria only in the sample
example 12 bacteria only in the sample with a known proportion. Each one with a
with a known proportion. Each one with a known proportion. Okay. And then we do
known proportion. Okay. And then we do the sequence
the sequence through this standard sample. Okay. And
through this standard sample. Okay. And then when we got a result to analyze the
then when we got a result to analyze the number of species within this standard
number of species within this standard sample and usually what we found is that
sample and usually what we found is that there are 15 species made of 12. Yeah.
there are 15 species made of 12. Yeah. And then you know clearly that
And then you know clearly that contamination
contamination and that's not avoidable. Okay. And so
and that's not avoidable. Okay. And so one way to solve the problem is to do
one way to solve the problem is to do the so-called uh uh filtration. Yeah. So
the so-called uh uh filtration. Yeah. So this amounts of uh usually those uh
this amounts of uh usually those uh additional species they appear in a very
additional species they appear in a very low fraction.
low fraction. Yeah. So one way to remove that is to
Yeah. So one way to remove that is to remove the bacteria that appear with a
remove the bacteria that appear with a very low fraction in the samp. Okay. So
very low fraction in the samp. Okay. So this is called uh terminologist to
this is called uh terminologist to remove the close talk. Yeah. To sample
uh step in the data analysis. Yeah. Yeah. Especially if you want to uh
Yeah. Especially if you want to uh quantify the diversity sample. Yeah. So
quantify the diversity sample. Yeah. So I got only 12 but you ended up 50. Yeah.
I got only 12 but you ended up 50. Yeah. That's a lot, right?
That's a lot, right? >> Yeah. So uh you want to do some
>> Yeah. So uh you want to do some pre-process, you want to do some
pre-process, you want to do some filtering. Yeah. During your process.
filtering. Yeah. During your process. Yeah.
Yeah. And your case, in your case, if you
And your case, in your case, if you sequence multiple genomes, uh it is
sequence multiple genomes, uh it is possible to align the rates to different
possible to align the rates to different gen.
Yeah. Okay. So, uh what do you usually do? What if
do? What if usually do? It is not that you just uh
usually do? It is not that you just uh enter all the data to a tool. Just click
enter all the data to a tool. Just click on run. Okay.
on run. Okay. That's whatever. No. Yeah. A quality
That's whatever. No. Yeah. A quality a high any qualified
a high any qualified magician should know all this potential
magician should know all this potential of errors. Yeah. During the procedure
of errors. Yeah. During the procedure and if you check this check that sure a
and if you check this check that sure a high quality event.
high quality event. Okay. So uh so
Okay. So uh so okay. So yeah um so we ask another
okay. So yeah um so we ask another company to do the sequencing and we have
company to do the sequencing and we have very competent data. Yeah. So or
very competent data. Yeah. So or subsequent events and about nasal micro
subsequent events and about nasal micro then there are very interesting example
then there are very interesting example here
here the composition of uh
the composition of uh different bacteria. Okay. And this is
different bacteria. Okay. And this is called a heat map. So if you see lots of
called a heat map. So if you see lots of red that implies that that bacteria is
red that implies that that bacteria is very bleed. So this red indicates the
very bleed. So this red indicates the percentage
percentage from zero to 100. So you can see that.
from zero to 100. So you can see that. So all these are almost 100%.
So all these are almost 100%. All the rest are like 0%. Okay. Yeah. So
All the rest are like 0%. Okay. Yeah. So this is what we know about
this is what we know about microbiota.
microbiota. Okay. So what can you learn? What do you
Okay. So what can you learn? What do you learn from this?
learn from this? Learn from this. You need to make a good
Learn from this. You need to make a good conclusion based on your day. And
conclusion based on your day. And >> what do you learn?
>> what do you learn? >> Yeah.
>> Yeah. >> Uh after attack
is in diversity. Maybe like since we're sneez uh since then as like you're
sneez uh since then as like you're likely like sneezing a lot and like a
likely like sneezing a lot and like a lot maybe the um some of the micro that
lot maybe the um some of the micro that is in your uh the breathing pathways in
is in your uh the breathing pathways in your uh throat and like that gets pushed
your uh throat and like that gets pushed up your nose. That's why there's a lot
up your nose. That's why there's a lot of
of >> Okay. Uh very good very good uh
>> Okay. Uh very good very good uh hypothesis. Okay. And uh so uh just
hypothesis. Okay. And uh so uh just mention that when error yeah or so
mention that when error yeah or so actually these data are uh these samples
actually these data are uh these samples were taken during the month
were taken during the month when children uh suffer from severe yeah
when children uh suffer from severe yeah syndrome. Okay. And these are the
syndrome. Okay. And these are the samples taken and they recovered
samples taken and they recovered right. So uh in your hypothesis you are
right. So uh in your hypothesis you are supposed to see something like this.
supposed to see something like this. Yeah. Uh in some of the samples you
Yeah. Uh in some of the samples you don't have one single bacteria
don't have one single bacteria dominating the the community. you you
dominating the the community. you you still see
still see uh tear and but that's not very common.
uh tear and but that's not very common. Yeah. And so what we learn from this one
Yeah. And so what we learn from this one is that in our nose usually there is
is that in our nose usually there is only one kind of bacteria that dominate
only one kind of bacteria that dominate the community.
the community. For example, it can be this
For example, it can be this caucus. It can be that. Yeah.
caucus. It can be that. Yeah. Morosa can be Uh this and usually it's
Morosa can be Uh this and usually it's like one party. Yeah. Yeah. One party
like one party. Yeah. Yeah. One party government. Yeah. It's not like in the
government. Yeah. It's not like in the US like two party. But there's one
US like two party. But there's one exception like these two bacteria they
exception like these two bacteria they somehow collaborate with each other.
somehow collaborate with each other. Okay. So uh they can coexist in your
Okay. So uh they can coexist in your nose. Yeah. And but if you added the
nose. Yeah. And but if you added the fraction of the two Yeah. they will be
fraction of the two Yeah. they will be almost one one.
almost one one. So the two bacteria constitute majority
So the two bacteria constitute majority of yeah the microbial commun. Okay.
of yeah the microbial commun. Okay. Yeah. So this this one interesting to
Yeah. So this this one interesting to know that that only one uh kind of
know that that only one uh kind of bacteria can
bacteria can survive well
survive well >> in your nose.
>> in your nose. >> No not many other.
>> No not many other. Okay.
Okay. Also uh interesting thing we found is
Also uh interesting thing we found is that uh uh when we uh check uh the
that uh uh when we uh check uh the dominating
dominating dominating uh bacteria back uh those of
dominating uh bacteria back uh those of these children uh during the eth okay
these children uh during the eth okay we compare the dominating uh bacteria
we compare the dominating uh bacteria with their clinical features because
with their clinical features because some of the athletic children they are
some of the athletic children they are allergic to many things. So we call them
allergic to many things. So we call them allergic asthma. Okay. But some other
allergic asthma. Okay. But some other children they allergic asthma. Okay. So
children they allergic asthma. Okay. So the asthma syndrome is not related to
the asthma syndrome is not related to allergy. Okay. And we found that this
allergy. Okay. And we found that this this allergic asthma children uh during
this allergic asthma children uh during the asthma attack yeah
the asthma attack yeah most of the children's dominating
most of the children's dominating is to
so uh and when the children recover yeah we don't see such kind of family yeah we
we don't see such kind of family yeah we don't see such family Okay. So what does
don't see such family Okay. So what does it imply? It implies that our the immune
it imply? It implies that our the immune system of the children is interacting
system of the children is interacting was interacting with the bacteria.
was interacting with the bacteria. >> Yeah. But the interaction only occur
>> Yeah. But the interaction only occur during the asthma attack. Yeah. When the
during the asthma attack. Yeah. When the recover from asthma they behave like nor
recover from asthma they behave like nor Yeah. Okay. So, so there's a very
Yeah. Okay. So, so there's a very critical window of time where you can
critical window of time where you can see the interaction between your immune
see the interaction between your immune system and bacteria in your dogs. Okay.
system and bacteria in your dogs. Okay. So, so it's very yeah in my opinion it's
So, so it's very yeah in my opinion it's like quite a nice findings because uh
like quite a nice findings because uh you want to study the direction system
you want to study the direction system and microbes
and microbes want to stud it at the right time right
want to stud it at the right time right but people used to not knowing which
but people used to not knowing which time best window yeah to study
Yeah, it's a very interesting study and uh
uh five minutes left. Okay. Yeah. And uh we
five minutes left. Okay. Yeah. And uh we also uh with that data that piece of
also uh with that data that piece of data we also collaborated with professor
data we also collaborated with professor from uh department of
from uh department of okay she's able to repeat the quality of
okay she's able to repeat the quality of air
air because uh those uh kids they live in
because uh those uh kids they live in different places somehow it's able to
different places somehow it's able to estimate the quality of their of the air
estimate the quality of their of the air near their how their houses. Okay. And
near their how their houses. Okay. And uh then we'll try to see if there's any
uh then we'll try to see if there's any correlation yeah between uh the
correlation yeah between uh the environments of the forms and uh that's
environments of the forms and uh that's all microb
and uh it's very interesting that we found that if you live in a place with
found that if you live in a place with more greenness and for example in your
more greenness and for example in your campus
campus and your natural microbiota will have a
and your natural microbiota will have a higher diversity.
higher diversity. Okay. And usually uh a microbial
Okay. And usually uh a microbial community with a high diversity is good
community with a high diversity is good is considered good because it's usually
is considered good because it's usually more robust, right? Yeah. You have more
more robust, right? Yeah. You have more players the community than uh they they
players the community than uh they they will not fluctuate.
will not fluctuate. Okay. So it's considered good if you
Okay. So it's considered good if you leave close to the park. Okay. and uh
leave close to the park. Okay. and uh with so many trees in your campus, I
with so many trees in your campus, I believe that your muscle micro are
believe that your muscle micro are pretty healthy.
pretty healthy. >> Okay. So that's the that's uh that's the
>> Okay. So that's the that's uh that's the beauty of bio
beauty of bio by analyzing this u composition we can a
by analyzing this u composition we can a very interesting yeah relationship
very interesting yeah relationship between environments between babies and
between environments between babies and the micro okay. All right. So it's about
the micro okay. All right. So it's about time. Uh yeah. So really I really enjoy
time. Uh yeah. So really I really enjoy giving the lecture here. Hopefully you
giving the lecture here. Hopefully you like to learn something. Yeah. And
like to learn something. Yeah. And remember that your life uh your career
remember that your life uh your career path can be full of surprise. Yeah. And
path can be full of surprise. Yeah. And so it is good to be openminded and try
so it is good to be openminded and try to learn some informatics. It's pretty
to learn some informatics. It's pretty good. Yeah. And it's not so difficult.
good. Yeah. And it's not so difficult. Yeah. See, I just told you some bad
Yeah. See, I just told you some bad ideas and you know the bad ide.
All right, guys. So, I wish that you enjoy binatics as I enjoy tomorrow.
enjoy binatics as I enjoy tomorrow. Okay. Yeah. Thank you.
enlighting. So now you know the triples of why you get bioinformatics
of why you get bioinformatics as your compulsory course. So anyway uh
as your compulsory course. So anyway uh because we unfortunately we have a very
because we unfortunately we have a very limited of time. Maybe we can have a
limited of time. Maybe we can have a quick uh photo session. Yeah. Uh would
quick uh photo session. Yeah. Uh would you please uh maybe professor
you please uh maybe professor here and then we take picture together.
thumbs up everyone. [Music]
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.
Works with YouTube, Coursera, Udemy and more educational platforms
Get Instant Transcripts: Just Edit the Domain in Your Address Bar!
YouTube
←
→
↻
https://www.youtube.com/watch?v=UF8uR6Z6KLc
YoutubeToText
←
→
↻
https://youtubetotext.net/watch?v=UF8uR6Z6KLc