0:05 [Music]
0:08 hello and welcome to explore bio
0:10 rna c chord transcriptome sequencing is
0:13 a powerful technique to characterize and
0:15 quantify gene expression
0:17 in my previous video i explained that
0:19 transcriptome is the entire set of rna
0:21 expressed
0:21 at a specified time in a particular
0:24 biological sample
0:25 its importance in identifying and
0:27 studying gene expression
0:28 which is useful to study development
0:30 disease in response to stresses
0:32 i also mentioned about two major
0:34 techniques of transcriptome analysis
0:36 namely microarray and rna-seq and their
0:38 applications
0:39 if you are new to transcriptome and its
0:41 analysis you should watch my
0:43 introductory video
0:44 the link is provided in the description
0:46 below
0:48 this is the second video in the
0:49 transcription series
0:51 in the first part of the video i will
0:53 cover the basic steps
0:54 involved in transcriptome library
0:56 preparation for sequencing
0:58 in the second part i will cover the
1:00 basic workflow for transcriptome data
1:02 analysis
1:03 i hope the video will be useful for
1:05 beginners who have little or no idea
1:07 about transcriptome and willing to learn
1:09 more about it
1:10 it would also help the researchers who
1:12 are planning or currently dealing with
1:13 some kind of transcriptome work
1:15 i request you to stay tuned and watch
1:17 the complete series of videos on
1:19 transcriptome
1:20 at last i will mention some of the
1:22 important things to remember and
1:23 consider
1:24 before you plan a transcript of
1:25 experiments
1:29 so let's begin with the basic steps
1:31 involved in transcript on library
1:32 preparation
1:34 here i will focus on one of the popular
1:36 illumina mrna enrichment library
1:38 preparation
1:39 there are separate protocols for other
1:41 small rna
1:42 and microorganic library preparations
1:44 too
1:45 the first and the foremost thing you
1:47 need to start is
1:49 a high quality of rna extracted from
1:51 biological samples to be studied
1:53 along with appropriate controls for
1:55 comparison
1:57 next you proceed for mrna enrichment if
1:59 your target is protein coding rnas
2:02 else this step can be omitted here poly
2:05 a tail containing rna is captured using
2:07 magnetic beads with oligodt attached to
2:09 it
2:11 next comes the cdna library preparation
2:13 which involves series of steps
2:15 the mrna is fragmented appropriately
2:18 using chemical or heat treatment to
2:20 shorter fragments
2:21 of usually 100 to 300 base pairs that
2:24 can be sequenced
2:25 note that full length mrnas are not
2:27 sequenced unless you are using oxford
2:29 nanopore sequencing chemistry
2:31 the fragmented rna is now reverse
2:33 transcribed to double stranded cdna
2:35 using reverse transcriptase
2:37 after end repair and addition of adenine
2:39 net 3 prime end
2:41 the adapters which are short double
2:42 stranded oligonucleotides are ligated at
2:45 both the ends of cdna fragments
2:47 these adapters serves as the site for
2:49 primer binding to facilitate clonal
2:51 amplification in pcr in the next
2:53 step the adapter ligated at cdns is
2:56 termed as
2:57 cdna library which represents the
2:59 complete set of rnas expressed in the
3:01 sample and are ready to be sequenced
3:04 multiple samples are ligated with
3:06 different adapters so that they can be
3:07 pulled together for sequencing in a
3:09 single run on a machine
3:11 this is known as multiplexing after the
3:13 sequencing is over the data generated
3:15 can be demultiplexed based on the
3:17 different
3:18 adapters used cdna libraries can be
3:21 sequenced from one
3:22 or both the ends which is termed as
3:24 single end or pair end sequencing using
3:26 suitable ngs platform
3:28 the amount of sequence data generated in
3:30 the form of short reads depend upon the
3:32 sequencing platform
3:34 and the need of experiment usually 10 to
3:36 30 million reads for each sample
3:38 are appropriate for analysis
3:42 coming on to the second part which is
3:43 the basic workflow for transcriptome
3:45 analysis
3:46 once the sequencing run is complete you
3:48 will get sequence data in the form of
3:49 raw reads
3:50 the read files are usually in fastq
3:52 format which contain the information
3:54 about the sequence and base quality
3:56 qc or quality check of the sequence rate
3:58 is the first step of transcriptome
4:00 analysis
4:01 generally done using tool like fast qc
4:04 raw reads generated after transcriptome
4:06 sequencing using next generation
4:08 sequencing platform such as illumina or
4:10 roche
4:11 is processed to remove low quality reads
4:13 adapter sequences
4:14 used during transcription library
4:16 preparation sometimes
4:17 read and streaming is also required as
4:19 the basis sequence at the end of
4:21 sequencing rung
4:22 may be of lower quality some of the
4:25 commonly used tools for
4:26 raw read filtering are ngs qc and fast p
4:30 so next comes is the read alignment or
4:32 mapping the short high quality reads are
4:34 then aligned or mapped back to the
4:36 reference genome or transcriptome if
4:38 available
4:39 this is known as reference based
4:41 assembly if reference is not available
4:43 for example in case of non-modal
4:45 organisms
4:46 de novo or fresh assembly is done in
4:49 case of genome guided assembly
4:51 spliced aligner tools and for
4:53 transcriptome guided or deno assembly
4:55 unspliced aligners are used
4:57 examples of routinely used aligners are
4:59 bowtie and top head
5:01 short reads are meaningless to us unless
5:03 they are assembled to larger
5:05 and more complete sequence termed as
5:06 transcripts or context that actually
5:09 represents
5:09 mrna from which they are derived the
5:12 assembly is done based on sequence
5:13 overlaps in the reads to form a must
5:16 longer sequence in case of reference
5:18 guided assembly reads are first aligned
5:20 to the reference genome transcriptome
5:22 and then the overlapping reads are
5:24 assembled together in case of denom
5:26 assembly the reads are assembled into
5:28 transcripts without reference
5:30 most popular tools for transcriptome
5:32 assembly are trinity oss
5:34 clc genomics workbench and curve link
5:37 sometimes assembly is done with multiple
5:39 tools before finding the best one
5:42 transcripts or the contigs are further
5:44 clustered using tools like
5:46 cd heat est to reduce the redundancy
5:49 once the assembly is done based on the
5:51 alignment with the conserved orthologous
5:53 genes in related lineage the
5:55 completeness of the assembly may be
5:56 checked
5:58 example of one such tool is busco
6:03 to quantify the expression of individual
6:05 transcript the mapping file generated
6:06 during read alignment is used as input
6:09 gene level or transcript level abundance
6:11 is determined using different tools such
6:13 as rsm
6:14 solvent or cufflink the abundance or
6:17 expression level of transcripts
6:19 is represented as normalized read counts
6:21 that are mapped to the transcript
6:24 major ways to represent normalized read
6:26 counts are tpm
6:27 fpkm rpk or cpm
6:31 to compare the change in expression in
6:33 treatment versus control samples
6:35 differential expression analysis or dge
6:38 is done
6:39 various programs such as hr desec
6:42 curvedif
6:43 performs differential expression
6:45 analysis between samples after
6:46 normalizing the abundance data
6:49 p-value and fdr tells how significant is
6:51 the differential expression results
6:53 and should be or should not be
6:55 considered for further analysis
6:57 later using real-time pcr the
6:59 transcriptome expression is validated
7:01 to learn more about it i highly
7:03 recommend you to watch my video on
7:05 real-time pcr and how it is done
7:08 to predict the function of transcripts
7:10 or contigs after assembly they are
7:12 assigned
7:13 functions based on the sequence homology
7:15 against known protein in the databases
7:17 such as nr
7:18 swiss prod and tear using blast search
7:21 i have made a separate tutorial on how
7:23 to perform standalone ncbi blast on your
7:26 computer
7:27 you may watch it later subsequently geo
7:30 classification and pathway analysis may
7:32 be done
7:33 so these are the major steps involved in
7:35 the transcript of sequencing and
7:36 analysis
7:38 coming on to the last part of the video
7:40 about the things to consider for
7:41 planning a transcriptome experiment
7:43 following question should be asked
7:45 is the aim is to identify a quantified
7:48 transcript sequence
7:49 what are the biological samples controls
7:51 and number of replicates you are going
7:53 to take
7:54 how much sequencing data needs to be
7:56 generated what sequencing platform you
7:58 will use
7:59 will it be reference based or de novo
8:02 this will determine the sequencing depth
8:04 you will need
8:05 is mrn enrichment required sequential
8:08 should be single end or pair handed
8:10 do you have budget for sequencing
8:12 analysis and is access to high-end
8:14 computing available
8:17 so that's all for the today's video you
8:19 can do a lot more things once you have
8:20 the transcriptome data for example you
8:22 can study gene enrichment pathway
8:24 enrichment classify the genes
8:26 based on their ontologies cag analysis
8:29 identify orthologous groups coexpression
8:31 analysis and generate a heat map
8:33 develop protein protein interaction
8:35 network to identify interacting partners
8:37 and lot more to make most use of
8:39 transcription data generated
8:41 some of these may be the part of my
8:42 subsequent videos
8:45 in my next video i will be covering the
8:47 important terms such as reads
8:49 transcripts annotation
8:50 blast e-value bit score read count
8:54 dge n50 faster format rpk
8:57 fp tpm cpm and others which are
9:00 routinely used in transcriptome analysis
9:06 if you like the information do share
9:07 with your friends comment about what new
9:09 you want to learn
9:10 subscribe and check my playlist to stay
9:12 tuned with other informative videos
9:14 and finally thanks for watching
9:22 [Music]
9:37 you