YouTube Transcript:
Fundamentals of RNA-Sequencing (RNA-seq) Analysis

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

This webinar provides a comprehensive overview of RNA sequencing (RNA-seq) analysis, detailing its applications, a standard workflow from data processing to interpretation, and available Illumina solutions to facilitate these analyses.

welcome to the webinar fundamentals of

rna sequencing analysis

in this webinar you'll be first

introduced to applications of rna sequencing

sequencing

followed by rna-seq analysis workflow

and lastly

analysis

the power of rna-seq allows us to

perform expression profiling

of a gene or transcript in a single condition

if we have samples from different

conditions we can measure

relative changes in gene or transcript abundance

abundance

this is useful when we want to identify

disease biomarkers

we can also conduct transcript analysis

by assembling the whole transcriptome

and detect known or novel rna species

these species

include splice variants fusions as well

let's dive into what a typical rna-seq

say that your lab is interested in

identifying biomarkers for a cancer

and have sequenced the rna of tumor

samples and non-malignant samples

there are five main steps in an

end-to-end rnc analysis workflow

starting from the transcriptome

profiling of individual samples

differential expression analysis of

samples between two conditions

functional annotation and visualization

of interesting candidates from the analysis

analysis

and lastly integrating with other data

sets as additional evidence to further support

support

these interesting candidates for

experimental validation

the first three steps belong to the

course section of rna-seq analysis

while the last two steps are more

let's start with transcription profiling

of the sample

to profile the expression of known genes

or transcripts in the sample

we generally perform reference guided mapping

we first map the reads in the fastq file

to a reference genome

and gene annotation file to obtain the

raw counts of each gene or transcript

in the bamfall are computed to generate

raw counts

this is in turn normalized for

downstream analysis

i will dive into details on each of

dragon rd pipeline and illumina solution

generates outputs such as the fan file

information and splice junctions

it also provides qc matrix of the fast queues

queues

for rna6 studies of human data or

organisms with well characterized

reference genomes

we can start with the mapping step where

the reads in the fascia files are mapped

to the reference genome or transcriptome

the input are frustu false

reference and genome notation file

this is a peak of the format of a gene

annotation file

where it contains transcript and

external level annotations

that match the reference genome being

mapped against

genome mapping and transcriptome mapping

are performed for different use cases

we perform genome mapping to identify

normal genes or transcripts

in addition to known ones we use splice liners

liners

such as top hat 2 star 2 dragon rna

in contrast the transcriptal mapping is

useful when you're interested

in profiling knowing transcripts the

transcript annotation is comprehensive enough

enough

it unsupplies a liner such as the bow

after obtaining the aligned band files

we want to measure the abundance of a

particular genomic feature in each sample

using the bam file as input we use a

counting tool

together with the gene annotation file

to generate a table of abundance values

depending on the application there are

different approaches to quantify

different genomic features as summarized

by this table

note that all three approaches require a

gene annotation

file the dragon rna pipeline is able to

measure abundance

at the gene and transcript levels which

are useful for gene and trans

expression analysis respectively

for this webinar i'll focus on gene expression

if we obtain raw counts of each gene in

each sample

we introduce normalization to ensure

that we can compare

the expression of each gene across

the input is a table of raw counts where

rows represent the gene and columns

represent the samples the output here is

there are multiple ways of normalizing

we can normalize gene expression by the

library size

different samples have different

sequencing depth which influences the

we can also normalize by gene length as

different genes had different lengths

as a result longer genes would be

counted to have more reads than shorter

most commonly reported measures that

normalized by library size and gene length

length

rfpkm which stands for fragments per kilobase

kilobase

of exon model per million map weeds rpkm

which stands for weeds per kilobase of

exon model per million weeds

and tpm transcripts per million

different tools can adopt different

normalization methods

the dragon rna pipeline generates tpm counts

counts

fpcam rpcm are traditional means of

performing normalization

our pkm is used for single and rna-seq

whereas fp

cams use parent and rna-seq we normalize

for the sequence depth followed by the

gene plant

dpm is a more recent normalization matrix

matrix

where the counter normalized for the g

implant followed by the sequencing depth

and lastly divided by the per million

normalization by distribution is a third method

method

the assumption here is that not all

genes are differentially expressed

the rna composition bias arises when

only a small number of genes are very

highly expressed

in one sample but not in the other

to handle this bias the geometric mean

for each gene is computed across all samples

samples

and the expression values of each gene

in each sample is then normalized by

that mean

d sig 2 and dragon differential

expression pipeline

employ this normalization method before

performing differential expression analysis

after obtaining the normalized counts we

pause for a qc check

to check the quality of the sequencing

data at the count level

to assess biology reproducibility

or batch effects this is important

before we proceed to downstream analysis

the input we use here would be a table

of normalized counts

of all genes across all samples

we first use a dimensionality reduction technique

technique

called principal component analysis that

captures the most variation in that data set

set

in this case it could be the tumor

versus non-malignant

variation across the samples we can

check if the replicates of multiple samples

samples

are clustering as logically as expected

in this case tumor cell replicates are

clustering together

and not mixing with the nonlinear sample replicates

in addition pca can capture variation

attributed by

batch effects that could arise due to

technical differences between your samples

samples

such as the type of sequencing machine

or technician that ran the sample batch

lastly we also want to check for high

correlation of normalized counts

these qc plots are usually performed

with custom scripts by the user

these qez checks are critical to help us identify

identify

any outlier samples or replicants that

we might consider removing before we

now we move over to differential

to identify candidates that are

differentially expressed between two conditions

conditions

say the tumor versus non-malignant

conditions we perform differential

expression analysis

the commonly used tools here are the

dragon differential expression pipeline

dc2 and hr

taking the concatenated expression

values of counts

or normalized values of every gene

across all samples as the input

we perform differential expression

analysis to generate a summary of calculations

calculations

different tools may have different

requirements for the count input

the typical matrix included in the co in

the calculation summary

are the log 2 full change nominal p

values adjusted p values for all genes

to narrow down to interesting and

statistically significant candidates

we can apply empirical cutoffs to the

log2fold change between

two conditions as well as full discovery

rate adjusted p-values

which accounts for multiple hypothesis testing

testing

in addition we can infer expression

signatures or subpopulations by

clustering the normalized gene expression

expression

values across the samples in an

unsupervised manner

the heat map plot here helps us

visualize the magnitude

of the gene expression of two different

sets of samples

using red and green colors

the hierarchical clustering of the gene

expression values by euclidean distance

as depicted by the dendrogram across the

samples here

shows that a subset of genes i.e the

gene express signature

can distinguish the two sets of samples

such as the brain samples and the

quickly

illumina provides the base-based dragon

differential expression app

to help customers perform differential

expression analysis

using the output files that were

generated from the dragon rna app

the output isn't similar to d62

where it generates a csv file of

differentially expressed genes

with fdr's adjusted p-values

in addition to the tables the app also

generates differential expression

matrix as well as visualization plots of

after consolidating a list of

differentially expressed candidates

there are statistically significant we

want to gain greater biological

insights to these candidates by

functional annotation

this would help us to narrow down the

list of candidates to the most

interesting ones

to gain greater biological insights we

adopt statistically driven

methods to characterize the gene

ontologies pathways

or networks that are associated with the

candidates of interest taking a table of

differentially expressed gene candidates

with a

with attributes such as log to full

change the p-value

we perform various functional annotation

analysis to derive a table summary

of gene sets or core expression modules

that are statistically

significantly associated with our list

of candidates

these are common functional enrichment

analysis carried by

open source tools like gsea david

and aluminum-specific condition a

solution called phase-based correlation engine

engine

these analysis help determine whether

there is an enrichment of known logical functions

functions

interactions or pathways in these

candidates relative

the gene core expression network

analysis is helpful

when you want to characterize coexpress

genes that are controlled by the same

transcriptional regulatory program

or functionally related members of the

same pathway

or protein complex this is particularly

relevant to characterize

functions of normal rna species which

visualization is an integral component

of the analysis

that helps us confirm whether our

findings from previous statistically

driven analysis

here we visualize the genomic event

related to the candidate

such as the expression changes the fusion

fusion

or splicing event using bam files or big

big files is input

and tools that we can use like igb

we can visualize these events in more

advanced use cases we want to add more

layers of evidence to the validity of

our candidates

for demonstrating experimental validation

validation

we achieved these by overlaying the rnc

tracks of our data

with the public annotations of

expression commenting profiles of

publicly available data sets

such as the encode cell line database

the gtex database as well as the tcga

cancer tissue database

this would give us insights if our

observations of these candidates in our

data set is generalizable to

in a nutshell we perform visualization

to verify

changes in expression between two conditions

conditions

predictive splice isoform models fusion transcripts

transcripts

as well as overlaying with public

annotations that support the press

we have reached the last step of our rnc analysis

analysis

the pose we want to narrow down to a

small list of candidates

for experimental testing how can we do that

that

we can integrate other data sets from

the public domain as additional layers

of evidence

the objective is to enrich annotation of

our candidates with additional

olympics or clinical data for candidate assessment

assessment

examples of additional data sets or databases

databases

include only data set or meet data types

such as sniper ray data

whole genome sequencing data all these

databases can include icgc

tcga or the gtex database

clinical data types can include survive

survival data

or progression-free survival data of

patients if you are

performing clinical projects

individual data types include chemical

or genetics perturbation data sets

or compound treatment data sets if you

these analysis can be performed by

custom scripting

and are statistically driven by nature illumina

illumina

offers a base-based correlation engine

which can perform

similar analysis examples of integration analysis

analysis

include correlational analysis when you

want to associate the candidates with a

biological process

such as dna methylation

another example could be a survival

analysis when you want to associate a

candidate with a clinical outcome

we have finished describing the typical

workflow of rna-seq analysis

now we will introduce illuminated

solutions that can help you carry out

there are three categories of solutions

that we support

on instrument cloud and on-site server analysis

analysis

briefly you can choose to use local run manager

manager

on your instruments run base base

sequence hub

and drag it on the cloud or run dragon

broadly there are four classes of

rna-seq applications

mmrna sequencing targeted rna sequencing

small iron sequencing and pathogen detection

detection

this table lists the featured solution

associated with each application

in this webinar i will introduce briefly

the solutions for

mrna-seq targeted rna-seq and small rna-seq

rna-seq

stay tuned for future webinars on

illumina's pathogen detection solutions

as described in the previous section on

a typical rna-seq analysis workflow

the mrna sequencing solution is currently

currently

fully available on the cloud-based base-based

base-based

sequence hub app and partially available

on the on-premise dragon server

bus queues are generated from the bcl

files using vcl convert

the fastq files are then fed into the

dragon rna pipeline

to generate quantification files of

individual samples

the dragon drna differential expression pipeline

pipeline

takes all these quantification files and

perform differential expression analysis

note that this rna differential expression

expression

pipeline is currently available on base

base at the moment

the user can then shortlist

statistically significant candidates

from the output of the differential

expression analysis

to gain more biological insights and

verify the correctness of the candidates

the user can then perform functional

annotation analysis

and visualization on these candidates

for applications like targeted rna sequencing

sequencing

illumina provides an audience driven and

cloud-based solution

for the rna amplicon workflow the rna

amplicon workflow enables streamlined

gene expression analysis

the user can select an illumina amplicon panel

panel

or custom panel in this case a custom

manifest is needed

and choose the sample grouping if the

user plans to perform

pairwise differential expression analysis

differential expression analysis is

performed using d62

generating output files such as mapping matrix

matrix

correlation plots and differential

for small rna sequencing illumina

provides an on instrument and

cloud-based solutions

for the small rna workflow small rna app

supports true six small rna sample

preparation kits

the app alliance reads against four

reference databases

and outputs hits to the mature micro rnas

rnas

the isomers and the pv interacting rnas

optionally the app performs novel

precursor discovery

and prairie wise differential expression analysis

analysis

pairwise differential expression

analysis identifies the french expressed

small rna species

for each pair of sample groups in addition

addition

the app also generates sample summaries

adapt trimming summaries

and lastly the output also includes

plots of qc statistics

as well as visualization plots of

macaroni precursors this brings us to

the end of this webinar

i hope you found this webinar useful and

you are ready to kick start your rnc

analysis journey thank you very much for listening

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Fundamentals of RNA-Sequencing (RNA-seq) Analysis

AutoDub

Video Transcript

Summary

Core Theme

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
Fundamentals of RNA-Sequencing (RNA-seq) Analysis