Transcripción de YouTube:
High Dimensional analysis with Phenograph and UMAP - Aug 15, 2024 with Joshua Luthy

Sin ver el video entero: obtén la transcripción completa, busca palabras clave y copia con un solo clic.

AutoDub

Entender Videos de YouTube Extranjeros

Doblaje Inmersivo de YouTube en Español

Supera las barreras del idioma, abraza el contenido de calidad mundial

Usar Gratis

Transcripción del video

Resumen del video

Summary

Core Theme

This webinar introduces and compares dimensionality reduction techniques (t-SNE, UMAP) and clustering algorithms (FlowSOM, PhenoGraph) within the FlowJo software, highlighting their application in analyzing high-dimensional flow cytometry data and demonstrating a new tool, Sleepwalk, for evaluating embedding quality.

Mind Map

Clic para expandir

Haz clic para explorar el mapa mental interactivo completo

welcome to the webinar we're doing

dimensional reduction in clustering and

flojo my name is Josh luy and I'm a

flojo um if you guys have any questions

throughout the webinar feel free to just

type any questions in the chat and I'll

along um so here's

our um agenda for today um

we're going to just do an overview of

dimensional ey reduction and clustering

with a focus on UAP and phenograph um

we'll look at some comparisons as well

to um tisne and

flum then towards the end we'll jump

into flojo and we'll take um a live demo

and overview of a quick setup of some of

the plugins like where to get those

setting up R um then I have an analysis

we can look at and then there's a new

tool I'd want to show you guys is called

Sleepwalk it's a art package and it's a

really cool way to um take a look at

clustering and embedding results you

have um and kind of get a better idea of

how well does that really represent your

data so we'll take a look at that at the

end um so as most of you are familiar

with flow um I'll Focus today um on the

data analysis aspect particularly

dimensional reduction and

clustering um which are key to

data

um so as you

know gometry generates um High

dimensional data often with dozens of

parameters dimensional reduction

techniques like tne and umap are

essential for um visualizing and

understanding this complex

data and understanding how to analyze

this is crucial for researchers and

professionals and various biological

Fields again today we'll focus on key

data analysis aspects that underly effective

interpretations and what we see here

um across here some of these examples

are some of the methods that are

available uh in flo to perform

dimensionality reduction keep in mind

these are just um some of the methods

there are additional ones um that are

available for download um as a plugin

Exchange so dimensionality reduction in

general um the goal here is to create a

a low dimensional

representation of our high-dimensional

data set that preserves the overall

structure of the data as much as

possible Right the method um this method

here tisne that we're showing um and

others help us reduce the complexity of

the data uh while preserving the most

critical information enabling us to see

patterns and relationships that might

missed dimensional reductions help give

us a clear understanding of the data and

can lead to more impactful biological

insights and it's vital for vising

data and here with tne by reducing

Dimensions we can better comprehend

intricate correlations within our data

dimensionality reduction again attempts

to to group events with similar

multi-dimensional expression patterns

together within the dimensionally

reduced data

space this helps to condense complex

data while retaining its essential

information making patterns visible like

we can see in this example here where we

can clearly see the different

populations of cell types that are

plot let talk about the the power of te

some of its basic principles advantages and

and

limitations so tne or t distributed

stoas itic neighbor embedding is a

popular dimensional reduction technique

it's widely used in flow for visualizing

our high dimensional data it was

developed by Lawrence vanderen and

Jeffrey Hinton it's um designed to

capture local structure of the data by

bringing similar data points closer

together in the low dimensional space

typically 2D or or sometimes 3D

representations and TIY works by

converting High dimensional ukian

distances between data points into

conditional probabilities that represent

similarities the algorithm then tries to

minimize the Divergence between these

probabilities in the high dimensional

space and the lower dimensional map

basically it's aims to ensure if two

points are similar in high dimensional

space they're going to remain close to

Dimensions one of the key strengths of

tisne is its ability to create those

clear visually interpretable

interpretable um clusters or Islands

right groupings of events in the low

Dimension space and that makes it uh

especially useful um to just identify

populations by reducing the dimension

dimensionality of the data tne again

helps us to reveal underlying biological

structures such as cell subtypes or

activation States in a more intuitive

manner it's really highly um effective

at capturing local structure so meaning

it excels at revealing small clusters or

um subtle differences between cell

populations and its ability to produce

visually clear and separated clusters um

does make it a power tool for

exploratory data analysis um and

flow T is also um nonlinear so this

allows it to handle complex nonlinear

relationships that often exist in biological

data however uh TC is not without its

limitations right one major drawback is

its computational

intensity um in particular with large

data sets so this can lead to a long

process processing

time and another challenge with T is

it's its tendency to to lose the global

structure so while it's great at

revealing those local relationships it

can distort the overall data landscape

making it difficult to interpret global

patterns or relationships between those

islands or clusters that we

see um additionally tne it's also

sensitive to parameter choices such as

perplexity learning earning rate right

those tunable parameters um that you see

like if you initi if you initiate

tne in flojo so fine-tuning of those

parameters can be tricky and it might

require multiple

iterations um however in flojo we do

offer um opsy and opsy it does aim to

kind of simplify those choices for you

so when um when it's calculating it

attempts to optimize some of those um

par during the calculation process for

instance it might not go the full number

of iterations if it finds that it's not

changing significantly between um each

round of

calculation so and finally um TC is not

does not inherently preserve those

distances between clusters so the spacing

spacing

between um clusters or Islands um and

the output may not really reflect their

true relationship in the high-dimensional

high-dimensional

space so next let's talk about um

um oops sorry I wasn't paying attention

chat so sorry someone's asking slides

will be available after the webinar yes

um those are always available um um from

our website you can download um slides

want someone asked does splitting of

cluster of the same color example the

pink or blue in means unique

subsets so yeah if we go

back so yeah like this the B cells or

the NK cells here there's there's some

um yeah I mean this is just overlaid

with some general um lineage Gates but

we can see at least based on T right

there's some more definition there so

it's probably it's grabbing capturing

more of the local structure of the data

and separating those spaces so if we did

additional gating maybe down into

further subsets of those we would um be

able to determine

okay so back to umap um uniform manifold

approximation and projection and so this

is more recent dimensional ey reduction

technique um developed by leand

mckinness and John

Healey and it's grounded in Concepts

from topological data analysis and

manifold learning it basically works by

um constructing a high dimensional graph

of the data representing the local

relationships between data points and

then it optimizes a low dimensional

representation that tries to preserve

these relationships um as best as it can

the the key idea here is um behind umap

is to main maintain both the local and

Global structures of the data allowing

for more accurate and meaningful

visualizations so here we

see um some of the tunable options that

are um available when you initiate umap

plug-in and flowjo so there's nearest

neighbors minimum distance and number of components

um so

with a nearest neighbors of something

low like a number of

two um we see that umap merely just kind

of glues together some small chains but

due to the the narrow view it fails to

really see how those connect together

any further and this represents the fact

that from A Fine detail point of view

the data is very disconnected and

Scattered throughout the

space as neighbors um start to increase

umf manages to see more of the overall

structure of the data and it glues

together um more of those components to

better kind of convey the broader

structure of the

data um and so by the time we reach you

know neighbors of around 20 we have a

fairly good idea of of the overall view

of the data showing how various um

colors in relate to each other over the

entire or the whole data set if you go

further right more focus is placed on

the overall structure of the data for

example like if you go up to something

like nearest neighbors of 200 right we

might get a plot where the overall

structure is well captured but at a loss

structure and then minimum distance

um this controls how tightly the

resulting umap plot is allowed to pack

points together so quite literally it

just provides the minimum distance apart

that the points are allowed to be in low

dimensional space um so this means that

low values of a minimum distance will

result in tighter groupings of uh events

Um this can can be useful if you're

interested in clustering or in final

finer topological structure large values

of that minimum distance will prevent

you from packing points real tight on

top of one another and will'll focus on

the preservation of the broad

topological structure

instead and then number of components

this you can choose like how many um

components or parameters do you want to

get back from map like if you choose two

you get it like a x and a y right a umap

one and a umap two or you could choose

three you get a x YZ then you can plot

that in a three-dimensional type of view

or you could go even up you know 10 different

components

um so how does this kind of differ from

um tne so one of the major differences

with umap and and tne is again um's

ability to preserve more of the global

structure of the data while tne focuses

heavily on on local relationships umap

um strives to maintain both that local

local and broader Global relationships

giving you a more accurate sense of how

clusters are related to each other this

mean that um Maps can provide a better

overview of the data's overall structure

helping to understand relationships

between different clusters or populations

um yeah someone asked in the chat is

there a typical minimum distance used

for umap analysis so yeah the default is um

0.5 I believe that's the default in the plugin

plugin

um and as we'll we'll see later

um um map's

pretty um forgiving and um perform

pretty well with a lot of those default

settings so um like that default volue

uh value for a 0.5 tends to work pretty

well like all these plots we see here

were created with that default minimum

0.5 um umap is all also really highly

scalable so it can handle large data

sets more efficiently than tne so it

does make it kind of a preferred choice

if you're dealing with um extensive data

um really big panels High number of

events so that computational eff

efficiency is really a significant

advantage to

umap um as it typically runs faster and

requires less memory compared to tne um

again especially working with when

um and another Advantage is again the

the Simplicity in terms of those

parameters that we talked about those

settings there unlike tne where it

requires maybe careful tuning of those

parameters like perplexity and learning

rate umap typically works really well

with just those default settings um when

that reduces like the need for a lot of

sorry I was checking the chat um

so in flow cytometry right umap is um

increasingly being used and adopted due

to its um ability to preserve both local

and Global structures which can be

crucial for accurately identifying and

understanding complex cellular relationships

right researchers might use umap to

explore cellular heterogenity track

changes in cell States or identify rare

cell populations that might not be

visible with traditional grading gating

strategies and in flowjo it's easy to

integrate um map into your

analysis um and so that's a great way to

get this powerful tool for reducing

dimensionality and uncovering those insights

so um just a quick kind of comparison

and overview again so TIY is again

particularly effective at capturing and

preserving the the local structures in

the data and so it excels at separating

those small closely related clusters um

makes it ideal for identifying subtle

differences between

populations tne

um widely used and validated in the

field so it does make it a trusted tool

for many researchers um particularly

when analyzing complex high-dimensional

data some of the cons of

tne um you do lose some of that Global

structure um so there's a tendency for

teing to distort that somewhat while it

excels at that local

clustering um it can lose that overall

relationships between between those

clusters um which can lead to

misinterpretation of how these are

another it's also again computationally

intensive so um especially with large

data sets the processing time can be

long the algorithm requires significant

memory and so that can be a bottleneck

analysis um it's sensitive to par

parameters so results of tne again can

be sensitive to how you input or tune

those parameters um like learning rate and

and

perplexity um so finding the optimal

settings um might require some some different

different

iterations we asking some other experts

that might have used it in the

past and now um looking at umap um it's

more balanced approach by preserving

both local and Global structures of the

data um it's highly scalable and

efficient so it runs faster can handle

larger data sets better than tne um so

this makes it a a practical choice if

you're analyzing um a large data

set again those parameters that you

input to the algorithm um it's less

sensitive to those parameter choices

compared to T So It generally performs

well with those default settings and

this can reduce the the need for um

excessive parameter tuning and it makes

it just easier to

use especially for those that um might

reduction some of the cons though while

umap is good at preserving the overall

structure of the data it can sometimes

produce clusters or or groupings that

are less distinct or more overlapping

compared to tne and so this can make it

harder to visually separate and identify

populations um map's somewhat um less

established is a little bit newer

technique um but it is gaining Traction

in the field and rapidly becoming

popular it may not yet be as widely

tne um

so in some cases though tne um might be

preferred right if you want if you're

primary goal is to explore and visualize

local structures within the data um such

as identifying small really distinct cell

populations um umap might be preferred

if it's important to maintain Global

structure of the data or understand

those broader relationships between

those clusters

um and that can make it ideal for

exploratory analysis and under

understanding overall patterns in your

data or if you're working with um large

data sets again umap um could be benefit

faster okay now we'll talk about um some

clustering so once we've reduced

dimensions in our data clustering

becomes the next crucial step right it's

going to allow us to group similar cells

together based on their expression

profiles helping to identify cell

populations and and

subpopulations so clustering methods

like floome phenograph each have their

strengths and are um used to define

these groups in a way that is uh biologically

meaningful so here we're looking um at

flow sum

so flosum is a a powerful clustering

algorithm um it's specifically designed

for the analysis of flow cytometry and

mass cytometry data so it combines the

strengths of um self-organizing Maps um

with hierarchical clustering to

efficiently identify and organize cell

populations in our high-dimensional data sets

sets

and the core idea behind flosum is to

map High dimensional data onto a lower

dimensional grid um and this is where

similar data points are placed closer

together and then this is achieved

through the self-organizing map right

that grid pattern and it's a a type of

artificial neural network that clusters

similarity then um those maps are

essentially reduced

um to preserve the topological

relationships between data points and as

the sum s or the self-organizing map

learns it adjusts those nodes so that

similar cells are grouped together in

the map forming a pattern that reflects

the underlying structure of the

data and after that s has been trained

Flom performs now the Second Step using

hierarchical clustering and this step

groups those nodes on the map into

larger clusters creating a hierarchical

tree that organizes the data into those

broader cell

populations um so the combination of the

sums with the hierarchical

clustering um allows floome to identify

both fine grained and Broad cell

populations making it a versatile tool

um in

flowjo and so that's what you notice if

you when you start flow some there's

kind of two options there there's like a

grid grid size um that's 10 by 10 so

that's the first um 100 clusters that it

makes onto the grid then there's the

option for meta clusters and that's

where you'd want to input a number of

clusters you might expect to get back

from the data and then it

further um uses hierarchical clustering

to refine those that grid into the

smaller number of meta clusters and

someone's asking if there's any tips on

picking the number of meta clusters to

start with flow Su um that's a tricky

question there's not really like the

best method um and that I would say is

like one of the downsides of of flow sum

um because you have to tell it how many

clusters to to return back um some

people will

um some people will run like a different

type of clustering algorithm like

phenograph or xshift because those will

return back however many clusters that

they find within the data um and then

say if you get back 20 clusters from

phenograph someone then might take that

into flow sum and ask for 20 meta

clusters um and then see how that

performs with their data and then with

the additional outputs that you get

right the self-organizing map and the

heat Maps um that can help you kind of

interpret those results more so than

just like phenograph where you just

you're getting clusters um however there

are other tools right if you wanted to

make those um kind of heat maps from

phenograph it's just pretty convenient

that it happens directly um in

floome that's one method or you know if you've

you've

done um some manual gating and you've

kind of already identified

populations from your manual

Gates that could inform you how many

meta clusters to ask from

flum and then you got to

compare clustering from floome against

approach but floome really has become a

go-to method in in Floetry

due to its ability to effic efficiently

handle those large data sets

um again it's well suited for um data

where there's large number of markers or

cells or where traditional gating

strategies might struggle to capture the

populations again one of this that's

benefits there is its speed and

scalability so the algorithm is really

computationally efficient um allowing it

to process large data sets much faster

methods the hierarchical clustering step

in flosum also helps to ensure that even

small distinct populations are captured

so this this R can be critical in

studies where you're detecting rare events

events

um so in Practical applications flosum

can be used to explore immune

landscape U identify by biomarkers

characterize cellular cellular

heterogeneity in various disease States

including cancer and autoimmune

diseases again another Advantage there

is it's intuitive visualization so this

the S grid provides a a clear um an

interpretable map of the data so it can

help you to understand the your complex

data set and and interpret how the the stream

stream

performed um and a pro tip there when you're

you're

in flow suum and the plug-in UI there's

an advanced button there if you click

that there's some additional options

there like controlling the colors of of

your outputs but one is to create a

summary plot um and it generates a PDF

and it does like all of the

available um visual outputs from floome

so kind different kinds of heat Maps

different kinds of self-organizing maps

um it actually runs a a tne and r and

then overlays all the Clusters onto that

so it does take a little bit of more

time because it is doing that tnak

calculation in R um but the outputs that

you get in that PDF are are pretty nice

um so if it's the first time kind of

exploring um your data you might want to

give that a try generating that um summary

PDF someone's asking um running

phenograph always failed for them there

possible reasons and what's possible solution

um possible

reasons first I was just check with

making sure the r packages are installed

so in the PDF that comes with the

download it lists packages so you can

take that Command put it in R make sure

those are installed if that's fine then

um I would check you know where are you

saving the data um make sure there's not

any problems with flojo being able to

read write information or R for that

matter to read write data to certain

locations right if it's on a server or

something like that um try locally or

try to save as a flowjo archive it from

the archive see if that works um that

might inform you if it does work from an

archive that maybe there's some kind of

permission issue with where your

workspace is saved if it's on a server

for example um another possibility

is um the amount of data you're throwing

at it I mean it is more computationally

intensive than flow some so if

you uh try running on a smaller subset

of events like does it work with the

small number but fail and large data

then maybe you know you're pushing the

amount of um Ram or compute power you

have on your computer so those are some

things to check but um a all these

plugins will output

um results in the output folder so look

for R

script phenograph text

file and if you have that text file you

can just send that in in to flojo bd.com

and then I can take a look at it and

help you from there it's kind of the

easiest ways just to see what's

happening from R so that R

information so um onto phenograph

so phenograph is um an advanced

clustering algorithm and it's designed

um specifically for high dimensional

single cell data um it was developed by

Jacob leine and colleagues and

phenograph is well suited for

identifying distinct cellular

populations within complex heterogeneous

data sets and it really excels at

finding um rare cell types and subtle

differences between cell populations

that might be missed by traditional um

clustering methods

the core idea behind faph is um it

constructs a graph or a network where

each node um represents a cell and the

edges connect nodes that are similar to

each other based on their

high-dimensional feature profiles

um So It Begins by building a k nearest

neighbors graph and in this graph um

each cell is connected to its K nearest

neighbors and forms a network that

captures the local structure of the data

and then from there it um applies

Community detection technique specifically

specifically

louane louane clustering method so um it

applies that to this K nearest neighbors

graph and this louane method uh is an

optimization algorithm and it's used to

detect communities or clusters in these Networks

Networks

so through this process phenograph can

identify those groups of densely

connected cells within the graph which

correspond to those distinct cells

populations or clusters so um the result

is a really really good robust

datadriven clustering that can uncover

those complex relationships between

cells so one of the strengths with

phenograph is its ability to

automatically determine the number of

clusters without requiring the user to

specif ify this parameter in advanced um

which is a limitation in like flow sum for

for

example phenograph

um has become pretty popular tool in

flowetry analysis again due to its

effective Effectiveness in handling um

High dimensional data and its ability to

populations so in flojo it's easy to

integrate flojo into your analysis right

um installing that plug-in um its

ability to handle complex data makes it

um useful for those large scale studies

insufficient um once we have

um our dimensional reduction our

clustering done we often go into cluster

Explorer and this is a tool that allows

us to load in um those results there um

dimensional reductions different you can

load in multiple so you could load in

umap tne

traps um different clustering results

and even in the latest release that's in

fjo 1010 you can choose to load in any

of your manually gated population which

is pretty cool so instead of just having

a cluster if you just have umap and you

want to look at your gates with cluster

Explorer and do this type of analysis

you can do that just from your manual gated

populations so in summary real quick uh

dimensional reduction and clustering are

complimentary techniques so again we're

um we're going to use both those when

we're doing a high dimensional analysis

and both tne and umap produce

biologically meaningful embeddings with

tne generally preserving more local data

structure and umap preserving more of

the global

structure um while dimensional reduction

techniques like tne or umap help us to

visualize the data in a more intuitive

way clustering helps to make a sense of

these visualizations by identifying

which cells are similar and belong in a

together so again in inflow these

techniques are are crucial for

translating complex multi-dimensional

data into actionable um biological

insights there are some challenges still

um like often times The Next Step people

want to um and anate those clusters so

um having a nice method for biological

annotation is a still challenge that um

there's some some options out there but

we haven't implemented anything yet in

fjo um we're we're looking towards some

type of

annotation interactive

visualizations um and thorough

comparisons with other methods um so

um take a demo

demo

um so starting with um just real

quick like our setup plugin setup things

like that right um um I'm right now I'm

on a Windows machine my Mac died so I'm

waiting for um a replacement which I'll

be happy to get it this thing's pretty

underpowered um so we'll see in in

flowjo I won't be I have a analysis it's

already done because this machine just

really isn't won't work well for me

um but so when you're installing R for

Windows or um a Mac's pretty simple

Windows just recommend going with a

default installation it typically

typically will go into program files um

sometimes it might go into on your local

machine app data so like under your

users local account there's an app data

folder that's fine too

um you'll want to install our tools or

whatever version of R you have so the

latest one is 4.4 and again just running

through with the default installation

for that is the best um way to get through

through

that um R is going to need our tools to

compile some packages and make sure you can

can

install um libraries that it

needs and then any of these plugins you

can get from flowjo exchange so umap flow

flow

some cluster Explorer is now native to

um flowjo

in um the latest version

1010 um you won't even really need this

plugin but it's available for like older

versions um but when you download these

plugins they'll come with a zip file you

can extract from the zip and it'll have

a PDF in the jar file you're going to

take that jar file and put it into

into

a folder so like this is where

my plug-in folder is located on this

machine it's on my user account one

drive documents folder I just have a

thing called flowjo plugins

plugins

there our path so don't forget when

you're putting your rpth especially on

Windows see like here this one went to

my app data local Pro programs so don't

forget you're going to go all the way

out to this B x64

folder okay because that's where the

executable that flojo is looking for

folder um when you run some of those

r-based plugins for the first time it's

going to attempt to

automatically um install some of those

packages so the first time it runs it

might take a minute while it's

installing a package the next time you

run it um it obviously won't need to

install again so it should run calculate

faster um one other tip if

um you do have it's like a very [Music]

[Music]

um fresh install of

R is to just go ahead and try to install

package um because you might get um a

prompt on your machine if it's the very

first time installing a a

package and I might say um it's going to

create so it's going to create a folder to

install um our packages where you have

read write permissions because if maybe

you're on a a work computer you might

not have

the permissions to install our packages

in C in like the default location so

like if you go install.

packages and then you're going to

install a package like

PNG um and just hit enter if I do it

here it's already installed so we won't

see the actual message that I'm

describing but

um if you were to get a message pop up

saying oh I'm going to create a folder

for you do you want me to do that just

click yes um and it'll make sure that

going forward when you run a plugin and

it's trying to autoinstall that you

shouldn't have any problems and it's

going to be able to install and put it

wherever it needs so I would say um

install R try to install a package let

it create a folder wherever it needs to

once that first one's good you can go

ahead and close

R and then and try to run a [Music]

[Music]

plugin okay

um so what I have

here um I've already gone through and

run so this is just

um a i down

sampled um just a a

phenotyping data set the 50,000 events

um again because I'm on this really

small Windows machine

um but it has

something forget exactly maybe like 38

30 close to 40 different parameters but

it's just a high parameter amop

phenotyping panel um and I

ran couple different embeddings I ran

tne um with like the default

settings um

tne um I excluded some markers here

because I I didn't want to get into like

really fine local structure of the data

I wanted a

more um less detailed embedding just

for the purpose of this kind of webinar

to show some of these next things we'll

be talking about um but like here's

those iteration perplexity learning rate

these are if you're on opsy

option um this is going to basically try

to tune those for you as it's running um

otherwise you can go manual and try to

adjust some of those things on your on your

your

own it is nice just to go with the opsy

here you have a couple different K

nearest neighbors algorithm so if you

have a um a large number of parameters

millions of events you might want to go

with the approximate random um

projection or the annoy method because

that'll calculate a a little bit faster

and then same thing for the gradient

descent you might want to go with the

Fitz KNE if you have um again really

large number of events High number of

parameters these can speed up T

umap umap

umap

um so you have some different distance metrics

um oh I guess I'm realizing I didn't

this wasn't even on my slides um but

yeah different distance metric so how

it's when it's building the nearest

neighbor graph like when it finds a

neighbor like the metric that it uses to

consider something a neighbor or not

there's different typically ukian

distance is most often used but there's

some different methods

here um another cool feature with um

this newer version of umap is if you

have a pre previous umap

embedding and you load in like a new FCS

file you can

go apply on map I'm going to apply my

new data set to this previous

embedding so it'll

map the new data to

this um previous umap result that we see

in the background so you should

get um comparable almost identical

looking Maps um cavy out there right you

would need the exact same parameters exist

exist

in the two files if you wanted to do

apply on that but this is a good way for

example if you had some really huge

number of events say you had a file with

like um 10 million

cells and 30 different markers

well um you might

try um performing a down sample running

umap the first time on a down sampled

population and then once you have that

first embedding that runs quickly on

your down sampled population go back

select your 10 million events and then

say I'm going to play on map from my

first embedding and that already has

some information calculated about the

nearest neighbors the distances things

like that um and it can aely you to do

that calculation even on a huge number

of events um so that's can be a a way to

get around um an issue maybe if you

don't have the amount of RAM available

computer okay so got some embeddings and

then I went and ran [Music]

[Music]

um t or sorry uh

floome and phenograph clustering and

what we see here I've overlaid at the

subsets somewhere down here

um different T t- Cell subsets um so

I've made those overlays here on both T

and UAP um then I ran phenograph I got

back 20 clusters and then I went to

floome and I said said all right now you

give me 20 clusters and now I've

overlaid those two

sets um here on the left are floome

clusters on my umap and tne and these

are phenograph on umap in

tne and um there's this really cool tool

I wanted to show you guys it's called

Sleepwalk um it's an r

package um it's not available yet um in

flojo I said yet because I want to build

this into a plugin for you guys because

I think it's pretty cool um but it's a

great way to kind of get an idea and

explore your embedding

so right um you get an embedding but how

do you really know some people might

want to go and draw a gate on an

embedding but as you might

um might know or suspect that how do you

really know that if an

embedding um Faithfully really

represents the high dimensional data and

low dimensional space because I mean

they it's trying to do a lot and embed

all that information in just those two

dimensions and so with um Sleepwalk we

can kind

of look at that information in this way so

um like on this side here we have the U

map overlaid with the Clusters and then

on the right is this the Sleepwalk tool

and basically what it's doing is when I

hover my mouse pointer on a

point what it's showing is here's a

scale across the bottom

um and where my pointer is like that

point the color scale when I hover is

the the distances calculated from all

these events in high dimensional space

but shown on two Dimensions here um so

in this way we can kind of

interrogate and move around our

embeddings and get an idea of how well

does this really how well did it really

embed my my data um so like on the left

we have B

cells and I can see if I

hover um this group looks pretty good there's

there's

for this area where I'm hovering in high

dimensional space these events are also

closely related to one another you can

see out on the edges where there's some

the Pinker color redder color so those

are further away

away

um if strangely if you go to this tip

here you see even in these B

cells um for this particular Point

cd48 uh T's fours and eights T cells

um so what I did was I took

took

um these embeddings here that I ran on

this this data

set um and I generated the same type of plot

plot

with that Sleepwalk

oops so let me just scoot this

over and so these are the same umap I

have umap on the left and tne on the

right so if we hover over and kind of look

look

at this area here these are my naive CD4

positive cells and we can see like in

umap space like when I'm covering

covering

um we see that things are pretty

tightly packed in this area when we're

talking about um the high dimensional

dimensional um

distances of of the events

so this

gate at least like so this embedding

where it's separated these two kind of

groups of of events seems to agree well

with um the distances of the data in

high dimensional space when we get to

the sort of Center things start to blur

and spread ac

across um but when get up on this kind

of other lobe again there's um pretty

tight grouping of events at least in U

space you see things aren't as kind of

tightly packed and they're spread around somewhat

somewhat

um and I found this other interesting

area if we look um so here in

umap and tne we have enk mature enk and with

with

floome clusters it kind of split that

into two different clusters I wasn't

able to gate that apart um with what I

had done previously but the clustering

separated that two those that area along

with um in tne space as well so if we go

over and look on this

tne side here um

four so we're in the in K what cluster

is this like it might be 16 and 20 but

anyway um so in tne space when I'm

hovering over this um this area of these

NK cells I can see like in the T

embedding right there's some more like

definition here and there's more local

structure captured and there's these tighter

tighter

Islands um at least in the embedding

space but when I hover over it and I'm

looking at now these distances in the

coloring and the embedding I can see that

um it's really like spread across this this

island where on umap Space um we lose

some of that fine grain local

structure right um we do kind of see

lobe agrees with what we see in high dimensional

dimensional

space and once I come up here we kind of only

only

see events highlighted in this upper

region so that would tell me like I

agree with kind of these clusterings we

get from floome at least on this um map

embedding T's more complicated like

there's this all this additional local

structure that probably if we did some

additional um clustering subclustering

on on um these we might pull apart some

some more detail that tne can find but

capture anyway um really cool tool I

hope to build this into maybe um a

separate tool or maybe a a feature added

on to um some of these embedding

tools to really kind of get an idea of

how how well does the embedding

represent my data and then comparing

clusters again it's called

Sleepwalk there's some other methods for

Sleepwalk one's called K nearest neighbors

neighbors

Sleepwalk um you get a same type of

visual but it shows you K near neighbors

distances instead of um idian distances

here however the K&N

sleepwalks um pretty computationally

intensive and um it tends to crash a

lot this one um works better for me

um that's about um it for my webinar

someone's so if there's any other

questions feel free to type that into

um so someone's

asking um I think here in in sleepwalks

is when I'm comparing and see

differences in tne versus umap is there

a sort of hierarchy of

believability um yeah good question

um I think it's really it's not to say

that um you know

one's better than the other

um or that I should believe one more

than the other just that um depending on

your data there one tool might um do a

better job

at visually representing like what you

want to show in your data

so um it can help you

interpret your manual

Gates um and then the the embed

like how like

maybe and again there's other methods

right there's try Map There's pack map

so some of those might um really pull

apart and show some local structure that

maybe you want to um dive deeper into

with your particular data so

um it can just help you understand if

one embedding might do a better job of

capturing that type of information that

you really want to show show to people but yeah comparing kind

of the clustering with the different methods of dimensionality reduction and

methods of dimensionality reduction and then there's other cool things it can do

then there's other cool things it can do too like instead of just showing

too like instead of just showing different methods like maybe you

different methods like maybe you um like run the embedding on multiple

um like run the embedding on multiple different samples so you can um I think

different samples so you can um I think there's an example so like you can show

there's an example so like you can show like

like here's three different

here's three different samples you can see how those are

samples you can see how those are compared to one

another this one doesn't have the same embedding so

one doesn't have the same embedding so it's kind of makes it a little bit

it's kind of makes it a little bit harder to interpret but with umap like I

harder to interpret but with umap like I was saying there's that apply on map so

was saying there's that apply on map so you could get all the samples under the

you could get all the samples under the same embedding and then split them apart

same embedding and then split them apart based on these samples and maybe the

based on these samples and maybe the samples are like different stim

samples are like different stim conditions and can kind of view the

conditions and can kind of view the results that

way and if you do anything like with single cell RNA seek or Sarat

single cell RNA seek or Sarat package you can use the Sleepwalk tool

package you can use the Sleepwalk tool as

as well anyway I thought that was pretty

well anyway I thought that was pretty cool

so some resources here there's our docs. flo.com it's a great um place to go to

flo.com it's a great um place to go to look

look for any questions on on flojo searchable

for any questions on on flojo searchable documents um Flo university has some

documents um Flo university has some short tutorial

short tutorial videos um then of course there's

videos um then of course there's um where you can get the webinars so

um where you can get the webinars so learn

learn webinars pre recorded is all the

webinars pre recorded is all the pre-recorded

pre-recorded webinars um so this one should be up

webinars um so this one should be up there um maybe next

there um maybe next week and yeah feel free to reach out to

week and yeah feel free to reach out to um flojo bd.com

um flojo bd.com um if you're looking for the slide deck

um if you're looking for the slide deck sooner rather than later um I can try to

sooner rather than later um I can try to send it over

so I have another question but thank you guys for joining um feel free to yeah

guys for joining um feel free to yeah type in any questions someone's asking

type in any questions someone's asking uh what is best practice to concatenate

uh what is best practice to concatenate files together to compare the pattern of

files together to compare the pattern of amuno phenotyping between controls and

amuno phenotyping between controls and treatments yeah good question if you

treatments yeah good question if you want to compare um things like that so

want to compare um things like that so so you'll probably start out with

so you'll probably start out with um your individual files in your

um your individual files in your workspace and you're going to you'll

workspace and you're going to you'll want to leverage um some

want to leverage um some keywords so you'll create

keywords so you'll create keywords that will identify those um

keywords that will identify those um different controls and

different controls and treatments um and

treatments um and then include those when you do the

then include those when you do the concatenation step

concatenation step and

and then um you know you might go through

then um you know you might go through the the traditional High parameter um

the the traditional High parameter um workflow where you're going to do

workflow where you're going to do dimensional reduction clustering and

dimensional reduction clustering and then you'll take that into um cluster

then you'll take that into um cluster Explorer and with cluster Explorer you

Explorer and with cluster Explorer you can um leverage those

can um leverage those keywords as long as you've gated those

keywords as long as you've gated those apart in the catenated file you can load

apart in the catenated file you can load those keywords essentially into cluster

those keywords essentially into cluster Explorer and then make some comparisons

Explorer and then make some comparisons um real

um real easily that

easily that way um I'm trying to think so there's

way um I'm trying to think so there's probably some

examples in our docs page

um I'm not seeing it right off hand um the other one let's check

the other one let's check um Bia University there might be like a

um Bia University there might be like a workflow there that will go through

no so yeah I don't see any short videos so then where you're going to find it is

so then where you're going to find it is if you go to learn

if you go to learn webinars and then recorded

webinars and then recorded webinars and then

going look for some of these like high parameter or like

parameter or like Advanced lojo

webinars um and those will walk through like creating those keywords doing the

like creating those keywords doing the concatenation dimension Auto reduction

concatenation dimension Auto reduction clustering and then pulling all that

clustering and then pulling all that apart in cluster

apart in cluster Explorer

otherwise um you could email flojo bd.com and I can try to find the um a

bd.com and I can try to find the um a good webinar and send you the

link I don't see it off hand but um I can ask group and they can point me to

can ask group and they can point me to towards a a good webinar for

you but these are the resources basically webinars lojo

University well um thanks again everybody um like I said feel free any

everybody um like I said feel free any questions reach out flojo bd.com

questions reach out flojo bd.com um right now there's only a couple of us

um right now there's only a couple of us so um we'll get back to you as soon as

so um we'll get back to you as soon as we can thanks again for

we can thanks again for joining right

Haz clic en cualquier texto o marca de tiempo para ir directamente a ese momento del video

La mayoría de las transcripciones están listas en menos de 5 segundos

Copia con un clicMás de 125 idiomasBuscar en el contenidoIr a marcas de tiempo

Pega la URL de YouTube

Ingresa el enlace de cualquier video de YouTube para obtener la transcripción completa

La mayoría de las transcripciones están listas en menos de 5 segundos

Instala nuestra extensión para Chrome

Obtén transcripciones al instante sin salir de YouTube. Instala nuestra extensión de Chrome y accede con un clic a la transcripción de cualquier video directamente desde la página de reproducción.

Añadir a Chrome — Gratis

Compatible con YouTube, Coursera, Udemy y más plataformas educativas

Obtén transcripciones al instante: ¡Solo cambia el dominio en la barra de direcciones!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

Transcripción de YouTubePreparando tus resultados…

Transcripción de YouTube:High Dimensional analysis with Phenograph and UMAP - Aug 15, 2024 with Joshua Luthy

AutoDub

Transcripción del video

Summary

Core Theme

Pega la URL de YouTube

Formulario de extracción de transcripción

Instala nuestra extensión para Chrome

Obtén transcripciones al instante: ¡Solo cambia el dominio en la barra de direcciones!

Transcripción de YouTube:
High Dimensional analysis with Phenograph and UMAP - Aug 15, 2024 with Joshua Luthy