This webinar introduces and compares dimensionality reduction techniques (t-SNE, UMAP) and clustering algorithms (FlowSOM, PhenoGraph) within the FlowJo software, highlighting their application in analyzing high-dimensional flow cytometry data and demonstrating a new tool, Sleepwalk, for evaluating embedding quality.
Mind Map
Clic para expandir
Haz clic para explorar el mapa mental interactivo completo
welcome to the webinar we're doing
dimensional reduction in clustering and
flojo my name is Josh luy and I'm a
flojo um if you guys have any questions
throughout the webinar feel free to just
type any questions in the chat and I'll
along um so here's
our um agenda for today um
we're going to just do an overview of
dimensional ey reduction and clustering
with a focus on UAP and phenograph um
we'll look at some comparisons as well
to um tisne and
flum then towards the end we'll jump
into flojo and we'll take um a live demo
and overview of a quick setup of some of
the plugins like where to get those
setting up R um then I have an analysis
we can look at and then there's a new
tool I'd want to show you guys is called
Sleepwalk it's a art package and it's a
really cool way to um take a look at
clustering and embedding results you
have um and kind of get a better idea of
how well does that really represent your
data so we'll take a look at that at the
end um so as most of you are familiar
with flow um I'll Focus today um on the
data analysis aspect particularly
dimensional reduction and
clustering um which are key to
data
um so as you
know gometry generates um High
dimensional data often with dozens of
parameters dimensional reduction
techniques like tne and umap are
essential for um visualizing and
understanding this complex
data and understanding how to analyze
this is crucial for researchers and
professionals and various biological
Fields again today we'll focus on key
data analysis aspects that underly effective
interpretations and what we see here
um across here some of these examples
are some of the methods that are
available uh in flo to perform
dimensionality reduction keep in mind
these are just um some of the methods
there are additional ones um that are
available for download um as a plugin
Exchange so dimensionality reduction in
general um the goal here is to create a
a low dimensional
representation of our high-dimensional
data set that preserves the overall
structure of the data as much as
possible Right the method um this method
here tisne that we're showing um and
others help us reduce the complexity of
the data uh while preserving the most
critical information enabling us to see
patterns and relationships that might
missed dimensional reductions help give
us a clear understanding of the data and
can lead to more impactful biological
insights and it's vital for vising
data and here with tne by reducing
Dimensions we can better comprehend
intricate correlations within our data
dimensionality reduction again attempts
to to group events with similar
multi-dimensional expression patterns
together within the dimensionally
reduced data
space this helps to condense complex
data while retaining its essential
information making patterns visible like
we can see in this example here where we
can clearly see the different
populations of cell types that are
plot let talk about the the power of te
some of its basic principles advantages and
and
limitations so tne or t distributed
stoas itic neighbor embedding is a
popular dimensional reduction technique
it's widely used in flow for visualizing
our high dimensional data it was
developed by Lawrence vanderen and
Jeffrey Hinton it's um designed to
capture local structure of the data by
bringing similar data points closer
together in the low dimensional space
typically 2D or or sometimes 3D
representations and TIY works by
converting High dimensional ukian
distances between data points into
conditional probabilities that represent
similarities the algorithm then tries to
minimize the Divergence between these
probabilities in the high dimensional
space and the lower dimensional map
basically it's aims to ensure if two
points are similar in high dimensional
space they're going to remain close to
Dimensions one of the key strengths of
tisne is its ability to create those
clear visually interpretable
interpretable um clusters or Islands
right groupings of events in the low
Dimension space and that makes it uh
especially useful um to just identify
populations by reducing the dimension
dimensionality of the data tne again
helps us to reveal underlying biological
structures such as cell subtypes or
activation States in a more intuitive
manner it's really highly um effective
at capturing local structure so meaning
it excels at revealing small clusters or
um subtle differences between cell
populations and its ability to produce
visually clear and separated clusters um
does make it a power tool for
exploratory data analysis um and
flow T is also um nonlinear so this
allows it to handle complex nonlinear
relationships that often exist in biological
data however uh TC is not without its
limitations right one major drawback is
its computational
intensity um in particular with large
data sets so this can lead to a long
process processing
time and another challenge with T is
it's its tendency to to lose the global
structure so while it's great at
revealing those local relationships it
can distort the overall data landscape
making it difficult to interpret global
patterns or relationships between those
islands or clusters that we
see um additionally tne it's also
sensitive to parameter choices such as
perplexity learning earning rate right
those tunable parameters um that you see
like if you initi if you initiate
tne in flojo so fine-tuning of those
parameters can be tricky and it might
require multiple
iterations um however in flojo we do
offer um opsy and opsy it does aim to
kind of simplify those choices for you
so when um when it's calculating it
attempts to optimize some of those um
par during the calculation process for
instance it might not go the full number
of iterations if it finds that it's not
changing significantly between um each
round of
calculation so and finally um TC is not
does not inherently preserve those
distances between clusters so the spacing
spacing
between um clusters or Islands um and
the output may not really reflect their
true relationship in the high-dimensional
high-dimensional
space so next let's talk about um
um oops sorry I wasn't paying attention
chat so sorry someone's asking slides
will be available after the webinar yes
um those are always available um um from
our website you can download um slides
want someone asked does splitting of
cluster of the same color example the
pink or blue in means unique
subsets so yeah if we go
back so yeah like this the B cells or
the NK cells here there's there's some
um yeah I mean this is just overlaid
with some general um lineage Gates but
we can see at least based on T right
there's some more definition there so
it's probably it's grabbing capturing
more of the local structure of the data
and separating those spaces so if we did
additional gating maybe down into
further subsets of those we would um be
able to determine
okay so back to umap um uniform manifold
approximation and projection and so this
is more recent dimensional ey reduction
technique um developed by leand
mckinness and John
Healey and it's grounded in Concepts
from topological data analysis and
manifold learning it basically works by
um constructing a high dimensional graph
of the data representing the local
relationships between data points and
then it optimizes a low dimensional
representation that tries to preserve
these relationships um as best as it can
the the key idea here is um behind umap
is to main maintain both the local and
Global structures of the data allowing
for more accurate and meaningful
visualizations so here we
see um some of the tunable options that
are um available when you initiate umap
plug-in and flowjo so there's nearest
neighbors minimum distance and number of components
um so
with a nearest neighbors of something
low like a number of
two um we see that umap merely just kind
of glues together some small chains but
due to the the narrow view it fails to
really see how those connect together
any further and this represents the fact
that from A Fine detail point of view
the data is very disconnected and
Scattered throughout the
space as neighbors um start to increase
umf manages to see more of the overall
structure of the data and it glues
together um more of those components to
better kind of convey the broader
structure of the
data um and so by the time we reach you
know neighbors of around 20 we have a
fairly good idea of of the overall view
of the data showing how various um
colors in relate to each other over the
entire or the whole data set if you go
further right more focus is placed on
the overall structure of the data for
example like if you go up to something
like nearest neighbors of 200 right we
might get a plot where the overall
structure is well captured but at a loss
structure and then minimum distance
um this controls how tightly the
resulting umap plot is allowed to pack
points together so quite literally it
just provides the minimum distance apart
that the points are allowed to be in low
dimensional space um so this means that
low values of a minimum distance will
result in tighter groupings of uh events
Um this can can be useful if you're
interested in clustering or in final
finer topological structure large values
of that minimum distance will prevent
you from packing points real tight on
top of one another and will'll focus on
the preservation of the broad
topological structure
instead and then number of components
this you can choose like how many um
components or parameters do you want to
get back from map like if you choose two
you get it like a x and a y right a umap
one and a umap two or you could choose
three you get a x YZ then you can plot
that in a three-dimensional type of view
or you could go even up you know 10 different
components
um so how does this kind of differ from
um tne so one of the major differences
with umap and and tne is again um's
ability to preserve more of the global
structure of the data while tne focuses
heavily on on local relationships umap
um strives to maintain both that local
local and broader Global relationships
giving you a more accurate sense of how
clusters are related to each other this
mean that um Maps can provide a better
overview of the data's overall structure
helping to understand relationships
between different clusters or populations
um yeah someone asked in the chat is
there a typical minimum distance used
for umap analysis so yeah the default is um
um
0.5 I believe that's the default in the plugin
plugin
um and as we'll we'll see later
um um map's
pretty um forgiving and um perform
pretty well with a lot of those default
settings so um like that default volue
uh value for a 0.5 tends to work pretty
well like all these plots we see here
were created with that default minimum
0.5 um umap is all also really highly
scalable so it can handle large data
sets more efficiently than tne so it
does make it kind of a preferred choice
if you're dealing with um extensive data
um really big panels High number of
events so that computational eff
efficiency is really a significant
advantage to
umap um as it typically runs faster and
requires less memory compared to tne um
again especially working with when
um and another Advantage is again the
the Simplicity in terms of those
parameters that we talked about those
settings there unlike tne where it
requires maybe careful tuning of those
parameters like perplexity and learning
rate umap typically works really well
with just those default settings um when
that reduces like the need for a lot of
sorry I was checking the chat um
um
so in flow cytometry right umap is um
increasingly being used and adopted due
to its um ability to preserve both local
and Global structures which can be
crucial for accurately identifying and
understanding complex cellular relationships
right researchers might use umap to
explore cellular heterogenity track
changes in cell States or identify rare
cell populations that might not be
visible with traditional grading gating
strategies and in flowjo it's easy to
integrate um map into your
analysis um and so that's a great way to
get this powerful tool for reducing
dimensionality and uncovering those insights
so um just a quick kind of comparison
and overview again so TIY is again
particularly effective at capturing and
preserving the the local structures in
the data and so it excels at separating
those small closely related clusters um
makes it ideal for identifying subtle
differences between
populations tne
um widely used and validated in the
field so it does make it a trusted tool
for many researchers um particularly
when analyzing complex high-dimensional
data some of the cons of
tne um you do lose some of that Global
structure um so there's a tendency for
teing to distort that somewhat while it
excels at that local
clustering um it can lose that overall
relationships between between those
clusters um which can lead to
misinterpretation of how these are
another it's also again computationally
intensive so um especially with large
data sets the processing time can be
long the algorithm requires significant
memory and so that can be a bottleneck
analysis um it's sensitive to par
parameters so results of tne again can
be sensitive to how you input or tune
those parameters um like learning rate and
and
perplexity um so finding the optimal
settings um might require some some different
different
iterations we asking some other experts
that might have used it in the
past and now um looking at umap um it's
more balanced approach by preserving
both local and Global structures of the
data um it's highly scalable and
efficient so it runs faster can handle
larger data sets better than tne um so
this makes it a a practical choice if
you're analyzing um a large data
set again those parameters that you
input to the algorithm um it's less
sensitive to those parameter choices
compared to T So It generally performs
well with those default settings and
this can reduce the the need for um
excessive parameter tuning and it makes
it just easier to
use especially for those that um might
reduction some of the cons though while
umap is good at preserving the overall
structure of the data it can sometimes
produce clusters or or groupings that
are less distinct or more overlapping
compared to tne and so this can make it
harder to visually separate and identify
populations um map's somewhat um less
established is a little bit newer
technique um but it is gaining Traction
in the field and rapidly becoming
popular it may not yet be as widely
tne um
so in some cases though tne um might be
preferred right if you want if you're
primary goal is to explore and visualize
local structures within the data um such
as identifying small really distinct cell
populations um umap might be preferred
if it's important to maintain Global
structure of the data or understand
those broader relationships between
those clusters
um and that can make it ideal for
exploratory analysis and under
understanding overall patterns in your
data or if you're working with um large
data sets again umap um could be benefit
faster okay now we'll talk about um some
clustering so once we've reduced
dimensions in our data clustering
becomes the next crucial step right it's
going to allow us to group similar cells
together based on their expression
profiles helping to identify cell
populations and and
subpopulations so clustering methods
like floome phenograph each have their
strengths and are um used to define
these groups in a way that is uh biologically
meaningful so here we're looking um at
flow sum
so flosum is a a powerful clustering
algorithm um it's specifically designed
for the analysis of flow cytometry and
mass cytometry data so it combines the
strengths of um self-organizing Maps um
with hierarchical clustering to
efficiently identify and organize cell
populations in our high-dimensional data sets
sets
and the core idea behind flosum is to
map High dimensional data onto a lower
dimensional grid um and this is where
similar data points are placed closer
together and then this is achieved
through the self-organizing map right
that grid pattern and it's a a type of
artificial neural network that clusters
similarity then um those maps are
essentially reduced
um to preserve the topological
relationships between data points and as
the sum s or the self-organizing map
learns it adjusts those nodes so that
similar cells are grouped together in
the map forming a pattern that reflects
the underlying structure of the
data and after that s has been trained
Flom performs now the Second Step using
hierarchical clustering and this step
groups those nodes on the map into
larger clusters creating a hierarchical
tree that organizes the data into those
broader cell
populations um so the combination of the
sums with the hierarchical
clustering um allows floome to identify
both fine grained and Broad cell
populations making it a versatile tool
um in
flowjo and so that's what you notice if
you when you start flow some there's
kind of two options there there's like a
grid grid size um that's 10 by 10 so
that's the first um 100 clusters that it
makes onto the grid then there's the
option for meta clusters and that's
where you'd want to input a number of
clusters you might expect to get back
from the data and then it
further um uses hierarchical clustering
to refine those that grid into the
smaller number of meta clusters and
someone's asking if there's any tips on
picking the number of meta clusters to
start with flow Su um that's a tricky
question there's not really like the
best method um and that I would say is
like one of the downsides of of flow sum
um because you have to tell it how many
clusters to to return back um some
people will
um some people will run like a different
type of clustering algorithm like
phenograph or xshift because those will
return back however many clusters that
they find within the data um and then
say if you get back 20 clusters from
phenograph someone then might take that
into flow sum and ask for 20 meta
clusters um and then see how that
performs with their data and then with
the additional outputs that you get
right the self-organizing map and the
heat Maps um that can help you kind of
interpret those results more so than
just like phenograph where you just
you're getting clusters um however there
are other tools right if you wanted to
make those um kind of heat maps from
phenograph it's just pretty convenient
that it happens directly um in
floome that's one method or you know if you've
you've
done um some manual gating and you've
kind of already identified
populations from your manual
Gates that could inform you how many
meta clusters to ask from
flum and then you got to
compare clustering from floome against
approach but floome really has become a
go-to method in in Floetry
due to its ability to effic efficiently
handle those large data sets
um again it's well suited for um data
where there's large number of markers or
cells or where traditional gating
strategies might struggle to capture the
populations again one of this that's
benefits there is its speed and
scalability so the algorithm is really
computationally efficient um allowing it
to process large data sets much faster
methods the hierarchical clustering step
in flosum also helps to ensure that even
small distinct populations are captured
so this this R can be critical in
studies where you're detecting rare events
events
um so in Practical applications flosum
can be used to explore immune
landscape U identify by biomarkers
characterize cellular cellular
heterogeneity in various disease States
including cancer and autoimmune
diseases again another Advantage there
is it's intuitive visualization so this
the S grid provides a a clear um an
interpretable map of the data so it can
help you to understand the your complex
data set and and interpret how the the stream
stream
performed um and a pro tip there when you're
you're
in flow suum and the plug-in UI there's
an advanced button there if you click
that there's some additional options
there like controlling the colors of of
your outputs but one is to create a
summary plot um and it generates a PDF
and it does like all of the
available um visual outputs from floome
so kind different kinds of heat Maps
different kinds of self-organizing maps
um it actually runs a a tne and r and
then overlays all the Clusters onto that
so it does take a little bit of more
time because it is doing that tnak
calculation in R um but the outputs that
you get in that PDF are are pretty nice
um so if it's the first time kind of
exploring um your data you might want to
give that a try generating that um summary
PDF someone's asking um running
phenograph always failed for them there
possible reasons and what's possible solution
um possible
reasons first I was just check with
making sure the r packages are installed
so in the PDF that comes with the
download it lists packages so you can
take that Command put it in R make sure
those are installed if that's fine then
um I would check you know where are you
saving the data um make sure there's not
any problems with flojo being able to
read write information or R for that
matter to read write data to certain
locations right if it's on a server or
something like that um try locally or
try to save as a flowjo archive it from
the archive see if that works um that
might inform you if it does work from an
archive that maybe there's some kind of
permission issue with where your
workspace is saved if it's on a server
for example um another possibility
is um the amount of data you're throwing
at it I mean it is more computationally
intensive than flow some so if
you uh try running on a smaller subset
of events like does it work with the
small number but fail and large data
then maybe you know you're pushing the
amount of um Ram or compute power you
have on your computer so those are some
things to check but um a all these
plugins will output
um results in the output folder so look
for R
script phenograph text
file and if you have that text file you
can just send that in in to flojo bd.com
and then I can take a look at it and
help you from there it's kind of the
easiest ways just to see what's
happening from R so that R
information so um onto phenograph
so phenograph is um an advanced
clustering algorithm and it's designed
um specifically for high dimensional
single cell data um it was developed by
Jacob leine and colleagues and
phenograph is well suited for
identifying distinct cellular
populations within complex heterogeneous
data sets and it really excels at
finding um rare cell types and subtle
differences between cell populations
that might be missed by traditional um
clustering methods
the core idea behind faph is um it
constructs a graph or a network where
each node um represents a cell and the
edges connect nodes that are similar to
each other based on their
high-dimensional feature profiles
um So It Begins by building a k nearest
neighbors graph and in this graph um
each cell is connected to its K nearest
neighbors and forms a network that
captures the local structure of the data
and then from there it um applies
Community detection technique specifically
specifically
louane louane clustering method so um it
applies that to this K nearest neighbors
graph and this louane method uh is an
optimization algorithm and it's used to
detect communities or clusters in these Networks
Networks
so through this process phenograph can
identify those groups of densely
connected cells within the graph which
correspond to those distinct cells
populations or clusters so um the result
is a really really good robust
datadriven clustering that can uncover
those complex relationships between
cells so one of the strengths with
phenograph is its ability to
automatically determine the number of
clusters without requiring the user to
specif ify this parameter in advanced um
which is a limitation in like flow sum for
for
example phenograph
um has become pretty popular tool in
flowetry analysis again due to its
effective Effectiveness in handling um
High dimensional data and its ability to
populations so in flojo it's easy to
integrate flojo into your analysis right
um installing that plug-in um its
ability to handle complex data makes it
um useful for those large scale studies
insufficient um once we have
um our dimensional reduction our
clustering done we often go into cluster
Explorer and this is a tool that allows
us to load in um those results there um
dimensional reductions different you can
load in multiple so you could load in
umap tne
traps um different clustering results
and even in the latest release that's in
fjo 1010 you can choose to load in any
of your manually gated population which
is pretty cool so instead of just having
a cluster if you just have umap and you
want to look at your gates with cluster
Explorer and do this type of analysis
you can do that just from your manual gated
populations so in summary real quick uh
dimensional reduction and clustering are
complimentary techniques so again we're
um we're going to use both those when
we're doing a high dimensional analysis
and both tne and umap produce
biologically meaningful embeddings with
tne generally preserving more local data
structure and umap preserving more of
the global
structure um while dimensional reduction
techniques like tne or umap help us to
visualize the data in a more intuitive
way clustering helps to make a sense of
these visualizations by identifying
which cells are similar and belong in a
together so again in inflow these
techniques are are crucial for
translating complex multi-dimensional
data into actionable um biological
insights there are some challenges still
um like often times The Next Step people
want to um and anate those clusters so
um having a nice method for biological
annotation is a still challenge that um
there's some some options out there but
we haven't implemented anything yet in
fjo um we're we're looking towards some
type of
annotation interactive
visualizations um and thorough
comparisons with other methods um so
um take a demo
demo
um so starting with um just real
quick like our setup plugin setup things
like that right um um I'm right now I'm
on a Windows machine my Mac died so I'm
waiting for um a replacement which I'll
be happy to get it this thing's pretty
underpowered um so we'll see in in
flowjo I won't be I have a analysis it's
already done because this machine just
really isn't won't work well for me
um but so when you're installing R for
Windows or um a Mac's pretty simple
Windows just recommend going with a
default installation it typically
typically will go into program files um
sometimes it might go into on your local
machine app data so like under your
users local account there's an app data
folder that's fine too
um you'll want to install our tools or
whatever version of R you have so the
latest one is 4.4 and again just running
through with the default installation
for that is the best um way to get through
through
that um R is going to need our tools to
compile some packages and make sure you can
can
install um libraries that it
needs and then any of these plugins you
can get from flowjo exchange so umap flow
flow
some cluster Explorer is now native to
um flowjo
in um the latest version
1010 um you won't even really need this
plugin but it's available for like older
versions um but when you download these
plugins they'll come with a zip file you
can extract from the zip and it'll have
a PDF in the jar file you're going to
take that jar file and put it into
into
a folder so like this is where
my plug-in folder is located on this
machine it's on my user account one
drive documents folder I just have a
thing called flowjo plugins
plugins
there our path so don't forget when
you're putting your rpth especially on
Windows see like here this one went to
my app data local Pro programs so don't
forget you're going to go all the way
out to this B x64
folder okay because that's where the
executable that flojo is looking for
folder um when you run some of those
r-based plugins for the first time it's
going to attempt to
automatically um install some of those
packages so the first time it runs it
might take a minute while it's
installing a package the next time you
run it um it obviously won't need to
install again so it should run calculate
faster um one other tip if
um you do have it's like a very [Music]
[Music]
um fresh install of
R is to just go ahead and try to install
package um because you might get um a
prompt on your machine if it's the very
first time installing a a
package and I might say um it's going to
create so it's going to create a folder to
to
install um our packages where you have
read write permissions because if maybe
you're on a a work computer you might
not have
the permissions to install our packages
in C in like the default location so
like if you go install.
packages and then you're going to
install a package like
PNG um and just hit enter if I do it
here it's already installed so we won't
see the actual message that I'm
describing but
um if you were to get a message pop up
saying oh I'm going to create a folder
for you do you want me to do that just
click yes um and it'll make sure that
going forward when you run a plugin and
it's trying to autoinstall that you
shouldn't have any problems and it's
going to be able to install and put it
wherever it needs so I would say um
install R try to install a package let
it create a folder wherever it needs to
once that first one's good you can go
ahead and close
R and then and try to run a [Music]
[Music]
plugin okay
um so what I have
here um I've already gone through and
run so this is just
um a i down
sampled um just a a
phenotyping data set the 50,000 events
um again because I'm on this really
small Windows machine
um but it has
something forget exactly maybe like 38
30 close to 40 different parameters but
it's just a high parameter amop
phenotyping panel um and I
ran couple different embeddings I ran
tne um with like the default
settings um
tne um I excluded some markers here
because I I didn't want to get into like
really fine local structure of the data
I wanted a
more um less detailed embedding just
for the purpose of this kind of webinar
to show some of these next things we'll
be talking about um but like here's
those iteration perplexity learning rate
these are if you're on opsy
option um this is going to basically try
to tune those for you as it's running um
otherwise you can go manual and try to
adjust some of those things on your on your
your
own it is nice just to go with the opsy
here you have a couple different K
nearest neighbors algorithm so if you
have a um a large number of parameters
millions of events you might want to go
with the approximate random um
projection or the annoy method because
that'll calculate a a little bit faster
and then same thing for the gradient
descent you might want to go with the
Fitz KNE if you have um again really
large number of events High number of
parameters these can speed up T
umap umap
umap
um so you have some different distance metrics
um oh I guess I'm realizing I didn't
this wasn't even on my slides um but
yeah different distance metric so how
it's when it's building the nearest
neighbor graph like when it finds a
neighbor like the metric that it uses to
consider something a neighbor or not
there's different typically ukian
distance is most often used but there's
some different methods
here um another cool feature with um
this newer version of umap is if you
have a pre previous umap
embedding and you load in like a new FCS
file you can
go apply on map I'm going to apply my
new data set to this previous
embedding so it'll
map the new data to
this um previous umap result that we see
in the background so you should
get um comparable almost identical
looking Maps um cavy out there right you
would need the exact same parameters exist
exist
in the two files if you wanted to do
apply on that but this is a good way for
example if you had some really huge
number of events say you had a file with
like um 10 million
cells and 30 different markers
well um you might
try um performing a down sample running
umap the first time on a down sampled
population and then once you have that
first embedding that runs quickly on
your down sampled population go back
select your 10 million events and then
say I'm going to play on map from my
first embedding and that already has
some information calculated about the
nearest neighbors the distances things
like that um and it can aely you to do
that calculation even on a huge number
of events um so that's can be a a way to
get around um an issue maybe if you
don't have the amount of RAM available
computer okay so got some embeddings and
then I went and ran [Music]
[Music]
um t or sorry uh
floome and phenograph clustering and
what we see here I've overlaid at the
like
subsets somewhere down here
um different T t- Cell subsets um so
I've made those overlays here on both T
and UAP um then I ran phenograph I got
back 20 clusters and then I went to
floome and I said said all right now you
give me 20 clusters and now I've
overlaid those two
sets um here on the left are floome
clusters on my umap and tne and these
are phenograph on umap in
tne and um there's this really cool tool
I wanted to show you guys it's called
Sleepwalk um it's an r
package um it's not available yet um in
flojo I said yet because I want to build
this into a plugin for you guys because
I think it's pretty cool um but it's a
great way to kind of get an idea and
explore your embedding
so right um you get an embedding but how
do you really know some people might
want to go and draw a gate on an
embedding but as you might
um might know or suspect that how do you
really know that if an
embedding um Faithfully really
represents the high dimensional data and
low dimensional space because I mean
they it's trying to do a lot and embed
all that information in just those two
dimensions and so with um Sleepwalk we
can kind
of look at that information in this way so
so
um like on this side here we have the U
map overlaid with the Clusters and then
on the right is this the Sleepwalk tool
and basically what it's doing is when I
hover my mouse pointer on a
point what it's showing is here's a
scale across the bottom
um and where my pointer is like that
point the color scale when I hover is
the the distances calculated from all
these events in high dimensional space
but shown on two Dimensions here um so
in this way we can kind of
interrogate and move around our
embeddings and get an idea of how well
does this really how well did it really
embed my my data um so like on the left
we have B
cells and I can see if I
hover um this group looks pretty good there's
there's
for this area where I'm hovering in high
dimensional space these events are also
closely related to one another you can
see out on the edges where there's some
the Pinker color redder color so those
are further away
away
um if strangely if you go to this tip
here you see even in these B
cells um for this particular Point
cd48 uh T's fours and eights T cells
um so what I did was I took
took
um these embeddings here that I ran on
this this data
set um and I generated the same type of plot
plot
with that Sleepwalk
oops so let me just scoot this
over and so these are the same umap I
have umap on the left and tne on the
right so if we hover over and kind of look
look
at this area here these are my naive CD4
positive cells and we can see like in
umap space like when I'm covering
covering
um we see that things are pretty
tightly packed in this area when we're
talking about um the high dimensional
dimensional um
um
distances of of the events
so this
gate at least like so this embedding
where it's separated these two kind of
groups of of events seems to agree well
with um the distances of the data in
high dimensional space when we get to
the sort of Center things start to blur
and spread ac
across um but when get up on this kind
of other lobe again there's um pretty
tight grouping of events at least in U
space you see things aren't as kind of
tightly packed and they're spread around somewhat
somewhat
um and I found this other interesting
area if we look um so here in
umap and tne we have enk mature enk and with
with
floome clusters it kind of split that
into two different clusters I wasn't
able to gate that apart um with what I
had done previously but the clustering
separated that two those that area along
with um in tne space as well so if we go
over and look on this
tne side here um
um
four so we're in the in K what cluster
is this like it might be 16 and 20 but
anyway um so in tne space when I'm
hovering over this um this area of these
NK cells I can see like in the T
embedding right there's some more like
definition here and there's more local
structure captured and there's these tighter
tighter
Islands um at least in the embedding
space but when I hover over it and I'm
looking at now these distances in the
coloring and the embedding I can see that
um it's really like spread across this this
island where on umap Space um we lose
some of that fine grain local
structure right um we do kind of see
lobe agrees with what we see in high dimensional
dimensional
space and once I come up here we kind of only
only
see events highlighted in this upper
region so that would tell me like I
agree with kind of these clusterings we
get from floome at least on this um map
embedding T's more complicated like
there's this all this additional local
structure that probably if we did some
additional um clustering subclustering
on on um these we might pull apart some
some more detail that tne can find but
capture anyway um really cool tool I
hope to build this into maybe um a
separate tool or maybe a a feature added
on to um some of these embedding
tools to really kind of get an idea of
how how well does the embedding
represent my data and then comparing
clusters again it's called
Sleepwalk there's some other methods for
Sleepwalk one's called K nearest neighbors
neighbors
Sleepwalk um you get a same type of
visual but it shows you K near neighbors
distances instead of um idian distances
here however the K&N
sleepwalks um pretty computationally
intensive and um it tends to crash a
lot this one um works better for me
um that's about um it for my webinar
someone's so if there's any other
questions feel free to type that into
um so someone's
asking um I think here in in sleepwalks
is when I'm comparing and see
differences in tne versus umap is there
a sort of hierarchy of
believability um yeah good question
um I think it's really it's not to say
that um you know
one's better than the other
um or that I should believe one more
than the other just that um depending on
your data there one tool might um do a
better job
at visually representing like what you
want to show in your data
so um it can help you
interpret your manual
Gates um and then the the embed
like how like
maybe and again there's other methods
right there's try Map There's pack map
so some of those might um really pull
apart and show some local structure that
maybe you want to um dive deeper into
with your particular data so
um it can just help you understand if
one embedding might do a better job of
capturing that type of information that
you really want to show show to people but yeah comparing kind
of the clustering with the different methods of dimensionality reduction and
methods of dimensionality reduction and then there's other cool things it can do
then there's other cool things it can do too like instead of just showing
too like instead of just showing different methods like maybe you
different methods like maybe you um like run the embedding on multiple
um like run the embedding on multiple different samples so you can um I think
different samples so you can um I think there's an example so like you can show
there's an example so like you can show like
like here's three different
here's three different samples you can see how those are
samples you can see how those are compared to one
another this one doesn't have the same embedding so
one doesn't have the same embedding so it's kind of makes it a little bit
it's kind of makes it a little bit harder to interpret but with umap like I
harder to interpret but with umap like I was saying there's that apply on map so
was saying there's that apply on map so you could get all the samples under the
you could get all the samples under the same embedding and then split them apart
same embedding and then split them apart based on these samples and maybe the
based on these samples and maybe the samples are like different stim
samples are like different stim conditions and can kind of view the
conditions and can kind of view the results that
way and if you do anything like with single cell RNA seek or Sarat
single cell RNA seek or Sarat package you can use the Sleepwalk tool
package you can use the Sleepwalk tool as
as well anyway I thought that was pretty
well anyway I thought that was pretty cool
so some resources here there's our docs. flo.com it's a great um place to go to
flo.com it's a great um place to go to look
look for any questions on on flojo searchable
for any questions on on flojo searchable documents um Flo university has some
documents um Flo university has some short tutorial
short tutorial videos um then of course there's
videos um then of course there's um where you can get the webinars so
um where you can get the webinars so learn
learn webinars pre recorded is all the
webinars pre recorded is all the pre-recorded
pre-recorded webinars um so this one should be up
webinars um so this one should be up there um maybe next
there um maybe next week and yeah feel free to reach out to
week and yeah feel free to reach out to um flojo bd.com
um flojo bd.com um if you're looking for the slide deck
um if you're looking for the slide deck sooner rather than later um I can try to
sooner rather than later um I can try to send it over
so I have another question but thank you guys for joining um feel free to yeah
guys for joining um feel free to yeah type in any questions someone's asking
type in any questions someone's asking uh what is best practice to concatenate
uh what is best practice to concatenate files together to compare the pattern of
files together to compare the pattern of amuno phenotyping between controls and
amuno phenotyping between controls and treatments yeah good question if you
treatments yeah good question if you want to compare um things like that so
want to compare um things like that so so you'll probably start out with
so you'll probably start out with um your individual files in your
um your individual files in your workspace and you're going to you'll
workspace and you're going to you'll want to leverage um some
want to leverage um some keywords so you'll create
keywords so you'll create keywords that will identify those um
keywords that will identify those um different controls and
different controls and treatments um and
treatments um and then include those when you do the
then include those when you do the concatenation step
concatenation step and
and then um you know you might go through
then um you know you might go through the the traditional High parameter um
the the traditional High parameter um workflow where you're going to do
workflow where you're going to do dimensional reduction clustering and
dimensional reduction clustering and then you'll take that into um cluster
then you'll take that into um cluster Explorer and with cluster Explorer you
Explorer and with cluster Explorer you can um leverage those
can um leverage those keywords as long as you've gated those
keywords as long as you've gated those apart in the catenated file you can load
apart in the catenated file you can load those keywords essentially into cluster
those keywords essentially into cluster Explorer and then make some comparisons
Explorer and then make some comparisons um real
um real easily that
easily that way um I'm trying to think so there's
way um I'm trying to think so there's probably some
examples in our docs page
um I'm not seeing it right off hand um the other one let's check
the other one let's check um Bia University there might be like a
um Bia University there might be like a workflow there that will go through
no so yeah I don't see any short videos so then where you're going to find it is
so then where you're going to find it is if you go to learn
if you go to learn webinars and then recorded
webinars and then recorded webinars and then
going look for some of these like high parameter or like
parameter or like Advanced lojo
webinars um and those will walk through like creating those keywords doing the
like creating those keywords doing the concatenation dimension Auto reduction
concatenation dimension Auto reduction clustering and then pulling all that
clustering and then pulling all that apart in cluster
apart in cluster Explorer
otherwise um you could email flojo bd.com and I can try to find the um a
bd.com and I can try to find the um a good webinar and send you the
link I don't see it off hand but um I can ask group and they can point me to
can ask group and they can point me to towards a a good webinar for
you but these are the resources basically webinars lojo
University well um thanks again everybody um like I said feel free any
everybody um like I said feel free any questions reach out flojo bd.com
questions reach out flojo bd.com um right now there's only a couple of us
um right now there's only a couple of us so um we'll get back to you as soon as
so um we'll get back to you as soon as we can thanks again for
we can thanks again for joining right
Haz clic en cualquier texto o marca de tiempo para ir directamente a ese momento del video
Compartir:
La mayoría de las transcripciones están listas en menos de 5 segundos
Copia con un clicMás de 125 idiomasBuscar en el contenidoIr a marcas de tiempo
Pega la URL de YouTube
Ingresa el enlace de cualquier video de YouTube para obtener la transcripción completa
Formulario de extracción de transcripción
La mayoría de las transcripciones están listas en menos de 5 segundos
Instala nuestra extensión para Chrome
Obtén transcripciones al instante sin salir de YouTube. Instala nuestra extensión de Chrome y accede con un clic a la transcripción de cualquier video directamente desde la página de reproducción.