Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
The AI Data Engine Behind 8M Devices & Petabytes of Web Data | Andrej Radonjic, Grass (#68) - AI Summary, Mind Map & Transcript | Fluence | YouTubeToText
YouTube Transcript: The AI Data Engine Behind 8M Devices & Petabytes of Web Data | Andrej Radonjic, Grass (#68)
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
This podcast episode introduces the Grass Protocol, a decentralized data infrastructure project that leverages a global network of residential devices to scrape the internet for data, primarily for AI development, and rewards participants with network ownership through tokens.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
Hello everybody. Welcome to the Deepin
podcast. I'm your host Tom Trobridge.
We're excited to bring you interviews
with the key founders in the deepin
landscape, investors and ecosystem
partners. This is an exciting time for
Deepin and this year we're going to be
meeting even more founders learning
about the projects that they're
building, about the traction they're
getting, and about the revenue that
they're generating. The podcast is
brought to you by Fluence. Fluence is
building a decentralized compute
platform. It's a project I'm a
co-founder of and really excited about.
So, buckle up for this year. Please
subscribe to the channel, listen to us
on Spotify or Apple. And of course, if
you are around any of the large
conferences, come to our deepen day
where you can meet these founders, the
Fluent team in person and hear from all
the ecosystem partners as well. So, look
forward to seeing you in person and
let's get going on this podcast. Welcome
to the Deepin podcast. I'm your host,
Tom Trobridge. Thrilled to be here today
with Andre um founder and CEO of Winged
Labs which is the entity behind the
Grass Protocol. Welcome Andre.
>> Thank you so much for that introduction
Tom. Uh great to be here and uh looking
forward to this. Listen what if those
you don't know grass is I think the
largest most recent deepen operating at
a terrific scale um 8 million plus um
active users scraping I don't know you
give me the numbers Andre but but
massive amounts of data around the
internet and so you know we're all
familiar with the OGs in the space of
the Heliums and the Render and the
Falcoin but you're operating ating from
a pretty recent start at a a huge scale
I think beyond any other deep pin that
I'm familiar with. So, so love to dive
into that a little bit and why don't you
give a little bit more uh you know co
coherent summary of what you're doing
than uh than what I did just so for
people that that aren't as familiar.
>> Yeah, absolutely. Um, so we we built a
piece of key data infrastructure u
that's currently helping some of the
biggest AI labs in the world access very
large quantities of data. We actually
set out to build um this solution years
ago before child GBT even came out. Um,
funnily enough, like a few months before
JAGBT came out and like right after FTX
collapsed, uh, we were evaluating, you
know, building this and, uh, we were
thinking about how do you build a
network of millions of residential
devices that can go and access any
information on the public web uh,
without getting blocked or rate limited.
Um, and it's one of those things that
needs to be run on many residential
devices because, uh, just the way the
internet's designed, uh, it's actually
very easy to detect data center traffic.
So, if you want to go and access large
quantities of data at scale without
getting blocked, you need to run it on
many many individual residential devices
that are on individual networks. So we
we we set out to to build this and we um
our our our first uh like the first
question you ask right when you're when
you're going to go and solve any problem
is like what do we do to solve this
problem? How do we build this? Uh how do
we look at this from first principles?
And it was this really interesting um we
had this really interesting realization
that uh if we wanted to go and
incentivize a network that can do this
and bootstrap it, make it ethical uh
make it make like economic sense, the
only way to do it uh was actually with
blockchain. And it was kind of this
really funny u moment because we were
like well this isn't like at all a
crypto business that we're looking to
build um in in the sense of you know
we're not selling any services to uh
crypto communities if that makes any
sense uh to like traders or to uh I
don't know stakers and node providers
and all this other you know all the
other um stakeholders in the crypto
ecosystem. Uh however the underlying
technology uh needs crypto in order to
actually succeed. So um we we we set out
to build this network that uh anyone in
the world can join um get rewarded in
network ownership. All right. Right. And
network ownership was probably the
number one most important reason to do
this uh with crypto and and happy to
talk about that more. But um you know
people get rewarded in network ownership
uh and that incentivizes them further to
go and refer others. So we we sort of
hit this virality loop and and our team
did an incredible job with uh designing
a very good uh referral system and a
point system that uh ahead of uh head of
airdrop one got us into the millions of
monthly active users and like more
recently we're at around 8 million
monthly active users. Um and
and when I say users I'm not I don't
mean like users of the network but
rather people contributing their
bandwidth to the network. Um and and
also more recently like over the last
year roughly um we start you know we set
out to actually start monetizing the
network uh because yeah I mean the whole
point of this thing right pooling all
these resources globally uh is to go and
like build something useful right um and
yeah growth has been incredible as well
on that side of things so I do feel very
lucky um a lot of uh yeah a lot of
grasses like growth and uh recent
success has really come down to uh
having an incredible community like an
incredible team as well uh working on
some of the coolest problems in the
world. So
>> well listen there is a lot to unpack
there and so I guess I I I think start
let's start by backing up so make it
super specific for people. If you go to
grass, you download a browser extension.
That browser extension gives you um it
gives basically grass access to excess
bandwidth which you then use to scrape
the web for data which you're then using
for which you then monetize, >> right?
>> right?
>> Pretty much.
>> And then people earn points for doing
that. Points can convert into tokens at
different periods of time, in different
circumstances, whatever. But that's
that's more or less the basic element.
You've got 8 million people out there
basically who've um are sharing
bandwidth and and helping participate in
this data collection, right?
>> Yep. That's right. I mean and that is I
mean what what just to back up for a
second what I think you mentioned which
I just want to make clear is a lot of
deepens are providing cheaper services
or maybe faster services or maybe more
redundant or a variety of better
services but what you're doing is
providing something that isn't doesn't
exist at the scale you're doing right
and so what's fascinating to me about
grass is you're using deepen to basically
basically
create data sets that just aren't
possible by centralized companies
without being question being better.
It's about creating something brand new, right?
right?
>> To a large extent. Yes. You know, one
one thing that I do think is worth
mentioning is there are there are
centralized companies out there um that
are operating at a very similar space.
Um but the the way that they're you know
so they they use millions of residential
devices on residential networks globally
distributed uh in order to collect very
similar data um and or recently these
companies have started trying to compete
with us in the AI space. However, the
way that they distribute those networks
is actually by sneaking software into
people's smart TVs and into their uh
freeVPNs and things like that. uh most
of the world doesn't really read terms
of service or privacy policies. So they
kind of just accept this um and it's
default on
>> didn't know that and you know these
>> Yeah. Yeah. like these come and and that
was a big motivation for us, right? In
the early days, we said we want to build
this. How do we make it? How do we build
it properly, right? Uh because at the
end of the day, this is an incredibly
important technology. Without it, a lot
of things in the world would seriously
slow down. Um and probably the one thing
that would slow down the most is AI
development. And and I feel like it's
not controversial at all to say that uh
like massive investment in AI and like
development of AI right now is a net
good for humanity. Um and you know if
your if your device or if your network
or your you know digital life is somehow
being used um to help make that you know
to help make AI a reality.
um how can you know like how on earth is
it fair for you to be like rewarded in
anything other than like ownership of
that future right um like it just it's
very difficult to quantify actually the
the the magnitude of someone's uh
contribution so it was a big motivation
for us to to do this on a blockchain
right because how else do you distribute
like actual network ownership uh how do
you distribute you know like you could
you can go and pay people in fiat,
right? And also pay like ridiculous fees
for like international money transfers
and like go and have crazy headaches
with all the different compliance and
like red tape that comes with that,
right? Uh or you can use crypto which is
an extremely elegant solution. Uh you
know, you you you kind of like kill all
the inefficiencies that come with fiat
from a technological perspective. And on
top of that um not even like
philosophically but just like literally
um you can just tell you can give people
something that represents ownership and
that is meant to grow with the network
itself and with its success. Yeah,
listen that that is a huge benefit of
deepen overall is that early providers
you know I I give the example before
I've given before if you're early seller
on Amazon all that happens Amazon grows
is you have more competition and maybe
you have more maybe there's more
customer demand but you also have way
more competition and even though you
were early helping Amazon get there
that's not helping you but if you
because they can't pay you an Amazon
stock but it's just not it's not not not
legal but if get tokens is sort of what
you're referencing that Amazon example
if you're early providing grass and
you're paid in tokens then you can
participate as the network grows because
you're holding these tokens which
theoretically appreciate if token
economics are set up right and you
appreciate in the growth of the network
and the ecosystem right
>> pretty much yeah
>> yeah listen I that that's that's a model
I understand um very well it's a huge
differentiator you're an Uber driver
you're not you're not participating like
pick your Airbnb you're not right pick
your one that's kind of has similarities
series with Deepen and you're you're
you're helping grow the network and
getting paid a little bit for it, but
that's it.
>> Absolutely. Yeah. I I think like any
peer-to-peer system um and to some
extent gets benefited um by like these
types of dynamics, right, when you move
to more blockchain oriented solution. I
I guess like the one thing I I do think
is probably worth pointing out is um the
there's this weird trend sometimes uh
for people to like force
decentralization on things. Um I I do
think like you know all the examples you
just gave those are all like natively
peer-to-peer networks and peer-to-peer
systems. Um I do think that unless
something is already distributed and
already peer-to-peer um you end up
facing some like very uh high friction
trade-offs uh between performance and uh
like decentralization if that makes
sense. Uh whereas like if something's
already like distributed then
decentralizing is just kind of it's
almost only a net benefit um to that
system. Does that make sense? Yeah, I
get and trying to think of that what
that means. I guess I I'll phrase it
slightly differently, which is that you
know having
the providers be decentralized if
they're natively decentralized already
makes sense. It's like a distributed
nature that you don't have to like go
force it. That's just what you require.
It's how it works. But on the flip side
of it, where I think I think you guys
are and I see a lot of deep in now is
having somewhat centralized
BD development, etc. because that's how
you move fastest and some of the early
projects that decentralized early on for
regulatory and other reasons have moved
more slowly and we can think of a couple
examples like that, right?
Oh yeah, that's exactly right. Um I I
can't imagine how we would have achieved
any of the commercial success that we
have this year. Um unless like all those
efforts were extremely concentrated um
and run by a very lean uh group of like
high agency people.
>> Fair enough. Well, let's let's talk
about one of those high agency people is
you. what's uh back up and and tell us
uh your background and how you you you
sort of referenced early on on um some
some comments as to how you kind of
backed into the cryptos or at least um
token solution, but give us a little
background and kind of how you ended up here.
here.
>> Sure. Yeah. Um
I uh you know I crypto ever ever since
like the very early days of Bitcoin,
I've kind of been aware of it. Um, I I
do have these memories of sitting in
high school and this guy was mining
Bitcoin on his laptop because that was
possible back then. He had his laptop
out during class and the teacher would
be like, "Put that away." And he'd say,
"I can't mine.
>> That's too expensive to put it away."
>> Yeah. Yeah. and and uh and at the time
it was kind of one of those things where
like over I think I think he made like a
dollar or something over a few months or
it was just some very negligible amount
of money at the time. And um I caught up
with him several years later and he had
to like that laptop was long gone that
that Bitcoin and like there were like
whole bitcoins in there. Um so I thought
that was insane. Um, anyways, I I ended
up having like my own early crypto
regret story a few years later. Um, I
participated in a very early Doge faucet
on some kind of sketchy like centralized
uh exchange website that I had found is
what at that point I was in university.
Um, I I was studying uh nuclear
engineering and uh fetonics engineering
as well actually. Uh and one day in the
library uh I came across this website
through some online ad and set up a
wallet and
I at the time the Doge meme itself was
very popular. Um and a lot of my
colleagues were using it in these like
study group chats. Um and I thought it
was kind of funny to go and like
participate in this like Dogecoin wallet
and earn a bunch of like Doge crypto. Um
and it was worth like very very little.
But then years later during the runup uh
when when Dogecoin appreciated very
meaningfully, I remembered that I had
that uh but the account was set up under
my old university email address which
had since then been deactivated u and I
think I'd told the story before uh but
it was uh I ended up calling the
university and saying please uh can you
reactivate my email just for one day and
they were like we can only activate an
email for you if you uh work for the
school or like become a student. So, I
was like asking what's the cheapest
program that I could very quickly apply
to like pay tuition for a semester and
and I was evaluating whether or not to
do that. Um, ended up not doing it. It
just ended up being such a hassle and I
kind of said, "Okay, I'll you know, I
I'll help decentralize Dogecoin, I
suppose." But, yeah, not
>> gifting it, gifting it out.
>> Well, you know, I I got to tell you that
story. I mean there I meet people all
the time who like oh should have could
have would have you know I have Bitcoin
this that I'm like listen three things
would have happened you would have you
would have lost your keys you would have
been hacked or the place that you were
storing it at might have disappeared and
gone under and whatever right so so
don't feel bad or oh sorry the other
thing I missed is that it would have
doubled and you would have sold it
because you were so thrilled it doubled
or tripled and then you' been like sweet
done I'm out and then so I I that is
that's that's the story for almost
everybody who was in early. Oh,
absolutely. Yeah. Uh but yeah, the thing
is though I I from a very cursory point
of view, right? It's like since then,
you know, I I went to school for like
advanced physics and whatever. Worked on
some stuff in grad school, did some
stuff in finance along the way. Did a
lot of like just kind of fun personal
side projects. uh many of them having to
do with web scraping which is kind of
where one of the few different things
that all converged when when grass
became a thing right um and yeah yeah
around around 2021 very late 2021 early
2022 I was uh I was doing a lot of work
in like web scraping infrastructure most
of it was really just for fun uh during
the pandemic a lot of people found
themselves with like a lot of spare time
right uh it was a lot time where you
would have otherwise gone out for dinner
with friends uh or gone out and done
some activity or like played a sport in
the park or whatever. Um and I I ended
up just filling a lot of that spare time
with these sort of personal side
projects. Um, one of them was like
working on web scraping infrastructure.
And around that time, I found out about
these really disgusting practices that
uh big companies are engaging in uh in
order to go and distribute these
networks to scrape websites with. And I
remember I remember thinking that, you
know, like there's nothing fundamentally
wrong um with, you know, using millions
of residential IPs to scrape public web
data. You're effectively you're actually
making the internet a more open place by
doing that. You're breaking down wall
gardens. But if you're going to do
something like that, like like at least
do it right, you know? Don't don't go
and hijack people like people's devices,
people's networks, like in order to do
that. Like that's just so wrong. And I
think it's something that was born out
of laziness, right? Like these companies
are lazy. like they could have gone and
and and probably built this the right
way and then like gone and set up the
right incentive mechanisms. They just
didn't. Uh
and uh yeah, like crypto kind of just
fell out of that as a solution, right?
Like having been exposed to it for a
really long time. Um just but not like
very actively participating uh and then
just thinking, hey, look, you know,
there's this thing that exists. At the
time, I was also kind of like I was a
user of Brave browser. uh that that was
a huge source of inspiration too because
that that was a company that did a very
good job uh at like reaching virality
among like non-crypto participants. Um
yeah, everything kind of just came
naturally. I met my two co-founders who
had backgrounds in like real-time AI
analytics, high frequency trading, like
building very low-level systems and
network engineering. Um, so they're like
kind of these perfect backgrounds to go
and build this with and more importantly
than that. Um, the type of people who
like, you know, you start talking about
this vision and they're like, I'm going
to drop everything else in my life and
work on this because this is such an
important problem to solve. Uh, so yeah,
like everything kind of just flowed
naturally from there. Um, we went and
raised a uh a preede round of funding uh
during a terrible time for crypto and
for AI. uh and then
did a lot of gorilla marketing in the
early days and grew from there and I
guess I guess the rest is history.
>> Well, interesting. What what I what I
take away from that partially is that
you've been looking I guess from right
around academ academics or academia at
kind of web scraping and that is your
kind of one of your core expertises and
this is just a way to solve that and I
see it across a whole bunch of deepens
whether it's you know GNSS location
services or you know mapping or whatever
it may be is people come to the space
with a particular expertise and mind
there is a really a deep end token related
related
possibility to scale that in a way
that's not possible otherwise or is very
challenging otherwise. So that's uh
that's certainly interesting to see. Um
um and and help um help us understand
the scale. I mean you were mentioning
the amount of data stored, data scraped.
Help people understand the scale at
which you're you're doing this.
>> Yeah. Um at the moment grass is being
used to scrape and provision literally
hundreds of pabytes of data. Um I think
the most that's gone through the network
in a single day was like four pab four
pabytes of new data scraped uh of data
that gets delivered to customers uh that
goes up to like eight pabytes a day per
customer. Um depending on you know
needs. Um it's something that has grown
incredibly quickly. Um a big reason for
it has been a very quickly growing need
for multimodal data. Um it's a need that
we identified kind of right before it
really started to grow. Um we we saw it
as uh what was probably going to be like
the driving force of demand for this
type of like infrastructure. uh and also
um probably like the new like marginal
market opportunity. Uh so from like very
early on from last year we were uh
optimizing everything on the network to
be very good for like really high
throughput um very like low error rate
uh multimodal data collection. Uh, and
when I say multimodal data, like for
anyone listening to this who doesn't
know what that means, it means um like
what in in uh in AI, right? You've got
like multiple what they call modalities
or like dimensions if you will. Um
dimension might not be the right term,
but anyway. Um so like you've got like
text for instance, right? A lot of chat
bots are the modality would be text, but
then if you want to generate an image,
image is a different modality. Uh video
is another one, audio is another one. Um
so you know anyways for a very long time
there are lots of text based data sets
that are just kind of free for training
uh AI models. You can go and download
like all of common crawl which is a
backup of a lot of the internet and use
that to go and train an AI model and
that's kind of your building block. Uh
nothing like that really exists for
video or didn't really exist for video.
and we saw, you know, this future huge
demand for video models and we're
starting to see it grow now, right? And
you'd be surprised by how early it still
is. Um there are a lot of companies that
are just only getting started on video
at the moment. So yeah, we, you know,
like when we first started, I remember
the the demand for some of this
multimodal data. Uh I remember we we did
like 40 to 60 terabytes in a day um on
the network and we thought, "Oh my god,
that's incredible. That is so much
data." Because it it really is, right?
40 to 60 terabytes is is quite a bit of
data. If you think about like a MacBook
Pro, uh like the higherend ones, like
they've got one terabyte storage on
them. So it's like you can fill 60
MacBook Pros in a day. We thought that
was that was incredible.
And then one thing led to another and we
started seeing demand from like actual
foundation model companies like actual
like huge AI lab um had approached us
and said hey like uh we you know we're
looking at this uh scale of data and the
difference between that and what we were
at the time like producing was like
orders of magnitude.
So it very quickly uh you know we we
went very quickly like scrambled it and
put out a huge upgrade to the network
which was called the scion upgrade which
enabled like much higher throughput much
higher capacity per node and all of a
sudden we were doing pabytes per day. So
like for anyone that's not familiar a
pabyte is a thousand terabytes.
So like each day it would be like the
equivalent of you know a thousand
MacBook Pros of uh content or 2,000 or
3,000 or a peak 4,000 of them. Um and then
then
uh as you can imagine you like so like
gr grass the the business or like the
commercial proposition of grass it's not
oh there's this big data set like buy
data sets right uh data sets are
something that fall out of grass it's
almost like a side product right uh
grass fundamentally it's data
infrastructure right so there are some
companies that'll just use grass and
collect data directly from the web uh
and then there's a lot of AI companies
continues to say, "Hey, we hate
collecting data. Uh, you guys think you
can just do it for us using that
network?" And we say, "Sure." The price
ends up being effectively the same.
Anyways, um, but as you're collecting
these data sets, you store them, right?
And a lot of AI companies end up asking
for very similar overlapping data sets.
So, it ends up being kind of a silly
thing to rescrape them. So, you know,
storing them ends up being something
that you you sort of just have to do,
right? So you you store huge quantities
of data and at this point now we've uh
we've got hundreds of pabytes of data
stored um on premises. Now in the early
days we were using like we were very
heavily using cloud providers uh to
store a lot of this data
um and the bill got atrocious right um
like I'm talking you know at at at the
very peak I'm talking the seven figures
per month uh in the for the cost of
storage and that was only ballooning it
made literally no sense to do but
because we were on such a time crunch to
service like a lot of customer demand
very quickly
It was something we just had to do
because setting up your own storage
infrastructure, your own storage
network, uh making sure there's enough
redundancy, making sure everything's
like very highly operational, like it
takes time. You can't just do it
overnight. And you know, we're the
startup of 25 people. Uh so our our our
CTO at one point like flew down to
Ashurn, Virginia.
Um went into data centers and evaluated
a bunch of them, chose one, uh brought
in a bunch of people and literally like
racked servers by hand, slept on the
floor. It was like a whole production.
Uh we we set up a a whole storage
network in in almost no time. and and
since then have been kind of moving a
lot of things uh from cloud onto these
uh storage servers that we were hosting
ourselves or or rather when I say we I
mean the grass foundation because the
grass foundation ultimately owns all
these things uh otherwise it wouldn't
really be fair to the token holders I
think anyways uh it's been part of this
trend of just forced verticalization
where we
hit a huge milestone or like hit a
massive opportunity and then the only
way to continue growing or to continue
moving from there is to continue
verticalizing. Uh so it's been like a
really interesting um like a a really
interesting uh experience because uh
yeah like in terms of scale like
hundreds and hundreds of pedabytes but
in order to get there you you can't
really do it unless you're actually
owning and controlling like many parts
of the stack all the way down to the
network and hardware layer.
>> Really interesting. And you mentioned
network and hardware layer which tie
into um transmission which ties into
distribution of these big data sets,
right? And so I think you know you
referenced earlier the sheer physics
limitation of sending via the internet
massive data sets and it sounds like
that's a problem and it's also leading
you to to kind of move move up the stack
a little bit. I'd love to hear a bit
more about that which is really interesting.
interesting.
>> Yeah. Yeah. Absolutely. So, you know,
building our own storage network,
building out a lot of these components
to actually go and like collect and
store this data without relying on third
parties like that was like verticalizing
down the stack. And then more recently,
we've we've been also forced to
verticalize like as you said up the
stack. Um the biggest driving factor for
that being AI companies wanting high
signal data now right um so like I I
think the the world of pre-training data
is extremely opaque and it's
intentionally opaque there are so many
things that are still being figured out
by some of these like large AI models
like like a lot of these large AI
foundation labs right um they're like
they're facing like these weird compl
liance issues in many different
jurisdictions all over the world and at
the same time they have this incredibly
large like and powerful market pressure
to go and build better and better
models. Um and for them it's just like a
matter of convenience that a lot of the
uh like pre-training
I guess experience or like the things
that go into pre-training just are
mostly opaque and not talked about and
which makes perfect sense. Um, so I
guess it's like worth kind of
illustrating like what happens, right?
Uh, when an AI company needs to go and
like pre-train a like a new model from
scratch. Uh, the the data collection
process like you you kind of end up
going to a lot of different vendors,
right? Um, and when it comes to certain
types of data, you end up just
collecting a crazy amount of raw data
from a single vendor, right? And for us,
that's kind of the case, right? Um, any
ad company wants to go and train a new
video model and they need hundreds of
pedabytes of data and they obviously
hundreds of pedabytes of data are not
going to be run like immediately through
a bunch of H100s. That would just be
insanely inefficient uh and probably
wouldn't even generate that good of a
result. The reason for that being like
what AI companies really care about is
taking this huge pool of data and
figuring out where's the signal in that
data, right? Um, so usually what AI
companies will do is with many vendors,
they'll say, "Hey, look, we'll just buy
all this raw data and we've got the
talent, we've got the compute, we've got
the resources to just go and figure out
what's high signal in that." And we're
not going to tell you what that is
because then
>> that's our special.
>> Yeah, exactly. They'll just go and
extract whatever that high signal uh
data is and then they'll use that to
train the model. Now, what we're doing
right now, we've kind of entered um
territory that's never been seen before,
right? Where a company comes in and
says, "Okay, we need 250 or 300 or 350
pabytes of uh video data to start." And
a lot of the time, you know, in their
minds, like they're just going to go and
filter all of this. And it's a inc it's
a crazy job that takes like two months
to do uh and lots of GPUs and all kinds
of other things. And um the the funny
thing is they want that data yesterday
cuz their competitor or whoever already
has like a similarly sized data set and
they say okay we need to get the data
right now. Um the thing is though like
like you said you end up getting limited
by the laws of physics right like at the
end of the day how quickly can you
actually transfer that data into um like
an AI company's environment um at a
terabit per second um now like I I don't
even know how to how to illustrate how
much a terabit per second is like
hundreds of people's like home internet
connections being completely saturated
um it it It is it is at the scale of
like hyperscalers right at a terabyte
per second uh which is what we do uh you
get maxed out at like up to eight
pabytes or so of data per day that you
can actually just transfer. So like a
company will go and buy I don't know 300
pabytes of data uh
>> and not waiting a year
>> and it'll take over a month. they'll
take over, but very highest throughput,
assuming nothing goes wrong, assuming it
takes no time at all to read the data
from tape and discs and all kinds of
other stuff. Um, and just sustained
throughput, it would take over a month
for that to happen. And the crazy thing
is AI moves so quickly. A lot of these
like huge AI labs are like, I don't have
a month to wait for the data to arrive,
right? Like even if you ship it in a
truck, you you you back all of it up on
on drives and like ship it in a truck.
um you're often like thinking weeks um
or if you're in the same data set or
facility like the act of copying up that
qu copying that quantity of data in that
time you could you might as well just
transfer it over. So um because so then
these companies end up saying okay you
know what we wanted to filter all this
stuff ourselves but we have this huge
time crunch um and it's also just like a
massive use of our resources like they'd
much rather be using um a lot of their H100s
H100s
uh for doing like ablation tests or uh
doing more training runs or like making
their models better rather than sorting
and filtering through hundreds of
pabytes of data to find a signal. So
even though it is like to a large extent
like you said their special sauce, it is
also a machine learning engineer's like
least favorite part of the job despite
how important it is. Um so we've found
ourselves in situations where AI
companies say, "Man, like I'd love I'd
love to get all of this and just like
filter it myself, but I want it yesterday.
yesterday.
Do you have XYZ? This is what we're
looking for." And they'll just say it.
They'll say it. And then we, you know,
kind of find ourselves in the situation
where it's like, oh crap, we need GPUs.
Um, we need, you know, crazy horsepower
now to go and
not just provision this massive quantity
of data, but make it usable. Um, and
like help our partners skip that really
long and cumbersome step of extracting
the signal from that data. Now, we're
doing a lot more. Um I I shouldn't say
like crazy amount more but it is like
considerably more of that type of work
um where we're doing things like you
know doing things beyond just metadata
filtering right like we're we're running
classifiers um and ASR and all kinds of
other stuff over all the different uh
like files um that are being collected
uh and then helping you know sort them
in much smarter ways using ML um which
which ends up just helping AI companies
tremendously and from an economic
standpoint, right? Like it ends up being
you can you end up selling 5% of the
data, right, for approximately the same
price because of all the compute
resources and all the headache that goes
into into collecting it
>> and you're selling But that's
interesting. Same price. So you don't
even That's interesting because the
economics then are you're selling a lot
less the data for the same price, but
the revenue amount for you doesn't
change that much.
Yeah, that that's right. Um
it I it ends up being just a lot easier
um for a customer to to access like a
much smaller amount of data that they
know is going to be useful rather than
crazy amount that they have to pay to
store and handle. So So yeah, they're
happy to pay they're happy to say pay
the same price because it's what they
would have paid anyway.
>> So yeah, fair enough. I get that. Um um
but does that then increase your cost
base because you've got to go get these
GPUs up and do this work?
>> Oh yeah. Yeah, definitely it does. Um so
I guess when I say the same price, I
mean the same price plus the cost.
>> Fair enough. Fair enough. Right. You're
not losing on that.
>> No, no, that would be uh that would be a
>> fair enough. Well, um, and then that
that that that's interesting because it
also is moving up the stack in deepen,
right? And deepen is thinking about core
infrastructure, but you're now using the
core infrastructure to provide a service
on top of that, right? you're getting
the data and then you're adding value to
that data and that that's I think what
we'll see couple of other there other
sort of much smaller scale but you know
groups that you know use phones to
collect location data or visuals of
cities or whatever may be and then can
add value on top of just the images if
you will to provide a more reason you're
you're doing that at a at a different
scale. Um, and then how does that
translate then to to revenue? What what
what is how how do you you mentioned,
you know, key partnerships and
customers? You've got PE. It sounds like
your customers are time sensitive, less
price sensitive. So, and you're you're
providing a pretty unique thing. So, how
do how do you think about revenue?
>> It's one of those things where like the
revenues it ends up being kind of the
whole point of the of the business. Um,
and it's been growing very quickly. Um
the the funny thing is like I mentioned
earlier, you know, Grass isn't just a
company that like builds data sets. We
build data infrastructure for companies
to use and like build their own data
sets primarily. Um but then we end up
doing a lot of stuff with data sets
anyway. Uh along that line, you know,
when we set out to do a lot of stuff
with multimodal data, it was almost a
like a side hustle at first. Um like we
fundamentally believe that five years
from now most of the value accuring to
data will be real-time data. So like
models needing to access real-time data
that's outside of their trade
complaints. So like live context
retrieval is really important thing that
we've been working on. Um and we'll be
accelerating next year and hopefully
releasing next year. Um but uh in terms
of multimodal data, we always saw that
as a way for us to hit revenue much
earlier than we would have otherwise.
Um so and it's also something that we
saw at the time nobody else was doing
and we knew it was like this huge
mismatch in terms of like what was
available in the market and what was
needed uh by these leading AI
institutions. Yeah. You know we we set
out to build that but then it turned out
that it was a much bigger problem to
solve than even we had anticipated and
and by that what I mean is like it was a
much bigger market opportunity than we
had even anticipated.
uh to the point where yeah like you know
revenue ended up being in the the
millions um like monthly uh basically
the moment we hit revenue which was like
something we we didn't yeah we would
have never expected previously. Um and
yeah like we we do intend on actually
having a token holder call with the
foundation um probably later this month
where more details about revenue and
things like that uh will be announced. I
don't want to spoil that too much of
that. Fair enough right now. But uh but
yeah, in terms of growth, it's like
quarter over quarter. It's been um yeah,
a lot better than I would have
anticipated a year ago.
>> Well, that's that's that's good. When
the founders say it's better than they
thought it was going to be, that's
generally a good thing. Um what and so
congratulations for that. Um two follow
questions. Um the first is how have you
designed a mechanism where that revenue
acr to token holders? Is that coming?
How do you think about that link?
>> Yeah, that you know that's a really good
question. Um I think
of of
for all the amazing benefits of crypto.
Um that is one that I that I think has
yet to be like properly figured out
especially for uh cryptoeconomic models
uh that are doing things outside of the
blockchain right that that are in just
like kind of recycling funds on chain. Um
Um
so you know if you think about uh how do
you know how do you use revenue from a
successful business um to go and like uh
feed that back into an ecosystem that is
like helping generate that revenue in
the first place. Um for a uh for like a
non-crypto startup or for a non-crypto
company even like the very mature
companies that ends up being a capital
allocation problem. It's like how do you
uh how do you properly allocate capital
at any given moment? Then that ends up
being like any business's like core
objective. Um so so for us um or for the
Grass Foundation in particular um
because the Grass Foundation is the one
that faces all of these uh all these
customers in the first place and owns
these bank accounts and things like
that. Um the the question is more like
how how can this like what is the
highest ROI opportunity right now for
this revenue uh and and to date that's
been on doing things like building out
data center infrastructure uh wiping out
completely wiping out a seven figure uh
monthly cost. Um it's been on doing
things like attracting some of the top
talent in like the AI data world.
Obviously like that evolves over time,
right? Um at a certain point um the
highest ROI opportunity ends up being um
like further developing like the onchain
ecosystem, right? Uh so
I I I think um yeah I I think all those
things are coming for grass. I some of
those things are still in development. Um
Um
but yeah, like to I guess to answer your
question, I do see it very much as a
capital allocation uh question um rather
than uh
>> yeah like like like a like a product design.
design.
>> Fair enough. and and and I guess the
only the only um comment I'd make on
that is I grasp the capital allocation,
but if you're generating significant
revenue, the market believes you're
generating more in the future and you
have a formulaic a high formulaic
buy burn of the token, right? Then the
market should antic should should
appreciate that should anticipate the
growth and that which makes your token
valuable which then allows you to use
that highv value token to do some of the
capital kind of components you want
while having your community very
involved and participate because the
problem is as you know the manager agent
or the principal agent issue where you
can keep allocating capital to the
company but your token holders
are don't see that. So the link breaks
down, right? So that that becomes the
the difference with a um a token versus
just an equity business.
>> Yeah. Yeah. I you know I completely
agree with you actually. Um and we're
definitely seeing that in the market,
right? Um so like uh we we made the
decision very early on that for grass
and the grass ecosystem um the like
first class citizen is the token holder
is the token at the end of the day. Uh
that's why we you know we put certain
protections in place early on like uh
nobody that is locked you know on our
cap table that has like vesting tokens
is able to stake those unlocked tokens.
>> We do the same thing. I think you're we
were the first ones to influence things
like that.
>> There were
>> you got push back from your investors,
I'm sure about that as we did.
>> Some of some of our investors like we're
locked but we want to stake. We want
like no no this is for community first.
Community first. So if you had some pain
I I I may have I may may share that a
little bit.
>> There were um I'd be lying if I said
that there was none of that. But I we
were very lucky that our biggest
investors were completely on board and
squashed any of that immediately among
among others on the cap table. Yeah,
they working with incredible partners. Um
Um
yeah, you know, like other other things
too, you know, happened kind of behind
the scenes that a lot of people don't
talk about in crypto. Um that shock that
shocked us because we're not like as a
team, we're not we weren't like the
fully crypton native like entrenched in
in this uh in some of these communities
uh heading into it. Um, one one thing
that kind of surprised us was some of
the practices of centralized exchanges.
Uh, and a lot of people were surprised
when when Grass like uh when the Grass
token launched like, oh, you know, why
are why are some exchanges missing? Um,
and it's because, you know, our team
always fundamentally believed that um,
without, you know, getting into this too
much, like that the token holder, your
token is probably the most important
thing in what we're building. Um beyond
that as well um like the the fact that
like all the all the revenue coming in
it comes into the foundation and not
some labs entity that goes and then
allocates capital is extremely important
and like I think worth emphasizing. Um
like there are like very large
commercial agreements um with like big
AI companies that face the grass
foundation. Um, and Wind Labs, which is
where I work, um, is literally just like
a contractor to the grass foundation at
the grass foundation could just buy her
anytime. And then I'm, you know, it's
like there there's no
>> Yeah. It's like the incentive mechanisms
are like very very strong for the people
running Wind Labs to do right by the
token and then also for the foundation
directors uh to do right by the token
because that's actually just foundation
mandate at the end of the day. Um, now
you know obviously like obviously like
the crypto market's been burned so many
times and like learning some of the
precedent like really helped me uh
empathize with some of the crypto market
participants, right? Cuz like for for
someone that's not super familiar with
some of crypto's history, like coming
into it and then getting hit with like
tons of crazy backlash for certain
things, the default is you are scamming
me. That's like the default position.
How are you scamming me?
Yeah. Yeah. We had a we had a few people
saying things like uh like saying like,
"Oh, you're a scammer." Or, "You guys
are scammers." Or something like that.
Um and I'm like, "How? Who got scammed?
How where? Tell me. Please tell me."
Like, you know, my initially I was like
uh concerned like, "Oh my god, like our
team do something to you." And I
realized like, "Oh, that's just how
>> it's kind of like almost [laughter] >> Yeah.
>> Yeah.
um just which just kind of funny but uh
at the same time it's not funny what a
lot of um a lot of companies and
businesses have done in the space to to
a lot of their communities. So I do
empathize quite a bit with uh people
saying okay it's great that you've set
up all these protections but at the end
of the day um like you know you're
missing the you know some link to
something on chain you know uh and you
know we we made that decision right like
we're not we're never going to go back
on like the token being the first class
citizen for us but it does it does you
know that type of precedent and like
people like pointing at those things it
does explain why right now grass is uh
like relative ative to uh noncrypto businesses
businesses
uh currently, you know, trading at some
like multiple that wouldn't make sense
to a lot of uh to a lot of market
participants outside of crypto if that
makes any sense. Uh but at the same at
the same time, it's something that
doesn't really concern me because um
we're we're actively looking at that.
>> Well, I I guess what I'd say is you
people outside of crypto usually didn't
understand valuation on the other side.
Now, it may be this way, but I'll tell
you, it's if you're generating revenue
like you think you are or like you you
articulate. Um, that's an easy thing to
fix, right? As long as as long as you
inst there's a variety of different inst
things you could institute as long as
people see as non oneoffs. Um, it's
interesting. The one the one thing I
also say is that, you know, we've seen
people talk about scams in crypto, huge,
right? We've also seen incredible scams
in trad from Enron to World Comp to
wherever and so if you actually move to
a place with on revenue is attested to
onchain and you have onchain buybacks
now you actually are more trusted than
being audited right so I I'm sort of
looking forward to time where we
actually have enough revenue that's
doing onchain verifiable buybacks that
auditors are superfluous because you can
just see the numbers. Anyone anywhere
can see all the detail in the numbers.
>> You know, I I I agree that that's just
an incredible feature of crypto, right?
Um that you have this immutable ledger
that is completely public and
irrefutable. So, on that on that note, I
do agree with you. Um that that's just
such a good way to um just end those conversations,
conversations,
>> but we've got to get there and there has
to be enough revenue and projects like
you've got to actually do it, right?
Right. And there's a couple others and
you know, GeoNet has done some, but it's
not quite transparent as it could be. So
there there's a number out there that I
think I get fast forward a year or two,
I hope we see a lot of that in the deep
ecosystem. Also set, by the way, Deepen
apart from the rest of the crypto crypto
ecosystem in a terrific way, which I'm
I'm super excited about. Um, related to
that, what if we fast forward, pick your
time horizon, three years, five years,
what would make you happy? what would be
a big success just to help people
understand the scale of the ambition and
and and the the demand for what you're doing.
doing.
>> So I I touched very very lightly on live
context retrieval earlier. Um so
I don't think there's a single question
that the use of AI is only going to grow
in coming years and it's only growing
right now. Look at the charts that are
showing like the demand for inference.
They're they're insane. Um there's no
sign of that slowing down anytime soon.
Um and as in inference and as AI models
continue to capture more product market
fit, um in order to remain useful, they
need context from the real world.
Whether they're um an image or video
generation model that wants to generate
something that you know they aren't
familiar with. Like, so let's say they
need to go and retrieve 30 pictures of
some viral moment that happened
yesterday that wasn't in their training
weights to go and help generate that
image or if they, you know, very like
the very simple and easy to think of
example of somebody wanting to go and do
some online shopping and having to like
scrape a bunch of sites in real time and
feed that. All these AI models like they
need real-time context. They're going to
need it more and more and it's still a
massively unexplored area of AI. Um to
answer your question, what would make
grass a success in three to five years?
It is uh I would say over 90% of all LLM
calls rely on grass and eventually all
all AI models uh funneling their prompts
through grass to make them more useful.
>> And what is that help me understand for
people what that means like in terms of
revenue scale you're talking about there
quantum of that?
>> Yeah, I mean I'm not I'm not going to go
in. So like to for for for reference
like we hit revenue like in a meaningful
way only this year. So like in terms of
revenue growth we're still so so early.
So I'm not going to go and like off of
those data points make like a three or
five year projection. Uh but what I
think makes sense to talk about the
market size. Um so at the moment pricing
for like search APIs and not even
multimodal search but just like text
search APIs. Um, the most expensive ones
are $20 per thousand calls and like the
cheapest ones are $5 per thousand calls.
Um, every time you use an AI model, um,
and it needs to fetch some information
from the internet, it is making several
of those calls. If you think about it,
like $5 like the lower end, right? Like
five $5 for a thousand calls. U, think
about how how many prompts are going to
go through AI in the next three to five years.
years.
Is it going to be like the question
isn't going to be like oh is it going to
be millions or billions it'll be how
many trillions are we talking about
right or whenever is bigger than that
and like the ability to go and like
access the entire internet and shove
that into these prompts is something
that's very difficult to put a price on
and especially at that scale it's
difficult to put a price on. Um, so it's
one of those things that, yeah, it's
going to be a huge market that's
undeniable and there's going to be lots
of room uh for multiple businesses to
succeed. Uh, and it's our aim to be the
one that succeeds the most in that space.
space.
>> Fair enough. All right. Well, it looks
like I'm not going to get a specific
number from you, but uh, we'll say we'll
save that for another another one down
the road. U, which is fine. Um, you
listen, you're you're super focused on
what you're doing. Um, are there any
other deepin projects you pay attention
to or think are doing something
interesting or are you just so focused
on what you're doing, you're focused on
more that the web AI web scraping AI
world? I don't spend that much time
looking at crypto protocols. I'm going
to be honest with you, but I I do um
I I do come across a few um that are
just doing some extremely exciting
things. Um like one of them is the
inference.net that team. Um I we've been
in a couple of announcements with
announcements with them and we all feel
incredibly lucky to work with those guys.
guys.
They're the the most cracked in for team
in SF. I think they like to be called that.
that. >> Okay.
>> Okay.
>> It's true.
>> You know, I I know you've mentioned
Filecoin a few times. Uh and and for a
long time, I'll be honest, I didn't
fully understand the um
the value proposition uh completely. uh
just given I guess a lot of like our own
storage needs and the types of things
that we do. Uh but more recently uh a
lot of it has started to click a little
bit more for me. Uh
but yeah, I think in general like anyone
operating in the in the deep in space is
uh doing something that's very at the
end of the day it's it's a very noble
pursuit like you're entering unchartered
territory to go and disrupt some
business using a distributed mechanism
that's never been done before. So I
think anyone building deep in of any
sort should be like very heavily
commended for that.
>> Listen Andre, it's been great having you
on. What how do people find you, follow
you? How do people participate and grasp
become a provider and and help
participate in in in growing this uh
growing this super exciting world and
and basically helping um helping AI uh
become better?
>> Absolutely. Um well, you can first and
foremost go to grass.io. Uh it's very
simple to join the network. Uh we work
very hard to make it as low touch and
low friction as possible. and we're
always open to like suggestions and
things like that. Uh we have a very
active Discord server with over half a
million people in it. Uh we're always in
there uh through the good times and the
bad times and just like huge supporters
and a really fun group to to chat with.
Uh and you can also just follow us on X
for official updates. Um there will be a
token holder call where you know Todd
you and I have spoke a little bit about
um incentive mechanisms and uh how
important it is to tie like incentives
between like the token the business and
things like that. So a lot of that will
be addressed in that call. So uh details
for that will be announced on official
channels. So at grass onx is the place
to look for that. And uh Tom thank you
so much for having me on here. This was
actually a lot of fun.
>> Excellent. Listen, thanks so much for
being on. I uh feel like I got to go
download Grass. I got to become part of
this network as well. Hopefully everyone
else here who's listening to this does.
Um and um let's stay in touch and look
forward to having you back on again. We
can talk real revenue numbers and after
you've talked to your token holders, we
can get into get into that in more
detail. Really appreciate it.
>> Sounds great. Thank you so much once again.
again.
>> Thank you. The Deepinned Podcast, the
place where we explore realworld use
cases unlocked by crypto. That's all for
today. Thanks for watching to the end.
If you enjoyed this episode, please help
us grow. Add a like, add a comment. Um,
and of course, if you have anyone you
think we should be talking to or have on
this podcast, either leave a comment
there or reach out. But thanks for
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.