This discussion centers on the BitTensor ecosystem, specifically addressing criticisms regarding revenue and subsidies, the operational nuances of various subnets (Templar, Basilica, Grail), and the contentious issue of miner emissions and "burns." The conversation highlights the platform's evolution from a technically focused community to one balancing technical development with investor and market demands.
Mind Map
クリックして展開
クリックしてインタラクティブなマインドマップを確認
We're talking everything. We're bringing
all the order. We're talking all the
order. So, um, how do you how do you how
do you want to do this today?
>> No, I was I was I was thinking of going
through some of the the like FUD. I
think that the best critique of Bit Tensor
Tensor
um is, you know, revenue versus subsidization
subsidization
argument that the P that Pine made. And
it's interesting that that's the
argument being made because that's a
really good sign because it's not like
are you doing real things? It's not like
are you a scam? Is this a pump and dump?
It's it's you know are you able to
monetize and go to market with these
commodities that have value already and
you can give bring them to market but
like are you able to finally optimize
down the the revenue so that that it
like there's it's more than the than the
subsidy and that's what you know the
Biter community has been preemptively
focusing on this entire year. So it's
it's ironic by the way that we're going
to be talking about like Templar which
is currently more than anything a
research style subnet like big play but
for most subnets on Bit Tensor shoots as
an example
Liam Targon these other subnets that in
Basilica you know subnets that have been
focusing so much on revenue revenue
surge from Siam like when revenue from
the entire towel community from the dens
and in uh in the the price chats and
stuff like that everyone been pushing
that narrative this entire time. Not me,
by the way. Um um but other people have
been pushing that narrative. And thank
God we actually can come back and really
talk about um you know where we lie on
that curve. And you know their numbers
in the Pine article just not not
correct. They're not up to date. But the
truth is that that we you know when you
look at say like where shoots was in the
previous novelty surge um we we are
moments away from making these protocols
like fully self-sustainable in terms of
emission versus versus outflow and and
distribution which is pretty incredible.
Um and uh and that's such an in such a
short period of time and to to do all of
that on the back of actually building
permissionless protocols is you know
such a feat. So, you can come back in a
couple months and and it'll be another
bogus p piece written by Pine. I'm not
going to say bogus. I would actually I
actually appreciate that they wrote a
very good critique and I I thank them
for that. But obviously, we're going to
get past that and then there'll be
another critique of um revenue and and
inflow versus versus token distribution
and subsidy. Like there's also another
aspect to they didn't really understand
how the subsidy works, right? Like so
the the the tow emission is not the same
thing as just literally what your alpha
token holders are getting in the mining
department. Like 50% of that is gone
right away which goes to the validators.
Another 15% goes to the owners. Um most
of this is yield on the token itself. Um
the the miners are you know he was
saying that we were paying more than AWS
for our our GPUs on shoots which is just
clearly wrong. We know that that's not
true. shoots knows that. Anyways, we'll
come back to that later. That was the
only piece of like FUD that I thought
was kind of interesting and and and
worthwhile and I'm glad that they
brought that one up. Um the other one is
like um pre-mining and and like selling
to to VCs.
You know, it's interesting like if you
build a if you build a permissionless
token, you can't really choose your
bedfellows. Like people are going to
come and they're going to buy your
token. is um and like you any any
project that you build that's open truly
open is you're going to get very weird
people own your token. Not that
Polychain or TCG are weird at all by the
way they're fantastic partners. Um but
it it's an odd way to attack an open
source protocol by saying hey there's
some people that we don't like that that
own your token because like we can't
really control that and that's part of
the design of the system in the first
place. Um what else? Anything else?
But on that revenue thing like for so
again you know it's different like if
everything you um if your only object is
a hammer then everything is a nail
there's different types of protocols
some some need revenue some are able to
instantly generate revenue compute
subnets are very tricky because they can
show revenue almost instantly but it's
in most cases it's asintotic but when
people ask me about revenue for um for
for templar
I just asked them I just asked them when
is open AI or entropic when are they
going to cover their minor emissions [laughter]
[laughter]
so so because it's it's the same to to
me it's the same kind of place so you
know I'm going to ask like yeah that
that's it but again you know it's good
conversations all around and it's been I
think it's been an amazing week for Bit
Tensor and you know I think we just
carried that momentum going but I don't
know the only thing is like I'm probably
one of the key people behind it is guys
let's stop being so feral like dude like
you understand like it's there's no like
we one of the things um so one of the
things are important for any community
especially like communities like ours is
you get maxis and get people that are
absolute lunatics but I think we need to
cross from that maxi stage to the kind
acceptance and like just explain or
block because you know we're doing
amazing things like there's nowhere in
crypto that is pushing out so much [ __ ]
like bit tensor is so like you know some
people are generally curious. It's it's
hard to believe a lot of times, but
let's not attack people on the internet.
Like that's just that's just a crypto
play because again, part of our crowd is
crypto and part of our crowd is AI.
>> Yeah. And AI people already hate crypto.
So even though we're right and we get
into mud fights with like
>> what's that guy's name? It's funny like
the FUD is funny too because it's like
there's nobody that FUDs Bit Tensor than
the internal market people of Bit Tensor
like like other subnet owners coming to
ruin your day, you know, like that's
what you really worry as a subnet.
You're not worried about like Eric
be like, "Oh yeah, is this really a good
subnet?" It's it's it's like, you know,
does does Carol or fish or ref or
something like or papaya or cats or
James come in?
>> Yeah. Yeah. Exactly. Right. Like there's
there's there's no there's no healthier
system than than that which hates itself
more than anything. [laughter]
>> Um but this is we shouldn't be talking
about the tensor at whole. That's the
fire. This is the uh I mean what this at
I think at the beginning of this week uh
or or was it a bit before that like um
you guys had three subnets in the top
five of a mission
>> but they've t some people have taken it
um away from us. We we'll get it back.
We'll get it back. We we plan on but
it's it's been an amazing week. Amazing
two weeks. We were we were we were
contemplating bringing out the pseudo
key to to to destroy Templar because it
had it reached 39% emission. We were we
were concerned. I think that there's no
subnet that's reached 39% emission since
really Apex as being the only sub onens.
Um so
so that that's impressive. I I think I
think that we should we should keep like
a hall of fame for
subnets that that like based on like the
max emission that a subnet has attained.
And so 39% is the is it 39%.
>> I don't know but I know that it needs to
be hardcoded. I I'm I'm tired of
watching emissions go up and down. Just
set just templar to 35%. Let me retire.
Let or let me have a week off.
>> I think we [laughter] disagree.
>> Okay. Okay. But okay, but let's get into
it. So, so we we we Okay, so we have
Templar, Basilica, Grail. Um, but which
one should the nerds be excited about?
>> It's like asking a father which child he prefers.
prefers.
Come on. You can't You cannot tell me to
split my baby three times. We're going
to take everyone through everything. Um,
Templar has been the um Templar Templar
has pole position right now because it's
been alive the longest. It's been alive
the longest and it's it's in some
regards it's so much harder and we will
explain to you what we're doing is so
much harder. We explain to you why. But
the aim of covenant labs is to own the
endto-end intelligence continuum is to
make intelligent the creation the
co-creation the access of intelligence
is to make that it's is to democratize
that so everything it's like you know
when I started when I started templat
then I registered basilica people got
mad oh you're greedy blah blah blah blah
blah but we had to do it because we had
a vision and the vision after training
the 1.2 2 billion parameter model. We
just realized that we're going to need a
lot of compute. So we started Basilica.
Then um what happened next is you know
we used to make fun of um people doing
decentralized pro training because to
our knowledge um the the the thesis
behind um decentralized training is the
compute budget and the compute cost the
cost of training a large model is so
much that
it makes it inaccessible. The data
center is one of the massive problems.
So if we can take away the data center
then we don't need to um we can make it
cheaper. That's that's the whole that's
the whole decentralized training thesis.
You have people like Qua working on um
the context layer but for actual
digitalized training it's a
communication and networking problem
communication networking and um GPU
efficiency. So then we started grill
because um I think for Grock 4 or three
that was the first time on the frontier
that someone actually came out and said
hey guys the compute budget for RL which
is post training is as much as
pre-training like oh then I remember
looking at our scientific advisor like
it was almost at the same time we're
like we got to do we got to do this post
trading so that's it. So it's like you
know it's all part of the same picture.
Um, Templar has actually overshadowed a
massive achievement by Grill and um,
Grill and Fan. Who needs Who Leads
Grill? But anyways, we got slides.
They're not pretty. We found out we were
doing this like two hours ago. So, so
please enjoy them where we take you
through them. Cool. Um,
>> I like it. Keep it authentic.
>> Yeah. Can anyone see Can we see this?
>> Yeah. Yeah,
>> cool. So, this is Templar. It's um
sorry, with Covenant Labs, we have three
subnets. Um I I think the slogan is one
order, one covenant, many orders. So, we
we refer to each of our subnets as
orders. The Well, it's not the crown
jewel. It's just the most it's just the
oldest kid. Obviously, the oldest kid is
bigger and it's, you know, it's just
more welldeveloped. So, we have Templar.
Templar. Do people know Do people know
that the logos for the subnets are the
alpha token symbols?
>> Yes. Yes. The the the logos for the
subnets are the alpha token signals. And
I will let I'll explain how Con screws
you then he unscrews you. Because when
we had this subnet here, we literally
were waiting for which one has a nice
token. Then a couple of months ago, Cons
decides to let anyone choose their
token. But we're not we're not even
going to go into that. Like that really
that really [laughter] because I
remember we were trying to buy
something. It's like, yeah, I don't like
that slot. Okay, let's reg. Yeah. So, I
mean, this is GMA Kai. Um I think the
Basilica one is U Neor and the grill one is
is
>> which looks like the the roof of a
basilica, right? It looks like a church
and it's so there's all these double
entandros have it's multiple meanings
symmetrical meaning it's very cool and
then the grail looks like
the holy grail where it looks like a grail
grail um
um >> okay
>> okay
>> I think it's called fairun
that's the symbol but yeah cool so what
are what are our subnets so again it's
um templar decentralized training
grill decentralized post training and
basilica which is compute. So let me you
know covenant we've started a while it
was our first um it was our um template
was our first subnet um in October I
realized that const was literally went
into so I used to lead the the
blockchainf and you know I just worked
with con so much and I knew that once
prime had dropped open delo he wasn't
going to talk to anyone he wasn't going
to shower until he had solved um um
decentralized trading for bit tensor. So
I worked with him a bit on it and you
know he taught me all the ML I know left
me with GPT3 and yeah we launched the
subnets. So it was it was really
impressive because again you know we
don't I mean apart from working hard I
don't think there's anything special
about me or whatever but it's the
environment of Bit Tensor because while
others can plan and they can you know
stage test net or whatever from the 7th
of November to the till sometime in
January Templar has always had runs. So
this allowed us to really test um things
in adversarial environments. Now what's
important for us is not even as much as
you know the sensorized training because
for us um I I I I always had this
insecurity that you know I'm not an ML
guy I'm never going to be able to do
this but I think me not being an ML guy
I chose the problem that's more
important to me and it turns out that it
aligns with our goals. It aligns with
you know our aspirations it becomes okay
decentraliz if intelligence is the most
powerful thing then decentralized
training is humanity's last dance
because it's the same thing that we
fought for for ages it's what the
internet tried to do is what Bitcoin
tried to do which if we're honest it
hasn't done it is how do we reclaim
agency it's never going to be about how
Um, you know, it's it's people people
missed the we we we missed the
assignment with Bitcoin. I was like,
"Oh, no, we're going to overthrow JP
Morgan." No, we are not because unless
you're ready to die for it. You're not
that's not that's not a fight you're
already ready to do. But it was always
can we renegotiate the social contract
with the Leviathan? So, we filled with
Bitcoin. Like, you know, we we filled
with Bitcoin. But with decentralized
training, it becomes okay.
Yes, we can create. Yes, we can create
this technology. We can turn the
internet into a data center. But what is
the point? The point is not to beat open
AI. I mean, we can try, but I don't
think it's I I think you you um you miss
the forest from the trees when you try
to do that because the fight is can we
create optionality?
Can people train models cheaply? Can
people train model good models cheaply?
if we get there what what do we what do
we uncover the rewards are massive
because what people don't realize and
you know what I started to realize the
more I do to um um do into training is
the first of all let's start on a very
on um let's start like this the gap
between academia and the frontier grows
exponentially because we you know we
work with some of some of them they they
they don't have compute all they do is
train very tiny models which are not
even representative they can experiment.
The cost of training models you you say
it's 10 billion. Let's say it's 10
billion to produce um um a large model.
But what they don't tell you is the run
costs like 5% of that. There's so much
experimentation that goes through that.
So okay, if we do this then we open up
the market so much more. We open up the
surface area for
of innovation because people do not
pre-train because they say it's too
hard. So that's that's that's one of the
things to do and when we create this market
market
we will be the best at it. So that
that's that's that's what really excites
me about Templar. Grail
Grail is is one of the funniest things
because for every other for every other
um subnet I've started the subnet and
I've um you know I've started it I've
understood it and I've um you know I
could always keep an eye on it but grill
is different um I don't know I I didn't
know any RL pro training I was like you
know but you you can pick up a couple of
things so I voded the first thing then
on the A um Ora joined. I believe he
deleted my repo. He's like, "Fuck this
shit." [laughter]
So, so Grill is pretty much just an like
you know, we're very blessed to have
him. He and we've done amazing things at
Grill. Like, you know, you know what's
crazy is we there comes a point where we
realize, okay, when do we start pushing
things out? Because as you find out,
like we wrote something two months ago
and not two, no, a month ago, published
it. It's training what everyone that our
technology is training what people use
now. So again, it's, you know, we're
we're pretty happy. Now, Basilica,
that's interesting because oh my god, no
one told because I thought that okay,
you know, you've done template
decentralized training, let's pick up
infrastructure. It's a different type of
hell. I've never seen more code in my
life. Like it's it's crazy, but it is
coming together. We are moving quite
quite quickly. So what our whole thesis
on is Basil in Basilica is
um we took on the problem because
there's an econ there's a different kind
of economic problem you have to face
with compute networks is you can do
everything on them but what kind of services
services
um how do you bootstrap both economic
activity revenue and how do you become
profitable on decentralized comput
networks you don't own you don't own any
of the resources so you can't really um
uh control demand supply but One thing
that stood to us from the very start is
selling those GPUs without um renting
out those GPUs without um
it could never make economic sense. We
would have to because most compute
networks they pay $1 and they sell and
they they pay $1 and their tokens and
they and they receive 80 cents in cash
and they bank it. That's token
arbitrage. So the only way in comput
networks can ever return more of their
um more than um more than their inputs
is value added services. So we've built
you know we're doing a couple of things
we've got open core it's quite
successful we're running um we've got
our decentralized pass services
will talk through it
>> can I just happen just just for the
audience like let's talk like raw
numbers and Evan you can jump in here
too because you might know even more
than than Sam like okay so B200's H200s
H100s 4090s 3090s we have all these GPUs
CPU servers etc um you know if if comput
providers come to Basilica and they say
they they have a bunch of these and you
say we'll pay you um some sort of rate
on them by the hour because it's a
continuous network right so it's it's
it's being you're paying them in alpha
tokens as as time goes by right
continuously um what what is the markup
if you were to if you were to rent that
those machines and let's say that you
had the AP you guys do have the API it's
easy to rent SS SSH keys you can go on
the machines like you're not talking
about much markup but just what does the
market look like for these machines?
Because like B200s, for instance, if you
can give me a B200, someone's going to
rent that. There's like a shortage of
B200s. Like the price for B200s is very
high. Um H200's just about the same. So
like they're flying off they're flying
off the the racks here. You know, why do
you say that there's no money in that?
Like is it because there is just, you
know, in relative to making real money,
um the margins are pretty low? uh or or
or is it actually quite impossible? And
and why would that be? Because like
surely there's all these compute
providers that have machines that can
just bring them over to your protocol.
Anyways, Evan, speak to that like like
what are the numbers you see like just
give me some concrete if can you give me
some concrete and numbers? Let me take
let me take that because it's um it's
evan will come on but it's something
that you know we I've actually thought
about a lot is computer networks are
like on I mean things are changing now
in bit tensor but traditionally computer
networks are lazy you understand so what
happens is you are you introduce the
miner is the middleman he's not
connected from supply he's going to get
it from shade form and because your um
because your thing is lazy easy you are
not securing the best price. So you need
a mechanism to push it down. I mean
there is there is a margin but it's very
thin and you need to push your incentive
mechanism to incentivize the actual um
owners of the resources or people lower
down the value chain to be able to um to
be able to on board onto your platform.
That's a very big that's a very big
problem. and we have solutions around
that and even when they do you need to
match it with supply. So it's possible
but it's really really um it's really
really hard and take for example we we
do have solutions to get like B because
we don't have any B2 I mean we do have
B200s come on but they're very sporadic
so we we um in V2 of our incentive
mechanism which um Alex can talk about
or V3 of Alex can talk about is the need
is a being able to dynamically respond
to market conditions because we went a
bit full [ __ ] on the um on the
mechanism now and we don't pay if it's
not rendered out. So what and Basilica
is two bits. So we have a um we have a
central we have a secure cloud and we
have the boss. Now the the aim of the
>> we're not going to get rented out. You
understand? So that that's that's good
from a cost perspective, but it's not
where we want to go because the real
power becomes when you can lean into the
incentive mechanism to do something like
the the supply crunch. So if we could
dynamically say, okay, we can match
someone that wants to pay $5 above
market for B200, that is where the value
is in that. But I I think we we'll be
get um we'll be rolling out the new
incentive mechanism next. bringing this
question to to to Evan, right? Like
Evan, can you answer this question? >> Yeah.
>> Yeah.
>> Right. So the
so what happens with the with the
compute networks the innovation is not
uh is not to you know
get the GPU from somewhere and then and
then just for you know and then just
forward the you know uh the SSH access.
uh when when it comes to uh GPU networks
and compute networks in general uh the
the key components is the contracts
right so so you need to uh so when you
go and negotiate uh with uh you know
with a data center to become a minor or
the miner as Sam said who becomes the
middleman to negotiate a contract uh
they need to have some uh some
longativity right so they they need to
they need to go and and have some
security uh or or feel secure enough to
go and say, "Hey, um I'm I'm going to
buy this commodity and and I know this
commodity is going to be utilized." uh
and this is where the the innovation
comes in because uh and this is why we
see you know we see so you know such a
poor uh I guess performance from in
general from uh decentralized compute
networks is is because it's hard to
provide this uh this kind of security
this kind of enterprise security which
is a business which is a business
problem and and we think that this is
where the added services uh that we're
going to talk a lot about in in the next
slides uh are are actually you know uh
solving a lot of the problems uh because
because they guarantee they guarantee
utilization they guarantee utilization
on uh on how efficiently uh you know uh
those those uh computes uh those compute
workloads are going to run uh they they
guarantee uh they guarantee uh hype uh
and when I say hype I don't mean you
know in a token uh you know in a token
way but but for example you know we we
saw auto auto research for example uh
that you know has uh popped in left and
right uh after car path is uh you know
um uh you know broadcast and and and and
people wanted to play with this right
and if you if you go and see for example
you as a researcher to go and rent an
H200 my god this is you know this is a
this is this an expense uh you know uh
for you but if you you know if you if
you go to a compute network that says
you know what just run this and and and
we going to guarantee utilization and
then we we can go back to the you know
uh to the uh to the miners and to the
data centers providers and say you know
what we we can guarantee a huge
percentage of of this utilization. Uh
this is where the the novelty uh you
know uh comes in. Um so yeah it's a it's
more of a old school novelty uh an old
school problem uh that requires you know
new school novelty uh I would say and
and and it's mostly business so it's not
it's not it's not the kind of novelty
that you will see from templar and grail
but it's it's uh it's literally business
novelty uh in order to make it uh you
know make it a success. So right now how
does the mechanism on basilica work?
It's it's it's similar to the other
comput networks. Um but you guys are not
using TE or you are using TE and like
you know my understanding you know the
the problems that other computer
networks came against was um was really
like there's so many ways of spoofing
machines and so like did you guys solve
that in a novel way? Um it sounds like
Alec you want to jump in?
>> Yeah. Yeah. I'd love to.
>> Yeah, go. [laughter]
>> So, my my background is actually I was I
was mining in Liam. I was mining in
Shoots Targon and then Sam found me and
now I talk to Evan all day. But um the
yeah I think like the problems that we
saw with the other comput subnets was
that it's just really hard to basically
specify like a model of GPU that you
want and you know if that's the only
criteria you're looking for there's so
much sort of variability that goes into
a GPU instance right so if I just say
hey you know I want an H200 it needs to
be bare metal with TDX enabled um and
then run our you know TVM on top of that
like you know that doesn't specify the
geography that doesn't specify the type
of network that it could be on um it
doesn't specify the terms that I have it
on. So if you know we take that in and
you know say we pay $4 an hour for it
because it's super scarce, we want it
and we're like okay we have our H200 and
then you know another subnet goes and
launches their product on that. um we
have no way of ensuring that that's not
a spot instance from you know a provider
in Asia right and if that instance gets
taken back by that provider because
there's a shortage right now and we lose
all of the customers data there's like
no ability to get them a new node it's
just like a a huge problem right and
then you know sort of back to the
numbers that I think you were looking
for like you know if you take a spot
instance you can get it for 50% of the
price of an on demand instance so if
miners are going in and they're able to
buy spot from the market and then sell
it to subnets as if it's this ondemand
product. Like they're taking this huge
margin that really shouldn't be there
because like the appropriate price for
spot is what they're, you know, very
close to what they're getting it for
from the provider and then if all
they're doing is just simply reselling
us access to that node, like that's a
that's a 5 to 10% margin game that, you
know, is is fairly well trodden in the
like in the traditional market. Like you
have big providers like Hydra Host,
Fluid Stack, um, Shade Form, like
they're all out there and they've got
inroads to every one of the major data
centers and they'll they'll resell that
[ __ ] like all day long like for 5%
margin. So it's like I don't think
there's a ton of value in just focusing
like Evan said on okay here's
infrastructure as a service here's a 5%
markup you can pay in tow like I think
that's a very basic value prop that we
can offer and it's it's useful to people
but it's really that like higher layer
stuff where you know we can we can take
a bunch of infrastructure we can have a
very efficient scheduler in the middle
of all of it and drive like super super
efficient utilization that disperate
teams wouldn't be able to achieve
um independently.
>> So does that mean that you guys have
written your own form of verification or
not? It's not like that on Basilica.
>> You guys basically
>> do you have
>> the GPU verification?
>> Mhm. Yeah. [clears throat] Like those
standard questions, GPU verification,
machine, host, IP. There's some things
you do do, there's some things you can't,
can't,
>> right? Yeah. So I mean spoiler spoiler
alerts because you know we we put so
much effort for the next slides but uh
>> Can we can we follow the slides like the
whole the whole I had people working on this
this
>> I'm just I just I'm just an artist. I
like to follow my ADHD.
>> So the short answer is yes. And you know
if we haven't covered it in the in the
slides then you can ask you can ask again
again >> typical.
>> typical. >> Okay.
>> Okay.
>> Okay. Sorry. So Alex um okay cons you've
messed that slides show up. How do you
want to do it? Do you want to still go
on on Baselica or where does your autism
lead you? Just dig deep man. Where are
we going next?
>> Me or you?
>> Yeah. Which one do we want to start
with? We're on Basilica. Let's we'll do
Basilica now and then maybe we can jump
back to the beginning and then we can I
don't think it matters which order.
>> We'll let we'll let Con guide us.
>> Hi J. So right let me let me let me
start the you know the I guess the you
know go in a pitch mode I guess. Uh so
uh you know so what we what we wanted to
you know uh communicate pretty much on
Basilica uh uh is that you know is is
the next the next stage right and and
the next stage is uh agentic. So uh you
know so as we know uh you know we all
use agents right now and every agent you
know in order to do some real work uh
you know like train a model run an
inference you know run oh I'm hearing my
voice twice uh for some reason uh or or
you know run evals um it needs it needs
compute it needs GPUs right and right
now this you know is literally what we
what we were explaining uh just before
uh uh you know in order to get those GPU
use uh what happens in in pretty much
any any kind of compute network is that
you know you you know you go you know
you you go in a dashboard click around
or maybe you know have a very basic CLI
uh pick a GPU model configure the
networking and then you wait and this is
again you know an old school kind of
designed uh of of an infrastructure
provider which is designed around humans
sitting on a keyboard. Um so in Basilica
we have taken a whole different uh you
know we we have we are flipping this
completely and and and and and we do
something that we just call it you know
GPU as a code right so uh so what this
means um this means that you know uh
agents you know of course they don't
navigate dashboards they don't navigate
uh human things they uh they navigate
code so uh so so so they don't have to
you know go go into I mean we don't even
have a UI for this specific reason,
right? Like, you know, they don't have
to learn our quicks and and and and
specifics. All they do, uh, is that
they, uh, they're using our SDK, uh, and
they just and they just say, you know
what, uh, I want to run this and I need
eight 8 gigs of RAM, 8 gigs of GPU RAM,
for example. And then the Basilica has
been built in such way. We have you know
we have our novel kind ofuler uh whereby
we uh we take all the compute uh that
the miners can give us uh and then we uh
we we utilize it uh we utilize it to
place this this workload uh in you know
in in in a such in a such you know uh
major and utilized specific way. So the
agent doesn't have to think about
infrastructure doesn't have to think
about devops it just keeps building. Uh,
Alex, you you know T go.
>> Yeah. So, it's honestly like that that
focus I think on those higher level
services is is what's going to drive I
think most of the value for Basilica,
right? Like I don't think anyone in Bit
Tensor needs us to, you know, help them
make a runpod account and get access to
a GPU. Like obviously I think there's
value in that we can automate the
accounting, let you pay in tow and just,
you know, your money can sort of stay in
the ecosystem. Um but I think the real
value is in driving consumption of the
global GPU supply. So we see supply
coming from a couple different sources
and I think the biggest and you know
best opportunity we have is with the
miners. So having miners out in the
market that you know have connections to
data centers. They might have contracts
already set up like say you're at a
company and you have like a 512 cluster
or something, you know, it's going to be
idle for a month. Like giving people in
Bitensor access to that infrastructure,
getting paid sort of a fair amount for
it. Um, I think that's all like a lot of
value that we can unlock from miners and
then having all of that backstoppped by
the actual on demand market. So
something that we're able to do is go
and just go to, you know, any GPU cloud
provider, form a partnership with them
where they give us a special pricing
that's below the market rate. So if you
were to go to, let's say, mass compute
and just make an account, you know,
we've got pricing that's sort of 10 20%
lower than what you're typically getting
just if you're just on that website
renting compute. So we can still sell
people the same, you know,
infrastructure as a service for the same
price that they can get it anyways, but
we have that added benefit of there's
money there's margin in it for us. Um,
and then there's the ability for us now
to use that as a baseline for the miners
where if we need elasticity and we, you
know, we need more A100s and miners are
only able to provide it at, you know, a
$150 an hour, but there's this this
whole ondemand supply at a dollar an
hour, um, the scheduler will just
automatically start using the the
infrastructure that's most appropriate,
right? And so there's obviously a lot
more metrics than model of GPU and
price. like I think there's you know
geography there's the spec around the
machine but that concept of basically
using the you know the true on demand
market to baseline the miners and then
having an incentive mechanism that
allows us to essentially enforce an SLA
with the miners it actually allows us to
have the miners compete and bring us
what I would call like useful
infrastructure rather than just bringing
us you know GPU hours so that's concept
>> Yeah. And just just to just add to this,
>> does that mean that you guys actually
play the minor side of Basilica? So you
guys are running the minor on Basilica?
>> No. So >> no.
>> no.
>> Yeah, we we provide the integrations to
the on demand APIs. So like say like for
instance like we we're doing one with
mass compute right now. They give us a
special pricing. We make that those GPUs
available from mass compute on Basilica.
So when miners come in and they provide
their instances, they're they're in the
same pools, right? Like if miners come
in and they say, well, you know, we get
all of our A100s from mass compute. Like
we don't need somebody to go to mass
compute and rent an ondemand GPU for us.
We want them to go and forge a
relationship with a data center we don't
have access to or that doesn't have an
ondemand API and get us that commodity
that's you know like special and and
more valuable right
>> does it mean that you do the
verification through those providers in
some sense
>> we we baseline the market through those
providers right that we when miners
bring us GPU infrastructure there is
there is a technical verification that
needs to happen because like as you know
people will bring you all kinds of
different stuff right
>> yeah and how does that work? How do you
do I mean this was actually what I was
asking earlier, right? And I think it's
like really the difficult hard problem
and you you obviously know it very well.
Um how do you do that? Like you guys
build uh like I a verification script
you run on the machines. It's not TE but
it's your own custom sauce, right?
>> Yeah. So Oh, sorry.
>> So I actually wrote the
>> No, no, no. I So So again, there's a lot
of things. So most copy subnet you have
a verification package that does things.
It does asymmetric checks. So shout out
CMlex. He actually taugh taught me this.
So basically the whole game is I mean
it's not it's not foolproof and you we
have to keep adapting but the whole game
is the checks have to be quicker than
it's the proof of work. So we have
checks for compute, we have checks for
GPU, we have checks for bandwidth and we
have checks for memory. So it's X proof
of work and it has to be cheap enough
for the validator to verify in a lot
shorter time than it takes the miner to
produce it. Then we have a myriad of
other checks that Evan can talk to and
and talk about and spot checks, but
that's the main verification package. We
will be moving to TES. But the truth is
I mean we I started writing the code
like two months ago and I stopped. One
TE code is boring and two there's still
heart bleed. So almost every major T um
almost every major
um TE package now everything it's it has
a major vulnerability. I think CHS have
managed to solve it. I think John was
telling me about it but we didn't have a
solution for it. So it was about okay we
spend this much time building this thing
for it to still not be bulletproof. We
haven't done that yet but Evan you can
talk about the rest of the checks that
are not in we call it veritas that that
are not in veritas.
>> Yeah I mean so we right uh just to uh
structure my thought here. Uh so you you
mentioned GPU tech. So uh so we we we
check uh we check the whether whether
the DPU uh can produce you know uh the
work that it it is advertising CPU uh
memory uh as well uh we check uh as you
said the network uh and we also check
the network in many uh many different
levels not just uh the bandwidth uh we
are also we are also building uh when
when when the miners uh compute uh
becomes part of uh not as a rental but
uh but part of the platform uh we have
even more checks there uh whereby uh we
are we are able to uh you know uh you
know on board uh on board this uh this
minor node on our uh on our pretty much
you know custom VPN uh so we have pretty
much you know literally a global wide
network of of VPN nodes uh that that
they're coming in uh we check uh we
check their their availability uh we
check their storage uh pretty much every
every hardware that we are about to
utilize uh of course you know if it's
docker compatible if it has the runtimes
that we need and if it doesn't we we we
try to you know uh install it so so any
any kind of hardware software uh
resource that we are able to utilize uh
plus uh plus you know security scans as
well on on the OS itself
um yeah everything everything that we're
about to utilize we we we scan it and
and the scan the scan uh uh period takes
about five
five minutes maybe a bit more uh you
know for for the much larger computes
but that's about it. So, so yeah. Yeah.
>> Wow, guys. So, it is a liquid market. I
mean, like that's impressive, guys.
Like, like I hats off like to get that
working is no small feat. There's a
bunch of people have been trying to do
it and like, you know, we've there's a
variety of different ways. Liam has
their own way, TVM on Targon, but to to
also just get it done is is incredible.
So, like hats off to them. Um, I'm I'm
I'm conscious of time. Um, I really want
to speak with with an about about pulse
because this is this is what you were
talking about in terms of something that
flew under the radar. Um, and and like
it's being used by it was used by
cursor. It was mentioned by cursor in
their paper tell us
>> of course of course uh so hey guys
uh so I mean the whole purpose of pulse
is like post training for now right as
you know Sam mentioned before uh what
happened in grog tree was like they
spend the same amount of compute on like
our post training uh that they spent on
like pre-training so I really uh like I
I I think I'm I really believe that like
post training is going to be uh like
major part of like you know model
training future. It's already is all the
agent ticket stuff that you see it's
mostly like RL post training with proper
I mean uh data that they use for
pre-training [snorts] but I mean when
you come here right like u the hardest
part for post training is that you have
so many moving modules right it's not
just the pre-training some of the things
are way easier right compared to
pre-training for example u you know like
a major part of the compute in our post
training is is kind of goes through the
inference part, right? If I want to give
you like you know a quick overview like
in in like of how RL works is that you
have a trainer module and you have a
inference module right and the what
happens is that uh you know trainer
trains the model and inference just
generates like rollouts for the model
those robots are like you know when
you're talking with chat GPD it writes
thinking or something like that those
are the robots that are happening right
or when you're talking with cloud code
it executes a lot of you know code you
know write tools does search all those
stuff right so inference is doing taking
care of that right and it's kind of you
know it's uh interacting with an
environment and you know generates all
those rollouts give it to the trainer
trainer improves the model using those
robots and actually sending a new model
like updated model to the inference
module and this loops keeps happening
right until the model gets really good
right but all these have like its own
complexity. One of the major complexity
they they need to communicate with each
other, right? And the weights needs to
be sent over the network, >> right?
>> right?
>> And weights are really big. I mean, if
you look at like a one trillion
parameter model, right? That's like uh
two terabyte of like like you know like
almost like two terabyte of weights you
need to send that over the network,
right? to to uh to just like you know
for the for the um for the trainer for
the inference modules to be able to do
inferences on that to do agent stuff on
that and everything. So I mean just just
compute like how much two terabytes of
data like it's going to take time to be
sent from like one part of the globe to
another part. It's like crazy. Like it's
like 100 megabyte per second, megabit
per second kind of like
>> thousands and thousands and thousands of
iterations like this where you're you're
getting the miners to run rollouts. You
produce this large data. >> Exactly.
>> Exactly.
>> Do a training step on a large batch of
of rollouts and then you pass the model
state back to the to the miners and that
could be you know terabytes in size. So
it's very it's very large. And so pulse
is basically your guys' method for
compressing that trans that
communication between the trainer and
the inference miners. Right.
>> Yes. Yes. Exactly. And the crazy thing
about it like I'm saying like this part
is way easier than pre-training because
what we realize is that in oil post
training most of the weight updates at
least in the short term are as sparse.
In long term they're sparse but not as
sparse. So what happens that like only
1% of the weights getting updated we
don't need to send all of them but this
wasn't a known phenomena right in the in
the industry if you look at like
intellect two paper which like you know
was came out from prime intellect what
they did they spend like 14 minutes it's
like it's a lot of time like I they
spend 14 minutes sending 30 billion
parameter model weights over the network
that's a huge overhead I mean then like
you you're competing when you're, you
know, competing with OpenAI, which they
they have the O data center, you're
you're you're going to be 10 times to
like 20 times as slower than them in
training or even more. Right. Right. So,
it's going to be that's the largest
bottleneck right now. It's like when you
look at at how you guys are doing
training training in this regard,
>> the biggest bottleneck for training, the
thing that's slowing us down the most is
this bandwidth. It's getting the model
updates across the wire back and forth.
It's not even necessarily like the
rollouts the rollouts scale
horizontally. So if we can solve that
problem that means we can take way more
compute to it
>> and like for rollouts each each of the
rollouts are like a single entity right
you can stream them over the network as
well they don't need to all arrive at
the time right so it's not like
sometimes rollouts can be really big as
well because I mean if you consider
you're generating like you know 100,000
tokens or something like that and you're
having like you know thousands of those
sitting over the network that could
become pretty big as but you can stream
it. It's okay if it is off policy. You
can do a lot of things but you can do
the same with weights because it can
really destabilize the training.
So uh so like and one thing that I want
to mention like the reason that like you
know the inference part the weight sync
we call it like weight sync weight
synchronization right is the major
bottleneck in RL it goes back to another
assumption about RL that like most of
the compute is done in the inference not
in the trainer I mean if you look at the
paper sometimes 90% of the compute is
for inference only 10% is for the
trainer right it it depends like how you
do it but like especially If you go to
agentic stuff and environments become a
bottleneck as well that can even go to
95% right because you need to run your
environment and like I can't talk about
that all the like for for the whole
meeting but uh so that's why like you
know the weight still becomes a
bottleneck so in the current uh
implementation that we have we kind of
got rid of that like 100 times f it's
it's kind of like we got bandwidth
reduction by 100 times and I was telling
you about
>> and So how do you do that? Is is that by
looking at the update and trying to
figure out for the update what are the
parameters of the model that are that
are um most changed during the gradient
update step and then only basically
setting that parse compression. Is that correct?
correct?
>> Yeah. Exactly. Exactly. Exactly. And
it's lossless. So it's like it's just
like only certain number of parameters
changes not all of them. Just like natural
natural
>> and that's a quality of RL training.
It's like when we're doing RL training,
it unlike pre-training [snorts]
and fine-tuning, I suppose, it's very
much more like a Laura in the sense that
it's like a you could there's a very low
rank um representation of that update
because the especially because of the KL
minimization, etc. in the RL update
step, you you you only need to move a
couple parameters and like that's where
the like the really fine-tuned changing
comes from. So just this is incredible
by the way and like it's it was being
used by cursor which is fantastic. It's
really good news that this is going out
to people and they're obviously getting
closer to to like training like this
over the internet because it's it allows
us to scale as far as we can. Question
about where we are with Grail like what
experiments have you run on the network
here? Like what have you seen in terms
of like the amount of inferences we can
get? Like now that this is broken down,
we put this into the wild. Like how big
can we go in terms of machine learning
model that we can pre-train? Can we get
to 30 billion parameters parameters?
Like can you start you know testing
these on aine and and like and training
on RL environments using your network
because that's a big moment right for
Grail you can get incentive directly
from from 120.
Um I mean to be honest I think based on
just this researching that we've done uh
we can even go like one trillion
parameter model right but it the hardest
part then it becomes like but the problem
problem
>> wait I'm not taking that cell pressure
like let's take this off [laughter] I'm
not taking that
72 I mean theoretically we can but like
let's you know let pump our bags before
we do one trillion
>> but 30 billion barometer is going to be
pretty easy, super easy. like it doesn't
need that much uh like that much uh
effort actually if I just get like a few
B200 I can just train I start training
that the the hardest part right now is
like context length right context length
is going to get really big as well and I
want to kind of make sure that the
compute can be heterogenous right like
so it's like if there is uh if they're
using A00 it works if they're using B200
it works and if they're using like M4 on
MacBook like MacBook chip like it works
as well, right? Is it there now that
Grail's verification which is a form of
inference verification using like
similar to top lock is it that still
like the the the original design for
Grail was something along those lines.
But is it that today you guys are doing
a um you know uh a sort of a hashing of
the hidden states etc etc. Is that are
you doing that or not? And and if and if
you are why does it need to be why can't
you use heterogeneous compute?
>> That's a really good question. So the
hashing is good like we came up actually
with a new verification algorithm. I
kind of you know described in a blog as
well. They can go and look at the code
as it's super like uh efficient in a lot
of respects. But the problem is not like
verification of the rollouts per se.
It's verification of the lot
probabilities that they are sending us.
Right? So when you do RL training, we
need to get really technical here.
Right? When you do RL training, you send
the rollouts and you send the
probability of every token that has been
chosen along the network as well, right?
That kind of shows like like the
previous network that has been used for
this inference generation, what
probability it gave to every of these
tokens, right? And this probability
really helps with training you know the
model in the trainer like it's part of
the it's it's it's it's used in
something called importance sampling
ratio right and it's really important
you can you can use it in other side as
well you use the additional information
from the logic dimension per token which
is a lot more right because it
potentially it's like top k top of each
token you can use that to actually train
the model even better
>> yes yes uh like that a stabilizer
training you can get rid of that but
it's kind of it kind of get it can get a
little bit messy so just for sending
that if they're using like a really
different compute right like for example
M4 that that probability can change
drastically because if we go really deep
like it's a lot of stuff happens in
different hardware right that can result
in like different like log probabilities
and even sometimes different inference
kind of you know u uh that can come out
and that can be a stabilized training So
I I can do it but I haven't taken the
risk like I can like engineering wise I
can do it but algorithmic wise I the
hardest part is like I need to find the
like error bond over at least a lot
probability to make sure at least so
just consider we have an we have a uh
malicious agents right like malicious
agent that kind of like ones who screw
us up right they send us a good rollouts
with bad luck probabilities
Right? And if we give like too much of a
you know freedom to them to do that that
can screw us up really bad. Right? So I
need to see like if we are running it
for example or M4 M5 like I need to do
like a really big benchmark on our you
know proof and everything like to make
sure like if there is how much room for
error we give them for the probabilities
right I I I can't come up with like some
stuff like theoretical but I need to run
this larger scale experiment to make
sure like you know this kind of runs up
because it you can't get the values
theoretically for this one it needs to
be really properly benchmarked uh And it
really depends on the problem. It
depends on a lot of parameters. So it's
going to I I need to do a really large
benchmark to set that up. But it's like
it's not like it's definitely like one
of the achievable things that I'm
actually working on like building those
kind of benchmarks. Uh and that that's
kind of like so it's more like
benchmarking rather than like I'd say um
engineering kind of you know innovation
here. But it's a really big benchmark
that I need to run
>> right now. What have you trained on the
network so far? What do those results
look like? Are they significant just in
the in and of themselves?
I mean we we so far we we trained on uh
we trained on math, we trained on code
and right now we are training on GPU
kernels, right? Which is like code that
is running on GPUs, not like CPUs. It
becomes like uh way more complex.
So when we ran on math and code, we
realized like our system is working
right. It's working properly. We got
like 40 uh like for like 40% 60%
improvement like when we were actually
doing it. There's a blog about it. We
ran a lot of other experiment as well.
So they proved that system is working in
that scale like that was like a few
months ago. uh now we are like trying to
increase like this like the scale of the
system so it can handle like agentic
training right and agentic training for
GPU kernels for like for right now at
least like but I mean you can use it for
anything after you know the
infrastructure is done uh and for that
uh we got some really good results
already I mean it's working and it I
mean last time I ran it at least for
kernel correction we got like 60%
improvement I mean like from zero to
like 60% at least with like uh uh quent
tree model right tree like 8 billion
parameter model right but this this is
these are still like I would call them
like not crazy big result but because
like what we are building toward is not
just you know uh beating the benchmark
in these kind of instances we are we are
building toward building useful models
like useful open source models for these
specialized task or even like general
tasks like for post training and what
I'm like really focused on like building
the IMFra and making sure it's
bulletproof which like so far it's it's
it's proven that it is because every
training run that we run like it it
never destabilizes it runs properly and
there's actually so much more innovation
to do that like I'm really excited about
but so it kind of shows that the system
working but like it's all about the
scale we want to scale it so it can it
can actually handle uh the whole agentic
infrastructure it And it can it can
handle like like 1,000 100uh,000 tokens,
right, as a context length or even more
like it can handle like 30 billion
parameter model. Uh and it can handle it
really good like with a stabilized
trainer that like it doesn't get like uh
uh dabilize and you know diverges and
goes bad.
>> So Sam, when will the covenant subnets
be at 0% minor burn? Um,
Um,
it depends. No, no, no, no, no. That's a
good question. No, no, no. I I actually
want to talk about this because, you
know, it's it's it's so Okay, little
rant and I'm conscious of time. It's as
bit tensor, we've had to grow up. This
is my personal opinion. And we started
off very technocratic, very autistic,
minor is God. Then
DTO introduced a creature none of us
knew how to deal deal with. It was
called the investor. And instead of
minor is God or subnet owner is God or
validator is God. We just learned that
we're in this very delicate balance
between the technical people that
produce the resources and people that
produce the capital. Why? So where does
this take me to? Because you know again
somebody I respect a lot in the industry
is John Durban and he put up a post that
you know a lot of subnets burnt but I've
been on two sides of it you know first
of all you drink cons coolage you're
like no you know let it go let it you
know you know pay minus all the
incentives but one
it's not always the right thing. So as a
subnet owner your job is providing token
order value whether you like it or not
like that is your job but it's also
finding so miners are not the cost
reduction exercise they're resource
optimization exercise. So ideally you
should be finding the sweet spot in in
between how okay if your subnet is good
then if your mission is successful
you will you know it'll be worth it but
you need to lean into the miners to
understand how to to make your mission
successful. I don't believe it's paying
them I don't believe it's paying them excessively because again you know
excessively because again you know things are different and if you have to
things are different and if you have to always assume 100% 100% um sell pressure
always assume 100% 100% um sell pressure on everything you pay out. So it's
on everything you pay out. So it's really it's really it's really really