YouTube Transcript:
I got a desktop supercomputer? | NVIDIA DGX Spark overview

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

Hey everybody, Timothy Carbat, creator

and founder of Anything LLM. And today I

was going to do a different style video.

Now, usually the videos I do are all

about anything LLM or just like AI tech

in general. And you know, I'll like run

some models, do some tests, highlight a

cool feature that we built, something

like that. Today, I actually want to do

a little bit of a different review and

just see how that goes, honestly, cuz

why not change it up? So, for my first

video of this kind of format, uh, we're

going to start off with a bang. I

recently got access to Nvidia DGX Spark.

Um, it's right here.

So, I've had access to this for about a

week and a half now, and I've actually

been using it as my daily driver. So,

you know, my day-to-day job, right, is

making anything LLM better for you.

Doing that, I usually do it on a

MacBook, but I have a whole bunch of

other computers, so I can test it on

everything. And uh now I have a DGX

Spark which is actually running Ubuntu

or DGXOS which is a version of Ubuntu

24. Uh so it's very familiar. It feels a

lot like Mac if you've used Ubuntu

before that someone's going to get mad

about that I'm sure. But I want to jump

into a review of this. No BS. We're just

going to get to it. So as I said for the

last week and a half or so I've been

actually using this as my daily driver.

Personally I'm impressed. It's a lot of

fun and it's just really cool to use

because even with someone who has like

an Nvidia GeForce RTX 5090 uh like this

is still cool. It's supplemental to

that. It's not a replacement. I'm going

to get into that more in the video, but

let's talk about kind of unboxing right

now. So, when you get this thing comes

in a pretty hefty box and you just slide

it on up. All the chargers and stuff are

on the bottom, but what you get

something that looks like this. So

immediately you're going to want to pick

this thing up. And you're going to

notice that while it feels very sturdy,

it is actually pretty light. Uh it's 1.2

kg or if you're in freedom units, that's

about a shave over 2 1/2 lb. And then by

dimensionally, it is 150x 150x 50 mm,

which is about 6x6 by two. Again, in

freedom units, the first thing that I

noticed about this was the color. Uh, it

might be hard to pick up on camera and

I'm not sure if that's even focusing,

but it is this nice gold color that

there's two immediate things that come

to mind from my childhood. One was the

gold Game Boy Color and the other was

for I think it was the Nintendo 64 Zelda

Ocarina of Time cartridge and it was

about the same kind of color. There's no

like sparkle in it or anything like

that, but it just it's just such a cool

color. So, the first time that I think

this got mentioned was actually at CES

of this year where they talked about

something called Project Digit. Uh, this

is that they renamed it and it's now

called the DGX Spark. And you're

obviously wondering because it is

Nvidia, what is in this thing? This is

not a hardware review channel, so I'm

just going to kind of give you the

highlevel stuff. In here is the Nvidia

GB10 Grace Blackwell Super Chip. This is

a unified memory kind of system here and

there is 128 GB of LP DDR5X memory in

here. This particular model has 4 TB of

storage which is plenty. And of course

you have a 20 core ARMbased CPU which is

really great because of power draw

concerns. This thing uh sips power. I

wish I had more metrics on that. I

didn't have an ampmeter, but uh I mean

it is ARM based and I do know it is

drawing less power from the stats that I

can collect. Of that 128 GB of unified

memory, I believe 96 of it can be

allocated specifically just to the

VRAMm, although I don't know if that can

be unlocked. I'm sure someone will find

a way. And when it comes to memory

bandwidth, it's 273 GB per second. This

actually allows you to run models up to

200B depending on the quantization

obviously. And then what you really get

is about one pentaflop of FP4 AI

performance. If you aren't an AI model

nerd, you don't care what that means.

But if you're an AI model nerd, you

probably care about what this is. And as

I mentioned before, it comes with Ubuntu

24 LTS with this specialized kind of

just a lot of preloaded software, the

stuff that you need to build and run AI

tools or run fine-tuning jobs. A lot of

the default stuff's already in here. So,

you've got the Nvidia Nvidia container

toolkit, you've got Nvidia SMI, like

basically any tool that you would need.

Uh, all of that is just pre-installed,

which is so nice because you don't have

to install it at all. So, what are we

going to showcase today with this

computer that was built from the ground

up to build and run AI tools like

anything LLM, but also for fine-tuning

and all these other things. Uh, today

specifically is first I'm going to show

you around the OS. It's very familiar if

you've used Ubuntu. Um, we're going to

actually use anything LLM and some other

tools that can run natively on this

hardware to be able to just show some

models running and benchmark them and

just get an experience uh and also get

an idea of obviously tokens per second.

And then also I would like to show a

pretty realistic fine-tuning example as

well where we're going to probably use

midsize model like GMA 3 4B to make a

fine tune for some specific use case.

There are two ways to run this. When you

get the manual for your DGX Spark, uh

it's going to actually give you two

configs in here. So, you can use it

basically as a desktop. Uh you plug in

an HDMI cable and whatever other your

peripherals are, and you just use it

like a computer. Uh it has a whole setup

process if you've used Ubuntu. It's

pretty much exactly like that. But then

there's an another mode which I think is

also interesting where you can actually

use it as aworked device. So you could

have this centralized in your office or

in your house and use it as a dedicated

compute machine for AI workloads, which

is the next thing that I would like to

get to is specifically AI workloads.

There have been a lot of criticisms or I

don't even want to say a lot and I also

don't even want to call it criticisms.

Uh it's just people I guess talking

about this on Reddit saying all of these

things like it's supposed to replace a

Mac Mini. This is not that. Uh this is

an additional compute re resource that

you can just use to free up whatever

you're using already. So like I have a

GPU on my computer. I can continue to

use that and then offload work to this

dedicated device for that. People have

home labs with Mac minis strung

together. Uh this is not a Mac mini

personal computer replacement, but it is

for the home lab use case where people

have been chaining them together. In

fact, actually, you can stack two of

these on top of each other, and there's

a big connection port in the back that

you can chain them together. And so, you

can actually get double the output,

which is really cool. You can run really

large models at actually a good quant.

And of course, because this is the DGX

kind of OS, if you do, for whatever

reason, have access to an a $350,000

H100 server, the code you write on this

for your apps or whatever jobs you're

running, you can actually just use on a

server. As you can tell from the

background of my video, I do not have

one of those servers. If you're building

a home lab dedicated for AI workloads,

which I see all the time on r/local

llama, this is a reasonable device. Now,

it depends on what your price range is,

but I've seen some really expensive Home

Lab setups, and I think that this is

actually in a reasonable price range.

And on that note, I do want to say that

the one I have, which is very clearly

labeled here, is early access. So, the

stuff that I'm getting, the results that

I'm getting, uh, could be better, could

be worse. They might just be different.

Um, but just something to highlight

there. And I just want to take a quick

little sidebar. Uh, for those of you who

don't know, my background is actually

mechanical engineering. Before I got

into the whole founder software thing, I

was a mechanical engineer. And this

thing has just a couple interesting

design highlights. And I'm going to

actually pull in the zoom here. Uh so we

can go over these kind of details. So

looking at the front of the device, uh

you can notice that, you know, there's a

little bit of these polished kind of

areas right here that also expose some

vents. But one thing you may have

noticed is this very interesting

material choice on the front of the

device. This looks like some kind of

like open cell metal foam, probably

aluminum, but this is actually an air

intake and and it's just a really

interesting metal choice and just design

decision in general. I personally really

like it. It's also not very rough to the

touch. Like it doesn't have any kind of

like burrs or snags. So, you can like

handle this pretty reasonably. Uh, and I

imagine stacking two of them would look

really cool. The bottom of the device is

really nothing that you wouldn't expect.

So, of course, you've got your kind of

grip here to keep it from sliding around

on surfaces as well as an additional air

intake. And on the back, we get that

same open cell metal foam finish again.

Um, but this is also exhaust as well.

You can feel kind of air coming out of

there. You have your power button, which

my first complaint about this device is

it has no on light. You have no idea if

this thing is running. So, what I've

been doing is putting my hand in the

front of the front vent to feel for any

kind of suction. Um, or just putting my

ear up to the device and listening for

the kind of worring sound that you can

hear. You have technically three USBC

ports. The first one is for your power

and then you have three additional. I

personally have no USBC peripherals, so

I had to buy some converters off Amazon

for about $6. And then you've got your

standard HDMI port. And then you have

your Ethernet port. Uh it this comes

with Wi-Fi and Bluetooth. These are the

specialized ports that can be used to

stack two DGXs together. Now for the

next part of this video, we're going to

get into the software side of things.

We're going to run GPT OSS120B.

Yes, the big one. Then of course, we're

going to jump into that simple

fine-tuning use case just to get an idea

on times for that. So when you first

boot up your DTX, you're going to be

greeted with a screen that probably

looks uh a lot like this. And you'll

notice it looks and feels because it is

Ubuntu. And so if you're familiar with

Ubuntu, you're already familiar with

this except it comes preloaded with some

additional software. Uh as well as

tools, and that's the real nice part. Um

so for example, you know, you've got

your regular stuff like your system

monitor, calendar, you've got Libra

Office, that kind of stuff. You've also

got this DGX dashboard. You also have

the NVIDIA AI workbench which is really

nice because if you do a lot of Jupyter

notebook stuff or like uh data science

or even like training models, uh the

NVIDIA AI workbench is a great tool for

that. It just comes with your very kind

of like basic software. VLC is already

included and of course it comes with you

know some cool backgrounds. This entire

UI should feel very familiar. Uh some of

the tools that are very useful that it

comes with is like uh NVCC is already

installed. Uh

Nvidia SMI already works and you can see

the driver version that we're on. We're

on that GB10 supported CUDA version of

13. Um and then of course if you are

interested in your GPU

stats and VTOP is also already present.

And so there's just a lot that you can

see and do uh in here just by default

without having to set up any additional

software. I think people know that

setting up all of the CUDA libraries and

the toolkits and the stuff that you need

is always just another step to take. But

for the next part of this video, we're

actually going to play around with some

actual models. So I already have Ola

installed. I'm on 12.5 and I actually

already have some models installed. I

have GPTOSS 12B installed right now. Of

course, you can always just, you know,

doma run GPTOSS 128V

and then just send a simple you need to

send a simple message and you get some

tokens back. But sometimes you'll want a

little bit more verbosity. So you can

say hello again and maybe we can get

some stats this time. You can see we're

sitting at around the 30 tokens per

second uh rate. And chatting with a

model through a CLI is, you know, I

mean, it's fun, it's useful, sure. Uh

but what most people do is they have a

tool like anything LLM where you can set

up workspaces. You can have access to

agent tools. You can build your own

tools in a flow builder. You can write

your own code, use MCPs, search the web,

generate charts. You can do a lot just

in anything LLM. And of course, we're

going to do all of this by just hooking

up to Olama that's already running and

using that GPT OSS120B.

Anything LLM comes with its own internal

O Lama if you don't have Olama installed

on your system, but you get the same

experience no matter what. And so I

think one of the easiest things to show

is there's this website that has a bunch

of CSV files that are just good, you

know, sample CSV files to kind of show

proof of concept. CSVs are probably the

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:I got a desktop supercomputer? | NVIDIA DGX Spark overview

Video Transcript

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
I got a desktop supercomputer? | NVIDIA DGX Spark overview