0:02 Hey everybody, Timothy Carbat, creator
0:04 and founder of Anything LLM. And today I
0:07 was going to do a different style video.
0:09 Now, usually the videos I do are all
0:12 about anything LLM or just like AI tech
0:15 in general. And you know, I'll like run
0:17 some models, do some tests, highlight a
0:18 cool feature that we built, something
0:20 like that. Today, I actually want to do
0:22 a little bit of a different review and
0:24 just see how that goes, honestly, cuz
0:26 why not change it up? So, for my first
0:29 video of this kind of format, uh, we're
0:31 going to start off with a bang. I
0:35 recently got access to Nvidia DGX Spark.
0:38 Um, it's right here.
0:41 So, I've had access to this for about a
0:43 week and a half now, and I've actually
0:45 been using it as my daily driver. So,
0:47 you know, my day-to-day job, right, is
0:50 making anything LLM better for you.
0:51 Doing that, I usually do it on a
0:52 MacBook, but I have a whole bunch of
0:54 other computers, so I can test it on
0:57 everything. And uh now I have a DGX
1:00 Spark which is actually running Ubuntu
1:02 or DGXOS which is a version of Ubuntu
1:05 24. Uh so it's very familiar. It feels a
1:08 lot like Mac if you've used Ubuntu
1:10 before that someone's going to get mad
1:12 about that I'm sure. But I want to jump
1:15 into a review of this. No BS. We're just
1:17 going to get to it. So as I said for the
1:18 last week and a half or so I've been
1:21 actually using this as my daily driver.
1:24 Personally I'm impressed. It's a lot of
1:26 fun and it's just really cool to use
1:27 because even with someone who has like
1:32 an Nvidia GeForce RTX 5090 uh like this
1:34 is still cool. It's supplemental to
1:35 that. It's not a replacement. I'm going
1:37 to get into that more in the video, but
1:39 let's talk about kind of unboxing right
1:41 now. So, when you get this thing comes
1:44 in a pretty hefty box and you just slide
1:47 it on up. All the chargers and stuff are
1:49 on the bottom, but what you get
1:51 something that looks like this. So
1:52 immediately you're going to want to pick
1:53 this thing up. And you're going to
1:56 notice that while it feels very sturdy,
1:59 it is actually pretty light. Uh it's 1.2
2:01 kg or if you're in freedom units, that's
2:04 about a shave over 2 1/2 lb. And then by
2:09 dimensionally, it is 150x 150x 50 mm,
2:12 which is about 6x6 by two. Again, in
2:13 freedom units, the first thing that I
2:16 noticed about this was the color. Uh, it
2:19 might be hard to pick up on camera and
2:20 I'm not sure if that's even focusing,
2:24 but it is this nice gold color that
2:26 there's two immediate things that come
2:29 to mind from my childhood. One was the
2:32 gold Game Boy Color and the other was
2:35 for I think it was the Nintendo 64 Zelda
2:37 Ocarina of Time cartridge and it was
2:39 about the same kind of color. There's no
2:40 like sparkle in it or anything like
2:42 that, but it just it's just such a cool
2:44 color. So, the first time that I think
2:47 this got mentioned was actually at CES
2:49 of this year where they talked about
2:51 something called Project Digit. Uh, this
2:53 is that they renamed it and it's now
2:55 called the DGX Spark. And you're
2:56 obviously wondering because it is
2:59 Nvidia, what is in this thing? This is
3:01 not a hardware review channel, so I'm
3:02 just going to kind of give you the
3:05 highlevel stuff. In here is the Nvidia
3:08 GB10 Grace Blackwell Super Chip. This is
3:10 a unified memory kind of system here and
3:15 there is 128 GB of LP DDR5X memory in
3:19 here. This particular model has 4 TB of
3:22 storage which is plenty. And of course
3:26 you have a 20 core ARMbased CPU which is
3:28 really great because of power draw
3:30 concerns. This thing uh sips power. I
3:31 wish I had more metrics on that. I
3:33 didn't have an ampmeter, but uh I mean
3:35 it is ARM based and I do know it is
3:37 drawing less power from the stats that I
3:42 can collect. Of that 128 GB of unified
3:45 memory, I believe 96 of it can be
3:46 allocated specifically just to the
3:48 VRAMm, although I don't know if that can
3:50 be unlocked. I'm sure someone will find
3:52 a way. And when it comes to memory
3:56 bandwidth, it's 273 GB per second. This
3:58 actually allows you to run models up to
4:00 200B depending on the quantization
4:02 obviously. And then what you really get
4:06 is about one pentaflop of FP4 AI
4:08 performance. If you aren't an AI model
4:10 nerd, you don't care what that means.
4:12 But if you're an AI model nerd, you
4:14 probably care about what this is. And as
4:17 I mentioned before, it comes with Ubuntu
4:20 24 LTS with this specialized kind of
4:22 just a lot of preloaded software, the
4:25 stuff that you need to build and run AI
4:27 tools or run fine-tuning jobs. A lot of
4:29 the default stuff's already in here. So,
4:31 you've got the Nvidia Nvidia container
4:34 toolkit, you've got Nvidia SMI, like
4:36 basically any tool that you would need.
4:38 Uh, all of that is just pre-installed,
4:40 which is so nice because you don't have
4:41 to install it at all. So, what are we
4:44 going to showcase today with this
4:46 computer that was built from the ground
4:48 up to build and run AI tools like
4:51 anything LLM, but also for fine-tuning
4:53 and all these other things. Uh, today
4:55 specifically is first I'm going to show
4:58 you around the OS. It's very familiar if
5:00 you've used Ubuntu. Um, we're going to
5:02 actually use anything LLM and some other
5:04 tools that can run natively on this
5:06 hardware to be able to just show some
5:08 models running and benchmark them and
5:10 just get an experience uh and also get
5:13 an idea of obviously tokens per second.
5:15 And then also I would like to show a
5:17 pretty realistic fine-tuning example as
5:20 well where we're going to probably use
5:23 midsize model like GMA 3 4B to make a
5:26 fine tune for some specific use case.
5:28 There are two ways to run this. When you
5:31 get the manual for your DGX Spark, uh
5:32 it's going to actually give you two
5:34 configs in here. So, you can use it
5:36 basically as a desktop. Uh you plug in
5:39 an HDMI cable and whatever other your
5:40 peripherals are, and you just use it
5:43 like a computer. Uh it has a whole setup
5:44 process if you've used Ubuntu. It's
5:46 pretty much exactly like that. But then
5:48 there's an another mode which I think is
5:49 also interesting where you can actually
5:52 use it as aworked device. So you could
5:54 have this centralized in your office or
5:57 in your house and use it as a dedicated
6:00 compute machine for AI workloads, which
6:01 is the next thing that I would like to
6:04 get to is specifically AI workloads.
6:07 There have been a lot of criticisms or I
6:09 don't even want to say a lot and I also
6:10 don't even want to call it criticisms.
6:12 Uh it's just people I guess talking
6:15 about this on Reddit saying all of these
6:17 things like it's supposed to replace a
6:20 Mac Mini. This is not that. Uh this is
6:23 an additional compute re resource that
6:25 you can just use to free up whatever
6:27 you're using already. So like I have a
6:29 GPU on my computer. I can continue to
6:31 use that and then offload work to this
6:34 dedicated device for that. People have
6:36 home labs with Mac minis strung
6:39 together. Uh this is not a Mac mini
6:41 personal computer replacement, but it is
6:44 for the home lab use case where people
6:45 have been chaining them together. In
6:48 fact, actually, you can stack two of
6:50 these on top of each other, and there's
6:52 a big connection port in the back that
6:54 you can chain them together. And so, you
6:56 can actually get double the output,
6:58 which is really cool. You can run really
7:01 large models at actually a good quant.
7:04 And of course, because this is the DGX
7:06 kind of OS, if you do, for whatever
7:11 reason, have access to an a $350,000
7:14 H100 server, the code you write on this
7:15 for your apps or whatever jobs you're
7:17 running, you can actually just use on a
7:18 server. As you can tell from the
7:20 background of my video, I do not have
7:22 one of those servers. If you're building
7:25 a home lab dedicated for AI workloads,
7:27 which I see all the time on r/local
7:30 llama, this is a reasonable device. Now,
7:32 it depends on what your price range is,
7:34 but I've seen some really expensive Home
7:36 Lab setups, and I think that this is
7:38 actually in a reasonable price range.
7:40 And on that note, I do want to say that
7:42 the one I have, which is very clearly
7:46 labeled here, is early access. So, the
7:48 stuff that I'm getting, the results that
7:49 I'm getting, uh, could be better, could
7:51 be worse. They might just be different.
7:53 Um, but just something to highlight
7:54 there. And I just want to take a quick
7:57 little sidebar. Uh, for those of you who
7:58 don't know, my background is actually
8:00 mechanical engineering. Before I got
8:02 into the whole founder software thing, I
8:04 was a mechanical engineer. And this
8:07 thing has just a couple interesting
8:09 design highlights. And I'm going to
8:11 actually pull in the zoom here. Uh so we
8:14 can go over these kind of details. So
8:16 looking at the front of the device, uh
8:18 you can notice that, you know, there's a
8:19 little bit of these polished kind of
8:22 areas right here that also expose some
8:24 vents. But one thing you may have
8:26 noticed is this very interesting
8:28 material choice on the front of the
8:30 device. This looks like some kind of
8:33 like open cell metal foam, probably
8:35 aluminum, but this is actually an air
8:37 intake and and it's just a really
8:40 interesting metal choice and just design
8:42 decision in general. I personally really
8:45 like it. It's also not very rough to the
8:46 touch. Like it doesn't have any kind of
8:48 like burrs or snags. So, you can like
8:50 handle this pretty reasonably. Uh, and I
8:52 imagine stacking two of them would look
8:55 really cool. The bottom of the device is
8:56 really nothing that you wouldn't expect.
8:58 So, of course, you've got your kind of
8:59 grip here to keep it from sliding around
9:02 on surfaces as well as an additional air
9:05 intake. And on the back, we get that
9:08 same open cell metal foam finish again.
9:10 Um, but this is also exhaust as well.
9:12 You can feel kind of air coming out of
9:14 there. You have your power button, which
9:16 my first complaint about this device is
9:19 it has no on light. You have no idea if
9:20 this thing is running. So, what I've
9:22 been doing is putting my hand in the
9:24 front of the front vent to feel for any
9:27 kind of suction. Um, or just putting my
9:29 ear up to the device and listening for
9:31 the kind of worring sound that you can
9:34 hear. You have technically three USBC
9:36 ports. The first one is for your power
9:38 and then you have three additional. I
9:41 personally have no USBC peripherals, so
9:43 I had to buy some converters off Amazon
9:45 for about $6. And then you've got your
9:47 standard HDMI port. And then you have
9:50 your Ethernet port. Uh it this comes
9:52 with Wi-Fi and Bluetooth. These are the
9:54 specialized ports that can be used to
9:58 stack two DGXs together. Now for the
9:59 next part of this video, we're going to
10:00 get into the software side of things.
10:03 We're going to run GPT OSS120B.
10:05 Yes, the big one. Then of course, we're
10:06 going to jump into that simple
10:08 fine-tuning use case just to get an idea
10:11 on times for that. So when you first
10:13 boot up your DTX, you're going to be
10:15 greeted with a screen that probably
10:18 looks uh a lot like this. And you'll
10:21 notice it looks and feels because it is
10:23 Ubuntu. And so if you're familiar with
10:24 Ubuntu, you're already familiar with
10:27 this except it comes preloaded with some
10:29 additional software. Uh as well as
10:32 tools, and that's the real nice part. Um
10:33 so for example, you know, you've got
10:35 your regular stuff like your system
10:36 monitor, calendar, you've got Libra
10:38 Office, that kind of stuff. You've also
10:40 got this DGX dashboard. You also have
10:43 the NVIDIA AI workbench which is really
10:45 nice because if you do a lot of Jupyter
10:47 notebook stuff or like uh data science
10:49 or even like training models, uh the
10:52 NVIDIA AI workbench is a great tool for
10:53 that. It just comes with your very kind
10:55 of like basic software. VLC is already
10:59 included and of course it comes with you
11:01 know some cool backgrounds. This entire
11:04 UI should feel very familiar. Uh some of
11:06 the tools that are very useful that it
11:10 comes with is like uh NVCC is already
11:13 installed. Uh
11:17 Nvidia SMI already works and you can see
11:19 the driver version that we're on. We're
11:22 on that GB10 supported CUDA version of
11:25 13. Um and then of course if you are
11:29 interested in your GPU
11:32 stats and VTOP is also already present.
11:34 And so there's just a lot that you can
11:36 see and do uh in here just by default
11:38 without having to set up any additional
11:40 software. I think people know that
11:42 setting up all of the CUDA libraries and
11:44 the toolkits and the stuff that you need
11:46 is always just another step to take. But
11:48 for the next part of this video, we're
11:50 actually going to play around with some
11:53 actual models. So I already have Ola
11:55 installed. I'm on 12.5 and I actually
11:59 already have some models installed. I
12:03 have GPTOSS 12B installed right now. Of
12:04 course, you can always just, you know,
12:08 doma run GPTOSS 128V
12:16 and then just send a simple you need to
12:18 send a simple message and you get some
12:22 tokens back. But sometimes you'll want a
12:24 little bit more verbosity. So you can
12:27 say hello again and maybe we can get
12:28 some stats this time. You can see we're
12:31 sitting at around the 30 tokens per
12:34 second uh rate. And chatting with a
12:36 model through a CLI is, you know, I
12:38 mean, it's fun, it's useful, sure. Uh
12:41 but what most people do is they have a
12:43 tool like anything LLM where you can set
12:46 up workspaces. You can have access to
12:48 agent tools. You can build your own
12:51 tools in a flow builder. You can write
12:54 your own code, use MCPs, search the web,
12:57 generate charts. You can do a lot just
13:00 in anything LLM. And of course, we're
13:02 going to do all of this by just hooking
13:04 up to Olama that's already running and
13:07 using that GPT OSS120B.
13:10 Anything LLM comes with its own internal
13:12 O Lama if you don't have Olama installed
13:13 on your system, but you get the same
13:16 experience no matter what. And so I
13:18 think one of the easiest things to show
13:21 is there's this website that has a bunch
13:24 of CSV files that are just good, you
13:26 know, sample CSV files to kind of show
13:29 proof of concept. CSVs are probably the
13:32 most popular format for people to use
13:35 anything LLMs with right alongside PDFs
13:36 who want these models to help them be
13:39 productive. And so there's a fun one in
13:41 here called Dairo CSV which has all the
13:43 Rotten Tomato ratings of Robert Dairo.
13:46 There's 87 records in here. So, it
13:47 probably isn't all of those movies, but
13:49 it's definitely enough of them. So, what
13:52 we can do is in anything LLM, we can
13:53 just simply open up the downloads
13:56 folder, drag and drop that in, and just
13:59 ask it,
14:03 and just ask GPTOSs 12V to analyze this
14:06 data set. And if you've used GPTOSS
14:09 at all, uh you'll know that it loves
14:11 tables. And so what we're going to see
14:14 here is VPTOSS just work through all the
14:17 data here, not formatting it in a way
14:19 that makes sense, asking follow-up
14:20 questions. These kind of things are all
14:23 part of the model's capabilities. So
14:33 And so that is the analysis that it gave
14:35 us. It gave us a a bunch of quick
14:36 takeaways. We could ask follow-up
14:38 questions, but I really don't know what
14:40 else I would follow up with. I mean,
14:43 there's even analysis as to why school
14:45 scores may have dropped and why movies
14:47 were poor poorly received. I mean,
14:49 obviously, I think everybody here knows
14:51 that Taxi Driver and Good Fellas are
14:53 amazing. And we definitely don't talk
14:55 about these two movies. And also, the
14:57 more importantly, we are still sitting
15:00 at that 30 tokens a second rating. So,
15:02 we're getting consistent performance
15:05 across large outputs. And of course, you
15:07 know, inside of anything LLM, we're able
15:09 to do a lot more here where we can
15:11 modify the system prompt. We even have
15:14 prompt variables where you can add kind
15:16 of dynamic data. There's a whole bunch
15:18 you can do in this tool. And having a
15:21 really capable model that can also run
15:24 incredibly fast to do actual productive
15:26 work is really, really nice. And you can
15:28 imagine just putting this somewhere in
15:30 your office or somewhere in your home
15:31 and having it be your centralized
15:34 inference service is really a reality
15:36 with a device like this. If you are
15:38 interested in anything LLM desktop and
15:39 what I showed you, you can always
15:41 download it for free on your device
15:44 today as well as we do have an
15:47 open-source repo here that is available
15:51 and MIT licensed. Um, but this video is
15:54 not about anything LLM. It's about the
15:56 DGX Spark. So, in the next part of the
15:58 video, I'm going to talk about
16:01 fine-tuning your own custom model on
16:04 this hardware. I do want to preface this
16:06 part of the video that not only is this
16:09 the nerdiest part of the video because
16:12 it is going to involve code. We're going
16:14 to be in a Jupyter notebook, but also
16:18 this is really where the DGX Sparks also
16:20 shows its unique capabilities because
16:24 this is the stuff that it was built for.
16:26 What we're going to do is I have a
16:29 Jupyter notebook already uh up, but
16:31 first I want to talk about Unsloth.
16:33 Unsloth are the people who made this
16:36 book. Unsloth is an open-source project.
16:38 You can find them on GitHub. They'll be
16:41 linked in the description. And they have
16:45 built a custom training framework and
16:48 even kernels for a bunch of different
16:51 types of GPUs. But just being able to
16:54 tune models in a more memory efficient
16:57 way. Even though we are on a DGX Spark
17:00 which has a lot of resources available,
17:02 Unsloth has even made it possible to
17:07 tune models on even lower-end hardware.
17:09 But since we have really good hardware,
17:12 we're going to actually use a midsize
17:14 model. So the model we're going to be
17:16 working with today is actually going to be
17:24 GMA 3 4B instruction model. This is
17:27 going to be a use case fine-tune where
17:30 we are going to do something with this
17:32 model to make it better for what we
17:35 specifically need. just going over the
17:38 high level of what this whole document
17:41 is supposed to be doing for us is we are
17:43 going to go through this step by step
17:46 and I will share a link to this exact
17:49 file so you can run this fine-tuning as
17:52 well but the the detail that really kind
17:56 of matters here is the data set and so
17:59 there is a data set here of basically
18:03 support tickets IT support tickets where
18:04 a user is complaining complaining about
18:07 something or a bug has occurred or there
18:10 was an issue and then there is a
18:14 recommendation to resolve this. Now this
18:17 can be this particular data set is
18:19 actually I'd say pretty generic, right?
18:21 Maybe there's specific verbiage and
18:23 processes that are outlined in this data
18:27 set. But in real life, you probably have
18:29 some kind of use case or repetitive
18:32 input output framework or standard
18:35 operating procedure that just generally
18:38 models don't capture. And the only
18:40 alternative to that is to pollute them
18:43 with a lot of context that then gives
18:46 you less ability to put more tokens to
18:49 the output. And then when you allow
18:50 fine-tuning for these models, you can
18:53 have a model inherently and just be
18:56 smarter about a specific domain. This
18:58 data set is just, you know, a very basic
19:00 data set. I believe there are 500
19:02 something examples in this. You don't
19:05 actually even need that much data. Uh
19:07 here's one with a lot more with a
19:10 100,000, but this is more of a general
19:13 QA kind of like check GPT answer like
19:16 explain a turnary operator. uh develop a
19:18 lesson plan. Like these are these are
19:21 questions that every model's base tuning
19:23 is pretty decent with. Uh I'd say very
19:25 decent with nowadays. So training on
19:27 this data set is not really going to
19:30 move the needle for us or you know give
19:32 us a different output than what we would
19:34 expect. To get started, I already have
19:36 this Jupyter notebook running and I'm
19:38 running it obviously on this DGX. So the
19:41 first thing to do is to actually load
19:44 the model in. And at the end of this
19:46 actually, we're even going to have the
19:49 opportunity to output this file as a
19:52 GGUF. So you can then export this file,
19:55 load it into anywhere you can run a
19:57 GGUF, you can now run this fine tune
19:59 model. We're not going to do that in
20:00 this video, but I am going to show you
20:03 the process and how it is legitimately
20:06 one line of code.
20:08 And now that the model is loaded, we can
20:12 just keep going through these steps.
20:14 And then we're going to load this data
20:16 set. And then the most important part of
20:18 a data set is formatting that data
20:21 properly. So I've already gone ahead and
20:23 wrote a little bit of code that it took
20:25 to do that. Here's an example of an
20:27 input snippet, which is where the user
20:29 and the agent have a discussion. And
20:31 then what we want from our output is an
20:34 analysis. What we want is to see this
20:36 kind of output where we begin with an
20:38 analysis and then have a recommendation.
20:40 And maybe in our internal system where
20:42 this model runs that means something
20:46 those those uh headers mean something
20:48 and this is just the formatted output
20:49 from the text conversation. So now the
20:51 first thing is let's get the trainer
20:54 ready. So the trainer is just basically
20:56 taking our data set tokenizing it and
20:59 getting us ready.
21:02 And then we're going to tell uh our our
21:04 training process that we're only
21:05 training on the responses because that's
21:07 the part of the data we actually care
21:09 about. And then let's just make sure
21:11 that things look all right, which I've
21:12 already gone through this. These these
21:15 things look okay. But the main thing is
21:19 first, what does GMA 3 even respond as
21:22 to a basic scenario.
21:24 And you'll see that it kind of gives
21:26 this generic answer of saying, "Oh,
21:28 that's good information. Let's narrow it
21:30 down. Uninstall the current version."
21:32 and it's talking about zoom which by the
21:36 way this scenario sample does not say
21:40 anything to do with zoom. Uh so this is
21:41 just the model hallucinating
21:43 essentially. So you can see that you
21:46 know it doesn't really it doesn't really
21:49 give an answer that is specific to our
21:52 output especially with no analysis
21:55 block. So let's train the model. Now,
21:58 this can take some time, but normally
22:02 you can actually use a service by Google
22:05 called Collab. And Collab allows you to
22:07 run these kind of scripts. Actually, you
22:11 could probably run this script on Nvidia
22:14 T4 GPUs. I can tell you from experience
22:16 because I've done this already on the
22:19 cloud that this particular training job.
22:21 The exact code that I'm running here
22:24 takes about 17 minutes to run on
22:26 average. I've run it about four or five
22:29 times just to get a good idea. Um, so
22:30 I'm going to let this run and we'll see
22:32 what the stats are to train on the
22:36 Nvidia TGX Spark. Okay. And you can now
22:38 see that all 60 of our training steps
22:42 for one epic have run. That was on 504
22:46 pretty chunky uh samples. And of course
22:48 I, you know, cut through the video a
22:49 little bit just to save time. But the
22:51 one thing you can't cut is the actual
22:54 time that it took. So as I said on an
22:57 Nvidia T4 in Collab, uh this would take
23:00 about 17 minutes. Uh but here it only
23:04 took 4.3 minutes, which is awesome. Uh
23:07 that's an incredible time save. Uh, and
23:09 also it lets us iterate much faster,
23:11 which is great because when it comes to
23:13 fine-tuning, you really don't know what
23:15 you're going to get until you do it. Um,
23:17 and so iteration speed is extremely
23:20 important for anyone who's fine-tuned.
23:23 But now we should be able to see, do we
23:26 get the output we expect? So I took the
23:29 same scenario that same scenario that we
23:32 had uh in the previous block where we
23:34 were experiencing a black screen and we
23:38 now get the model giving us the actual
23:40 kind of output that we want. We have an
23:43 analysis block and a recommendation. Now
23:45 this specific format is not only from
23:48 our data set but so is the content. And
23:50 I think that is the important part
23:52 because a lot of people would say, "Oh,
23:53 why don't you just have a system prop
23:55 that just says break it out like this?"
23:57 It's not only the format. Sometimes,
24:00 sometimes it is the output. And most
24:01 times it's actually both. For anyone
24:04 who's fine-tuned models, that's a pretty
24:06 obvious thing. Uh, but you know,
24:08 obviously there's more than one
24:10 scenario. And so we can do this again on
24:12 a different scenario that we haven't
24:14 tested on before. Um, and you can see
24:16 that we still get that same format. We
24:18 still get that output. Now of course if
24:21 we don't have direct output that we were
24:23 trained on then of course the model is
24:26 going to try to answer the question
24:29 anyway but still within its new domain
24:32 expertise and while keeping that format.
24:34 So now we have a model that works the
24:37 way we want. We took gamma 34B which is
24:40 a great model for its base and gives an
24:44 okay answer right when we use it on when
24:46 we use it totally untrained just
24:49 straight from Google. But when we apply
24:52 just this 500 sample data set we're able
24:55 to get the answers the way we want and
24:58 now we have our own version of gamma 3.
25:00 Now of course you're going to want to
25:01 take that and put it different places.
25:04 Obviously, like this specific model is
25:06 not very useful if it's stuck in this
25:08 notebook. And so to do this, you can
25:11 just do model save pre-trained GGUF. You
25:16 can do it at F16, Q4, Q8, Q5, or a whole
25:18 bunch at the same time. If you're try,
25:20 if you know that this is a good model,
25:21 but you're going to want to offer it at
25:24 different quality levels, great. You can
25:26 do that. You can save all of those to a
25:29 GGUF completely compiled and then take
25:31 that and put it in your software of
25:32 choice. And that software of choice
25:35 could very well be anything LLM. We have
25:38 a way to import GGUFS if you want. That
25:41 is the kind of software overview and
25:44 demo for the DGX and for specific this
25:48 fine-tuning use case. It is definitely
25:50 worth saying though that this is the
25:52 version that Nvidia will be selling off
25:55 of their website. And from what I do
25:57 understand is that there are other OEMs
25:58 out there who are going to be utilizing
26:02 this GB10 chipset but in their own OEM
26:04 form factor. But just know that this is
26:06 not the only way that this device and
26:08 its hardware may be presented to you if
26:10 you're interested in them. And I'm sure
26:12 like we see with graphics cards, there's
26:13 different form factors and packages and
26:15 all of that stuff. So hopefully this
26:18 demo of showcasing the kind of early
26:21 access Nvidia DGX Spark was useful, gave
26:23 you some insight into maybe some
26:24 practical uses. I mean, we did run
26:26 anything LLM just to get some benchmarks
26:28 on some models. Um, we did do some
26:30 fine-tuning just to showcase that there
26:32 are some performance improvements there.
26:35 It's just a nice dedicated device. And I
26:36 think what I'm going to do personally
26:38 with this device is I'm actually going
26:41 to set it up probably at my house first,
26:44 but then maybe move it into an office.
26:46 um just to have a dedicated centralized
26:49 but still local kind of landbased
26:50 inference service that I can just use
26:53 for whatever it is that I want to use.
26:56 Most tools allow you to just slap in a
26:57 endpoint that you can use like if you're
27:00 using something like a continue.dev or a
27:02 void editor or some other or even like
27:04 one of the clawed codes out there that
27:06 allow you to put your own inference URL
27:08 to run your own coding models. probably
27:10 just load this up with a coding model
27:12 honestly and just use this and save some
27:14 money on a cursor or whatever it is you
27:16 might be using. But that's it for now.
27:18 If you're interested in anything LLM or
27:19 an open source project, you can star us
27:21 on GitHub. Uh if you want to use a
27:22 desktop app, it's free to use and free
27:25 to download. If all you have is a phone,
27:27 uh we actually now have a phone app,
27:30 anything LLM mobile for Android and you
27:31 can download it and run small language
27:33 models that and still get utility out of
27:36 them on device. And actually, if you do
27:37 have something like this, you can
27:39 actually hook your phone up uh to use
27:41 this endpoint instead and get a really
27:44 powerful experience just on mobile. So,
27:46 whatever you want to do, I think this is
27:47 a good fit. But thanks for watching. Bye.