0:02 2 years ago, I said AI is the future of
0:03 rendering, but I'm honestly surprised
0:05 how far we've come since then. AI now
0:08 lets you reimagine your 3D layouts with
0:10 simple prompts and reference images. But
0:12 it doesn't just add textures, lighting,
0:14 and depth of field. It will also
0:16 generate smoke simulations, water
0:18 splashes, and explosive debris based on
0:20 the movement in your scene. That way,
0:22 you can go from rough layout to fine
0:24 rendering in minutes. You can easily
0:25 change the style by swapping out the
0:27 reference image, or you can even use
0:29 multiple reference images for different
0:31 parts in your rendering. We also built a
0:33 custom notepad that allows you to render
0:34 scenes of any length without crashing
0:36 your PC. So, today I'm going to show you
0:38 how you can set this up using free
0:40 open-source tools that [music] run
0:44 entirely on your own computer.
0:45 This video took a lot of time to
0:46 research, and developing these workflows
0:48 was weeks of trial and error. The fact
0:50 that we can share them for free is made
0:52 possible entirely by our amazing Patreon
0:54 supporters. If you want to support our
0:56 work, get access to advanced workflows
0:57 and our amazing Discord community, check
0:59 out the link in the description. So,
1:00 traditional rendering is the process of
1:02 turning a three-dimensional scene into a
1:04 2D representation, a 2D image, for
1:06 example, by calculating how light
1:08 bounces off the surfaces. Ray tracing
1:10 generative AI rendering works
1:11 differently. And that's probably why I
1:13 got a community note when I last used
1:14 that term on X. Instead of
1:16 mathematically simulating light
1:17 transport, a neural network predicts
1:19 what the image should look like based on
1:21 patterns it learned from millions and
1:23 millions of training images. We feed in
1:24 additional information like depth map,
1:27 outlines or post data from our 3D
1:28 scenes, and these are called control
1:31 nets. When we then add a reference image
1:32 and a prompt, the model is able to
1:34 interpolate the style of the reference
1:37 image over the duration of the shot. The
1:38 challenge was to find a model that
1:41 adheres to the scene geometry precisely,
1:43 but still has the freedom to generate
1:45 new detail and understands the reference
1:47 image good enough that is able to
1:49 generate new scene information in new
1:51 areas that were obstructed previously.
1:53 And we were very close to giving up
1:54 because every model we tested only had
1:56 some parts of these functionalities. But
1:58 then we found this model merge by inner
2:00 reflections. Inner reflections combined
2:03 two different video models. Skyreel's
2:05 reference to video is designed to really
2:07 understand reference images. You can
2:09 load in references for characters,
2:11 backgrounds, and styles, and then the
2:12 model will merge them all together,
2:15 creating your final scene. The problem
2:17 is it can't understand control nets, so
2:20 we can't feed it our 3D geometry. One
2:21 vase solves this problem. You can feed
2:23 it control nets and it's able to follow
2:24 them pretty precisely. The problem is
2:26 that vase's reference feature is just
2:28 not as good. Though when the camera
2:30 moves to a new area of the scene, Vase
2:32 is not able to generate new detail here
2:34 that matches the style and vibrance of
2:37 the original reference. The merge model
2:39 completely fixes that because Skyre's
2:41 references just work so much better, but
2:43 they still work in tandem with the
2:45 control nets. It's just like the model
2:46 understands two languages. Now, before
2:48 we can run another scene, we need to
2:50 create our control net passes in our 3D
2:52 software. And I recommend two. The
2:54 outline pass is good if you want to
2:56 preserve the exact composition, but give
2:58 the model freedom to generate new detail
3:00 between these lines. The depth pass is
3:01 better if you need the model to follow
3:03 your geometry more precisely. I
3:05 recommend exporting both. That way, you
3:07 can test out which one works better for
3:08 your scene. You can also merge them
3:10 together to have the best of both
3:12 worlds. You can use any tune shader with
3:14 outlines in any 3D program to create the
3:16 outline pass. In Blender, I used the
3:18 freestyle tool in the past, but that can
3:20 be super slow for some reason. So
3:22 instead, I recommend changing the render
3:25 engine to workbench, selecting flat
3:29 lighting, go to color, single, and make
3:32 it black and then activate outlines and
3:34 make them white. But you can see that it
3:36 only does outlines for one object at a
3:38 time. So if you need more detail, I
3:40 recommend activating freestyle. You can
3:42 find the freestyle settings in the view
3:45 layout tab. Scroll down here, activate
3:47 as render pass. And if we now render an
3:49 image, we can come over to the
3:52 compositing tab, use nodes, create a
3:54 viewer node, connect the freestyle, and
3:56 you can see the outlines. They are
3:57 looking good, but they are black. So for
4:00 this, go to view layer, scroll down to
4:03 the settings. Let's change the freestyle
4:05 color to white. Render the image again.
4:07 Maybe we can make them a little bit
4:10 thinner even. Let's try two. And we can
4:14 just deactivate use alpha. Now create a
4:16 file output node and connect it. And
4:18 that's it for the outline pass. Next,
4:20 let's set up the depth pass. For this,
4:22 we go to view layer, activate Z. Now we
4:24 need to render the image again. Connect
4:27 the viewer to the new depth output.
4:29 Let's add a normalize node to use it as
4:31 a control net. We need to invert it.
4:34 After that, I just added this curve node
4:36 here and just pretty much increase the
4:37 contrast. So you can see on the
4:39 character, we now have more separation
4:41 of the character's depth. create another
4:43 output node, connect it, and save out
4:45 your image sequence. Another thing you
4:47 can do is just render out your layout,
4:48 whatever you have in your viewport, and
4:51 then use our free AI preparation
4:53 workflow to create the outlines and the
4:54 depth map. But keep in mind that these
4:56 are just approximations and will not be
4:59 as good as the rendered ones. We have
5:01 two options for guiding the style of our
5:03 rendering. We can create one reference
5:05 start frame or we can throw in multiple
5:07 references. Let's start with the first
5:09 option as it renders a bit faster. The
5:12 easiest is probably just to use CHBT or
5:13 Nano Banana or whatever you have access
5:16 to. Load in your start depth or outline
5:18 image and describing what you want.
5:19 Something like this. And this worked
5:21 perfectly well. But there are also some
5:23 amazing local options. So let's switch
5:25 over to CompuI. I think one of the best
5:27 models for that currently is Z image
5:29 Turbo together with the new control nets
5:31 for it. And I created this free workflow
5:32 for you that lets you transform your
5:35 start image. Just drag and drop it into
5:37 Confui. Go to manager, install missing
5:39 custom nodes. if you have any red notes
5:41 and then you need to download these
5:44 models that you can find right here.
5:45 Download them, put them in the right
5:47 folders and make sure that they are
5:50 loaded right here. Then you can come
5:52 over here and just drag and drop in your
5:54 control image for the first frame of
5:56 your sequence. And I'm choosing the
5:58 outline pass. If you already combined
6:00 all these images into a video sequence,
6:02 you can also come down here and upload
6:05 that right here. This will then load
6:07 only the first frame. if you activate
6:09 the option right here. Next, we need to
6:11 create a prompt right here. And I like
6:12 to start simple and see what it gives
6:15 me. Before we run this, let's come down
6:17 here. This is where you set the control
6:20 net strength. And I usually recommend
6:22 going as low as possible as this will
6:24 give you the maximum quality. Like let's
6:28 try something like this. Click run. And
6:29 this looks pretty cool. You can see the
6:31 tail is now in a different position than
6:33 it is in the control net image, but that
6:35 is usually not a problem. This is just a
6:38 reference. It doesn't need to be 100%
6:40 perfect. So, this is good enough. Let's
6:42 create another image in another style.
6:45 For example, let's try this anime style.
6:48 This is looking really cute. If you like
6:49 the vibe, but want to try out more
6:51 options, you can also just change the
6:53 seat right here. Yep, really cute. And
6:55 it got the tail right this time. So,
6:57 let's render our scene. Now, I like to
6:58 first convert all these images into
7:02 H.264 uh video sequences just so it's a
7:03 bit easier to handle. For this, I'm
7:05 using a setup like this. You can use of
7:07 course any editing program, but you can
7:10 also do it in comi. Create a load images
7:12 node and then a video combine node from
7:14 the video helper suit. Select the frame
7:17 rate. In my case, it's 24 frames. Give
7:20 it a name and change the format to this
7:22 one right here. Click run. And here's
7:24 our video. Let's do the same thing for
7:26 the outlines. Just copy this in here.
7:28 Let's now install the main workflow, the
7:30 free AI renderer. Installing this is
7:31 very easy. Just drag and drop the
7:34 workflow into the Confui interface. Open
7:36 the Confui manager. Click install
7:39 missing custom nodes and install all of
7:41 them. Once it's done, restart Confui.
7:42 Next, you need to download the models.
7:44 And you can find all the download links
7:46 in the notes on the left side of the
7:48 workflow right next to the corresponding
7:50 model loaders. To help you get set up,
7:52 we've also created a free guide that you
7:54 can check out on Patreon and our
7:56 website, along with a detailed
7:58 step-by-step video installation
7:59 tutorial. If you don't have a powerful
8:01 GPU, it might make sense to run these
8:03 workflows on Runpot. We've prepared
8:05 readyto-use templates that handle the
8:07 setup for you. Plus, if you sign up with
8:08 our link, you'll get a random credit
8:11 bonus between $5 and $500 when you spend
8:14 the first $10 on the platform. Next, we
8:16 come over to the video input up here.
8:18 You can set the resolution for our
8:20 lizard. We're using a square format. So,
8:23 I'm going to select this one right here.
8:26 And we don't want to skip any frames.
8:28 Just drag and drop the outline pass
8:31 right here. And I'm also going to drag
8:34 in the depth pass right here. This note
8:36 will actually blend them together if you
8:38 want that. But for now, I want to test
8:41 it with only the outline pass as it
8:43 gives the video generation process more
8:45 freedom. So I will disable this option
8:48 right here. When we now click run, you
8:50 can see it actually just loads in the
8:52 first one. If we activate it, it would
8:54 look something like this. Up here, you
8:55 can also change the blend value. Next,
8:58 come here and drag and drop in your
9:00 reference image. Let's start with the
9:02 realistic one. And next, you can just
9:03 create a simple prompt like this
9:05 describing the shot that you want to
9:07 create. Make sure to add in little
9:08 details, like for example, if you want
9:10 to see wrinkles on the suit, just add
9:12 that in. You can also copy this prompt
9:14 over to any large language model and
9:16 refine it there. And that's it for the
9:18 workflow. Now, you can just click run
9:20 and wait for it to finish. And this is
9:23 looking really good. Let's quickly try
9:25 out the anime image that we created
9:27 earlier just to see how it compares. And
9:29 I just copied over this prompt to claude
9:32 and just said change the style to anime.
9:34 Well, and this worked really nice. I
9:36 love how to edit details. Like for
9:38 example, it's blinking now. But as I
9:39 said, you can use this workflow not just
9:41 with a start image, but you can also
9:43 combine multiple reference images. But
9:44 to demonstrate that, let me show you
9:46 this other scene here. I created this
9:48 one just by downloading Migamo
9:50 animations and putting that inside of
9:54 Blender. I created my outline pass like
9:56 this. And then we can create our
9:57 characters. I like to do that by
9:59 creating character sheets. You can use
10:02 any AI image generator to do that, but
10:04 because I showed you this already, I'm
10:06 going to load in the Z image Turbo
10:08 control net workflow. And I'm just going
10:11 to deactivate this control net right
10:13 here. Create this empty latent node and
10:15 plug it into the latent image. So, this
10:18 is basically just the normal Z image
10:20 workflow. Now, we can create a prompt
10:23 like this. Let's run that. That's a
10:25 pretty cool fox. Let's create another
10:28 character. Actually, let's just reuse
10:30 one of the lizards that we already have.
10:32 Maybe this lizard right here. And now we
10:34 activate the second reference by
10:36 selecting these two nodes and clicking
10:39 CtrlB. Once two of these image inputs
10:41 are selected, the workflow automatically
10:43 switches from start image mode to
10:45 reference image mode. So, all you need
10:46 to do is just also drag and drop in the
10:49 image of the fox. And then I'm
10:51 activating the third reference because
10:53 we need an environment. Okay, I found
10:55 this futuristic city with a river. Next,
10:58 I'm adjusting the frame load cap to 81.
11:00 And I'm also going to switch the format
11:02 to 720p
11:04 16x9. Prompting for this reference
11:06 approach is pretty easy. You just need
11:08 to tell the model what to do in natural
11:10 language. So, a prompt could look like
11:12 this. Two characters are dancing in a
11:14 futuristic city. They are standing in a
11:17 shallow river. On the left side of the
11:19 image, a humanoid lizard wearing a suit
11:21 and holding a brown leather bag is
11:23 jumping up and down. On the right side,
11:26 a humanoid fox wearing sunglasses and
11:28 green thief's clothing is dancing. Yep,
11:30 and that just worked super well. It
11:32 combined all these elements into one
11:34 video. But you can see the outfit of the
11:35 fox changed a little bit, and you can
11:37 usually improve that by being more
11:39 specific in the prompt. So, this
11:41 workflow can generate up to 120 frames
11:44 at 24 frames per second. So, around 5
11:46 seconds of video. You can actually go a
11:47 bit higher than that, but eventually
11:50 you'll run out of VRAMm or the quality
11:52 will degrade too much. But if you
11:53 support us on Patreon, you can not only
11:55 get all the example files and test
11:57 renderings we created during the
11:58 creation of this video, you can also get
12:00 your hands on the advanced version of
12:02 this workflow. This one includes a
12:04 custom node pack that automatically
12:07 splits your video into batches and runs
12:10 them one iteration after another, taking
12:11 the last frames of the previous
12:14 iteration as start frames, ensuring that
12:16 it's a super smooth and consistent
12:18 video. And you can see it looks actually
12:20 really similar because all the changes
12:22 are under the hood if you go into the
12:24 sampler subgraph here. So all you need
12:27 to do is use it the exact same way. Let
12:29 let me load in this test shot right
12:32 here. This is a very very very
12:33 challenging shot. We have a lot of
12:35 detail. We have a creature with a lot of
12:37 tentacles. The camera is moving. So,
12:39 let's see what this will create. Let's
12:41 go back to the top here. I'm going to
12:45 set the resolution to 720p 16x9.
12:48 And let's load in around maybe 300
12:50 frames. Something like this. We're not
12:52 blending it with the depth map at this
12:55 time. This is our start frame. So, we
12:58 have a stormy ocean. And then this is
13:00 our prompt. Now we have a few more
13:03 options below here. For example, you can
13:06 set how many frames should be created
13:08 per iteration. And I recommend depending
13:13 on your GPU, keep this between 41 and
13:16 121. 81 is a sweet spot. The number of
13:18 start frames is important for the
13:20 blending process. You see, once you hit
13:22 generate, it will generate the first 81
13:24 frames. And this is set so it will
13:28 actually use the last 11 frames as start
13:30 frames for the next generation ensuring
13:32 that the transition is super smooth. Now
13:34 if you already generated a sequence with
13:36 this shot and for example just in the
13:38 ending something is off you can actually
13:41 start this workflow again resuming from
13:43 a later iteration. Down here you can set
13:45 the total iterations. And for our longer
13:48 shot we need actually we need five. Okay
13:50 I'll shorten it a little bit. I'll take
13:53 I I'll do four. And finally below that
13:55 you have the seed. And yep, that's
13:57 that's all you need to to set. And
13:59 here's the final shot. Look at these
14:00 amazing effects. Look at the
14:02 consistency. And look at these water
14:04 splashing. That's so cool. So here are
14:06 some more variations with other start
14:08 frames. For example, this mossy forest
14:10 or this anime style. I think the speed
14:12 here is the real benefit. You can
14:14 explore 10 visual directions in the time
14:16 it takes up to set one traditional
14:17 render. But of course, you're trading
14:19 control for speed. And the physics are
14:21 unsimulated. meaning sometimes they look
14:23 really convincing and sometimes they are
14:25 a bit off and all you can do is just run
14:27 the workflow again changing the seat or
14:29 adjusting the prompt. Still, I think you
14:31 can reach really impressive quality with
14:33 this open-source solution and for rapid
14:36 prototyping or previous work, this is
14:38 really amazing. You can also do many
14:39 other things with this workflow which I
14:41 will show in future videos. So,
14:42 [clears throat] make sure to subscribe.
14:44 But that's it for this one. If you
14:45 create anything with these workflows,
14:47 feel free to tag me in your work or show
14:49 it to me on Discord. I always love to
14:51 see what you come up with. And huge
14:53 thanks to our amazing Patreon supporters
14:55 who make these deep dives possible.
14:57 Thanks for watching and see you next time.