YouTube文字起こし：
We Built an AI Render Engine for FREE

動画を最後まで見なくてOK。完全な文字起こしを取得し、キーワード検索やワンクリックコピーができます。

AutoDub

YouTube外国語動画を理解

没入型YouTube日本語吹き替え

言語の壁を越えて、世界の優良コンテンツを楽しもう

無料で使う

動画の文字起こし

動画の要約

Summary

Core Theme

This content introduces a revolutionary AI-powered rendering workflow that significantly accelerates the process of transforming 3D layouts into high-quality 2D visuals, enabling rapid style exploration and content generation using free, open-source tools.

Mind Map

クリックして展開

クリックしてインタラクティブなマインドマップを確認

2 years ago, I said AI is the future of

rendering, but I'm honestly surprised

how far we've come since then. AI now

lets you reimagine your 3D layouts with

simple prompts and reference images. But

it doesn't just add textures, lighting,

and depth of field. It will also

generate smoke simulations, water

splashes, and explosive debris based on

the movement in your scene. That way,

you can go from rough layout to fine

rendering in minutes. You can easily

change the style by swapping out the

reference image, or you can even use

multiple reference images for different

parts in your rendering. We also built a

custom notepad that allows you to render

scenes of any length without crashing

your PC. So, today I'm going to show you

how you can set this up using free

open-source tools that [music] run

entirely on your own computer.

This video took a lot of time to

research, and developing these workflows

was weeks of trial and error. The fact

that we can share them for free is made

possible entirely by our amazing Patreon

supporters. If you want to support our

work, get access to advanced workflows

and our amazing Discord community, check

out the link in the description. So,

traditional rendering is the process of

turning a three-dimensional scene into a

2D representation, a 2D image, for

example, by calculating how light

bounces off the surfaces. Ray tracing

generative AI rendering works

differently. And that's probably why I

got a community note when I last used

that term on X. Instead of

mathematically simulating light

transport, a neural network predicts

what the image should look like based on

patterns it learned from millions and

millions of training images. We feed in

additional information like depth map,

outlines or post data from our 3D

scenes, and these are called control

nets. When we then add a reference image

and a prompt, the model is able to

interpolate the style of the reference

image over the duration of the shot. The

challenge was to find a model that

adheres to the scene geometry precisely,

but still has the freedom to generate

new detail and understands the reference

image good enough that is able to

generate new scene information in new

areas that were obstructed previously.

And we were very close to giving up

because every model we tested only had

some parts of these functionalities. But

then we found this model merge by inner

reflections. Inner reflections combined

two different video models. Skyreel's

reference to video is designed to really

understand reference images. You can

load in references for characters,

backgrounds, and styles, and then the

model will merge them all together,

creating your final scene. The problem

is it can't understand control nets, so

we can't feed it our 3D geometry. One

vase solves this problem. You can feed

it control nets and it's able to follow

them pretty precisely. The problem is

that vase's reference feature is just

not as good. Though when the camera

moves to a new area of the scene, Vase

is not able to generate new detail here

that matches the style and vibrance of

the original reference. The merge model

completely fixes that because Skyre's

references just work so much better, but

they still work in tandem with the

control nets. It's just like the model

understands two languages. Now, before

we can run another scene, we need to

create our control net passes in our 3D

software. And I recommend two. The

outline pass is good if you want to

preserve the exact composition, but give

the model freedom to generate new detail

between these lines. The depth pass is

better if you need the model to follow

your geometry more precisely. I

recommend exporting both. That way, you

can test out which one works better for

your scene. You can also merge them

together to have the best of both

worlds. You can use any tune shader with

outlines in any 3D program to create the

outline pass. In Blender, I used the

freestyle tool in the past, but that can

be super slow for some reason. So

instead, I recommend changing the render

engine to workbench, selecting flat

lighting, go to color, single, and make

it black and then activate outlines and

make them white. But you can see that it

only does outlines for one object at a

time. So if you need more detail, I

recommend activating freestyle. You can

find the freestyle settings in the view

layout tab. Scroll down here, activate

as render pass. And if we now render an

image, we can come over to the

compositing tab, use nodes, create a

viewer node, connect the freestyle, and

you can see the outlines. They are

looking good, but they are black. So for

this, go to view layer, scroll down to

the settings. Let's change the freestyle

color to white. Render the image again.

Maybe we can make them a little bit

thinner even. Let's try two. And we can

just deactivate use alpha. Now create a

file output node and connect it. And

that's it for the outline pass. Next,

let's set up the depth pass. For this,

we go to view layer, activate Z. Now we

need to render the image again. Connect

the viewer to the new depth output.

Let's add a normalize node to use it as

a control net. We need to invert it.

After that, I just added this curve node

here and just pretty much increase the

contrast. So you can see on the

character, we now have more separation

of the character's depth. create another

output node, connect it, and save out

your image sequence. Another thing you

can do is just render out your layout,

whatever you have in your viewport, and

then use our free AI preparation

workflow to create the outlines and the

depth map. But keep in mind that these

are just approximations and will not be

as good as the rendered ones. We have

two options for guiding the style of our

rendering. We can create one reference

start frame or we can throw in multiple

references. Let's start with the first

option as it renders a bit faster. The

easiest is probably just to use CHBT or

Nano Banana or whatever you have access

to. Load in your start depth or outline

image and describing what you want.

Something like this. And this worked

perfectly well. But there are also some

amazing local options. So let's switch

over to CompuI. I think one of the best

models for that currently is Z image

Turbo together with the new control nets

for it. And I created this free workflow

for you that lets you transform your

start image. Just drag and drop it into

Confui. Go to manager, install missing

custom nodes. if you have any red notes

and then you need to download these

models that you can find right here.

Download them, put them in the right

folders and make sure that they are

loaded right here. Then you can come

over here and just drag and drop in your

control image for the first frame of

your sequence. And I'm choosing the

outline pass. If you already combined

all these images into a video sequence,

you can also come down here and upload

that right here. This will then load

only the first frame. if you activate

the option right here. Next, we need to

create a prompt right here. And I like

to start simple and see what it gives

me. Before we run this, let's come down

here. This is where you set the control

net strength. And I usually recommend

going as low as possible as this will

give you the maximum quality. Like let's

try something like this. Click run. And

this looks pretty cool. You can see the

tail is now in a different position than

it is in the control net image, but that

is usually not a problem. This is just a

reference. It doesn't need to be 100%

perfect. So, this is good enough. Let's

create another image in another style.

For example, let's try this anime style.

This is looking really cute. If you like

the vibe, but want to try out more

options, you can also just change the

seat right here. Yep, really cute. And

it got the tail right this time. So,

let's render our scene. Now, I like to

first convert all these images into

H.264 uh video sequences just so it's a

bit easier to handle. For this, I'm

using a setup like this. You can use of

course any editing program, but you can

also do it in comi. Create a load images

node and then a video combine node from

the video helper suit. Select the frame

rate. In my case, it's 24 frames. Give

it a name and change the format to this

one right here. Click run. And here's

our video. Let's do the same thing for

the outlines. Just copy this in here.

Let's now install the main workflow, the

free AI renderer. Installing this is

very easy. Just drag and drop the

workflow into the Confui interface. Open

the Confui manager. Click install

missing custom nodes and install all of

them. Once it's done, restart Confui.

Next, you need to download the models.

And you can find all the download links

in the notes on the left side of the

workflow right next to the corresponding

model loaders. To help you get set up,

we've also created a free guide that you

can check out on Patreon and our

website, along with a detailed

step-by-step video installation

tutorial. If you don't have a powerful

GPU, it might make sense to run these

workflows on Runpot. We've prepared

readyto-use templates that handle the

setup for you. Plus, if you sign up with

our link, you'll get a random credit

bonus between $5 and $500 when you spend

the first $10 on the platform. Next, we

come over to the video input up here.

You can set the resolution for our

lizard. We're using a square format. So,

I'm going to select this one right here.

And we don't want to skip any frames.

Just drag and drop the outline pass

right here. And I'm also going to drag

in the depth pass right here. This note

will actually blend them together if you

want that. But for now, I want to test

it with only the outline pass as it

gives the video generation process more

freedom. So I will disable this option

right here. When we now click run, you

can see it actually just loads in the

first one. If we activate it, it would

look something like this. Up here, you

can also change the blend value. Next,

come here and drag and drop in your

reference image. Let's start with the

realistic one. And next, you can just

create a simple prompt like this

describing the shot that you want to

create. Make sure to add in little

details, like for example, if you want

to see wrinkles on the suit, just add

that in. You can also copy this prompt

over to any large language model and

refine it there. And that's it for the

workflow. Now, you can just click run

and wait for it to finish. And this is

looking really good. Let's quickly try

out the anime image that we created

earlier just to see how it compares. And

I just copied over this prompt to claude

and just said change the style to anime.

Well, and this worked really nice. I

love how to edit details. Like for

example, it's blinking now. But as I

said, you can use this workflow not just

with a start image, but you can also

combine multiple reference images. But

to demonstrate that, let me show you

this other scene here. I created this

one just by downloading Migamo

animations and putting that inside of

Blender. I created my outline pass like

this. And then we can create our

characters. I like to do that by

creating character sheets. You can use

any AI image generator to do that, but

because I showed you this already, I'm

going to load in the Z image Turbo

control net workflow. And I'm just going

to deactivate this control net right

here. Create this empty latent node and

plug it into the latent image. So, this

is basically just the normal Z image

workflow. Now, we can create a prompt

like this. Let's run that. That's a

pretty cool fox. Let's create another

character. Actually, let's just reuse

one of the lizards that we already have.

Maybe this lizard right here. And now we

activate the second reference by

selecting these two nodes and clicking

CtrlB. Once two of these image inputs

are selected, the workflow automatically

switches from start image mode to

reference image mode. So, all you need

to do is just also drag and drop in the

image of the fox. And then I'm

activating the third reference because

we need an environment. Okay, I found

this futuristic city with a river. Next,

I'm adjusting the frame load cap to 81.

And I'm also going to switch the format

to 720p

16x9. Prompting for this reference

approach is pretty easy. You just need

to tell the model what to do in natural

language. So, a prompt could look like

this. Two characters are dancing in a

futuristic city. They are standing in a

shallow river. On the left side of the

image, a humanoid lizard wearing a suit

and holding a brown leather bag is

jumping up and down. On the right side,

a humanoid fox wearing sunglasses and

green thief's clothing is dancing. Yep,

and that just worked super well. It

combined all these elements into one

video. But you can see the outfit of the

fox changed a little bit, and you can

usually improve that by being more

specific in the prompt. So, this

workflow can generate up to 120 frames

at 24 frames per second. So, around 5

seconds of video. You can actually go a

bit higher than that, but eventually

you'll run out of VRAMm or the quality

will degrade too much. But if you

support us on Patreon, you can not only

get all the example files and test

renderings we created during the

creation of this video, you can also get

your hands on the advanced version of

this workflow. This one includes a

custom node pack that automatically

splits your video into batches and runs

them one iteration after another, taking

the last frames of the previous

iteration as start frames, ensuring that

it's a super smooth and consistent

video. And you can see it looks actually

really similar because all the changes

are under the hood if you go into the

sampler subgraph here. So all you need

to do is use it the exact same way. Let

let me load in this test shot right

here. This is a very very very

challenging shot. We have a lot of

detail. We have a creature with a lot of

tentacles. The camera is moving. So,

let's see what this will create. Let's

go back to the top here. I'm going to

set the resolution to 720p 16x9.

And let's load in around maybe 300

frames. Something like this. We're not

blending it with the depth map at this

time. This is our start frame. So, we

have a stormy ocean. And then this is

our prompt. Now we have a few more

options below here. For example, you can

set how many frames should be created

per iteration. And I recommend depending

on your GPU, keep this between 41 and

121. 81 is a sweet spot. The number of

start frames is important for the

blending process. You see, once you hit

generate, it will generate the first 81

frames. And this is set so it will

actually use the last 11 frames as start

frames for the next generation ensuring

that the transition is super smooth. Now

if you already generated a sequence with

this shot and for example just in the

ending something is off you can actually

start this workflow again resuming from

a later iteration. Down here you can set

the total iterations. And for our longer

shot we need actually we need five. Okay

I'll shorten it a little bit. I'll take

I I'll do four. And finally below that

you have the seed. And yep, that's

that's all you need to to set. And

here's the final shot. Look at these

amazing effects. Look at the

consistency. And look at these water

splashing. That's so cool. So here are

some more variations with other start

frames. For example, this mossy forest

or this anime style. I think the speed

here is the real benefit. You can

explore 10 visual directions in the time

it takes up to set one traditional

render. But of course, you're trading

control for speed. And the physics are

unsimulated. meaning sometimes they look

really convincing and sometimes they are

a bit off and all you can do is just run

the workflow again changing the seat or

adjusting the prompt. Still, I think you

can reach really impressive quality with

this open-source solution and for rapid

prototyping or previous work, this is

really amazing. You can also do many

other things with this workflow which I

will show in future videos. So,

[clears throat] make sure to subscribe.

But that's it for this one. If you

create anything with these workflows,

feel free to tag me in your work or show

it to me on Discord. I always love to

see what you come up with. And huge

thanks to our amazing Patreon supporters

who make these deep dives possible.

Thanks for watching and see you next time.

テキストまたはタイムスタンプをクリックすると、動画のその場面に移動できます

ほとんどの文字起こしは5秒以内に完了

ワンクリックコピー125以上の言語内容を検索タイムスタンプにジャンプ

YouTube URLを貼り付け

任意のYouTube動画リンクを入力すると、完全な文字起こしを取得できます

ほとんどの文字起こしは5秒以内に完了

Chrome拡張機能を追加

YouTubeを離れずに文字起こしを瞬時に取得。Chrome拡張機能をインストールすると、動画視聴ページで任意の文字起こしにワンクリックでアクセスできます。

Chromeに追加 — 無料

YouTube、Coursera、Udemyなど主要な学習プラットフォームに対応

文字起こしをすばやく取得：アドレスバーのドメインを変えるだけ！

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube文字起こし結果を準備しています…

YouTube文字起こし：We Built an AI Render Engine for FREE