track. And here is Sununo's AI generated extension.
extension. [Music]
[Music]
I would describe this as music from an
airport spa that somebody downloaded off
of Napster in 1999. There's an even more
recent and competitive Generative AI
music outfit from China that I only know
about initially just because they blew
me up so much in my email trying to get
me to promote them on this channel. Ask
and you shall receive, I guess. It's
called Miniax Audio and a lot of
creators seem to be raving about it, but
I'm assuming that they're raving about
it because they're being paid to rave
about it. uh it doesn't really blow my
mind or anything. Anyway, if you tap
directly into the API, it has a feature
that allows you to extend a song the
same way that we did in the Sunno test.
And so technically how this works is
they scan the song that you uploaded and
then fine-tune the model to generate
more of what it had [Music]
[Music]
heard. And now let's feed it the song
that's been encoded with Harmony Cloak
and Poisonify. Ultimately we have this
[Music]
another popular model that has this
extend feature is Meta's music gen and I
had tried that out early on but I might
as well show you what that does as well.
Here is audio fine-tuned on the original unencoded
[Music]
song. And then just like Miniax, when we
upload the version that's been encoded
by Harmony Clo and Poisonify, it hangs
this pretty pretty good. So, as I'm
editing this video, YouTube just
announced that they will be launching a
new feature for YouTube creators where
you can generate your own AI music. So,
there's some very conflicting interests
on the same website. God damn. There are
some challenges, though. First of all,
instrument classification is not only
exclusively used by Generative AI, it's
also used by Spotify to sort out the
recommendation algorithm. Now, my most
abrasive music may be recommended to
people who exclusively listen to
barberhop quartets, which I personally
think is awesome, but I could see how
some musicians might not want that. The
other challenge is efficiency. Harmony
Cloak requires a bunch of high-end
specialized GPUs currently. However, the
team is working on a much more efficient
model that they are testing as I record
this. And then here locally when working
on Poisonify, I've been using two RTX
5080 video cards, which takes about 2
hours per 15,000 iterations on an
18-second file for the adversarial noise
to be unnoticeable to the human ear.
That means that just my upcoming album
is taking around 2 weeks of non-stop GPU
grinding, which is coming in at around
242 kW, which in my case can be mostly
offset by the hot Georgia sun. But it
would cost most people in the US between
$40 and $150 worth of electricity
depending on their location and if they
stagger training sessions into off- peak
hours. So, while this all works really
well, the goal is to make this much more
efficient and scale it to where it can
be offered as an API that's hosted and
processed from top set labs, which by
the way is what voice swap will be
called in the future due to the
diversification of our offerings. But
that way, a willing distributor can
offer the option to AI proof or
poisonify your music when you upload it
to streaming
services. I've made a few videos
covering this, but music distribution is
not exactly great these days. In early
2024, Touncor had falsely accused me of
fake streaming on Spotify, removing my
entire catalog from every store and
streaming service without even notifying
me. Fortunately, this resulted in a lot
of bad press that ended up being noticed
by their parent company. And after a lot
of dick swinging, they restored most of
it, but royally screwed up the metadata.
So, in the last year, while my monthly
listeners have continued to grow on
other platforms, Tounor's mistake
resulted in me losing close to 100,000
listeners per month from my primary
library. Since that happened, I've been
negotiating my catalog with a lot of
different companies, and I came really
close to just selling the entire thing,
but I'm glad that I didn't because I
managed to find a diamond in the rough
in one of the potential buyers. I
started meeting with and pitching my AI
music detection to Jorge Bria, the CEO
of Symphonic Distribution. He seemed
open-minded enough to hear my plan with
Poisonify and Harmony Cloak and has the
resources to potentially incorporate an
API as an optional service for other
musicians in the future should we be
able to combine them into an efficient
process. When you're uploading your
album and you're uploading your release
cover, then you tell us like how much AI
was used to create this album cover. Was
all of it done using AI? Was some of it
done using AI? Or was none of it using
AI? And the same thing at the track
level mostly for us to be able to just
have awareness of it. We're doing that
because we're trying to show responsible
uh thought processes around AI and a lot
of the DSPs haven't yet came out came
out with legitimate guidance on what
they will do in terms of this content.
So this is kind of our way of starting
to inventory and being able to just get
a sense of how much of this is actually
happening within the ecosystem. So now
I'm doing something that I thought that
I may never do again. Finishing and
mastering a new album. My entire
discoraphy will also be randomly encoded
with one of these poison pill methods or
a combination of them or a variation of
them. I've also encoded some of them
with inaudible random adversarial noise
that does absolutely nothing at all. And
the reason I'm not going to tell anybody
which tracks are encoded with what is to
opiscate how this works technically so
it can't be avoided down the line as AI
music companies train new models. We've
covered a whole lot here, but another
thing that I'll cover more extensively
in a future video as I test more devices
is broad protection from AI listening
devices through targeted pressure waves.
If you haven't noticed, everywhere we
go, there's a combination of both smart
and dumb microphone equipped devices
listening to us. Let's say that you came
to see me perform live at a small
intimate concert that I was performing
and I want to play a song for you that I
haven't really finished yet, but I just
kind of want to bounce it off the audience.
Actually, I don't want your Instagram
followers to see it. So, uh you'll just
I can also make it device specific by
playing very specific audio files from
my phone. Alexa, what is 2 + 2? 2 + 2 is four.
Alexa, Alexa,
Alexa,
Alexa, Alexa, Alexa,
Alexa, Alexa.
I'll definitely be exploring this stuff
a lot more in the future, but for now,
I'm just really happy that artists may
soon have a way to push back using
technology without having to depend on
copyright or IP laws, because those
things have utterly failed us in recent
years in regards to AI. And even
expanding on that, I'm glad to be
involved in technology that will someday
give you the option to physically
protect your music or even personal
conversations from being recorded or logged.
logged.
The entire generative AI industry has a
much larger existential problem than
artists or creators using the same class
of technology to defend themselves. They
have to worry about the paro principle,
otherwise known as the 8020 rule. It's
not meant to be precise. It's more of an
estimate. But for example, many people
spend 80% of their time only wearing 20%
of the clothing that they own. Many
businesses get about 80% of their sales
from only 20% of their products. A lot
of software developers notice that 80%
of their bugs are caused by 20% of the
code. In a lot of healthcare systems,
about 80% of resources are used by only
20% of patients. I could keep going with
this. Considering how ubiquitous and
omnipresent the paro principle is in
computer science and data science, think
about this. Have you noticed that AI
image generators got really good really
fast, but are still nowhere near
perfect? In the last 2 years, it would
be hard for the average person to point
out a difference at all in AI image
generators. Even with the insane amount
of hype and investment that's going into
it, they still can't seem to figure out
things like text and hands without using
special tricks or extensions. And music
isn't that much different. When you see
or audio releasing these new versions
with new features, most of the new
features are not in the generative
quality themselves, but in features like
inpainting. And when there is an
improvement in the sound quality, many
times the customer finds themsself
trading customization for quality. For
example, the music may sound more
realistic or clear, but now it's
ignoring most of the text in your
prompt. I suspect that the reason for
this is because these AI models quickly
improved 80% with only 20% of the time,
investment, and work with training them.
And now just getting a 5% improvement on
those models is an expensive,
complicated, and unprofitable grind.
That's why it's possible for a company
like Topset Labs or Voice Swap to be
successful without any sort of runway
investment. Concentrating on the input
data and working with vocalists and
cutting them in financially is way less
expensive and much easier than the trial
and error of retraining bass models over
and over again for voices. And having
artists involved creatively in that
process is also way more efficient than
fine-tuning a vocal model for another
100,000 iterations. The biggest downside
to this business model is that paying
people doesn't seem as sexy to
investors, which is something that
should be sat on and digested for a
while. But you have to cut them in on
the profit and pay them royalties
long-term for the stretch in order for
this to work. Major record labels
figured this out about a 100 years ago.
And we all know that there's no shortage
of greed in that industry. Perhaps a
solid way of thinking about any
generative AI industry in relation to
art or any other creative industry is
like a race car. You could design the
ultimate car and raise incredible
amounts of money to engineer that car
and then make it and it could be the
fastest and sexiest and most efficient
car around, but it will be a tremendous
failure and waste of money if you only
put a tiny little bit of fuel in it and
prevent it from getting to its
destination or finish line. I'm not an
AI hater, not by a long shot. You could
go back into the early days of this
channel or even earlier with my music
and see that I have been fascinated with
generative AI for a very long time
before it became integrated with neural
networks. But the business side of me
firmly believes that developing a useful
tool will pay out much higher than
developing an investment
scheme. This video is sponsored by some
of you, my viewers. In fact, a ton of
research in this video has been paid for
by my viewers through Patreon. So, thank
you for being part of that. And if you
want to join a large, healthy, inspiring
Discord community and have access to my
music and field recordings and audio
production assets, you can join for as
little as $1. Thanks for watching. Keep
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.