This presentation introduces SAS Data Maker, a tool designed to generate synthetic data, highlighting its potential to overcome data access, testing, and AI model development challenges by providing a flexible, privacy-preserving, and efficient alternative to real-world data.
Mind Map
クリックして展開
クリックしてインタラクティブなマインドマップを確認
Hello, everyone.
Welcome to the Spotlight Stage.
Thank you guys for taking time for this next presentation.
I'm Mark Demers, the Spotlight Stage host.
I'm glad you're here with us.
Show of hands, who's using synthetic data
or wants to use synthetic data?
Well, then this presentation is for you.
So I'm not going to waste time.
I'm turning it over to Brett Wujek
and to Sundaresh from SAS.
They're going to talk to you about SAS Data Maker.
I'm going to come around with this hand out
and I'm going to scan your badges
and try to not be annoying.
All right, thank you, Mark.
Yeah, and welcome, everyone, to our overview
of how synthetic data, especially generated
in a really easy and convenient manner with SAS Data Maker,
can make a difference for you and your organization
in your AI development efforts.
I was really excited to see those hands go up.
At least, you know, that was a good handful of people there.
Some of the SAS people mixed in, so maybe that didn't count.
But, you know, it's a whole new world with AI these days.
We're all living it.
It's evolving really fast.
And let me pause for a second and just
say when I use the term AI, I am yielding
to kind of the mainstream use of the AI term.
I'm including all sorts of all the analytics
under the full umbrella there.
And when we talk about AI, we know it all starts with data.
You know, having good and sufficient data.
And so what's the problem?
We live in a data-rich world, right?
We have an abundance of data.
We're flooded with data.
The fact is there's still a lot of challenges
with accessing and using data sufficiently.
And I'll get to those in a second.
When we talk with our customers about the concept
of synthetic data and the potential
that it has to bring value to their efforts,
they're really intrigued.
And hopefully, when you were introduced to this this morning,
possibly for the first time in our presentations
on main stage there, it started you thinking about, all right,
should I be taking advantage of this?
How could I use this?
Because there really is a lot of value to it.
And when we talk about synthetic data with our customers,
we hear kind of three main positions on it.
The first is really about the potential
it has for just opening up access to data
and sharing data across the enterprise.
Obviously, there's a lot of privacy issues and protections
on data and regulations to comply
with that kind of keep a lot of that data locked away
from people that really could make use of it in their efforts.
And Harry talked a lot about privacy this morning
and all the issues around that.
And that's very important.
So that's one aspect of it.
Just having some representation of real data
in a synthetic form that is allowed
to be used in all of your AI efforts is very valuable.
The second position we hear a lot
is about the potential to use synthetic data to test
applications and solutions that these organizations have
developed, ensure that they are robust,
be able to create new scenarios and potentially rare events
that they just don't have real data for,
to ensure that their products, their processes,
the decisions made from all of their AI efforts
are robust and behave as expected,
and do so in a way without harming
real people in the process, and do it
in a cost-effective manner.
So some of that downstream use of synthetic data
is really intriguing to our customers.
And then the third area we hear a lot
is really in just the AI development phase itself,
that middle phase of the AI lifecycle,
to kind of unlock some opportunities
to explore different approaches to their products
and their processes and the models
that they develop to innovate, to be
more productive in that phase.
And so a lot of compelling reasons
to kind of turn to synthetic data,
which is why we're really excited to be providing
this new offering, SAS Data Maker, for this.
Now, Sundaresh, when you talk to customers about synthetic data,
you're in a customer advisory role.
You're with customers all the time.
When you talk to them, what's kind of their first reaction
to all of this?
But while customers relate very easily
to concepts of privacy protection and testing
scenarios, they are unsure about how synthetic data
helps model development.
For example, the other day, we talked to a customer.
After listening to us, she still made a comment on the lines
of, but this is still made-up data.
So that notion of fakeness leads to customers
being a little unsure about how to use synthetic data
in model development directly.
But really, enlightened organizations
view things a little differently.
Let's consider a scenario, an example, to see just how.
Imagine that I'm a data scientist at a bank,
helping it make better decisions.
Some of these decisions involve helping
determine whom to approve or decline for a loan.
A loan decision in model poses challenges.
Applicants come from a variety of risk profiles.
And further, macroeconomic changes,
and what we term as portfolio shifts, they affect outcomes.
So what this means is that if I continue
to use historical, original data alone,
very soon my data ceases to be relevant for today's scenario.
It ceases to be relevant for future scenarios and dynamics.
And it's not as accurate as before.
In short, I'll have to wait until I get enough usable data
to obtain a relevant model.
Now, note also that model data might contain bias.