This event, "Acceleration AI," focused on the advancements and challenges of Generative AI at the edge, bringing together industry and academic experts to discuss innovations in machine learning, edge computing, hardware-software co-design, and the future of AI.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
Hello everyone and welcome to the sixth
edition of acceleration AI. I'm Yin from
CMC micros systemystems. I am very
pleased to be your host today for this
annual virtual event which I have had
the pleasure of organizing since 2020.
This year workshop is supported by
fabric our latest initiative which I
will talk about a little bit later.
It has been uh inspiring to see the
growth of this workshop over the years
as it continues to bring together a
dynamic and expanding community of
researchers, innovators and experts.
Today we are proud to present an
outstanding group of speakers. Uh these
leaders in their fields will share the
latest developments in machine learning,
edge computing, generative AI and
hardware software codeesign.
I would like to extend a sincere thank
you to our distinguished speaker today
for sharing their time and insights and
to all of you for being here today. Uh
whether you are a professor,
researchers, startup founder or an
industry professional, your
participation is what makes this event
so valuable. Before we begin, please
note that this session is being recorded
and uh will be uh made available soon on
As we dive into today's workshop, uh
let's take a moment to reflect on the
overarching goals of this event. Uh our
mission has been uh is to bring together
experts from both industry and academia
to share the latest trends and
innovation in AI, explore the key
challenges and opportunities in cloud
and edge computing and identify
opportunities in for collaborations and
that can drive us forward.
Additionally, uh from CMC perspective,
we aim to identify the common
infrastructure requirements that will
support the growth and scalability of
this transformative technologies to
better support the Canadian ecosystem.
With this in mind, uh we are set for an
exciting and insightful event as we
navigate the intersection of AI and edge computing.
computing.
So this year workshop shines a spotlight
on generative AI at the edge with a
focus on the latest advancements and
real world challenges in building
efficient cost effective AI solutions
for resource constrained environments.
Our speakers will delve into topics like
model optimization and security
including techniques for fine-tuning uh
models and seamlessly integrating them
with edge hardware. We also explore
development in AI hardware featuring
architectures like risk 5 processors,
analog neural network chips as as well
as FPGAs and how these technologies can
help address environmental challenges
such as radiation effects in harsh
deployment setting. To conclude the day,
we will host a panel session on the
future of HAI where we'll reflect on
emerging opportunities and what lies
ahead for this rapidly evolving
field. Before we dive into the CMC
micros systemystems opening remarks
which includes CMC products and services
in IoT and HAI, let me quickly walk you
through today's agenda with some
housekeeping rules. Uh we have a packed
and exciting lineup of presentations
from our distinguished speakers. Each
speaker brings a unique perspective on
the intersection of AI and edge
computing and each presentation is
scheduled for 20 minutes 15 minutes
hopefully for presentation followed by a
5 minute Q&A session. And this message
is for uh my dear speakers. Please try
to keep your presentation time to 15
minutes. So when you see me appearing
your screen, this means that you need to
wrap up so we can move to the Q&A
session. So let's uh talk a little bit
about the uh the agenda today. uh our
first presentation from uh Kier Poland
from synopsis who will shed the light on
cost effective solution for generative
AI at the edge followed by Davis Sawyer
from NXP semiconductor who will present
secure fine-tuned LLMs for generative AI
at the
edge and professor Warren Ros from
McGill University will present parameter
efficient finetuning of
transformer-based language model using
data at Brunie. I would like to thank
again these three speakers. They've been
long-term contributors to the
workshop and we have a new speaker Borak
from Edge Signal startup here here in
Ottawa. He will be presenting the
implementation of generative AI in edge
environment uh challenges and solution.
We will have a break uh 5 minutes break
and then we will resume the workshop uh
a presentation from Katarina from Nvidia
who will present Nvidia edge AI stock
software and hardware followed by
another contributor long-term
contributor to the workshop professor
Franual Primo from Poly Technique who
will cover risk 5 Polar collaborative
design and of an open source risk 5
multiore processors then we'll have new
presenters from academia Professor
Liechen from University of Saskatchewan
who will cover radiation effects in
conversion network implemented with FPGA
and mitigation technique. Last but not
least, Nirage Matthew from Blue Mind
will switch the gear to cover all analog
neural network processor to deliver
highly efficient and performance AI
inferencing. The news this year is we
invite our distinguished panel panel
director Walter Knights a CEO from EIOT
Canada who will host our panel session
today which cover pioneering the future
of generative AI at the edge challenges
opportunities and innovation. So this is
a high level overview of our agenda
today and now let me give you some news
So as you know uh most of you know
fabric uh our latest initiative is uh
funded by ISET strategic innovation fund
and uh it is managed by CMC micros
systemystems is focused on building a
strong and sustainable semiconductor
ecosystem in Canada which supports
companies developing homegrown
semiconductor technologies encourage
collaboration across industry and help
grow Canada's roles in the global supply chain.
chain. [Music]
[Music]
Public challenge projects uh help
Canadian industry and academia develop
next generation semiconductor processes
products uh with a focus on photonics,
MEMS and quantum. IoT projects drive
innovation in sensors for clean techch
healthcare and telecom. Uh this
initiative uh provides design tools,
methods and prototype fabrication with
up to 50% reinforcement for industry and
full coverage for academia. Um these
challenges uh help straighten uh the uh
Canadian manufacturing supply chain.
The fabric innovation platform offers
tools, technical resources and training
to support a strong talented uh
pipeline, accelerates product
developments and drive world-class [Music]
research. Currently, the public
ecosystem welcomes Canadian
professionals, academic, government and
industry experts who are passionate
about semiconductors. Academics and
students keep access to their CMC
subscriptions like CAD, tools,
fabrication and basecam while gaining
access to extra training and resources through
through [Music]
[Music]
fabric. Now a brief uh introduction of
uh our latest development at CMC micros
systemystem in support for IoT and HI
ecosystem in
Canada. This slide showcase our end to
end IoT development process from concept
to prototype. It begins with project
launch including consultation needs
analysis and partnership. Then we move
to design selecting components and
optimizing the IoT architecture. In
manufacturing we handle supplier
coordination production planning and
quality control and finally our
prototype phase include embedded
software cloud edge integration testing
and the path to volume production. This
is of course high level. So if you need
more details uh we can schedule a quick
meeting and
uh walk you through all what is
available. Um we've developed an open
source customizable IoT sensor platform
with the KitKat PCB design including
schematic layout and bill of materials
all available on fabric github. And uh
this is a Bluetooth low energy mode. So
that supports sensor networks, machine
monitoring and electromechanical sensing
connecting to apps uh for data display
and processing. Uh these demo uh samples
evaluation and we are developing
application across various verticals
with this IoT sensor demonstrators. One
example is the IoT platform for smart
agriculture that enable easy
integration, field testing and real-time
monitoring of environmental parameters
like temperature, humidity and soil
moisture. It supports application in
greenhouse automation, environmental
monitoring, livestock farms and
automatic automatic irrigation and
more. On the edge side, we have we offer
a one-stop shop for development from
concept to prototype. We begin with
conceptualization. Uh defining the
problem goals and addressing hardware
software constraints. Uh we have a large
collection of data sets that we use for
training. Next we move to model
training. Uh depending on the problem we
help our client select the right model
for their application. Uh we have an
infrastructure that allow us to train
this model uh efficiently and we
optimize them for deployment at the
edge. Um and we have some examples to
show here. So for the flow we use for
the edge uh development uh we we use our
infrastructure for training. So we start
with pre-trained models and we fine-tune
them uh on custom data sets. Uh we use
uh mostly uh uh our cluster which is uh
powered by uh Tesla 100 for training and
uh inference testing and we finetune it
on custom data sets. uh then we test the
uh the trained model again and for edge
deployment we use a variety of tools to
compress and optimize these models to be
suitable for edge deployments. Uh we use
various uh uh edge platforms for
deployment. I will show some of them
here. So this is the infrastructure we
we continue improving. Uh on the cloud
side here we have uh the FPGA GPU
cluster where we use mostly GPUs for
training and we use a complete software
stack for optimization and we partner
with uh Enser and Storins who are
building custom inference chips for uh
low power inference uh applications. We
also support Jetson Orurin from Nvidia
for uh most of our IoT demonstrators,
edge AI
demonstrators. Here is one example uh of
the uh IoT demonstrators, the AI
demonstrators that we have built as part
of fabric. So we uh took uh
state-of-the-art uh computer vision
model Euro V9. We train it on a custom
data set. The main objective here is to
enhance work safety through realtime
anomaly detection. This is this is a a
big models that we trained and
optimized. We were able to run at almost
40 frame per second on a jet or in real
time. And this allow for example to
detect workers who don't wear their
safety equipment. And the second uh
example here is a generative AI based
prompt vision for advanced video
analytics. And the system we have
developed uses uh our VIT which is a
powerful vision language model to detect
objects in real time based on natural
language uh prompts. So you just type uh
what you are looking for like helmets or
people with bags and the model instantly
highlight them in the video stream. The
front end application we have developed
show live statistic of for each detected
object uh with uh the time it was
detected and uh its number of occurrence
all while running efficiently on the
jets in orange. So we did a lot of
optimization of this model. So it run
fast in real time and this is a part of
our support and fabric to the uh
Canadian ecosystem who want to uh
integrate edge AI in their application.
Now before we dive into the workshop
topic uh I would like to go through uh
some uh high level uh uh market trends and
and
So we expect uh 29 billion devices
connected by 2023. This is uh an annual
increase of 12%. And the drivers IoT, 5G
and AI these are not buzz word they are
transforming many sectors healthcare
industry for all environment precision
agriculture smart city the data uh we
are generating uh massive sensor data
and the data is doubling every two to
three years and 75% of this data is now
uh currently processed at the edge. This
is a rise from 10% in 2018. This has uh
puts a lot of uh pressure on edge
computing where you need very high
performance low power edge computing
capabilities at the edge in order to
deal with all this data. The edge
transformation as most of you knows 60
to 70% of task automated by genai by
next year and 60% of it is multimodel.
Uh this is a rise from 1% in 2023.
This is extremely fast. Uh it's similar
to switching from a flip phone to a
smartphone overnight. So it's it's
really something that is that industry
is trying to capitalize on its uh
capabilities. The edge AI focus as we
know uh AI is continuously moving from
the cloud to the edge because of these
advantages low latency, bandwidth
saving, data privacy, security and
autonomy. What about energy? 90% of the
power is consumed by data movement. This
is this is a fact and uh this has led to
uh an increase or high demand for
energycentric hardware including uh new
innovative approach on classical
computing and even some uh advancements
in photonics and wideband gap
semiconductor materials. And if you need
to know more about photonics and white
banks gaps semiconductors, we have a
team dedicated to these uh different
technologies also the exploration of
quantum spiking and analog architecture.
We have a presentation from your mind
about analog architecture today. So I'm
looking forward to to know their latest
advancements security and optimization
trustworthiness which uh combine safety
security and reliability and privacy. Um
there is a need for uh standards to
ensure interoperability uh for adoption.
So these are high level trends and I I I
I think the speakers also have some uh
they cover some of these. So we will see
uh if we are aligned here. So back to
the workshop, I would like to start with
uh our first speaker. So kicking off our
lineup is Pierre Polland from Synopsis
with over 30 years of experience in AI,
noral processing and embedded systems.
Pierre has helped shape cutting edge SOC
technologies across multiple industries
and today he'll share his vision on cost
effective solution for generative AI at
the edge. Welcome Pierre. Please share
your screen. Thank you for the
introduction. Let me share. So, hello
everyone. I'm Pier Ple. I'm with
Synopsis. Um, based half the year in
Ottawa, Canada, and the other half in
France where I'm uh where I'm presenting
today. So, normally at 7:00 p.m. I've
had a glass of wine. I have not in in in
respect for this interesting workshop,
but I'll have one to celebrate in 10 or 11
p.m. outline for my talk is a quick
introduction on the latest trends in uh
transformers and uh which are the basis
of generative
AI. My audio has gone quiet. Can anyone
hear me?
We hear you perfectly. Yeah. Okay,
great. I can hear you. Perfect. Uh then
I'll give a very short introduction of a
product we've developed called the NPX6
which is a neural processing unit and
then look at the um uh key features of
these units in order to support these
these transformers and therefore Gen AI
and then we'll look specifically at you
know the challenges of genai mapping to
a neural processing unit like the NPX6
and if I have time a quick
outlook probably that will be for the panel.
panel.
So uh amazing uh change I I entered this
space of vision back in 2010ish. Um we
were working in set top box in my
previous uh job at ST micro electronics
in Europe and France and um at that time
we were doing algorithmic uh you know
applications. So we call that classic
computer vision using DSPs. Um and at
that time the you know the one of the
best al one of the best object detection
was called SIFT. um and that had about a
50% accuracy for the imageet top one
number. Uh the revolution happened uh at
University of Toronto um with AlexNet.
It took a you know quantum leap from 50
to 63% in in in a year or two. Um and
that's really when we move from what I
call the prehistoric vision times to uh
the the age of CNN's which is already
the medieval times uh in in terms of
what we're doing today. And you know if
you've been in this space you know
AlexNet became VGGG ResNet and then kind
of saturated these yellow uh CNN
saturated at kind of 90%
um accuracy which is which is still a
pretty good number but a ton of
innovation over at least 10 years to
achieve that with hundreds of papers uh
hundreds of groups and thousands of
papers. And then the the transformer age
started in in in 2020. And literally
within uh first transformers were
developed in the context of natural
language processing and then there was a
an an application of this approach to
vision and within 6 months they had
already caught up with the you know the
best efficient net uh you know CNN and
we're exceeding that with VIT and
applied to vision.
Um and so you know we're kind of hitting
kind of the the asmtoic limit of you
know the information contained in this
in this imageet. But the key message
here is that literally transformers
revolutionized uh CNN's and beat CNN
results in in less than 6 months and
Um so transformers uh were as I've just
mentioned were developed initially for
natural language processing and that's
you know it's at the basis of things
like net chat GPT um but the really
exciting work because we were working in
vision and the exciting uh uh learning
that happened in 2021 is that
transformers as is uh can be applied to
other domains like with like vision with
very little
modification and that we've discovered
that models that combine attention and
classical you know CNN's convolutions um
u out outperform you know the old CNN's
even for small models and initially they
were quite big but we've seen you know
things like VIT become mobile VIT and
get smaller and more
compact so we truly believe that these
transformers and the attention uh uh at
the at the core of a transformer
is are here to stay. Here to stay,
sorry. And let's give an example of
that. Why do we think they're here to
stay? Let's take a look at what the
state-of-the-art for CNN based
applications, something we call panoptic
segmentation. That's where you have an
image on the left hand side. And the
panoptic segmentation state-of-the-art,
you know, convolution neural network
will then be able to identify uh
instances of different classes of
objects. So cars, for example, you have
a uh the taxi in blue, the uh the
minivan in green. Uh people um and
that's about it in this case. Um beyond
just recognizing the objects, it
recognized different instances, so
they're in different colors. Um and then
it also does semantic segmentation. So
it's not only recognizing the car, but
its exact uh contour. Same thing with
the with the person.
So this was the state-of-the-art uh only
three maybe four years ago uh in
CNN's if you're uh building a autonomic
autonomous driving system this is very
shallow information about what this
scene is now this is an odd scene but
let's take a look at we say take the
same scene and we simply ask the
question what is unusual about this
image and we apply this using
lava and the lava response is the
unusual aspect of the image is a man is
ironing clothes on the back of a yellow
minivan while it is on the road. This is
an unconventional and unsafe place to
perform such an activity as ironing
clothes etc etc requires stable surface
blah blah blah. Ironing clothes and
moving vehicles could lead to potential
hazards for both the person doing the
ironing and the other road users. Now,
if I'm building an autonomous drive
software stack, this is the
interpretation I need in order to react.
Otherwise, the other scene just says
there's a pedestrian, there's two cars,
and doesn't say anything about what's
going on. Now, reading the text, it's
obviously some lawyers were involved in
the training here, but uh it's it's
still quite remarkable um this uh
richness of
interpretation and and we really believe
that this richness is needed for the
next generation of AI and as as we've
discovered in the last year or two.
So let's switch gears a little bit um
and talk a little bit about our neural
processing unit, the
NPX6. Um we started this project uh we
were in Canada and some of my key
architects actually studied with Jeff
Hinton at University of Toronto in the
80s and my hardware architect studied
under uh Yosua Benjio at Mel and so I
got lucky. Uh, I actually resisted this
at at the time and they didn't get fat.
Right. So, 42% of Americans are obese. It's
It's
Yeah, you can keep going. Yeah.
Okay, we're we're good. Yeah, we have
muted some some participants here. Some
some not so pleasant comments about
Americans. That's all I heard. Okay.
Okay.
Um, yes. So, I I got lucky. I two my
software architect and hardware
architect came to me in 2012 and said,
"Pierre, uh, AI is really cool. Look at
these new CNN's." I was a bit skeptical
to be honest. Um, I had worked in AI in
the '90s with, you know, knowledgebased
expert systems. And that was a failure.
Um, but I took a look. I read the papers
and we said, "Okay, let's let's put
together a small task force of, you
know, five, six people." And we built a
small first generation uh processor um
which we we we delivered in 2014. Um and
then we saw in two by 2014 it was clear
this was going somewhere and then we
developed since then uh uh four other
generations of this what we'll call the
blue the blue color here which are based
and optimized for CNN.
um back in 2021 we could or back in 2020
even we could see the importance of
transformers and this new generation so
we made a big skip to uh our next our
sixth generation the the NPX6
um and that's the basis of our current
product and that was a big uh
discontinuity we learned a lot from CNN
of course but it was not flexible enough
uh to to accommodate uh these new
applications and the you know natural
language processing or Swing T and VIT
applications. So the the MPX family in
fact we have three families of cores. We
have a low-end microcontrollers uh that
operate you know below 100 GOPS. We have
a vector DSPs, a general purpose DSP
families that used to be used for
computer vision and now just are are
more general for for vision and they can
do low-end AI applications you know
below one tops and then once we get to
one terops uh we have a fam a scalable
family that starts at the npx6 1k the 1k
means 1,024 max and then you have a 4k
which is 4,96 all the way up to 96k uh
which is 98,000 and Then that's our
largest uh single. So roughly 100,000
max. That's about uh 200 tops. Um and
that's our biggest single single NPU.
And then we can instantiate up to eight
NPUs. And that'll get us beyond 2,000
tops. Um we've introduced this two years
ago. Uh since in those two years, we've
licensed uh over 25 uh leading edge
customers. half of those in automotive
and some of those are at 2,000 tops
today. Our high-end uh automotive
customers, leading edge automotive
customers are uh in the 1,00 to 200
tops, 2,000 tops. And we have other
extremes of um in in vehicle
infotainment at one tops uh or uh low
power digital still cameras uh leading
edge uh consumer applications at one
top. So there's three orders of
magnitude between our low-end and our
architecture. So this is a uh quick
overview of the architecture. Um it's a
scalable architecture. It starts with a
a set of cores shown in yellow here uh
from one to a maximum of 24 cores. Each
core internally um has a uh two key
components. convolution accelerator that
does kind of the uh CNN's and matrix
multiplications. Um and that's uh is
4,96 max can run in integer only or with
a floatingoint unit option. And then
attached to that is a generic tensor
accelerator that does anything that's
not matrix multiplications or or CNN's
um things like activation functions but
a whole whole bunch of other functions
that are not
CNN's. And then finally um we have a
complex multi-level memory hierarchy.
Each core has its own level one memory
inside the core. There are 24 of those.
And then we have a level two shared
memory with a high performance and high
low latency interconnect custom network
on chip um that moves data between the
24 up to 24 cores and the level two
shared memory and of course the external
memory external DRAM memory.
Uh each of these cores has its own local
DMA and we have a top level DMA called
the streaming transfer unit and each of
these cores has its own internal
controller a small risk controller and
the top there's also a couple of
controllers at the top level.
Um even though the block diagram you
know talks a lot about hardware uh this
is a large you know doubledigit team um
and exactly half that team is developing
the tools which is probably the single
biggest challenge even more difficult
than the architecture design. So those
are compilers runtime SDKs simulators
platforms. So key architecture features
one of the the the objectives we gave
ourselves when we moved from our fifth
generation of CNN based machines to this
more general class was really to go
beyond CNN and support things like uh
RNN which was already getting old but
mostly transformers and genai recommener
networks are the uh you know the classes
of of uh of applications that emerge in
2021 and beyond but not only uh moving
beyond AI applications but also the
types of sensors we're using. So,
initially we're mostly focused on
vision. We generalized to multiple
sensor classes like radar and lighter
which are heavily used in automotive
which is about half of our customer
base. The other lesson the hard lesson
was flexibility is is essential
everywhere. Uh we kept on thinking CNN's
were going to stabilize at some point.
Um we were proven wrong every
generation. So we added more flexibility
and in this architecture we went even
much further. So we have a fully
programmable um what we call our generic
tensor accelerator which complements the
CNN. Both are extremely flexible and the
the uh the generic tensor accelerators
is fully
programmable. Uh we also went wider in
data types with integer a 16 integer 4
as well as an option for floating point
16 and brain float
16. So that's on the flexibility side
which is a key objective. The other
objective is continued improvements in
efficiency. Um our uh we've seen a MAC
utilization improvement of about 1.5x to
two based on all our lessons learned
around you know state-of-the-art CNN's
like mobile net efficient net and then
focusing on you know genai like stable
diffusion uh lambda 2 and so on. uh we
also uh brought in sparsity. So uh we
have a form of structured sparity very
similar to what's used on on general
purpose GPUs like Nvidia. Uh those of
you are familiar it's called structured
sparity. You get a somewhere between 1.4
to you know close to 2x performance
increase by using this structured
sparity. Um all the R&D is around
bandwidth reduction. Uh the challenge
there is moving data and that's the
challenge in power and the challenge in
complexity and software tool features is
all around data movement. It's not
around putting down hundreds of thousand
maps. That's the easy part. Putting
memories all over the place is easy.
It's about intelligently moving the data
through the
architecture. And then uh we also
improved latency because it's not only
about getting high throughput using high
batch sizes as was the tricks used in
the you know the 2000s and 20s early
2020s people use high batch size in
automotive it's not about it's not about
throughput it's about latency. It's the
time you detect a pedestrian or a guy
ironing on the back of a van. Um the
time to detect that was mostly important
and not as much the
throughput. And finally, we continue to
do power efficiency improvements based
on uh different uh techniques,
gaming. Um I'm not going to explain this
just to say if you add up all the
different things that happen, you have a
level one risk core doing control, you
have a DMA, you have a convolution
accelerator, your generic tensor
accelerator doing activations and soft
maxes, and then you have your output
DMA. Add all of these activities
together, you have 13 ways, 13 parallel
activities that need to be sequenced and
scheduled. Uh, which gives a hint of
some of the challenges of dealing with
the complexity of these
architectures. So, we've invested
heavily like I said exactly or even a
little bit more than half of our team is
building these compilers and runtime.
So, it takes standard representations
like PyTorch and TensorFlow. We convert
that to the industry standard
representation Onyx. uh we compile that
it generates an execution plan
interpreted by a runtime and then 99% of
this runs on the NPU but you can have
special secret sauce for certain
customers that may not run directly on
the accelerator and can run on our uh
vector DSP family and that is done also
by the tools.
So different use cases is you know
exploration uh we we do we compile onto
virtual platforms like the platform
architect and virtualizer which are
other tools developed in other groups in
synopsis we have function and
performance models we have emulators and
boards so what can you do here's a
simple example we have a YOLO V5 um
you're going to be exploring the impact
of u throughput uh depending on
bandwidth. So you might start off at 250
GB per second as an upper bound on
bandwidth assuming you have expensive
HBM interfaces and you're going to look
at the impact on the frame per second of
of bandwidth. The other dimension here
shown by the colors is you're exploring
sizes of onchip memory. The CSM is our
level two cluster shared memory. So the
purple curve has no onchip uh level two
memory while the yellow curve at the top
has 16 megabyte for this for this
machine. Um and you can see that there's
different trade-offs with more memory.
Uh you're less sensitive to bandwidth
because you can store more data on chip
and therefore not as sensitive to DAM.
And you can see that let's say you had a
target of 500 frames per second. Well,
basically there are quite a few data
points that meet that target and
therefore you can trade off well do I
want to spend more money on bandwidth uh
which has you know cost and power
impacts or do I want to spend more money
on memory uh in order to reduce
bandwidth and therefore with the you
know the green curve for example which
is a nice trade-off you can see you can
achieve 600 frames per second with 32
GBTE per second as a nice trade-off of 8
mgabyte or even going down to the red
curve which is 4 megabyte and still
achieving just above 500 frames per
second. So the tools allow you to do
this automatically and that's a key
thing here because doing this manually
is a non-starter. The complex of these
machines do not allow you to do even one
of these 25 data points in you know less
than days and weeks. Well, this can be
done in a couple of minutes on our exploration
tool. So I talked about transformers and
I'm going to go fast. I think we're
running out of time. Um, just to say
there's key features. There's features
in the convolution accelerator that are
unique and different from CNN's. So
things like matrix matrix multiplication
instead of just matrix vector
multiplications used for CNN's. You need
matrix matrix. You need feature maps
appear on both operands and not on just
on one side. Those are just an example.
You need this very flexible generous
tensor accelerator to do things like
softmax, layer normalization, new
activations functions like
GLU. And then finally, you need a
dedicated DMA that does complex things
like embedding lookups. Uh the all
features needed to support the the
constructs of a transformer which are
the basis for Gen AI.
So just to give you a sense of uh you
know what's the efficiency um we've run
you know vision transformer vit and swin
for different imprint sizes. This is on
our single core the 496 MAC and you can
see our maculization is varies between
60 and 70%. And our bandwidth is uh you
know in the range of an LP DDR5
somewhere between 20 GB and 30 GB. Of
course, under NDA with our customers,
you can get the exact numbers and just
giving you a range here. Um, a key
message here is that if you run these on
a GPU, Mac utilization is typically
below 5%. Sometimes 10, rarely above 20.
Um, these machines in the embedded space
need to be much more efficient and the
tools need and these are more dedicated
machines than a GPU and you can get much
higher utilization at very low area and power.
power.
So, genai stable diffusion the example.
Um, I'm going to skip this because we're
out of time. It'll be in the material we
leave with with the scene. Uh, I just
want to show where this compares. So,
our NPX 6 32K running in, you know, uh,
dense mode, uh, where there's no
sparity. We'll match a RTX 3060. The 32K
with structured sparsity will match the
Titan RTX. This is about uh 30 frames
per second for stable diffusion version
1.5. Uh so that's a $200 machine. It
consumes about 200 watt. Just as a
ballpark, these machines are less than
10 mm square and 5nmter. Um a Titan RTX
is many hundreds of millime square. And
maybe even more importantly, it consumes
less than 2 watts compared to 200 watts
on a general purpose GPU GPU like a
Titan. And that's kind of our mid
mid-enge machine. We can go higher, of
course, with the 64K. And then we're
approaching, you know, the the
state-of-the-art a year and a half ago
when this chart was was
developed. But I think the key message
here is by specializing by developing a
neural processing unit, not a GPU
variant, you can get these two order
magnitudes of power reduction and an
order magnitude or two of area reduction.
This applies to you know stable
diffusion but we also can apply this to
you know genai like lama 2. Uh all I
want to say about lama 2 is that this it
really the challenge of lama 2 is is
bandwidth limitations. So you need to do
tricks to reduce your coefficient size
go from you know integer 8 to integer 4.
Internally you can use higher accuracy
like integer 16 or or FP8 FP16 to
preserve accuracy but on the the the
bottleneck to DRAM is the coefficient
bandwidth and the large model sizes. So
what we've discovered is that basically
if you use all the available bandwidth
we're matching any public result using
the same amount of bandwidth because it
is completely bandwidth limited and not
resource limited which is not the case
for stable diffusion where you do have
it is compute
limited. So to summarize and I ran a few
minutes over time transformers are the
baseline for these deep learning models
that were developed initially in the
field of natural language processing.
We've discovered in the early 2020s that
we uh in six months we're able to
achieve state-of-the-art vision results
in other domains and we developed at
that time we started the design of the
uh NPX6 generation in
202021 and we took a bet that
transformers would be there to stay.
That bet so far has been uh proven right
and that we've seen not only
transformers remain but they're the
building block for these latest
generation of of Gen AI like stable
diffusion, lama 2, chat GPT and the very
latest uh you
know um mixture of expert B based
approaches like DeepSeek um that go
beyond and make more compact models by
having a dynamic model size and we can
support that as well. We've already done
the uh the uh preliminary benchmarking for
for
deepseek. So this is moving quickly.
These initially were in the cloud on
high performance uh high cost high power
GPUs and they're moving quickly in the
embedded space and we believe we're
we're we're prepared for this and we're
space. Thank you. I'm happy to take one
or two questions.
Thank you Pierre or in the panel. Thank
you Pierre. Just a reminder, you can
post your questions uh in the chat and
uh they will be addressed by the
speakers. So since there is no questions
in the chat, I I I I will I I'll have a
question for you, Pierre. Uh how do you
keep up with the rapidly evolving AI
models to ensure hardware architecture
remains compatible and efficient?
Yeah, so so far so good. It's been 5
years since we built the spec. So we
took uh fundamentals in basic primitives
as the building blocks and we made
everything programmable. Um that being
said it's programmable and it's
efficient around a certain class of
applications which are today you know
transformer dominated with a lot of
flexibility and complexity but still
they're CNN and
transformers. Hopefully that's still
true for the next couple years and we
therefore we have a market that's uh
that's that's valid for us. If there's a
new completely new invention, we'll
discover with
the with the rest of the world. But for
the moment, we don't see this um the the
choices we made in flexibility around a
class of of transformerbased and you
know genai based applications. So far so
good. But I cannot you know I don't have
a crystal ball but will there be a new
you know invention in two three four
years this happens in in AI every five years.
years.
I have a question for from the chat. Are
you planning on targeting the bigger
parameter models or is there interest in
smaller more specialized models as well?
That's more a question about our
customers and the answer from our
customers is the latter. Um in our space
we're in the embedded space. We're not
in the cloud. We have one or two
customers kicking the tires but most of
our committed customers are in auto half
of them are automotive. The other half
are consu high-end consumer, low-end
consumer, and all of those are really um
these more specialized models with
smaller because it's completely
bandwidth limited. So, it's not
realistic to use these these these large
models that are used in the in the cloud.
cloud.
Okay. Thank you, Kier. Uh next up is
Davis Sawyer from NXP Semiconductor, a
Canadian tech entrepreneur and AI
products marketing manager. Davis also
chairs the EdgeAI foundations industry
working group and brings a unique blend
of business and technical insights.
Today he'll talk about secure fine-tuned
LLMs for Gen AI at the edge. Welcome
Davis. Awesome. Thank you and and thanks
PR. Yeah, great uh way to kick things
off. A lot of great background and
context on how we've come to this place
and looking forward to diving in. Um you
know yeah it's definitely true that I
think one key insight was you know CNN's
were computebound and transformers are
now more memory bound and it's true that
we definitely at the edge especially
edge semiconductors play uh we we kind
of inherit what happens at the cloud and
some of the innovations there and then
you know look for markets look for
opportunities and build silicon that can
support that. I think interestingly for
this talk I'm actually despite NXP being
obviously a semi-actor company uh I'm
actually going to spend more time
talking about software and some of the
tools that we've built on top of our
SOC's and products that help make it
easier to deploy whole applications. Um
we definitely need you know these
benchmarks and these uh you know
testimonies to performance as a way of
uh both attracting customers and also
backing up and justifying is this you
know viable for for product practical
use cases. In this talk, I'm going to
show some of the some of the software
pipelines we've built um that are now
available, which is exciting. So, I'll
definitely point to some links and some
GitHub repos that the audience can can
access as of today to start seeing for
themselves some of the the value we
think we've created here. But, I'll dive
in assuming you can see my slides,
assuming you can hear me. Uh, everyone's
kind of gone quiet, so I'm going to jump
in here. I know we have limited time, so
I'll try to be as effective as
Yes. Okay. All good. All good. Thank
you. Excellent. So, here's the high
level overview today. What we call the
intelligent edge, which I'll define a
little more specifically. The edge can
be enableless term. So, I'm going to try
to be precise in in terms of what we
target. I'll also give a high level
overview of our AI software stack and
neutron MPU. You know, like like
Synopsis, like others, we have a
portfolio of in-house, but also licensed
MPUs um that I think give a a good range
of flexibility to our products that meet
different workloads. It's really about
rising to meet the needs of of what the
application demands and and having the
support for that both from throughput,
memory, CPU usage uh price performance,
power performance, all that kind of
stuff. Uh I'll then do a deeper dive on
our genlow and rag database generator.
These are two distinct software tools
that I think uh help create those
fine-tuned secure LMs we referenced in
the title. Um then I'll give a bit of a
what I think is a sneak peek to the the
future of where we see the edge market
which is enabling multimodal geni and
some recent strategic moves that NXP has
made to help support that again from our
product portfolio. I'll wrap up the
summary hopefully some questions and
looking forward to the panel as well. So
when we talk about the edge uh and we
talk about the opportunity we focus on
of course there were some some companies
you know named name named earlier and in
GPUs space specifically that dominate we
think is the training opportunity and
there is some training shift to the edge
definitely see that in factories uh
maybe locally for smart home hubs maybe
automotive as well where you have some
kind of you know connectivity shaping
how how these models are updated which
is the training part um but we focus on
the the prediction part and I mean one
of my favorite sayings in the space is
training is once and inference is
forever for any specific model so I
think that there's a big opportunity
when you focus on even just the
inference piece um and what that means
for a IML uh software and supporting uh
you know
devices how NXP looks at the intelligent
edge has a few lenses and there's
definitely not enough time to cover all
of our enablement today and all of our
options and portfolio so I'll focus on a
few key messages one key message is we
scale up from you know our MCXM MCUs
which are kind of the lowest footprint
size physical size uh you know power
power consumption ratio um that's
enabled by neutron on as of today we
have a lot of say time series and sensor
data driven by that some interesting uh
use cases that have already been built
over the last few years really we've
also introduced more recently this uh
crossover brand of MCUs under the IDMX
um flag that really support again you
know in the wearable category also power
power sensitive power sensitive use
cases but also some ML capabilities
penetrate into that market then our IMX
application or microprocessors really
that's our the bread and butter of
computer vision voice time series data
as well scaling up to the kind of stuff
that Pier was alluding to with
transformer-based workloads which of
course dramatically improve on the
accuracy you can do with computer vision
use cases or similar perception use
cases in general but there definitely is
this new class of applications that are
enabled by the reasoning or cognitive
abilities of LLMs and vision LLMs um
which of course I'll actually give some
demos of in a second which will be
pretty cool and so this is built on top
of our our EIQ software stack for our
customers to have an easier path and and
faster path to market by really
demystifying and simplifying a lot of
the stuff that has to happen I mean you
don't need maybe optimize every model as
as thoroughly. You don't need to do
retraining in many cases. But for those
cases where you have say the whole
spectrum of offtheshelf versus heavily
optimized, EIQ really rises to meet
those demands. We have a few specific
components that are a bit newer that I
won't cover again in depth today, but
want to mention time series studio
that's initially focused on MCUs but now
running on ID Automix as well specific
specific SOC's. Um that helps again for
AutoML for time series models. I will
focus today on the geni flow because one
of the themes is geni and I think
actually we cover almost every mega mega
theme mentioned for today by us at the
start um except for the AI and harsh
environments which will be cool to get
to but we cover model optimization we'll
cover software tools we'll cover
hardware design we'll cover um geni at
the edge of course and so again this is
meant to be flexible with both you know
productivity enhancement but also energy
efficiency and of course the performance
needed for the
task wouldn't be a good AI talk if we
didn't mention EIQ Qutron MPU which
again fits that scalability story where
we're scaling up from the MCXN you see
on the lowest end here with that you
know iteration scaling up to products
exist today with again external NPUs I
think that's what you know how NXP has
approached the market and may continue
to with with how we try to produce a
flexible product portfolio so our MX93 I
use the ARM ethos5 microMPU pretty
capable engine with with a software
pipeline you know and FQ components that
really help get the most performance
possible there our X8 plus has been in
the market for a few years and then our
our kind of what I'd call our lead
puncher flagship which is for ji
applications which is our IMX95 also
available today that actually uses our
in-house MPU which on paper we list as a
two top engine I actually think when you
see the performance it it punched above
its weight that's certainly true of
CNN's so your classical your not
classical but classic uh classification
and uh object detection segmentation
CNN's some of the stuff that was covered
earlier again to the newer generation of
models that are transformer-based um and
power workloads like vision transformers
of course And actually, I'll focus a bit
more on the LLM side. Um, we've built a
really capable voice UI, voice AI
pipeline that when you drop in an LLM at
the edge, uh, for both privacy and
real-time response reasons, when you
have the silicon to power it, um, can
create really interesting HMI, you know,
human machine interface and other
application spaces as well that weren't,
let's say, possible a few years ago even
until we had this transformer
breakthrough and then of course the edge
silicon to power
it. an underrated part because I think
it's a con, you know, one classic uh
perspective of AM practitioners is that
it's all about AI but I actually think
you know especially how NXP approaches
it with security in mind and really
best-in-class security even as I've been
recently learned postquantum crytography
which is super important for financial
automotive and and regulated domains. Um
we deliver that today. So when you
combine security plus intelligence at
the edge uh I think it's a very very
compelling offering and that's what NXP embodies.
embodies.
Now I'll go a little deeper into GIF
flow which is something I'm I'm happy to
say that I I own and drive here at NXP.
Um and I have a lot more innovation
coming in the back half of this year.
This actually exists today and I'll
point to a couple resources but to give
you a sense of why we built this and
what the problems it solves. I'll
actually look forward first at what are
our JI ambitions? What are we trying to
do here? So we already play in a wealth
of markets. Automotive being one
consumer uh to some degree mostly smart
home industrial uh smart building power
management uh wide breadth. Because of
this wide breadth, we have a big
opportunity to bring gener into these
domains that some already have some AI,
some AI is a bit newer technology, but
all the same, we have these new
capabilities that we want to be able to
leverage. And this is just I'd say not a
not an exhaustive list, but certainly a
good glimpse of places where we're
making a difference
already. The challenge though is you
have this geni landscape that we all
know is is changing rapidly. And so the
way that I tried to tried to parse this
and tried to present this for you know
audiences like today
is you have this core stack in the
middle. These are libraries we're
familiar with like the AI frameworks.
Then you have a lot of let's say
necessary components that you need to
either support or interface with somehow
also communication protocols as well as
be another big one for us. But for it
comes to edgi stack this is the kind of
the classic stack you know sandwich
diagram that we see a lot of places I
tried to bifrocate that from the moving
parts of AI that actually are really
dynamic. you get lots of innovation but
you know do you need to focus on as your
core expertise and then if you don't how
do you leverage it and so the takeaway
that I tried to distill for for us in XP
and for the edge is to solve problems by
creating optionality to leverage the
best of what changes so as we saw in the
previous talk you know leveraging
transformers huge deal and that actually
probably will change frequently what
comes next we don't know but having the
optionality to leverage it will be
important and so we also want to
optimize what doesn't change as often we
have this paradox of models changing
weekly or monthly but silicon and
products that need to live at the edge
in in devices in products for you know
10 to 20 years in some
cases excuse me so because of this
paradox we need to find a way to balance
both uh performance so getting the best
of what's possible while while having
longevity and I think when it comes to
IMX in particular longevity is a big
part of of what we do and stand for now
challenges we have with geni industry um
we should all be familiar with this from
even just playing with chatbt and other
other off-the-shelf or API based LLM
um they're limited in context
understanding. Obviously, they're
carpentage expensive. That's a given.
They have hallucinations, so they just
they make stuff up and combating that is
actually a pretty deep part of how they
work. That is not something readily
solved, let's say, unless you have some
some some powerful software
contributions, which again, rag is one
that's talked about a lot and I think is
here to stay as well. Um and and we
focus on NXP. And then errors and
reasoning. All this to say is that if
you want to use this for a a a use case
where there's material or people
involved, um you won't you can't have
any of these issues. You have to address
them somehow. And again, this happens in
software in most cases. And the sweet
spot we found by kind of assessing
what's out there and the approaches we
have. Uh on the bottom, you know, this
is how most of us use it for day-to-day
tasks that are not mission critical. On
the top, that's limit to a very small
audience in the world with the skills
and and and compute to do so. But here
in the middle, kind of goldilock
scenario, we have this ideal performance
overhead trade-off. And what we like
specifically about rag is it protects
users IP from the model creation even
from ourselves actually. We want to have
this environment where you have this
again secure and fine-tuned LMS without
compromising you know things you can't
compromise and this also helps you know
lower the the time to market where
you're not retraining um because you're
creating a database you're creating
something that can be stored on on the
edge device by parsing domain knowledge
specific knowhow in different forms to
be interfaced with an LM and actually
voice UI as we've done the genifi flow.
The other problem of course is model
size. I won't belabor this point but I
think one interesting you know thing we
have is that for every billion
parameters we have about a gigabyte of
memory we need not not bandwidth but
actual actual memory uh you know store
store these weights and models um so in
the integer a precision that data type
it's around a gigabyte and what this
means of course is as Pier alluded to is
you me these models tend to become
bandwidth they're memory memory
bound so because of these two precursors
we want to have fine-tuning we want to
have optimization we've baked this into
a software tool we call GIF flow and
this program is available um for free to
deploy you know off-the-shelf models. We
also have a commercial version that we
we provide for customers on on again the
IMX95 is the flagship product today
possibly other other families in the
future but that's where we really bring
both of these the capabilities the
finetuning to adapt these LMS to your
domain knowledge without compromising IP
but also eliminating those errors and
reasoning those hallucinations that you
have to for industry use on their side
optimization to get the best performance
the performance you need really I would
say the best performance is always
needed but acceptable performance
actually gets the job
Uh a little more deep dive on this would
actually be you know it's made of
modules. These modules are kind of the
building blocks DJI use cases that we
found to be quite common. So it comes to
voice you have kind of speech in speech
out these components wake events think
hey NXP you know hey XYZ that we want to
wake up could be also a visual event to
trigger that in the future and the
current version is focused on
conversation as I mentioned and I am
trying to go fast here. I apologize it's
uh just you know obviously time limited
but happy to go deeper on any of these
uh topics in the future. A quick demo of
what we're doing with uh with um this
rag engine and why it's so valuable
especially for contexts like medical is
it can actually have answers tailored to
data that is grounded in factuality,
grounded in truth and relevant to just
domain. So this is using an older bigger
model. We've actually I think had the
response time a lot faster with this.
You can see the text being generated.
We've introduced a streaming mode so you
don't have to wait for all the tokens to
be to be produced. you can actually
start producing earlier tokens faster.
conversational. But there you get the
TTS. That's the full that's kind of the
full use case of this. I think what we
can power the JUI plus the voice UI
powered by LMS, which we didn't have
before. It brings a lot of capability.
Closer look at the numbers here. Uh I
think where we were before and using
only CPU, it doesn't make sense. You
know, people read around just faster
than 510 tokens per second, but that 1.5
second delay just isn't natural. Isn't
suitable for conversational AI. Of
course, we can throttle that heavily
when we start focusing on both using a
neutron imp. So, that of course actually
accelerates greatly time to first token,
but achieves meaningful token speed. Um,
just for reference, this would be six
Cortex A CPUs versus a single NPU. We're
getting actually the performance of
about six, you know, high performance
Cortex A cores out of our Neutron MPU on
the IMX 95. Pretty cool.
As I mentioned at the start, this is
available today. We'd love for you guys
to start playing with, you know, the
voice UI that can be built with Jenny
Flow today and future engines coming
coming out to on uh later in this year.
The rack database generator is also a
unique tool that you can create from
kind of PDF in database out. I think
that's super effective to get a sense of
the quality um you know of of of how
effective this process is for
fine-tuning answers, but also the
efficiency because you know relative to
the size of an LLM, these databases are
designed for the edge and quite small.
I'll wrap up with a quick glimpse of
like I said the future before hopefully
time for a couple questions here. Um you
know we've made a a a big move in in
intending to acquire Canar which is a
MPU that's you know dedicated to kind of
the latest and greatest workloads and
and with quite efficient power um which
fit for the edge while expanding nicely
on the host SOC's which of course NXP is
our bread and butter. So we think this
is a really great uh union of of
technologies and teams and commercial
focus as well. And to give you a sense
of that as well, one space I focus on
might be an unsexy industry, but it's
industrial. And I think industrial is
ripe for geni innovation for reasons
like we see here with using um you know
uh visual, you know, both vision
transformer plus multimodal LLM to
understand what's happening in in a
series of
images. Um I won't go through all of
this here, but you can get I think this
is a great example of the kind of visual
intelligence that could be layered with
an agent for example to then notify
emergency services, notify supervisor
and actually take an action. So going
from perception to perception plus
action is a big theme for us with
Jenning at the edge. That's a quick so
quick summary. We can bring domain
specific intelligence to edge. These
elements can be optimized and deployed.
They can also be fine-tuned and then
leveraging the efficient acceleration we
have in the IMX family and integrated or
discrete MPUs plus this geni flow which
really serves as a one-stop shop. That's
how we bring GI to life in NXP today. Uh
thanks again for your time everyone and
hopefully just seen a few questions.
Thank you Davis for this great
presentation. There's a lot of material
to digest
today. I have a question from my
colleague at CMC James Miller. Uh is
there anything in the stack that helps
real time system developers ensure
timely bounded results? for example,
guaranteed critical responses within 50
milliseconds for safety in applications
like robotics and automotive. Yeah. So
for automotive uh I would call it so EIQ
is what we is AI space qualified. Um so
I think that there are some components
of that that help with these
deterministic requirements. One problem
of course with LM and AI in general is
that it has a stoastic or probabistic
nature. So when it comes to just LLM
throughput or these models uh having a hard cut off on their responses might be
hard cut off on their responses might be a little trickier. But for things like
a little trickier. But for things like the infotainment system automotive,
the infotainment system automotive, we've already deployed LMS at at
we've already deployed LMS at at reasonable conversation speeds. So I
reasonable conversation speeds. So I think for that kind of stuff, we can
think for that kind of stuff, we can already see a lot of innovation
already see a lot of innovation happening with the kinds of applications
happening with the kinds of applications that have these harsh requirements. You
that have these harsh requirements. You need a provider automotive grade uh AI
need a provider automotive grade uh AI and hardware stack to meet those needs.
and hardware stack to meet those needs. Yeah, thank you Davis. See you in the
Yeah, thank you Davis. See you in the panel. Thanks. Our next speaker, our
panel. Thanks. Our next speaker, our next speaker is Professor Warren Gross,
next speaker is Professor Warren Gross, a James Miguel professor and chair of
a James Miguel professor and chair of the department of electrical and
the department of electrical and computer engineering at Miguel
computer engineering at Miguel University. Warren's research bridges
University. Warren's research bridges algorithm and hardware with a focus on
algorithm and hardware with a focus on efficient deep learning models and
efficient deep learning models and hardware for machine learning. Today
hardware for machine learning. Today he'll present on parameter efficient
he'll present on parameter efficient fine-tuning of transformer-based
fine-tuning of transformer-based language model using data set pruning.
language model using data set pruning. Please join me in welcoming Warren. Why
Please join me in welcoming Warren. Why don't the stage is yours? Thank you very
don't the stage is yours? Thank you very much. Uh and it's a pleasure to be able
much. Uh and it's a pleasure to be able to speak again at this accelerating AI
to speak again at this accelerating AI workshop. This is
workshop. This is uh not the first time I've been here and
uh not the first time I've been here and I always enjoy the interactions and the
I always enjoy the interactions and the talks at this workshop. Uh can you all
talks at this workshop. Uh can you all see my slides? Okay. Yes. And we hear
see my slides? Okay. Yes. And we hear fine. Thank you. Okay. Great. So what
fine. Thank you. Okay. Great. So what I'm going to do today is talk about um
I'm going to do today is talk about um fine-tuning language models. following
fine-tuning language models. following on what Davis was talking about and I'm
on what Davis was talking about and I'm going to talk about some things that you
going to talk about some things that you can do in the U training process to make
can do in the U training process to make the um
the um uh the finetuning more
efficient. So we're we're talking about LLMs uh language models. Uh there was
LLMs uh language models. Uh there was great introductions to this area and
great introductions to this area and transformers in in the last two talks.
transformers in in the last two talks. Uh, I just wanted to show you um
Uh, I just wanted to show you um something that I found online uh as of
something that I found online uh as of late last year, some of the
late last year, some of the state-of-the-art LLMs. And what we're
state-of-the-art LLMs. And what we're seeing is that now in terms of the
seeing is that now in terms of the number of parameters in these LLMs,
number of parameters in these LLMs, we're talking about trillion plus
we're talking about trillion plus parameters. Uh these are absolutely
parameters. Uh these are absolutely enormous. And you know, looking around
enormous. And you know, looking around trying to find information about the
trying to find information about the training cost of these models, you see
training cost of these models, you see that you really need upwards of $100
that you really need upwards of $100 million to train uh one of these large
million to train uh one of these large language models from scratch. Now,
language models from scratch. Now, things changed very uh dramatically uh
things changed very uh dramatically uh at the end of the year when we saw Deep
at the end of the year when we saw Deep Seek uh which has um uh a training cost
Seek uh which has um uh a training cost uh dramatically lower of about $6
uh dramatically lower of about $6 million. uh and this has you know also
million. uh and this has you know also though it does have a lot of parameters
though it does have a lot of parameters but it's bit but smaller at 671 billion.
but it's bit but smaller at 671 billion. So there there are things that we can do
So there there are things that we can do in model design and we and also clever
in model design and we and also clever training uh to reduce this training cost
training uh to reduce this training cost but there there needs to be still uh
but there there needs to be still uh more attention paid to the efficiency of
more attention paid to the efficiency of training to bring it down even further.
training to bring it down even further. Looking at the trends in transformer
Looking at the trends in transformer size uh in terms of number of parameters
size uh in terms of number of parameters what we're seeing is an exponential
what we're seeing is an exponential increase in model size uh at the rate of
increase in model size uh at the rate of about 10 times per year. Uh this is a
about 10 times per year. Uh this is a very significant increase. And what we
very significant increase. And what we find is that when you have bigger and
find is that when you have bigger and bigger models, you also actually need
bigger models, you also actually need more and more data to accurately train
more and more data to accurately train the models. Uh and so there's really two
the models. Uh and so there's really two pieces to this. There's the how do you
pieces to this. There's the how do you efficiently train and and consider the
efficiently train and and consider the the hardware uh complexity and the and
the hardware uh complexity and the and the hardware you're training on. But
the hardware you're training on. But also there's a data set piece to this.
also there's a data set piece to this. How do you um uh consider the data sets
How do you um uh consider the data sets that we're we're going to talk about
that we're we're going to talk about both parts of this in in today's talk.
So the the way that training uh uh the training
that training uh uh the training challenge is addressed in large language
challenge is addressed in large language models is really broken down to a
models is really broken down to a two-stage process. The first stage is
two-stage process. The first stage is pre-training and that's the expensive
pre-training and that's the expensive stage. This is when you train from
stage. This is when you train from scratch on a huge data set, a
scratch on a huge data set, a pre-training data set. Uh and this is
pre-training data set. Uh and this is what would take the millions and
what would take the millions and millions of of dollars. uh and is really
millions of of dollars. uh and is really only something that can be done by large
only something that can be done by large large uh large companies that have
large uh large companies that have access to the large data set but also to
access to the large data set but also to the computational resources needed to do
the computational resources needed to do it. Um but then what you can do once you
it. Um but then what you can do once you have this general large uh pre-trained
have this general large uh pre-trained model, you can then fine-tune the model
model, you can then fine-tune the model in a second stage of training to a
in a second stage of training to a specific data set and for a specific
specific data set and for a specific task. uh and this finetuning data set is
task. uh and this finetuning data set is usually smaller and uh the finetuned
usually smaller and uh the finetuned model is then is then uh adapted to
model is then is then uh adapted to solving a particular
solving a particular task. In this talk, we're going to focus
task. In this talk, we're going to focus on the fine-tuning stage uh of this
on the fine-tuning stage uh of this process given uh an existing pre-trained
model. And so I wanted to talk about two key metrics, two key hardware metrics in
key metrics, two key hardware metrics in the training process. One will be the
the training process. One will be the training time, the amount of time it
training time, the amount of time it takes to train from start to end of the
takes to train from start to end of the process and the peak memory usage. So
process and the peak memory usage. So why is training time important? Because
why is training time important? Because it directly influences uh metrics of of
it directly influences uh metrics of of interest to everyone. The for example
interest to everyone. The for example energy usage. So the longer it takes to
energy usage. So the longer it takes to train, the more energy the process will
train, the more energy the process will take and the energy usage directly
take and the energy usage directly impacts the electricity bill or the
impacts the electricity bill or the battery life of the device as well as
battery life of the device as well as the carbon footprint.
the carbon footprint. On the other hand, the peak memory usage
On the other hand, the peak memory usage determines the minimum amount of memory
determines the minimum amount of memory you need to uh uh allocate on your
you need to uh uh allocate on your device uh such as a GPU or an NPU. This
device uh such as a GPU or an NPU. This will lead to more expensive devices and
will lead to more expensive devices and since you need many devices, this could
since you need many devices, this could be very significant cost as well as
be very significant cost as well as training difficulty. If your model
training difficulty. If your model doesn't fit in the memory of your GPU,
doesn't fit in the memory of your GPU, for example, then you may need to
for example, then you may need to partition the training process, move
partition the training process, move data in and out, and it it it can
data in and out, and it it it can complicate the training process.
complicate the training process. The other key aspect of memory is the
The other key aspect of memory is the number of memory accesses. And this
number of memory accesses. And this actually has a very strong uh influence
actually has a very strong uh influence on energy usage. So these are not
on energy usage. So these are not completely independent things, but
completely independent things, but they're both key metrics we want to
they're both key metrics we want to decrease. We want to have faster
decrease. We want to have faster training and using less memory.
So I'm going to introduce this uh picture on the right here which we'll
picture on the right here which we'll use throughout the talk to show the
use throughout the talk to show the effect of the different uh innovations
effect of the different uh innovations uh that can be applied to to the model
uh that can be applied to to the model training. So on the vertical axis we'll
training. So on the vertical axis we'll plot peak memory usage and the training
plot peak memory usage and the training time on the horizontal axis. Uh when you
time on the horizontal axis. Uh when you compare pre-training and fine-tuning you
compare pre-training and fine-tuning you see that fine-tuning uh has a much
see that fine-tuning uh has a much smaller training time than pre-training.
smaller training time than pre-training. But you'll notice that they both use the
But you'll notice that they both use the same amount of peak memory because we we
same amount of peak memory because we we haven't changed the
haven't changed the model. What we want to do is find
model. What we want to do is find techniques to move towards the bottom
techniques to move towards the bottom left of this uh of this graph to have
left of this uh of this graph to have low peak memory usage and lower training
low peak memory usage and lower training time. And we want to do all of this
time. And we want to do all of this without negatively affecting the model
accuracy. So the first thing we can do is use a technique called parameter
is use a technique called parameter efficient finetuning. And the basic idea
efficient finetuning. And the basic idea there is to avoid updating every single
there is to avoid updating every single model parameter when you're fine-tuning.
model parameter when you're fine-tuning. You can freeze large parts of the model
You can freeze large parts of the model weights and not actually update them and
weights and not actually update them and only focus on a subset of the weights um
only focus on a subset of the weights um uh in the fine-tuning. And this has a a
uh in the fine-tuning. And this has a a a couple of of uh of benefits. So if you
a couple of of uh of benefits. So if you look on the left, this is the normal uh
look on the left, this is the normal uh operation of training. First you do a
operation of training. First you do a forward pass, then you compute gradients
forward pass, then you compute gradients uh and then you update the weights using
uh and then you update the weights using those gradients. And the gradients and
those gradients. And the gradients and the weights happen to be moved in and
the weights happen to be moved in and out of memory uh uh which uh increases
out of memory uh uh which uh increases the amount of memory you need. Well, it
the amount of memory you need. Well, it sets the amount of memory you need and
sets the amount of memory you need and the time it takes to perform this
the time it takes to perform this training. In the memory are stored the
training. In the memory are stored the weights, the gradients and other uh
weights, the gradients and other uh optimizer states. Now in a uh frozen
optimizer states. Now in a uh frozen model you still have to perform the
model you still have to perform the forward pass but then you for a large
forward pass but then you for a large portion of the the parameters in the
portion of the the parameters in the model the frozen parameters you don't
model the frozen parameters you don't actually have to compute the gradients
actually have to compute the gradients or update the weights which means now
or update the weights which means now I'm going to reduce the amount of data
I'm going to reduce the amount of data that has to go back and forth between
that has to go back and forth between the compute and the memory I don't have
the compute and the memory I don't have to store uh all the gradients and the
to store uh all the gradients and the states so this is going to reduce the
states so this is going to reduce the peak memory uh quite substantially will
peak memory uh quite substantially will it reduce the time taken the training a
it reduce the time taken the training a little bit, but since you still have to
little bit, but since you still have to do the forward pass, the reduction in
do the forward pass, the reduction in time is not really significant. It's
time is not really significant. It's really the memory savings that you're
really the memory savings that you're you're achieving
you're achieving here. And the most popular way to do
here. And the most popular way to do this is a technique called Laura, low
this is a technique called Laura, low rank adaptation. And there you freeze
rank adaptation. And there you freeze most of the model. And and it's easiest
most of the model. And and it's easiest to think of freezing models in terms of
to think of freezing models in terms of freezing layers, but some of the layers,
freezing layers, but some of the layers, for example, the attention layers in a
for example, the attention layers in a transformer will be the ones that are
transformer will be the ones that are are are fine-tuned. Um and when you look
are are fine-tuned. Um and when you look at a layer that is going to be um not
at a layer that is going to be um not frozen, what you what you do is you take
frozen, what you what you do is you take the the linear layer which is the m byn
the the linear layer which is the m byn matrix and you actually freeze that and
matrix and you actually freeze that and add additional parameters in parallel to
add additional parameters in parallel to that linear layer. Um so it does involve
that linear layer. Um so it does involve adding a few extra parameters but this
adding a few extra parameters but this not too many because uh this parallel
not too many because uh this parallel layer is an n by r concatenated with an
layer is an n by r concatenated with an r byn matrix where r is a very very
r byn matrix where r is a very very small number. So you have a small
small number. So you have a small additional number of parameters, but in
additional number of parameters, but in terms of training, it makes it much much
terms of training, it makes it much much much easier. And in inference, you're
much easier. And in inference, you're going to use both of these layers. So
going to use both of these layers. So this will result in fewer memory
this will result in fewer memory accesses and smaller peak
accesses and smaller peak memory. And what is the effect of doing
memory. And what is the effect of doing uh parameter efficient finetuning like
uh parameter efficient finetuning like Laura? Well, the peak memory is
Laura? Well, the peak memory is considerably reduced. So you can see on
considerably reduced. So you can see on this graph that uh compared to standard
this graph that uh compared to standard finetuning, finetuning with Laura
finetuning, finetuning with Laura combined reduces the peak memory. the
combined reduces the peak memory. the training time is also slightly reduced
training time is also slightly reduced but not by a lot. Uh so how now we we
but not by a lot. Uh so how now we we we're asking the question how do we
we're asking the question how do we further decrease the training
further decrease the training time and to do that we need to look at
time and to do that we need to look at the other part of the equation which is
the other part of the equation which is the data that you're training on and
the data that you're training on and when you look at a data set which is a
when you look at a data set which is a collection of of data samples. You can
collection of of data samples. You can realize that not all the samples in the
realize that not all the samples in the data set are are equally helpful. Some
data set are are equally helpful. Some of them are not helpful at all. For
of them are not helpful at all. For example, some data points may be
example, some data points may be mislabeled and they can actually be
mislabeled and they can actually be misleading to the model. It may actually
misleading to the model. It may actually hurt your model if you train on them.
hurt your model if you train on them. Some of them are very very easy. We call
Some of them are very very easy. We call easy data points where they don't really
easy data points where they don't really add any information to the model that
add any information to the model that doesn't already exist in the pre-trained
doesn't already exist in the pre-trained model. On the other hand, some are very
model. On the other hand, some are very very difficult and if you train on them,
very difficult and if you train on them, it can actually lead to bad things and
it can actually lead to bad things and damage your model. So, we really would
damage your model. So, we really would like to not train on those kinds of data
like to not train on those kinds of data points. They're unwanted. And if we can
points. They're unwanted. And if we can identify them and prune them away, what
identify them and prune them away, what we can do is have a fine-tuning model
we can do is have a fine-tuning model that is now more accurate. But also if
that is now more accurate. But also if you think about it if we don't train on
you think about it if we don't train on on certain number of of data points then
on certain number of of data points then it's going to be faster to train. So it
it's going to be faster to train. So it has the the dual benefit of of uh
has the the dual benefit of of uh benefiting the accuracy potentially but
benefiting the accuracy potentially but also reducing the training time. That's
also reducing the training time. That's the training time is really the the
the training time is really the the aspect that we want to look at
aspect that we want to look at here. So in data step pruding you want
here. So in data step pruding you want to find which data points you don't uh
to find which data points you don't uh want to train on by evaluating some
want to train on by evaluating some score function. And so uh we've looked
score function. And so uh we've looked at at uh the design of score functions
at at uh the design of score functions to look at each data point in the uh in
to look at each data point in the uh in the data set and see if we can prune it
away. And our uh score function we call the h score and it works kind of like
the h score and it works kind of like this. So you do some training some
this. So you do some training some finetuning for a few epics and you look
finetuning for a few epics and you look whether the classification for each data
whether the classification for each data point was correct or not. So we look at
point was correct or not. So we look at the ground truth and we see did the
the ground truth and we see did the model correctly classify or not. And if
model correctly classify or not. And if the the model is correctly classifying
the the model is correctly classifying across all epics consistently then we
across all epics consistently then we give uh that data point a score of one
give uh that data point a score of one and then we do this multiple times for
and then we do this multiple times for different seeds or six seeds and we just
different seeds or six seeds and we just add the score for every seed. So what we
add the score for every seed. So what we end up with is a score for data points
end up with is a score for data points that are consistently always being
that are consistently always being correctly trained uh uh giving the
correctly trained uh uh giving the correct answer then you could have a
correct answer then you could have a score of six. On the other hand, if you
score of six. On the other hand, if you have data points that are always
have data points that are always consistently giving the wrong
consistently giving the wrong classification, then you have a score of
classification, then you have a score of zero. And so you can have scores between
zero. And so you can have scores between zero and six. If we look at the scores
zero and six. If we look at the scores of uh data points with score six, these
of uh data points with score six, these are really, really, really easy data
are really, really, really easy data points. They're always getting the right
points. They're always getting the right answer. The model probably already knew
answer. The model probably already knew the answer. So you don't need those. You
the answer. So you don't need those. You can prune those away. Scores that are
can prune those away. Scores that are zero is very difficult. You don't want
zero is very difficult. You don't want those there. So you prune those away.
those there. So you prune those away. And in the middle, you have scores that
And in the middle, you have scores that are ambiguous. These are the data points
are ambiguous. These are the data points we keep and train
on. So the training time is now reduced proportional to the size of the prune
proportional to the size of the prune subset. In our experiments, we're
subset. In our experiments, we're pruning away 70 to 80% of the data set.
pruning away 70 to 80% of the data set. So we're left with maybe 20 or 30% uh of
So we're left with maybe 20 or 30% uh of the original data set. So this can have
the original data set. So this can have significant decreases in training time.
significant decreases in training time. So oops, you can see now that um we've
So oops, you can see now that um we've now talked about two two methods. the
now talked about two two methods. the Laura which can reduce the peak memory
Laura which can reduce the peak memory usage and uh data set pruding using H
usage and uh data set pruding using H core which can reduce the training time.
core which can reduce the training time. What we'd like to do now is see can we
What we'd like to do now is see can we combine both of these techniques to
combine both of these techniques to drive us uh closer to the the bottom
drive us uh closer to the the bottom left to have low memory usage and low
left to have low memory usage and low training time. And so the proposed
training time. And so the proposed method does both. You take the
method does both. You take the pre-trained model apply low rank
pre-trained model apply low rank adaptation to come up with a parameter
adaptation to come up with a parameter efficient model. That model is then used
efficient model. That model is then used with the fine-tuning data set to compute
with the fine-tuning data set to compute an H score. Now that I have an H score,
an H score. Now that I have an H score, I can do data set pruning. I have a
I can do data set pruning. I have a prune data set applied to the lower
prune data set applied to the lower model. I can fine-tune to come up with
model. I can fine-tune to come up with my fine-tuned model which I can then
my fine-tuned model which I can then evaluate. So this is the results of of
evaluate. So this is the results of of the evaluation of these um of these two
the evaluation of these um of these two techniques combined. And we have here uh
techniques combined. And we have here uh combinations of one or the other
combinations of one or the other technique and then see the both of them
technique and then see the both of them together. Um and so compared to the the
together. Um and so compared to the the baseline which we'll just say uh has a
baseline which we'll just say uh has a speed up of of one normalized and um the
speed up of of one normalized and um the peak memory of about 10 gigs. The data
peak memory of about 10 gigs. The data the Laura by itself has limited speed up
the Laura by itself has limited speed up 1.2 times but a significant reduction in
1.2 times but a significant reduction in peak memory usage. the data set pruning
peak memory usage. the data set pruning using the hcore by itself has
using the hcore by itself has significant speed up of of over four
significant speed up of of over four times but of course doesn't reduce the
times but of course doesn't reduce the uh peak memory. So as we hypothesized
uh peak memory. So as we hypothesized the experiments showed that when you
the experiments showed that when you combine both of these techniques
combine both of these techniques together on average you could have over
together on average you could have over five times speed up and also enjoy the
five times speed up and also enjoy the significant compression uh of memory.
Now looking at accuracy uh what we've done here is we've looked at these two
done here is we've looked at these two techniques both in individually and
techniques both in individually and combined and we've also added uh
combined and we've also added uh comparison with random pruning not using
comparison with random pruning not using hcore and the reason why we include
hcore and the reason why we include random pruning is it's actually in the
random pruning is it's actually in the regime where we have significant data
regime where we have significant data set pruning of 80% or or more random is
set pruning of 80% or or more random is actually state-of-the-art. It's actually
actually state-of-the-art. It's actually better than the other uh scoring
better than the other uh scoring functions that have been posed. Maybe
functions that have been posed. Maybe when you're doing less aggressive
when you're doing less aggressive compression, there are other techniques
compression, there are other techniques that can be used. But in this highly
that can be used. But in this highly aggressive regime, uh random is is very
aggressive regime, uh random is is very good. So the question is does H4 do
good. So the question is does H4 do better? And it does. And in fact, it's
better? And it does. And in fact, it's necessary to get uh excellent
necessary to get uh excellent performance. So what we can see here is
performance. So what we can see here is that the accuracy um overall is actually
that the accuracy um overall is actually improved slightly by using H4 pruning.
improved slightly by using H4 pruning. Um and Laura helps as well. So the
Um and Laura helps as well. So the combination of the two is very effective
combination of the two is very effective and it either doesn't hurt or slightly
and it either doesn't hurt or slightly improves because of regularization
improves because of regularization affects the uh accuracy on a model like
affects the uh accuracy on a model like Roberto large which has 350 and so
Roberto large which has 350 and so million
million parameters. Finally I just wanted to
parameters. Finally I just wanted to introduce uh one additional uh set of
introduce uh one additional uh set of experiments on continual learning. So
experiments on continual learning. So what is continual learning? It's the
what is continual learning? It's the scenario where I'm going to fine-tune on
scenario where I'm going to fine-tune on two different tasks consecutively. So, I
two different tasks consecutively. So, I have a pre-trained model and I'll
have a pre-trained model and I'll fine-tune it to to task one and I was I
fine-tune it to to task one and I was I I'll end up with a model. Then I'll take
I'll end up with a model. Then I'll take that model, that fine-tune model and
that model, that fine-tune model and fine-tune on a different task, task
fine-tune on a different task, task number two. Now, the problem is now I
number two. Now, the problem is now I have a model that's been fine-tuned
have a model that's been fine-tuned twice. And what can tend to happen is
twice. And what can tend to happen is that when you fine-tune the second time,
that when you fine-tune the second time, the model forgets how to do the first uh
the model forgets how to do the first uh task. It can be damaged. And so we want
task. It can be damaged. And so we want to evaluate
to evaluate um do these techniques of data set
um do these techniques of data set pruning and
pruning and Laura help mitigate the forgetting in
Laura help mitigate the forgetting in the of the first model when when
the of the first model when when fine-tuning more than once and and the
fine-tuning more than once and and the answer is yes it does. And so what what
answer is yes it does. And so what what we'll do is we'll evaluate this
we'll do is we'll evaluate this fine-tuned model the final one that's
fine-tuned model the final one that's been trained on task one and two and
been trained on task one and two and we'll we'll evaluate them on both tasks
we'll we'll evaluate them on both tasks and we'll see the effect of of uh
and we'll see the effect of of uh applying these techniques.
applying these techniques. So you can see in the first line where
So you can see in the first line where there's there's uh no modification just
there's there's uh no modification just there's two scenarios one where we task
there's two scenarios one where we task number one is called MNLI and then we
number one is called MNLI and then we train it on QNLI and there's also this
train it on QNLI and there's also this other scenarios C2MDB
other scenarios C2MDB uh you can see that the the first model
uh you can see that the the first model the first task MNLI in this case is
the first task MNLI in this case is dramatically damaged by fine-tuning on
dramatically damaged by fine-tuning on QNLI so what we can see then is by
QNLI so what we can see then is by applying Laura you can get back some of
applying Laura you can get back some of the for forgetting. So you can mitigate
the for forgetting. So you can mitigate some of the damage but not a lot.
some of the damage but not a lot. Applying data set pruning then is key to
Applying data set pruning then is key to actually rebuild the performance on that
actually rebuild the performance on that first task. Uh and compared to random uh
first task. Uh and compared to random uh we find the age score actually does u
we find the age score actually does u does a little bit better. And so it it
does a little bit better. And so it it really uh shows us that the combination
really uh shows us that the combination of these two techniques not only gives
of these two techniques not only gives you um uh more efficient fine-tuning in
you um uh more efficient fine-tuning in terms of memory reduction and training
terms of memory reduction and training time but also can help in continual
time but also can help in continual learning scenarios by mitigating damage
learning scenarios by mitigating damage to the the first uh task.
to the the first uh task. So in conclusion, we've have looked at
So in conclusion, we've have looked at ways to reduce the peak memory usage and
ways to reduce the peak memory usage and training time of fine-tuning in uh data
training time of fine-tuning in uh data large language models. Uh the first
large language models. Uh the first technique Laura which is one that's not
technique Laura which is one that's not ours but we we've evaluated it uh is
ours but we we've evaluated it uh is very effective in mainly reducing the
very effective in mainly reducing the memory usage and our proposed data set
memory usage and our proposed data set pruning using Hore greatly reduces
pruning using Hore greatly reduces training time. So what we've done is
training time. So what we've done is combined both of these together and
combined both of these together and shown that it's very effective. We
shown that it's very effective. We receive a over five times speed up and a
receive a over five times speed up and a 40% peak memory reduction and uh I've
40% peak memory reduction and uh I've also shown that these two techniques in
also shown that these two techniques in combination are very effective in
combination are very effective in mitigating the forgetting uh of the
mitigating the forgetting uh of the first task in a continual learning
first task in a continual learning setup. So that's my presentation and I
setup. So that's my presentation and I want to thank you very much for uh for
want to thank you very much for uh for your attention.
your attention. Thank you Warren for this great
Thank you Warren for this great presentation and great research. Um I
presentation and great research. Um I think this is a much needed feature
think this is a much needed feature going forward to reduce the training
going forward to reduce the training time and the memory usage. This will
time and the memory usage. This will also reduce power in data center when we
also reduce power in data center when we train models. Right. So what are uh I I
train models. Right. So what are uh I I think this is not a technical question.
think this is not a technical question. What are the key tools and hardware
What are the key tools and hardware required to enable research in parameter
required to enable research in parameter efficient finetuning and data set
efficient finetuning and data set pruning and what are the main pain
pruning and what are the main pain points uh you face in advancing this
points uh you face in advancing this field? Right. So the that's a good
field? Right. So the that's a good question. Thank you very much. So the
question. Thank you very much. So the main um bottleneck in terms of tools for
main um bottleneck in terms of tools for this kind of research is available
this kind of research is available computational resources. So getting
computational resources. So getting enough GPUs or NPUs we using GPUs to do
enough GPUs or NPUs we using GPUs to do the training is um is very significant
the training is um is very significant uh bottleneck especially because in data
uh bottleneck especially because in data set pruning in order to compute the um h
set pruning in order to compute the um h score we need to do multiple training.
score we need to do multiple training. So this is this is um uh quite a pain
So this is this is um uh quite a pain point. And um the second uh challenge we
point. And um the second uh challenge we have is partly related to the resources
have is partly related to the resources we have and also partly related just to
we have and also partly related just to coming up with good techniques is how to
coming up with good techniques is how to scale this to very large language models
scale this to very large language models which we have not done yet although in
which we have not done yet although in this current ongoing work we're looking
this current ongoing work we're looking at how to apply this to much larger
at how to apply this to much larger class LLM. Yeah, this was my next
class LLM. Yeah, this was my next question. Thank you. Okay, thank you
question. Thank you. Okay, thank you Warren. Um, due time limitation, we're
Warren. Um, due time limitation, we're going to uh go to our next speaker, but
going to uh go to our next speaker, but please do not hesitate to ask uh the
please do not hesitate to ask uh the speakers question directly in the chat.
speakers question directly in the chat. They will be pleased to answer them. Up
They will be pleased to answer them. Up next is Borak Kmak, CTO of Edge Signal.
next is Borak Kmak, CTO of Edge Signal. With over 18 years of experience leading
With over 18 years of experience leading global products development in edge
global products development in edge computing and cyber security, Borak
computing and cyber security, Borak brings both technical and strategic
brings both technical and strategic insights. He'll be speaking on uh
insights. He'll be speaking on uh implementing generative AI in the edge
implementing generative AI in the edge environments, challenges and solutions.
environments, challenges and solutions. Let's hear from Borak. Borak, the stage
Let's hear from Borak. Borak, the stage is yours. Thank you so much, Yasin. So,
is yours. Thank you so much, Yasin. So, let me share my
let me share my screen. Is that visible now? It's
screen. Is that visible now? It's visible and we hear you fine. Thank you.
visible and we hear you fine. Thank you. Okay. Uh
Okay. Uh so yeah being the last uh speaker of
so yeah being the last uh speaker of this first section uh it's going to be
this first section uh it's going to be kind of repetitive because the things
kind of repetitive because the things that I want to discuss already some of
that I want to discuss already some of them are uh you know
them are uh you know uh well presented by the prior uh
uh well presented by the prior uh speakers but I will skip those parts a
speakers but I will skip those parts a little bit faster and come up to the
little bit faster and come up to the real challenges that we are coming
real challenges that we are coming uh like currently they facing in the
uh like currently they facing in the customer
customer environments.
environments. Uh
Uh so here uh the reason we wanted to take
so here uh the reason we wanted to take advantage of edge llm was
advantage of edge llm was uh pretty you know obvious. Of course,
processors. That's uh that's our primary focus. Um um you can see the tagline
focus. Um um you can see the tagline here. We we recognize as as everybody
here. We we recognize as as everybody else in this gathering that the world is
else in this gathering that the world is moving to deploying gen AI, right? But
moving to deploying gen AI, right? But the the challenge is going to be okay
the the challenge is going to be okay gen AI at what cost? And one of the big
gen AI at what cost? And one of the big factors here is is energy consumption.
factors here is is energy consumption. So we've made it our objective because
So we've made it our objective because we've seen this coming for a while uh
we've seen this coming for a while uh that AI's proliferation is going to
that AI's proliferation is going to ultimately be gated by the amount of
ultimately be gated by the amount of energy that can be consumed in in many
energy that can be consumed in in many applications. Um, so we set out to to
applications. Um, so we set out to to really remove that from the equation.
really remove that from the equation. Not by improving AI inferencing by 2x,
Not by improving AI inferencing by 2x, 4x, 10x, but more like a 100x, a,000x,
4x, 10x, but more like a 100x, a,000x, 5,000x without compromising on latency,
5,000x without compromising on latency, performance, and cost. Naturally, our
performance, and cost. Naturally, our initial markets are around the edge
initial markets are around the edge where energy is heavily constrained and
where energy is heavily constrained and typically devices have to carry their
typically devices have to carry their energy source with them in the form of
energy source with them in the form of batteries. So this has drawn us very
batteries. So this has drawn us very naturally to smart sensors, devices, AI
naturally to smart sensors, devices, AI IoT, wearables and smart mobility kind
IoT, wearables and smart mobility kind of markets. Um we're doing all of this
of markets. Um we're doing all of this with a very proprietary technology which
with a very proprietary technology which I'll get into. Uh but we are very
I'll get into. Uh but we are very product focused team. So we are
product focused team. So we are developing our first product, the BM10,
developing our first product, the BM10, the chip that you see here on the
the chip that you see here on the picture. Um that we are currently uh
picture. Um that we are currently uh working to get samples for to our tier
working to get samples for to our tier one customers. and we've got a
one customers. and we've got a multi-stage roadmap to ultimately target
multi-stage roadmap to ultimately target uh Gen AI applications. Uh I think last
uh Gen AI applications. Uh I think last year will go down in history as the year
year will go down in history as the year where Gen AI finally made an appearance
where Gen AI finally made an appearance on the edge on devices. Uh Apple
on the edge on devices. Uh Apple announced uh you know the iPhone 16 Pro,
announced uh you know the iPhone 16 Pro, Meta and Rayban had the smart glasses.
Meta and Rayban had the smart glasses. uh users love their very responsive um
uh users love their very responsive um interaction with these devices along
interaction with these devices along with of course the features that they
with of course the features that they provided. The one challenge however the
provided. The one challenge however the one pain point was energy consumption on
one pain point was energy consumption on the iPhone. Uh if you use these Gen AI
the iPhone. Uh if you use these Gen AI capabilities all the time, you're going
capabilities all the time, you're going to run through your battery within a
to run through your battery within a couple of hours. On the glasses, if you
couple of hours. On the glasses, if you turn on always on audio to interact with
turn on always on audio to interact with the glasses, well now you've reduced
the glasses, well now you've reduced your battery life by about 30%. uh these
your battery life by about 30%. uh these this is the unacceptable tradeoff that
this is the unacceptable tradeoff that users are having to make on these
users are having to make on these devices and we're setting about to to
devices and we're setting about to to change that. Um so in addition to the
change that. Um so in addition to the power consumption uh problem as as a lot
power consumption uh problem as as a lot of you have have observed inferencing uh
of you have have observed inferencing uh is going to dominate AI workloads right
is going to dominate AI workloads right a lot of focus so far has been around
a lot of focus so far has been around training uh that is now moving very
training uh that is now moving very rapidly to inferencing as these models
rapidly to inferencing as these models get deployed and put out in the world uh
get deployed and put out in the world uh nextgen use cases like uh robotics and
nextgen use cases like uh robotics and human augmentation are being limited by
human augmentation are being limited by the fundamentals of digital processors
the fundamentals of digital processors and I'll I'll talk a little bit more
and I'll I'll talk a little bit more about this but uh the semiconductor
about this but uh the semiconductor industry has you could say it's a mature
industry has you could say it's a mature industry and actually has been optimized
industry and actually has been optimized heavily to deliver phenomenal
heavily to deliver phenomenal performance at at a very low
performance at at a very low cost with machine
cost with machine learning paradigm has been upended and
learning paradigm has been upended and and we've really reached the the limits
and we've really reached the the limits uh to what our existing architectures
uh to what our existing architectures can deliver. Uh there are also
can deliver. Uh there are also significant privacy and latency
significant privacy and latency concerns. Of course, with cloud-based AI
concerns. Of course, with cloud-based AI processing, you don't want to be sending
processing, you don't want to be sending uh all your real-time data into a cloud
uh all your real-time data into a cloud uh into jurisdictions where you may not
uh into jurisdictions where you may not have any control over your data. So, all
have any control over your data. So, all of these problems exist. Uh they're
of these problems exist. Uh they're they're kind of added challenges at the
they're kind of added challenges at the edge. Um our industry has been doing a
edge. Um our industry has been doing a lot to address the these challenges in
lot to address the these challenges in the semiconductor world. Um, you know,
the semiconductor world. Um, you know, there's a there's a there's a chart
there's a there's a there's a chart that's trying to depict the kind of
that's trying to depict the kind of activities that have been happening and
activities that have been happening and consuming many billions of dollars and
consuming many billions of dollars and and lots of smart engineers working
and lots of smart engineers working around this challenge. taking a
around this challenge. taking a 85year-old vonoman computer architecture
85year-old vonoman computer architecture which was really developed for human
which was really developed for human written sequential code and modifying
written sequential code and modifying optimizing it to uh to meet the needs of
optimizing it to uh to meet the needs of this very compute inensive and massively
this very compute inensive and massively parallel nature of neural network
parallel nature of neural network processing and we've gone from you know
processing and we've gone from you know your traditional CPUs to GPUs uh we've
your traditional CPUs to GPUs uh we've gone to near memory compute moving the
gone to near memory compute moving the memory on chip uh and eliminating the
memory on chip uh and eliminating the need for going offchip to access
need for going offchip to access activations and weights. And uh finally
activations and weights. And uh finally even in memory this is this is uh being
even in memory this is this is uh being heavily worked on in the industry now
heavily worked on in the industry now where you integrate some compute
where you integrate some compute elements right right in the memory array
elements right right in the memory array itself all with the goal of reducing the
itself all with the goal of reducing the energy lost in data movement. This is a
energy lost in data movement. This is a huge grain uh when you're doing these
huge grain uh when you're doing these very intense matmal operations that are
very intense matmal operations that are inherent with all the neural networks
inherent with all the neural networks that we have today.
that we have today. energy group is, you know, we talk a lot
energy group is, you know, we talk a lot about the tops, but but an important
about the tops, but but an important metric is tops per watt, right? What is
metric is tops per watt, right? What is the energy efficiency uh per watt and
the energy efficiency uh per watt and and the the the bleeding edge in the
and the the the bleeding edge in the industry right now is in the tens of
industry right now is in the tens of tops per watt, maybe around 50 tops per
tops per watt, maybe around 50 tops per watt is the bleeding edge and CPUs are s
watt is the bleeding edge and CPUs are s are kind of sub five tops per watt,
are kind of sub five tops per watt, right? So that's the kind of the spread
right? So that's the kind of the spread in the industry today and this is not
in the industry today and this is not getting a lot better. Okay, the the
getting a lot better. Okay, the the going into really advanced nodes like
going into really advanced nodes like 5nanmter and 2nmter is not changing this
5nanmter and 2nmter is not changing this equation anymore. So we've really hit
equation anymore. So we've really hit this performance wall where the vonoyman
this performance wall where the vonoyman architecture and uh Moors law are really
architecture and uh Moors law are really not helping anymore. So a new way of
not helping anymore. So a new way of thinking is needed and this is exactly
thinking is needed and this is exactly what we're doing with our analog compute
what we're doing with our analog compute architecture at a very high level.
architecture at a very high level. Right? what we're doing is taking uh
Right? what we're doing is taking uh your traditional MAC arrays, right? And
your traditional MAC arrays, right? And and the folks u in earlier talks talked
and the folks u in earlier talks talked about uh how uh they're implementing a
about uh how uh they're implementing a huge number of these increasing number
huge number of these increasing number of these to offer increased capacity in
of these to offer increased capacity in their chips. Um but we're we're we're
their chips. Um but we're we're we're kind of going going down to first
kind of going going down to first principles and reimagining okay what
principles and reimagining okay what exactly is done when you're doing a
exactly is done when you're doing a multiply accumulate operation and how
multiply accumulate operation and how could we do that more efficiently. The
could we do that more efficiently. The digital world uses a paradigm on the on
digital world uses a paradigm on the on the left here uh or some variant of this
the left here uh or some variant of this where you got memories storing your
where you got memories storing your weights storing your activations. You
weights storing your activations. You suck them out of the memory. Um you load
suck them out of the memory. Um you load them into a Mac. You do the you do the
them into a Mac. You do the you do the arithmetic. you write the results back
arithmetic. you write the results back into your memories. All of this is
into your memories. All of this is controlled by a high-speed clock and an
controlled by a high-speed clock and an instruction set that gets compiled from
instruction set that gets compiled from some highle AI framework. Right? So
some highle AI framework. Right? So that's typically how these instruction
that's typically how these instruction set based processors work. Uh what we've
set based processors work. Uh what we've done is we've gone back to first
done is we've gone back to first principles and said okay how can we do
principles and said okay how can we do this more efficiently? How can we
this more efficiently? How can we leverage the inherent device physics of
leverage the inherent device physics of the transistors themselves in order to
the transistors themselves in order to do some mathematical operations?
do some mathematical operations? And this led us to our architectural
And this led us to our architectural breakthrough um we've we've named Ample.
breakthrough um we've we've named Ample. Um it it draws its inspiration from the
Um it it draws its inspiration from the human brain which is an incredibly
human brain which is an incredibly efficient uh biochemical signal
efficient uh biochemical signal processor. Not really a computer of any
processor. Not really a computer of any sort. Um and we've we've taken that
sort. Um and we've we've taken that inspiration and created three main
inspiration and created three main elements of our architecture. The
elements of our architecture. The synapsis neuron and our ephemeral memory
synapsis neuron and our ephemeral memory um to implement standard neural
um to implement standard neural networks. Our
networks. Our synapses are multipliers essentially
synapses are multipliers essentially that uh exploit the device physics of
that uh exploit the device physics of the transistor itself to perform the
the transistor itself to perform the multiplication. So when an activation
multiplication. So when an activation hits the gate of the transistor by
hits the gate of the transistor by modifying the threshold voltage of it,
modifying the threshold voltage of it, the drain current represents the
the drain current represents the multiply result. Okay. And then to sum
multiply result. Okay. And then to sum multiple synapses together, we can just
multiple synapses together, we can just string them together. Um just connect
string them together. Um just connect the wires together. So summation in when
the wires together. So summation in when you're using charge uh is very simple.
you're using charge uh is very simple. You just connect wires together and let
You just connect wires together and let physics do the work for you. Uh and then
physics do the work for you. Uh and then we can store that resultant value in
we can store that resultant value in neurons uh which are capacitor-based
neurons uh which are capacitor-based element which then can fire that
element which then can fire that activation downstream to the following
activation downstream to the following layer uh for further processing. We also
layer uh for further processing. We also have our ephemeral memory. The ephemeral
have our ephemeral memory. The ephemeral memory is highly silicon area and power
memory is highly silicon area and power optimized storage element that can take
optimized storage element that can take um the the the way we encode activations
um the the the way we encode activations and data and our architecture is
and data and our architecture is different than the digital world. it can
different than the digital world. it can take those encoded values and store them
take those encoded values and store them temporarily not for days or hours but
temporarily not for days or hours but more milliseconds and microsconds while
more milliseconds and microsconds while they're u needed to be stored for the
they're u needed to be stored for the next layer to come in and process the
next layer to come in and process the information. So these three elements uh
information. So these three elements uh basically form the foundation of our
basically form the foundation of our architecture. uh some additional
architecture. uh some additional benefits right we we we do everything
benefits right we we we do everything end to end in the neural network without
end to end in the neural network without any kind of data conversion a tods or
any kind of data conversion a tods or dtos these are power hogs and we've
dtos these are power hogs and we've eliminated them completely from our
eliminated them completely from our architecture uh we don't use any fast
architecture uh we don't use any fast clocks uh in fact we don't use any
clocks uh in fact we don't use any clocks at all this is 100% eventdriven
clocks at all this is 100% eventdriven uh architecture so if you have sparse
uh architecture so if you have sparse sparity in the neural network weights or
sparity in the neural network weights or activations that is automatically taken
activations that is automatically taken care If there's zeros, um, we just don't
care If there's zeros, um, we just don't consume any charge and and don't do
consume any charge and and don't do anything. In the digital world, well,
anything. In the digital world, well, you you may be reading and writing zeros
you you may be reading and writing zeros into your memory and doing zero
into your memory and doing zero multiplies and so on. So, sparsity
multiplies and so on. So, sparsity handling in the digital paradigm is is
handling in the digital paradigm is is much more tricky to do. Uh, here it's
much more tricky to do. Uh, here it's just taken care of by the architecture.
just taken care of by the architecture. Um, we also are impervious to PBT
Um, we also are impervious to PBT variations. This is an analog
variations. This is an analog architecture. So it's it's not trivial
architecture. So it's it's not trivial to deal with these effects, these
to deal with these effects, these operating condition effects. And we have
operating condition effects. And we have developed some proprietary operation and
developed some proprietary operation and controls routines and circuits that run
controls routines and circuits that run and keep everything in check uh
and keep everything in check uh delivering no loss in accuracy versus a
delivering no loss in accuracy versus a standard digital architecture. Uh
standard digital architecture. Uh additionally uh the cherry on top is the
additionally uh the cherry on top is the is that we can implement these uh the
is that we can implement these uh the these designs in very mature standard
these designs in very mature standard CBOS processes. So we don't need any
CBOS processes. So we don't need any special masks to develop this. They're
special masks to develop this. They're they're standard digital flow and uh you
they're standard digital flow and uh you know processes that that were bleeding
know processes that that were bleeding edge uh 12 years ago, right? So this
edge uh 12 years ago, right? So this keeps cost very low making them really
keeps cost very low making them really ideal for these high volume edge
ideal for these high volume edge deployments. Um the other kind of side
deployments. Um the other kind of side effect of of doing things in an analog
effect of of doing things in an analog way is that you can interact with our
way is that you can interact with our analog world in a different way today.
analog world in a different way today. Uh if you have an environment sensor uh
Uh if you have an environment sensor uh measuring a parameter like light in an
measuring a parameter like light in an image sensor or a pressure sensor or a
image sensor or a pressure sensor or a microphone, these are all analog
microphone, these are all analog parameters that that an analog
parameters that that an analog environment sensor typically measures
environment sensor typically measures and then we digitize the information.
and then we digitize the information. Right? That's typically what's done
Right? That's typically what's done today. We digitize it. we send a digital
today. We digitize it. we send a digital signal into your digital processor that
signal into your digital processor that then runs essentially an emulation of a
then runs essentially an emulation of a neural network in its uh instruction
neural network in its uh instruction setbased architecture. Um what we do
setbased architecture. Um what we do because we are an analog processor, we
because we are an analog processor, we can eliminate that digitization step
can eliminate that digitization step that consumes power that adds to the
that consumes power that adds to the cost of the solution as well. And we can
cost of the solution as well. And we can ingest analog environment in information
ingest analog environment in information directly into our neural network.
directly into our neural network. Thereby making this thing really ideal
Thereby making this thing really ideal for integration with sensors and and
for integration with sensors and and making dumb sensors smart.
making dumb sensors smart. The other uh attribute is we can scale
The other uh attribute is we can scale up these these these devices to do
up these these these devices to do pretty large networks. We do have on our
pretty large networks. We do have on our road map a pathway to get to u genai
road map a pathway to get to u genai transformer uh based networks in in call
transformer uh based networks in in call it the the llama 8 billion kind of
it the the llama 8 billion kind of range. Uh that's that's kind of where we
range. Uh that's that's kind of where we where we hope to get to in the next
where we hope to get to in the next couple of years. Um we've also paid a
couple of years. Um we've also paid a lot of attention on the software stack.
lot of attention on the software stack. We want to make our silicon as easy to
We want to make our silicon as easy to integrate with existing AI frameworks
integrate with existing AI frameworks without the need for any we work there
without the need for any we work there from the data science side. So um so we
from the data science side. So um so we have an ESSB product line like our first
have an ESSB product line like our first ship the VM 110 where we we can provide
ship the VM 110 where we we can provide um we can provide training models uh we
um we can provide training models uh we can use a customer data set or our own
can use a customer data set or our own data set do the training using a
data set do the training using a standard uh framework in in the cloud.
standard uh framework in in the cloud. Once training is complete we have a
Once training is complete we have a mapper that then loads onto our chip and
mapper that then loads onto our chip and then we run uh inferencing from that
then we run uh inferencing from that point onwards. We also do custom
point onwards. We also do custom solutions. Uh we can build custom
solutions. Uh we can build custom products for high volume strategic
products for high volume strategic applications. So our technology lends
applications. So our technology lends itself well there as well. Um here are
itself well there as well. Um here are three examples of the kind of u impact
three examples of the kind of u impact we can have uh for natural language
we can have uh for natural language processing pretty tiny networks in the
processing pretty tiny networks in the 100 call it 150 kilobyte range. Uh one
100 call it 150 kilobyte range. Uh one microwatt is what we can deliver.
microwatt is what we can deliver. Industryleading solutions are north of
Industryleading solutions are north of 100 microwatts. If you're doing uh you
100 microwatts. If you're doing uh you know object detection a basic network uh
know object detection a basic network uh like a BG11 uh 500 plus microwatts we
like a BG11 uh 500 plus microwatts we come in at sub 10 microwatts at 5 fps
come in at sub 10 microwatts at 5 fps gesture this is a beefier network about
gesture this is a beefier network about north of four 4 million 4 megabyte
north of four 4 million 4 megabyte parameters you know 100 microwatts is
parameters you know 100 microwatts is what we can do when standard digital
what we can do when standard digital solutions are tens to hundreds of
solutions are tens to hundreds of millows right so as the network gets
millows right so as the network gets bigger so does our our advantage
bigger so does our our advantage here. Um, we have a we have a a strong
here. Um, we have a we have a a strong track record of delivering silicon.
track record of delivering silicon. We've done three silicon tape outs so
We've done three silicon tape outs so far. Um, and we've got uh, you know,
far. Um, and we've got uh, you know, multiple customers that we're engaged
multiple customers that we're engaged with trying to get this solution out
with trying to get this solution out into the market. Um, I'm going to skip a
into the market. Um, I'm going to skip a couple of charts in the interest of
couple of charts in the interest of time. Uh, as I mentioned earlier, we're
time. Uh, as I mentioned earlier, we're we're executing a three-stage road map
we're executing a three-stage road map here, getting from audio all the way to
here, getting from audio all the way to uh to Gen AI use cases in the future.
uh to Gen AI use cases in the future. Um, here are three examples of the kind
Um, here are three examples of the kind of use cases we're driving right now.
of use cases we're driving right now. You know, object detection in smart
You know, object detection in smart cameras that are battery operated is
cameras that are battery operated is very interesting for surveillance and
very interesting for surveillance and retail kind of environments. uh smart
retail kind of environments. uh smart tires that can detect road conditions um
tires that can detect road conditions um and classify those for an autonomous
and classify those for an autonomous driver or or a human driver. And then uh
driver or or a human driver. And then uh smart wearables is another space where
smart wearables is another space where you want to insert uh audio interface in
you want to insert uh audio interface in in devices that last many weeks on a
in devices that last many weeks on a single charge. U that's an example of a
single charge. U that's an example of a use case we're targeting right now. So
use case we're targeting right now. So with that, I'll I'll uh I'll pause. I
with that, I'll I'll uh I'll pause. I think I've used up all my time. Maybe I
think I've used up all my time. Maybe I don't know if I have time for a question
don't know if I have time for a question or two.
or two. Just on time like uh analog
Just on time like uh analog chips. Um I don't have any questions in
chips. Um I don't have any questions in the chat. I'm I'm really fascinated
the chat. I'm I'm really fascinated fascinated by the uh the specification
fascinated by the uh the specification you provided. Maybe uh just one high
you provided. Maybe uh just one high level question. Uh I'm new to analog
level question. Uh I'm new to analog computing especially for AI. So given
computing especially for AI. So given how quickly AI models evolve and uh how
how quickly AI models evolve and uh how does your analog architectures adapt to
does your analog architectures adapt to support future model requirements and
support future model requirements and how long do you expect a processor that
how long do you expect a processor that is designed today to remain relevant in
is designed today to remain relevant in performance?
performance? Yeah, it's it's a great question. Um
Yeah, it's it's a great question. Um that's one of the tradeoffs you have to
that's one of the tradeoffs you have to very cleverly make. Okay, there's we're
very cleverly make. Okay, there's we're not building general purpose processors
not building general purpose processors here. We're building very application
here. We're building very application specific processors that will support um
specific processors that will support um call it a zoo of networks. Um but it
call it a zoo of networks. Um but it will not I it's not going to be as
will not I it's not going to be as flexible as your risk 5 or ARM coursees.
flexible as your risk 5 or ARM coursees. Okay. The idea is we want to at least
Okay. The idea is we want to at least initially target with our gen one
initially target with our gen one products applications that have
products applications that have welldefined network requirements and
welldefined network requirements and they're driving energy efficiency.
they're driving energy efficiency. Right? That's what that's their that's
Right? That's what that's their that's the single biggest pain point. In our
the single biggest pain point. In our future generations, we are envisioning a
future generations, we are envisioning a much more flexible uh and programmable
much more flexible uh and programmable architecture kind of like an FPGA for AI
architecture kind of like an FPGA for AI if you will uh with our analog compute
if you will uh with our analog compute elements that can be reconfigured much
elements that can be reconfigured much more than than they can in the gen one
more than than they can in the gen one version.
version. It's it's a trade-off. Yeah, go ahead
It's it's a trade-off. Yeah, go ahead next. Okay. Thanks. Uh yeah, thanks for
next. Okay. Thanks. Uh yeah, thanks for the the presentation. It's nice to see
the the presentation. It's nice to see people trying to do uh analog computing
people trying to do uh analog computing for for AI. I I think as you mentioned
for for AI. I I think as you mentioned the main challenge with the this
the main challenge with the this approach is going to be robustness,
approach is going to be robustness, right? Because you're directly exposing
right? Because you're directly exposing your computation to uh to the noise and
your computation to uh to the noise and and fabrication variation of your of
and fabrication variation of your of your circuit. So I I was wondering um if
your circuit. So I I was wondering um if you could say a bit more about that and
you could say a bit more about that and in particular like is is do you think
in particular like is is do you think the better approach to this is to modify
the better approach to this is to modify the DNN model to to make it more robust
the DNN model to to make it more robust or do you think that the robustness
or do you think that the robustness should come from the circuit itself?
should come from the circuit itself? Um actually Franis the robustness is not
Um actually Franis the robustness is not an issue that's the problem we have
an issue that's the problem we have solved. We have we've spent the last
solved. We have we've spent the last three years in R&D uh make guaranteeing
three years in R&D uh make guaranteeing that our computations are accurate uh
that our computations are accurate uh over time over temperature uh right this
over time over temperature uh right this is exactly what our IP is so robustness
is exactly what our IP is so robustness is not really a challenge um or it is a
is not really a challenge um or it is a challenge but it's one that we have
challenge but it's one that we have addressed our we we're striving for
addressed our we we're striving for accuracy um equivalent to a digital MCU
accuracy um equivalent to a digital MCU for the same workload
but But how like so you so you you do you modify the DNN model or not? No, we
you modify the DNN model or not? No, we do not we do not we do not need to
do not we do not we do not need to modify the model. So we we deliver uh we
modify the model. So we we deliver uh we we use standard CNN's RNN's and and soon
we use standard CNN's RNN's and and soon transformers. Um we don't need to modify
transformers. Um we don't need to modify the model. We execute the same model. Uh
the model. We execute the same model. Uh and we've taken care of this accuracy
and we've taken care of this accuracy challenge in our
challenge in our circuits. Okay. Thank you. One one last
circuits. Okay. Thank you. One one last question. Why train? uh unmute yourself
question. Why train? uh unmute yourself and ask your question if you wish. Oh
and ask your question if you wish. Oh yeah, thank you Yasin and thank you Nash
yeah, thank you Yasin and thank you Nash for a really interesting presentation. I
for a really interesting presentation. I um I have a question about the do my
um I have a question about the do my translator. I think I may miss at some
translator. I think I may miss at some point, but is it like a do you do you
point, but is it like a do you do you guys develop um it as as a software
guys develop um it as as a software development kit that maps the model into
development kit that maps the model into the um the amp ambo architectures or can
the um the amp ambo architectures or can you specify more with that? Thank you.
you specify more with that? Thank you. Yeah, sure. Uh there there are two steps
Yeah, sure. Uh there there are two steps to it. Uh we do have a software that um
to it. Uh we do have a software that um that helps configure the hardware to the
that helps configure the hardware to the network the target network. Right? So
network the target network. Right? So there's a configuration tool there and
there's a configuration tool there and then the other step is taking the
then the other step is taking the trained uh neural network weights coming
trained uh neural network weights coming from your training process and and
from your training process and and converting them to a format that can be
converting them to a format that can be loaded onto the chip. So those are the
loaded onto the chip. So those are the two steps and we provide software um you
two steps and we provide software um you know drivers and utilities that that
know drivers and utilities that that accommodate both those things.
accommodate both those things. Okay. So two steps. Okay. Yeah. Yes. I
Okay. So two steps. Okay. Yeah. Yes. I got it. Thank you. Perfect. Thank you
got it. Thank you. Perfect. Thank you very much Niraj. Now to close out our
very much Niraj. Now to close out our incredible lineup of presentations, it's
incredible lineup of presentations, it's time to to dive into a forward-looking
time to to dive into a forward-looking conversation on the future of generative
conversation on the future of generative AI at the edge. To lead this panel, I am
AI at the edge. To lead this panel, I am very pleased to welcome Walter, a
very pleased to welcome Walter, a seasoned expert in IoT innovation
seasoned expert in IoT innovation ecosystems and technology strategy.
ecosystems and technology strategy. Walter is the executive director of AIoT
Walter is the executive director of AIoT Canada and bring deep industry insight
Canada and bring deep industry insight and experience to today's discussion.
and experience to today's discussion. Please join me in welcoming Walter to
Please join me in welcoming Walter to the stage as he guide us through an
the stage as he guide us through an engaging panel on the challenges,
engaging panel on the challenges, opportunities and innovation shaping
opportunities and innovation shaping edge generative AI. I would like to ask
edge generative AI. I would like to ask the speakers uh voluntarily to to unmute
the speakers uh voluntarily to to unmute their themsel and uh show their camera
their themsel and uh show their camera so they can be part of this discussion
so they can be part of this discussion and this is optional so no pressure.
and this is optional so no pressure. Walter, do you have any slides or no? I
Walter, do you have any slides or no? I don't have slides which is great. Let's
don't have slides which is great. Let's dive in. Okay. So, great. Um, if we
dive in. Okay. So, great. Um, if we could uh remove the share and just see
could uh remove the share and just see everybody's faces, that would be great.
everybody's faces, that would be great. Um, so first of all, congratulations
Um, so first of all, congratulations everyone to for your insightful
everyone to for your insightful presentations. Uh, there's a lot of
presentations. Uh, there's a lot of innovation presented here. Uh, it almost
innovation presented here. Uh, it almost makes me want to do silicon design
makes me want to do silicon design again. Uh, and you see um great job for
again. Uh, and you see um great job for keeping the whole team whole thing on
keeping the whole team whole thing on track. Um, so my name is Walter as Yen
track. Um, so my name is Walter as Yen said. Um I'm the president of AIoT
said. Um I'm the president of AIoT Canada. It's an industry association
Canada. It's an industry association that brings IoT and AI actors together.
that brings IoT and AI actors together. So we provide a platform for our members
So we provide a platform for our members to connect, interact, learn from each
to connect, interact, learn from each other. And we also provide kind of voice
other. And we also provide kind of voice of the industry for our members toward
of the industry for our members toward government strategies and policies. And
government strategies and policies. And um our members include AIoT providers,
um our members include AIoT providers, adopters, uh investors, academia and so
adopters, uh investors, academia and so on. So I'm very eager to uh moderate
on. So I'm very eager to uh moderate this discussion. Um
this discussion. Um since no one needs introduction here,
since no one needs introduction here, we'll just jump right into it. Um so
we'll just jump right into it. Um so far most of the talks have been pretty
far most of the talks have been pretty down in the weeds of technology, right?
down in the weeds of technology, right? So this past panel discussion is
So this past panel discussion is different that we need to take our eyes
different that we need to take our eyes up and cast it forward and our two tools
up and cast it forward and our two tools are going to be a crystal ball and a
are going to be a crystal ball and a radar. Right? So um we'll cover
radar. Right? So um we'll cover opportunities, challenges and some of
opportunities, challenges and some of the innovations underfoot to address
the innovations underfoot to address those uh challenges and
those uh challenges and opportunities. So it's not a Q&A on your
opportunities. So it's not a Q&A on your presentations for this one. You have to
presentations for this one. You have to present your Gen AI, right? So you have
present your Gen AI, right? So you have all this vast knowledge and
all this vast knowledge and experience about gen AI and AI in
experience about gen AI and AI in general and now you're going to use this
general and now you're going to use this knowledge, have a model in your head to
knowledge, have a model in your head to kind of forecast and generate
kind of forecast and generate uh predictions about the future.
uh predictions about the future. Um, now I may be asking specific uh
Um, now I may be asking specific uh people questions, but in general it'll
people questions, but in general it'll be a question and it's open to anybody
be a question and it's open to anybody that wants to uh address it. Just uh
that wants to uh address it. Just uh start talking and uh there's a race
start talking and uh there's a race condition. I'll deal with it. And uh
condition. I'll deal with it. And uh also don't hesitate to uh ask questions
also don't hesitate to uh ask questions yourself to other panelists or make
yourself to other panelists or make comments on other comments. Okay. So
comments on other comments. Okay. So let's start with opportunities. I will
let's start with opportunities. I will allocate roughly same amount of time to
allocate roughly same amount of time to opportunities and then challenges and
opportunities and then challenges and then solutions and innovations. So first
then solutions and innovations. So first question would be what are some of the
question would be what are some of the key opportunities or use cases that you
key opportunities or use cases that you see dominating edge gen AI in the next
see dominating edge gen AI in the next say two to five years. Uh I noticed that
say two to five years. Uh I noticed that some of you gave a shout out to
some of you gave a shout out to automotive. So clearly right now that
automotive. So clearly right now that seems to be it. Is it going to continue?
seems to be it. Is it going to continue? Are there other sectors that will start
Are there other sectors that will start to dominate in that
to dominate in that space? Anybody want to tackle
that? I can certainly say that um I I mentioned we've engaged with about 25
mentioned we've engaged with about 25 plus customers and we didn't go looking
plus customers and we didn't go looking in any market. We just have a general
in any market. We just have a general product and we let the customers sign
product and we let the customers sign up. uh they voted with their feet and to
up. uh they voted with their feet and to our surprise um because in our previous
our surprise um because in our previous product I'd say 30% of our customers
product I'd say 30% of our customers were automotive and 70 were
were automotive and 70 were non-automotive that's consumer video
non-automotive that's consumer video surveillance
surveillance manufacturing AR VR printers etc. Um but
manufacturing AR VR printers etc. Um but the automotive was 30%. Suddenly that
the automotive was 30%. Suddenly that nearly doubled we're kind of over 50
nearly doubled we're kind of over 50 closer to 60% for this uh latest
closer to 60% for this uh latest generation. Um so that was the first
generation. Um so that was the first surprise is that automotive really is
surprise is that automotive really is and that's worldwide. It's Asia, Japan,
and that's worldwide. It's Asia, Japan, Europe, North America, less North
Europe, North America, less North America surprisingly um but uh the rest
America surprisingly um but uh the rest of the world for sure. And the other
of the world for sure. And the other surprise was well we expected to object
surprise was well we expected to object detection and you pedestrian recognition
detection and you pedestrian recognition and and stopping the car but then we get
and and stopping the car but then we get a ton of requests for you know Lama 2
a ton of requests for you know Lama 2 and for in cabin infotainment and all
and for in cabin infotainment and all kinds of other things. So that was for
kinds of other things. So that was for me a uh a surprise both that automotive
me a uh a surprise both that automotive was so dominant in our customer base and
was so dominant in our customer base and second that um the class of applications
second that um the class of applications was what was not more than I expected.
was what was not more than I expected. We got the radar, the LAR, the sensor
We got the radar, the LAR, the sensor fusion with video vision um but we also
fusion with video vision um but we also got this whole new class of even you
got this whole new class of even you know stable diffusion you know for
know stable diffusion you know for infotainment. So that was so that for me
infotainment. So that was so that for me points at least from the the data you
points at least from the the data you know going from 30 to 60% at least from
know going from 30 to 60% at least from our exposure that automotive clearly
our exposure that automotive clearly seems a good opportunity and much wider
seems a good opportunity and much wider and multi-dimensional than I was
and multi-dimensional than I was expecting. So is that going to do you
expecting. So is that going to do you think persist over the foreseeable
think persist over the foreseeable future you know whatever horizon you
future you know whatever horizon you want to put on it whether it's two to 10
want to put on it whether it's two to 10 years or one to three or whatever the
years or one to three or whatever the case may be.
case may be. Well, because automotive in the end is
Well, because automotive in the end is some it's a place we spend I don't know
some it's a place we spend I don't know 20% of our free time and if we live in
20% of our free time and if we live in Toronto that's that's 60%. Uh, in any
Toronto that's that's 60%. Uh, in any case, it's a lot of time. It's a lot of
case, it's a lot of time. It's a lot of what we do and we're kind of prisoner
what we do and we're kind of prisoner and so there's lots of things you can do
and so there's lots of things you can do with electronics in your car and so
with electronics in your car and so drive it. the drive itself is the
drive it. the drive itself is the obvious one but um you know doing your
obvious one but um you know doing your work and and entertainment and so yeah I
work and and entertainment and so yeah I think uh we've only just touched on uh
think uh we've only just touched on uh 5% like I take the latest you know Tesla
5% like I take the latest you know Tesla is a pretty good example of pretty
is a pretty good example of pretty high-end autonomous drive and it's still
high-end autonomous drive and it's still only 30% there it's still a little risky
only 30% there it's still a little risky but it's already pretty impressive but
but it's already pretty impressive but we're not there yet in terms of all the
we're not there yet in terms of all the other potential I've not seen anything
other potential I've not seen anything yet it's really 5 years in the future
yet it's really 5 years in the future that the things that our customers for
that the things that our customers for the licensing now will be in cars in in
the licensing now will be in cars in in five six years. Okay. Um so that that
five six years. Okay. Um so that that good points. So Davis uh I also noted
good points. So Davis uh I also noted that you you talked about uh uh uh
that you you talked about uh uh uh automotive applications. Uh do you have
automotive applications. Uh do you have any comments on that? Uh is that still
any comments on that? Uh is that still going to be a strong thing going or is
going to be a strong thing going or is health going to take over or any other
health going to take over or any other sector? I think they just left the Oh,
sector? I think they just left the Oh, did he? Okay. Okay. Would anybody else
did he? Okay. Okay. Would anybody else want to add to this uh question about
want to add to this uh question about which sector might might become
which sector might might become prominent uh and complement uh uh the
prominent uh and complement uh uh the automotive in terms of use of a
AJI here? Uh, James Miller, are you noticing any trends with say the
noticing any trends with say the automotive
automotive industry not uh producing as many cars
industry not uh producing as many cars or or not increasing their production of
or or not increasing their production of cars so much and people moving towards
cars so much and people moving towards say uh transportation systems like light
say uh transportation systems like light rail and other types of things like
rail and other types of things like that? And perhaps some early are you
that? And perhaps some early are you seeing any early
seeing any early indicators of some of the technology or
indicators of some of the technology or infotainment and solutions and whatnot
infotainment and solutions and whatnot moving from the car to more mass transit
moving from the car to more mass transit uh scalable solutions for cities.
uh scalable solutions for cities. Yeah. So our customers uh don't see that
Yeah. So our customers uh don't see that that kind of it's a competition, right?
that kind of it's a competition, right? that's they're working to keep cars
that's they're working to keep cars alive and interesting and a good market.
alive and interesting and a good market. Um, so they're for sure not telling us
Um, so they're for sure not telling us about, you know, what the competition is
about, you know, what the competition is and and we're not engaging with, you
and and we're not engaging with, you know, alternate forms of of transit
know, alternate forms of of transit other than, you know, robo taxi. I guess
other than, you know, robo taxi. I guess you could consider that as going in that
you could consider that as going in that direction. Um, what I can say from the
direction. Um, what I can say from the automotive customers, it's a industry in
automotive customers, it's a industry in crisis.
crisis. um there's the established players and
um there's the established players and then there's suddenly a whole bunch of
then there's suddenly a whole bunch of new players in particular in China with
new players in particular in China with EV uh expertise and AI uh you know
EV uh expertise and AI uh you know strong knowhow and so there's a real
strong knowhow and so there's a real competition in terms of there's a a new
competition in terms of there's a a new car of the future and the established
car of the future and the established players are are struggling with that
players are are struggling with that they're trying to reinvent themselves
they're trying to reinvent themselves and the uh Asian market and yeah I'd say
and the uh Asian market and yeah I'd say mostly in China and Asia you
mostly in China and Asia you aggressively moving in, changing the
aggressively moving in, changing the rules of the game. Tesla was the first.
rules of the game. Tesla was the first. Um, but now we're seeing, you know, a
Um, but now we're seeing, you know, a whole bunch of other companies that are
whole bunch of other companies that are going beyond Tesla. So, that's the first
going beyond Tesla. So, that's the first thing I see is the unknown. Um, because
thing I see is the unknown. Um, because it's like two markets fighting
it's like two markets fighting themselves. The established one and this
themselves. The established one and this brand new one that could just sweep
brand new one that could just sweep everything away and people are reacting
everything away and people are reacting different ways. Some trying to reinvent
different ways. Some trying to reinvent themselves, some trying to just do
themselves, some trying to just do better what they're good at.
better what they're good at. um hard to have a crystal ball there
um hard to have a crystal ball there other than massive changes of foot.
other than massive changes of foot. There's a crisis of foot um and the kind
There's a crisis of foot um and the kind of mass transit and alternate forms of
of mass transit and alternate forms of transportation through our customers. We
transportation through our customers. We don't unfortunately I don't see that uh
don't unfortunately I don't see that uh that trend. Okay. I
that trend. Okay. I Okay. Yes. Just curious how many uh
Okay. Yes. Just curious how many uh speakers are are left and participating
speakers are are left and participating in this
in this panel. Uh so we have uh Pierre. Go
panel. Uh so we have uh Pierre. Go ahead. Yes. How many? We have uh Warren.
ahead. Yes. How many? We have uh Warren. Warren is still
Warren is still here
here Pierre
Pierre Chen and as I said this is optional. We
Chen and as I said this is optional. We can also open the floor for the
can also open the floor for the attendees to engage with the speakers if
attendees to engage with the speakers if they want to ask any questions.
they want to ask any questions. Okay. All right. So um automotive um
Okay. All right. So um automotive um so what particular
so what particular uh um edge gen AI capabilities is
uh um edge gen AI capabilities is required for to address these u um
required for to address these u um opportunities as opposed to for example
opportunities as opposed to for example um cloud
um cloud AI it's probably the usual ones that
AI it's probably the usual ones that already already uh happening but uh
already already uh happening but uh anything in particular that f the future
anything in particular that f the future opportunity ities will
anyone. All right. So, I'll put my two cents worth in. I I I guess uh it's the
cents worth in. I I I guess uh it's the usual ones, you know, local
usual ones, you know, local uh real time response and uh delays and
uh real time response and uh delays and and things like
and things like that. Okay. Um privacy. Yep. Is a big
that. Okay. Um privacy. Yep. Is a big one. Security. Yeah, I would I would
one. Security. Yeah, I would I would think robotics would be a important
think robotics would be a important application as well. I don't know if
application as well. I don't know if from the industry you're seeing
from the industry you're seeing uh uh movements in that direction. Okay.
uh uh movements in that direction. Okay. Yeah, we definitely are. We have some of
Yeah, we definitely are. We have some of our we have a range of customers and
our we have a range of customers and some of them have what we'll call most
some of them have what we'll call most of them are application specific you
of them are application specific you know a focused area but some of them do
know a focused area but some of them do general platforms in a certain
general platforms in a certain performance range and those that do
performance range and those that do these general platforms a lot of them
these general platforms a lot of them are doing robotics as one of the kind of
are doing robotics as one of the kind of more opportunistic so I'm seeing a lot
more opportunistic so I'm seeing a lot of interest not not the level of cars as
of interest not not the level of cars as I just mentioned 60% but probably 10% of
I just mentioned 60% but probably 10% of our customers are doing interesting
our customers are doing interesting driving disruptive robotic applications
driving disruptive robotic applications um more in Asia than other places but
um more in Asia than other places but not only. So how much of that uh okay
not only. So how much of that uh okay robotics I guess is it's a big spectrum
robotics I guess is it's a big spectrum of capabilities but including smart
of capabilities but including smart manufacturing it's not just robotics to
manufacturing it's not just robotics to play with their kids but you know to
play with their kids but you know to manufacture mass manufacturing smart uh
manufacture mass manufacturing smart uh smart robots. So how much is uh Agentic
smart robots. So how much is uh Agentic AI going to drive all of this
AI going to drive all of this uh in terms of uh silicon and and um in
uh in terms of uh silicon and and um in terms of delivering edge AI? Are we
terms of delivering edge AI? Are we going to see a JTek uh AI at the edge or
going to see a JTek uh AI at the edge or are we just talking about mainly in the
are we just talking about mainly in the cloud?
Personally I see it in the edge uh for a specialized domain. So not you know not
specialized domain. So not you know not the full chat GPT or LMA but a you know
the full chat GPT or LMA but a you know domain specific smart uh agentic uh AI
domain specific smart uh agentic uh AI that knows about I don't know
that knows about I don't know manufacturing you're on the you're on
manufacturing you're on the you're on the floor there's an issue it's very
the floor there's an issue it's very good at what it does but and as good as
good at what it does but and as good as most humans but only in that specific
most humans but only in that specific space in which case you reduce the you
space in which case you reduce the you know the training and uh knowledge space
know the training and uh knowledge space by couple of orders of magnitude I can
by couple of orders of magnitude I can see that happening in the uh in the
see that happening in the uh in the embeded space, right? Okay, sounds good.
embeded space, right? Okay, sounds good. All right. So, let's uh switch over to
All right. So, let's uh switch over to challenges. Um given you know the suite
challenges. Um given you know the suite of
of uh um opportunities that we have out
uh um opportunities that we have out there uh automotive I guess robots and
there uh automotive I guess robots and uh probably other ones that we haven't
uh probably other ones that we haven't touched on that just people don't uh
touched on that just people don't uh want to mention like what are some of
want to mention like what are some of the challenges in in terms of delivering
the challenges in in terms of delivering solutions to those opportunities? you
solutions to those opportunities? you know the challenges could be technical
know the challenges could be technical or they could be
or they could be uh maybe business and investment related
uh maybe business and investment related or or talent or regulatory. What what
or or talent or regulatory. What what what do you think going forward we'll
what do you think going forward we'll see uh as the main challenges in
see uh as the main challenges in delivering some of those opportunities?
I can I can start on so just pull back a second and think about the general
second and think about the general purpose computing that we've been living
purpose computing that we've been living for the last 50 years and some of you
for the last 50 years and some of you less some of you more but we've been you
less some of you more but we've been you know we invented APL and then forran and
know we invented APL and then forran and pro you know then eventually we got to C
pro you know then eventually we got to C and then we went to objectoriented and
and then we went to objectoriented and list and then eventually we all went
list and then eventually we all went back to C right in the end we never
back to C right in the end we never found a general purpose programming that
found a general purpose programming that was not C which is was a shock and a
was not C which is was a shock and a scandal. We invented 380 different
scandal. We invented 380 different programming languages. In the end,
programming languages. In the end, there's one or two that are used today.
there's one or two that are used today. Um, amazingly in AI because you have a
Um, amazingly in AI because you have a more well-defined compute programming
more well-defined compute programming model uh which is you know neural
model uh which is you know neural networks including you know the latest
networks including you know the latest transformers they still conform to the
transformers they still conform to the basics of that pytorch or you know
basics of that pytorch or you know tensorflow light tensorflow programming
tensorflow light tensorflow programming model. We have a essentially one
model. We have a essentially one programming model with couple of
programming model with couple of variants that are very close and not
variants that are very close and not only are they general purpose but they
only are they general purpose but they actually be used to program
actually be used to program supercomputers not to program one risk
supercomputers not to program one risk core. You know, we've spent
core. You know, we've spent 800 million years programming a couple
800 million years programming a couple of risks. They're increasingly complex
of risks. They're increasingly complex with, you know, look ahead and
with, you know, look ahead and translation look at buffers and it's a
translation look at buffers and it's a mess, right? Suddenly there's this very
mess, right? Suddenly there's this very elegant highle data flow like matrix
elegant highle data flow like matrix multiplication with some activations.
multiplication with some activations. It's actually reasonably simple. So
It's actually reasonably simple. So that's an amazing opportunity is that we
that's an amazing opportunity is that we actually have a high level programming
actually have a high level programming model that can program that is rich
model that can program that is rich enough and general enough and enough to
enough and general enough and enough to program supercomputers that are AI
program supercomputers that are AI based. Now the challenge of behind that
based. Now the challenge of behind that is that these compilers are really hard.
is that these compilers are really hard. So that's a great opportunity and if you
So that's a great opportunity and if you invest in it heavily you can you know
invest in it heavily you can you know we've put hundreds of manurs to get a
we've put hundreds of manurs to get a good AI compiler for one architecture
good AI compiler for one architecture ours. Um so I think the the the
ours. Um so I think the the the opportunity and the challenge is how do
opportunity and the challenge is how do we create from this great opportunity a
we create from this great opportunity a single language single programming
single language single programming model. um how do we create a compiler
model. um how do we create a compiler that can actually address GPUs on one
that can actually address GPUs on one end, uh AS6 on the complete other end,
end, uh AS6 on the complete other end, FPGA somewhere in the middle and then
FPGA somewhere in the middle and then NPUs with a pretty wide and diverse
NPUs with a pretty wide and diverse range of NPUs that exist on the market
range of NPUs that exist on the market today all innovating in different ways.
today all innovating in different ways. And then of course then it brings to the
And then of course then it brings to the you know analog and the optical
you know analog and the optical computing and the um in inmemory compute
computing and the um in inmemory compute all these clever architectures but
all these clever architectures but without this compiler they're unusable
without this compiler they're unusable right because the complexity is is high.
right because the complexity is is high. So that's what I want to bring up is for
So that's what I want to bring up is for me the the challenge of the AI space
me the the challenge of the AI space with all these innovative architectures
with all these innovative architectures is how do we leverage the opportunity of
is how do we leverage the opportunity of a single programming model with
a single programming model with something general enough to uh to be
something general enough to uh to be used across a wide range of
used across a wide range of applications. So, are you talking about
applications. So, are you talking about compilers going from whatever this
compilers going from whatever this language is down to like a a Mac level
language is down to like a a Mac level architecture all the way down to
architecture all the way down to silicon?
silicon? Not down to not silicon compilers
Not down to not silicon compilers necessarily. I mean, FPGA, you might
necessarily. I mean, FPGA, you might argue you you can get there, but um
argue you you can get there, but um let's just stay with a well- definfined
let's just stay with a well- definfined architecture you need to compile to. So,
architecture you need to compile to. So, it's it's generating, you know, object
it's it's generating, you know, object code or configuration code for a
code or configuration code for a flexible architecture. Yeah. Okay. So,
flexible architecture. Yeah. Okay. So, Warren, you put on your camera. I I I
Warren, you put on your camera. I I I suspect you have a question or a
suspect you have a question or a comment. Sure. So I would give you my
comment. Sure. So I would give you my perspective of one of the challenges
perspective of one of the challenges going forward and um it it has to do
going forward and um it it has to do with what I was talking about a little
with what I was talking about a little bit um in in data sets and so I think as
bit um in in data sets and so I think as AI is evolving to more complex models
AI is evolving to more complex models more complex
more complex scenarios multi-step reasoning agentic
scenarios multi-step reasoning agentic AI where um the AI will be interacting
AI where um the AI will be interacting with the
with the environment
environment now collecting enough quality data to be
now collecting enough quality data to be able to train the models uh to act in
able to train the models uh to act in these complex manners will become very
these complex manners will become very difficult. So we're already seeing um a
difficult. So we're already seeing um a movement to synthetic data to simulation
movement to synthetic data to simulation environments to be able to get enough
environments to be able to get enough training examples for these AIS and I
training examples for these AIS and I think there's a lot of challenges there
think there's a lot of challenges there um in ensuring that we able to to train
um in ensuring that we able to to train on quality data understanding
on quality data understanding um how much real or human intervention
um how much real or human intervention will be required um and this become
will be required um and this become especially difficult in unstructured
especially difficult in unstructured environments where you would u have
environments where you would u have robots operating. So I think this is
robots operating. So I think this is going to be a big challenge uh going
going to be a big challenge uh going forward. Okay. All right. Is there
forward. Okay. All right. Is there anyone in the audience that has uh
anyone in the audience that has uh thoughts about the challenges uh going
thoughts about the challenges uh going forward in in in delivering on some of
forward in in in delivering on some of the opportunities?
I can just make a a high level statement especially in the Canadian context.
especially in the Canadian context. Right. So I think AI inference is a is a
Right. So I think AI inference is a is a very important uh uh element of uh AI
very important uh uh element of uh AI adoption in Canada especially at the
adoption in Canada especially at the edge. You have companies like entered
edge. You have companies like entered and storage designing uh AI chips that
and storage designing uh AI chips that are fine-tuned for uh AI workload at the
are fine-tuned for uh AI workload at the edge uh maybe not in cars but at the the
edge uh maybe not in cars but at the the near edge. I think connecting with the
near edge. I think connecting with the end uh uh user is very important
end uh uh user is very important supporting this architecture with
supporting this architecture with applications that are running out of the
applications that are running out of the box instead of instead of saying look I
box instead of instead of saying look I have a cool uh AI inference card use it
have a cool uh AI inference card use it I don't think this is a good selling
I don't think this is a good selling point uh this is why Nvidia is uh is
point uh this is why Nvidia is uh is having tremendous success because they
having tremendous success because they are supporting various uh sectors at the
are supporting various uh sectors at the end user level so they have libraries
end user level so they have libraries they have applications, they have
they have applications, they have reference design, people can just take
reference design, people can just take them and use them in their application.
them and use them in their application. So I think uh building uh a resilience
So I think uh building uh a resilience ecosystem and and connecting both of
ecosystem and and connecting both of both side of the supply chain you have
both side of the supply chain you have these uh producers of technologies uh
these uh producers of technologies uh like uh Nvidia and storage and entered
like uh Nvidia and storage and entered and you have the end users uh across all
and you have the end users uh across all verticals whether is smart health,
verticals whether is smart health, manufacturing, agriculture and you need
manufacturing, agriculture and you need a ready to go solution so people can
a ready to go solution so people can adopt it and use it. This is my high
adopt it and use it. This is my high level two cents uh for this conversation
level two cents uh for this conversation and
and uh it's it's a problem um and this is uh
uh it's it's a problem um and this is uh something that's defined success I guess
something that's defined success I guess for these companies especially in
for these companies especially in Canada. So how much of a um thanks for
Canada. So how much of a um thanks for that how much of a uh issue is uh just
that how much of a uh issue is uh just getting the talent. So far we we touched
getting the talent. So far we we touched on technical problems, right? Uh or
on technical problems, right? Uh or technical desires. Uh how much of a an
technical desires. Uh how much of a an issue or a challenge is getting the
issue or a challenge is getting the right people, the talent that you need
right people, the talent that you need to actually create solutions?
to actually create solutions? So in Canada we have we have these three
So in Canada we have we have these three uh very uh important institutes. Mila,
uh very uh important institutes. Mila, Vector Institute, Ames, they they focus
Vector Institute, Ames, they they focus on uh algorithmic innovation. We have
on uh algorithmic innovation. We have great universities like poly techchnique
great universities like poly techchnique uh like University of Toronto, Cyber
uh like University of Toronto, Cyber Fraser University. They have great
Fraser University. They have great research teams uh building innovative
research teams uh building innovative algorithms. I think there is a gap
algorithms. I think there is a gap between the algorithm development and
between the algorithm development and the hardware development, right? And uh
the hardware development, right? And uh I think we should see some initiatives
I think we should see some initiatives uh in Canada that bridge that gap uh to
uh in Canada that bridge that gap uh to enable a vibrant ecosystem where you
enable a vibrant ecosystem where you have various uh uh experts in various
have various uh uh experts in various area collaborating with each other. This
area collaborating with each other. This is my uh my main feedback here. A
is my uh my main feedback here. A compiler should solve all that, right?
compiler should solve all that, right? I I completely agree with you scene on
I I completely agree with you scene on the bridge between the application space
the bridge between the application space and I would qualify not necessarily to
and I would qualify not necessarily to create new hardware although that you
create new hardware although that you know that's an important one but
know that's an important one but creating new hardware and then the tools
creating new hardware and then the tools to program that hardware is a nightmare
to program that hardware is a nightmare and it's expensive and not that many
and it's expensive and not that many companies can do it and I'm fortunate
companies can do it and I'm fortunate I'm part of a company we can afford it
I'm part of a company we can afford it but there there are a handful that can
but there there are a handful that can do this but there is another I mean I'm
do this but there is another I mean I'm willing to bet that a company that would
willing to bet that a company that would take an innovative application and map
take an innovative application and map it onto our architecture efficiently in
it onto our architecture efficiently in a way that gets that 100x power
a way that gets that 100x power advantage uh 10x uh cost advantage and
advantage uh 10x uh cost advantage and leverages that really efficiently to use
leverages that really efficiently to use ours well would make much more money
ours well would make much more money than we've made just because they would
than we've made just because they would put six people on this we put you know
put six people on this we put you know 10 times more for 10 years they could do
10 times more for 10 years they could do with with a half dozen of people over
with with a half dozen of people over one year and probably produce more value
one year and probably produce more value for the C paying customer that they be
for the C paying customer that they be willing to pay a lot of money for this
willing to pay a lot of money for this because the customer in the end doesn't
because the customer in the end doesn't really care about the compiler and the
really care about the compiler and the architecture and how many Macs you have.
architecture and how many Macs you have. They care about a solution to market
They care about a solution to market quickly that is efficient. So we have
quickly that is efficient. So we have some of our um consuming high-end consu
some of our um consuming high-end consu uh leading consumer companies like Moore
uh leading consumer companies like Moore um that have partners who do exactly
um that have partners who do exactly that. They just say listen we we're
that. They just say listen we we're using the synopsis solution. we we like
using the synopsis solution. we we like it, but we're using standard graphs and
it, but we're using standard graphs and we compile them and sometimes it's
we compile them and sometimes it's great, sometimes it's medium, sometimes
great, sometimes it's medium, sometimes it's not great. Um, could you just write
it's not great. Um, could you just write your models in a way that leverages this
your models in a way that leverages this product, not Nvidia GPU? Nothing wrong
product, not Nvidia GPU? Nothing wrong with doing it for Nvidia GPU, but that
with doing it for Nvidia GPU, but that that's their starting point. And then
that's their starting point. And then there's a four or five NPUs out there.
there's a four or five NPUs out there. They happen to be using ours. They
They happen to be using ours. They wanted to run well on ours. I'm sure
wanted to run well on ours. I'm sure they could make a lot more money than
they could make a lot more money than we're making. And I shouldn't say this
we're making. And I shouldn't say this because my boss might be listening. But
because my boss might be listening. But so very very interesting. I'm just
so very very interesting. I'm just curious
curious um where does Gen
um where does Gen AI act what role does Gen AI actually
AI act what role does Gen AI actually play in the design process itself? Gen
play in the design process itself? Gen Gen AI generating Gen AI solutions,
Gen AI generating Gen AI solutions, right? Are where are we with all of
right? Are where are we with all of that? I'm sure that'll that's that's
that? I'm sure that'll that's that's something that that's like a holy grail
something that that's like a holy grail probably uh going forward. Mhm. That's a
probably uh going forward. Mhm. That's a 2001 a space odyssey. Oh, you uh Dave, I
2001 a space odyssey. Oh, you uh Dave, I have a problem. I cannot design your
have a problem. I cannot design your chip because that would put me out of a
chip because that would put me out of a job.
job. We use AI a ton that synopsis. There's a
We use AI a ton that synopsis. There's a huge use of of AI for design clever
huge use of of AI for design clever ways. So, I'm not part of that
ways. So, I'm not part of that organization. So, I can just say from
organization. So, I can just say from the outside is a major part of our
the outside is a major part of our investment. Synopsis was one both in the
investment. Synopsis was one both in the embedded AI and inference but also in
embedded AI and inference but also in the you know use of AI for EDA was
the you know use of AI for EDA was another leader and and started early. So
another leader and and started early. So I definitely can see they're using this
I definitely can see they're using this for, you know, power optimization,
for, you know, power optimization, physical design, placement, routing.
physical design, placement, routing. There's a ton of opportunities that this
There's a ton of opportunities that this works well, right? So we're kind of get
works well, right? So we're kind of get uh Biosmos is getting into the solutions
uh Biosmos is getting into the solutions area question, right? That's which is
area question, right? That's which is okay. Uh so I I think what in the end I
okay. Uh so I I think what in the end I should just be able to verbally say I
should just be able to verbally say I want a system that does this and this
want a system that does this and this and this uh in terms of AI capability
and this uh in terms of AI capability and push a
and push a button and and there it is right uh
button and and there it is right uh we'll see when that comes um yeah I
we'll see when that comes um yeah I don't see AI genai doing like RTL design
don't see AI genai doing like RTL design anytime soon you know there's there's
anytime soon you know there's there's some aspects where you you need to have
some aspects where you you need to have some
some structure in in the solution that maybe
structure in in the solution that maybe is is more difficult to get with the
is is more difficult to get with the with AI. Well, okay, let me challenge
with AI. Well, okay, let me challenge that. Uh you're you might be absolutely
that. Uh you're you might be absolutely right and I'm sure you are. But okay,
right and I'm sure you are. But okay, you know Zuckerberg is over there saying
you know Zuckerberg is over there saying in the next whatever number of years uh
in the next whatever number of years uh 50% of software will be written by AI
50% of software will be written by AI and I mean you look at software it is
and I mean you look at software it is structured there is a structure uh you
structured there is a structure uh you know there's certain syntax certain
know there's certain syntax certain things that that you have to follow
things that that you have to follow right and RTL is like that so if you
right and RTL is like that so if you take Zuckerberg's word for it then why
take Zuckerberg's word for it then why would couldn't you apply that to RTL Oh
would couldn't you apply that to RTL Oh yeah, fair fair enough. Yeah, I suppose
yeah, fair fair enough. Yeah, I suppose if you have I mean I was I was shocked
if you have I mean I was I was shocked in one year the what they call AGI
in one year the what they call AGI benchmarks artificial general
benchmarks artificial general intelligence. They compared you know the
intelligence. They compared you know the results of programming but this is
results of programming but this is software programming. um and they came
software programming. um and they came to the point where they were kind of
to the point where they were kind of grade six students and then they were
grade six students and then they were the CTO of leading um software companies
the CTO of leading um software companies were being challenged by the results in
were being challenged by the results in one year uh of of just training these
one year uh of of just training these models for programming. So this 50% of
models for programming. So this 50% of programming will be done by AI.
programming will be done by AI. Absolutely. Um but then for me the RTL
Absolutely. Um but then for me the RTL is uh the people as as smart but no
is uh the people as as smart but no smart no reason. Um and the problem is
smart no reason. Um and the problem is as hard and they're probably no harder
as hard and they're probably no harder and it's different. I'm sure there's a a
and it's different. I'm sure there's a a but there's a smaller market so I'm sure
but there's a smaller market so I'm sure no one's worked as hard in you know
no one's worked as hard in you know automating RTL. Yeah. But for software
automating RTL. Yeah. But for software there's a much bigger market. So people
there's a much bigger market. So people have worked on that and I see no reason
have worked on that and I see no reason why you know companies would not say
why you know companies would not say okay we've got enough tens of thousands
okay we've got enough tens of thousands of RTL programmer RTL designers to
of RTL programmer RTL designers to automate that better and it should not
automate that better and it should not be seen as a challenge to anyone. It's
be seen as a challenge to anyone. It's the opposite is the smart people will
the opposite is the smart people will use the easy tools and do the really
use the easy tools and do the really hard stuff which were you know we will
hard stuff which were you know we will not do with AI
not do with AI creative out of the box. So I'm very I'm
creative out of the box. So I'm very I'm very interested in that topic of RTL. Um
very interested in that topic of RTL. Um maybe having various agents and each
maybe having various agents and each agent is working at different level. You
agent is working at different level. You have an agent that is high level RTL
have an agent that is high level RTL trained to uh come up with the best
trained to uh come up with the best architecture possible and you have
architecture possible and you have various agents optimizing those blocks
various agents optimizing those blocks depending on what type of acceleration
depending on what type of acceleration you need and you have a high level agent
you need and you have a high level agent who is the architect that's bring all
who is the architect that's bring all these things together. So I don't know
these things together. So I don't know this will take place but uh I think
this will take place but uh I think there's a trend of using various agents
there's a trend of using various agents train it for various purposes and link
train it for various purposes and link them together in a in a flow and this
them together in a in a flow and this will have impacts especially in the CAD
will have impacts especially in the CAD industry. Okay. So, okay to be to be
industry. Okay. So, okay to be to be fair and you know to uh I think to
fair and you know to uh I think to Franis's uh point uh I mean there are
Franis's uh point uh I mean there are similar there are structures but in in
similar there are structures but in in the end RTL translates to hardware right
the end RTL translates to hardware right and ultimately in my experience back in
and ultimately in my experience back in Nordell days they had some great tools
Nordell days they had some great tools RTL and and so and other compilers and
RTL and and so and other compilers and that was great but in the final thrust
that was great but in the final thrust of getting the chip out you always had
of getting the chip out you always had to tweak in there and do some custom uh
to tweak in there and do some custom uh logic um uh changes etc. So there is a
logic um uh changes etc. So there is a difference I recognize and and the
difference I recognize and and the impact is different because software
impact is different because software well it's just more memory or more
well it's just more memory or more cycles
cycles uh in hardware you're locked in right
uh in hardware you're locked in right you're locked in in chip size and you're
you're locked in in chip size and you're locked in in cost so so it does require
locked in in cost so so it does require extra diligence in in terms of uh the
extra diligence in in terms of uh the compiler
compiler output. Okay. Um,
output. Okay. Um, so are there any
so are there any other ways that we can innovations
other ways that we can innovations coming down the down the road other than
coming down the down the road other than you know compilers and um
you know compilers and um um you know getting more talent on and
um you know getting more talent on and so on that that will help us bring
so on that that will help us bring address some of the uh future
address some of the uh future opportunities like in AI in uh
opportunities like in AI in uh automotive or health so on what's what's
automotive or health so on what's what's down the So in terms of silicon or other
down the So in terms of silicon or other uh uh technologies,
another big point that does is being mentioned a lot is of course
mentioned a lot is of course trustworthiness uh which is mostly uh
trustworthiness uh which is mostly uh being addressed at the software at the
being addressed at the software at the model level. But I think in in hardware
model level. But I think in in hardware we have a a role to play uh about that
we have a a role to play uh about that right in in making sure that the
right in in making sure that the hardware can
hardware can uh work in um in a harsh environments
uh work in um in a harsh environments like uh professor Chen was was was
like uh professor Chen was was was talking about uh can be extremely
talking about uh can be extremely reliable in in for mission critical
reliable in in for mission critical applications
applications uh and so on. So I think there's it's
uh and so on. So I think there's it's it's it's more a more minor point but
it's it's more a more minor point but still uh quite important in in
still uh quite important in in developing the the hardware properly I
developing the the hardware properly I believe. Yeah. So there's that there's
believe. Yeah. So there's that there's kind of the
kind of the environmental protection to
environmental protection to environmental uh uh faults if you will
environmental uh uh faults if you will but then there's the uh deliberate uh
but then there's the uh deliberate uh hacking right of uh yes that's of the
hacking right of uh yes that's of the algorithm of of the structures of the
algorithm of of the structures of the memory of the content right so I think
memory of the content right so I think that's going to be a must especially
that's going to be a must especially once you start implementing using
once you start implementing using agentic AIs that make their own
agentic AIs that make their own decisions and make their own Right?
decisions and make their own Right? You're not just asking a question and
You're not just asking a question and then do some doing something that
then do some doing something that they're doing the uh the AI is doing the
they're doing the uh the AI is doing the uh the actual uh decision making and the
uh the actual uh decision making and the actual uh actuating if you will right
actual uh actuating if you will right yeah we can recall like the I guess the
yeah we can recall like the I guess the original uh that's quite dated now but
original uh that's quite dated now but the original example for adversarial uh
the original example for adversarial uh say image recognition where you change a
say image recognition where you change a few pixels in a stop sign and then
few pixels in a stop sign and then suddenly like it doesn't get recognized
suddenly like it doesn't get recognized at all or things like that. So
at all or things like that. So definitely that that needs to be
definitely that that needs to be addressed, right? Yeah. Okay. Yeah. And
addressed, right? Yeah. Okay. Yeah. And that's a real concern, you know, talking
that's a real concern, you know, talking to leading the highquality automotive,
to leading the highquality automotive, they they don't want this to happen. So
they they don't want this to happen. So they always have redundancy not on just
they always have redundancy not on just two AIs comparing each other. They have
two AIs comparing each other. They have an AI and then they have an algorithmic
an AI and then they have an algorithmic approach which is very dumb, but will
approach which is very dumb, but will detect that it's a stop sign and not a
detect that it's a stop sign and not a corrupted uh you know adversarial
corrupted uh you know adversarial network that's saying it's not a stop
network that's saying it's not a stop sign. So they have this uh you know
sign. So they have this uh you know innovation around uh Yeah. Um
innovation around uh Yeah. Um like a backseat driver. Yeah. Your
like a backseat driver. Yeah. Your mother-in-law in the back seat saying
mother-in-law in the back seat saying you're missing the stop sign. All right.
you're missing the stop sign. All right. Uh so I think that's it for me. If Naj
Uh so I think that's it for me. If Naj was here, I'd ask him a little more
was here, I'd ask him a little more about the uh analog part. Um
about the uh analog part. Um I did want to say something about analog
I did want to say something about analog just because this is one of the both
just because this is one of the both opportunities, innovations and
opportunities, innovations and challenges and it's a frustrating one
challenges and it's a frustrating one because we have a lot of potential
because we have a lot of potential customers that are startups that come to
customers that are startups that come to us and say we have this amazing analog
us and say we have this amazing analog compute you know we can do max and CNN's
compute you know we can do max and CNN's at 100th the power and it looks great
at 100th the power and it looks great and it is um we have companies that do
and it is um we have companies that do opto electronics and Okay, we can it's
opto electronics and Okay, we can it's all about data movement. So we can do
all about data movement. So we can do this through integrated fiber optics get
this through integrated fiber optics get it connected through whatever and then
it connected through whatever and then we have companies that say well it's all
we have companies that say well it's all about memory right and this is the
about memory right and this is the elephant right everyone sees it from
elephant right everyone sees it from their angle it's all about memory so why
their angle it's all about memory so why since it's all about data movement in
since it's all about data movement in memory well let's do compute in memory
memory well let's do compute in memory rather than you know memory around
rather than you know memory around compute and all these ideas are great
compute and all these ideas are great and and from in their niche they do
and and from in their niche they do something amazing which is 100 times
something amazing which is 100 times better than anything else or at least 10
better than anything else or at least 10 times better and then we we work
times better and then we we work together try to they say yes but the
together try to they say yes but the problem is is that you know what we get
problem is is that you know what we get from PyTorch is so much stuff you know
from PyTorch is so much stuff you know actually there's only 10% of it that
actually there's only 10% of it that maps naturally and easily um so what
maps naturally and easily um so what about the 90% can we couple ours with
about the 90% can we couple ours with yours and you know do deep heart surgery
yours and you know do deep heart surgery and plug in our m our analog max or take
and plug in our m our analog max or take out your knock and put in an optical one
out your knock and put in an optical one take out your memories and put optical
take out your memories and put optical and but we try this and we open-mindedly
and but we try this and we open-mindedly explore this but there's always these
explore this but there's always these bottlenecks that appear. You remove one
bottlenecks that appear. You remove one problem, you create another one and the
problem, you create another one and the other one is sometimes worse, sometimes
other one is sometimes worse, sometimes it's not as bad, but then you have to
it's not as bad, but then you have to build a new compiler, you have to build
build a new compiler, you have to build all kinds of tools, and then how
all kinds of tools, and then how flexible will it be in the future? And
flexible will it be in the future? And for me, the root cause of this is that
for me, the root cause of this is that people build there might be some amazing
people build there might be some amazing innovation by combining these things,
innovation by combining these things, right? But because people build their
right? But because people build their neural networks on a GPU, Nvidia, AMD,
neural networks on a GPU, Nvidia, AMD, doesn't matter which. And they have all
doesn't matter which. And they have all that generality and they don't care
that generality and they don't care about efficiency. They don't care about
about efficiency. They don't care about how they find a really cool new
how they find a really cool new XGU++ activation function, which is wow,
XGU++ activation function, which is wow, that gives me 2% better accuracy. And
that gives me 2% better accuracy. And then they go off and do some kind of
then they go off and do some kind of softmax variant. And then they go off
softmax variant. And then they go off and do an non-maxim suppression with a
and do an non-maxim suppression with a whatever. And so you because they have
whatever. And so you because they have access to the most general purpose
access to the most general purpose processor in the world even though it's
processor in the world even though it's hard to program but it is general
hard to program but it is general purpose. Um you don't have any
purpose. Um you don't have any homogeneity in the models and no synergy
homogeneity in the models and no synergy of the model with the architecture. And
of the model with the architecture. And for me there's got to be a sweet spot
for me there's got to be a sweet spot where we have a clean model which is
where we have a clean model which is restricted to what these optical in in
restricted to what these optical in in compute memory and uh in in memory
compute memory and uh in in memory compute and um analog you know can can
compute and um analog you know can can leverage but for now it's a chicken and
leverage but for now it's a chicken and egg till until you can break through
egg till until you can break through with these people won't use it and if
with these people won't use it and if they don't use it they're using a GPU
they don't use it they're using a GPU and therefore they're programming all
and therefore they're programming all kinds of weird stuff which sometimes is
kinds of weird stuff which sometimes is needed sometimes is because it's it's
needed sometimes is because it's it's there. Yeah, I guess um you know uh
there. Yeah, I guess um you know uh design velocity is a real uh need,
design velocity is a real uh need, right? Uh so the the more you
right? Uh so the the more you standardize the the the greater you can
standardize the the the greater you can have the velocity. However, I must say
have the velocity. However, I must say my own opinion is uh that the analog is
my own opinion is uh that the analog is sufficiently distinct from just you know
sufficiently distinct from just you know tweaks of different digital uh things
tweaks of different digital uh things that uh and if it saves you 10x power or
that uh and if it saves you 10x power or whatever and you want to put that on a
whatever and you want to put that on a watch or or some some device, you know,
watch or or some some device, you know, in IoT, it definitely uh makes sense.
in IoT, it definitely uh makes sense. But anyway, that's that's just me. It's
But anyway, that's that's just me. It's very specialized. I think you're
very specialized. I think you're absolutely right. Our use cases with
absolutely right. Our use cases with these startups is it's never specialized
these startups is it's never specialized enough. It always requires to plug in
enough. It always requires to plug in our general, you know, generic tech
our general, you know, generic tech sensor accelerator which is 10 times the
sensor accelerator which is 10 times the area and power of that unit. So yeah,
area and power of that unit. So yeah, you've saved 50%, you've brought down to
you've saved 50%, you've brought down to 5%. But you still have 50% on the other
5%. But you still have 50% on the other side which has not changed and then
side which has not changed and then you've introduced new bottlenecks and
you've introduced new bottlenecks and you have no tools. So in the end it
you have no tools. So in the end it doesn't give you it's you know it's law
doesn't give you it's you know it's law in in a different form but it's uh these
in in a different form but it's uh these have everything gets a 10x and that's
have everything gets a 10x and that's the problem because of all the
the problem because of all the complexity of the weird stuff. You can't
complexity of the weird stuff. You can't do the weird stuff in analog and so
do the weird stuff in analog and so you're stuck with auto and and
you're stuck with auto and and unfortunately Macs are only 30% of the
unfortunately Macs are only 30% of the overall power and area and complexity.
overall power and area and complexity. No, it's not none of 2% of complexity,
No, it's not none of 2% of complexity, but you know, still 30% of power and
but you know, still 30% of power and area. If you have 70% that can't be
area. If you have 70% that can't be analog, then you're still stuck. Yeah,
analog, then you're still stuck. Yeah, exactly. So, you you you can build want
exactly. So, you you you can build want to build a different house, some grandio
to build a different house, some grandio house, but if you can't use a hammer to
house, but if you can't use a hammer to do it, then you're stuck, right? Okay.
do it, then you're stuck, right? Okay. Sorry, I'm I'm turning it over to you,
Sorry, I'm I'm turning it over to you, Yim. And uh uh I would like to thank you
Yim. And uh uh I would like to thank you Walter personally for accepting to
Walter personally for accepting to moderate this panel session. And it's
moderate this panel session. And it's time to conclude the workshop. Uh we are
time to conclude the workshop. Uh we are 8 minutes over the time but it's fine.
8 minutes over the time but it's fine. It was a
It was a very great
very great great uh discussion. Um so thank you to
great uh discussion. Um so thank you to all our speakers and panelists and
all our speakers and panelists and attendees for making today's workshop uh
attendees for making today's workshop uh such a success. uh this uh the insight
such a success. uh this uh the insight shared uh highlights just how exciting
shared uh highlights just how exciting and critical uh the future of AJI truly
and critical uh the future of AJI truly is. Um so stay inspired, stay connected
is. Um so stay inspired, stay connected and we'll see you at the next one. Thank
and we'll see you at the next one. Thank you very much. Thanks for organizing
you very much. Thanks for organizing this. Very well done. Thank you. Thank
this. Very well done. Thank you. Thank you. Bye-bye. Thank you. Bye-bye.
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.