YouTube Transcript:
Accelerating AI Workshop 2025 – Challenges and Opportunities in Cloud and Edge Computing

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

This event, "Acceleration AI," focused on the advancements and challenges of Generative AI at the edge, bringing together industry and academic experts to discuss innovations in machine learning, edge computing, hardware-software co-design, and the future of AI.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

Hello everyone and welcome to the sixth

edition of acceleration AI. I'm Yin from

CMC micros systemystems. I am very

pleased to be your host today for this

annual virtual event which I have had

the pleasure of organizing since 2020.

This year workshop is supported by

fabric our latest initiative which I

will talk about a little bit later.

It has been uh inspiring to see the

growth of this workshop over the years

as it continues to bring together a

dynamic and expanding community of

researchers, innovators and experts.

Today we are proud to present an

outstanding group of speakers. Uh these

leaders in their fields will share the

latest developments in machine learning,

edge computing, generative AI and

hardware software codeesign.

I would like to extend a sincere thank

you to our distinguished speaker today

for sharing their time and insights and

to all of you for being here today. Uh

whether you are a professor,

researchers, startup founder or an

industry professional, your

participation is what makes this event

so valuable. Before we begin, please

note that this session is being recorded

and uh will be uh made available soon on

As we dive into today's workshop, uh

let's take a moment to reflect on the

overarching goals of this event. Uh our

mission has been uh is to bring together

experts from both industry and academia

to share the latest trends and

innovation in AI, explore the key

challenges and opportunities in cloud

and edge computing and identify

opportunities in for collaborations and

that can drive us forward.

Additionally, uh from CMC perspective,

we aim to identify the common

infrastructure requirements that will

support the growth and scalability of

this transformative technologies to

better support the Canadian ecosystem.

With this in mind, uh we are set for an

exciting and insightful event as we

navigate the intersection of AI and edge computing.

computing.

So this year workshop shines a spotlight

on generative AI at the edge with a

focus on the latest advancements and

real world challenges in building

efficient cost effective AI solutions

for resource constrained environments.

Our speakers will delve into topics like

model optimization and security

including techniques for fine-tuning uh

models and seamlessly integrating them

with edge hardware. We also explore

development in AI hardware featuring

architectures like risk 5 processors,

analog neural network chips as as well

as FPGAs and how these technologies can

help address environmental challenges

such as radiation effects in harsh

deployment setting. To conclude the day,

we will host a panel session on the

future of HAI where we'll reflect on

emerging opportunities and what lies

ahead for this rapidly evolving

field. Before we dive into the CMC

micros systemystems opening remarks

which includes CMC products and services

in IoT and HAI, let me quickly walk you

through today's agenda with some

housekeeping rules. Uh we have a packed

and exciting lineup of presentations

from our distinguished speakers. Each

speaker brings a unique perspective on

the intersection of AI and edge

computing and each presentation is

scheduled for 20 minutes 15 minutes

hopefully for presentation followed by a

5 minute Q&A session. And this message

is for uh my dear speakers. Please try

to keep your presentation time to 15

minutes. So when you see me appearing

your screen, this means that you need to

wrap up so we can move to the Q&A

session. So let's uh talk a little bit

about the uh the agenda today. uh our

first presentation from uh Kier Poland

from synopsis who will shed the light on

cost effective solution for generative

AI at the edge followed by Davis Sawyer

from NXP semiconductor who will present

secure fine-tuned LLMs for generative AI

at the

edge and professor Warren Ros from

McGill University will present parameter

efficient finetuning of

transformer-based language model using

data at Brunie. I would like to thank

again these three speakers. They've been

long-term contributors to the

workshop and we have a new speaker Borak

from Edge Signal startup here here in

Ottawa. He will be presenting the

implementation of generative AI in edge

environment uh challenges and solution.

We will have a break uh 5 minutes break

and then we will resume the workshop uh

a presentation from Katarina from Nvidia

who will present Nvidia edge AI stock

software and hardware followed by

another contributor long-term

contributor to the workshop professor

Franual Primo from Poly Technique who

will cover risk 5 Polar collaborative

design and of an open source risk 5

multiore processors then we'll have new

presenters from academia Professor

Liechen from University of Saskatchewan

who will cover radiation effects in

conversion network implemented with FPGA

and mitigation technique. Last but not

least, Nirage Matthew from Blue Mind

will switch the gear to cover all analog

neural network processor to deliver

highly efficient and performance AI

inferencing. The news this year is we

invite our distinguished panel panel

director Walter Knights a CEO from EIOT

Canada who will host our panel session

today which cover pioneering the future

of generative AI at the edge challenges

opportunities and innovation. So this is

a high level overview of our agenda

today and now let me give you some news

So as you know uh most of you know

fabric uh our latest initiative is uh

funded by ISET strategic innovation fund

and uh it is managed by CMC micros

systemystems is focused on building a

strong and sustainable semiconductor

ecosystem in Canada which supports

companies developing homegrown

semiconductor technologies encourage

collaboration across industry and help

grow Canada's roles in the global supply chain.

chain. [Music]

[Music]

Public challenge projects uh help

Canadian industry and academia develop

next generation semiconductor processes

products uh with a focus on photonics,

MEMS and quantum. IoT projects drive

innovation in sensors for clean techch

healthcare and telecom. Uh this

initiative uh provides design tools,

methods and prototype fabrication with

up to 50% reinforcement for industry and

full coverage for academia. Um these

challenges uh help straighten uh the uh

Canadian manufacturing supply chain.

The fabric innovation platform offers

tools, technical resources and training

to support a strong talented uh

pipeline, accelerates product

developments and drive world-class [Music]

research. Currently, the public

ecosystem welcomes Canadian

professionals, academic, government and

industry experts who are passionate

about semiconductors. Academics and

students keep access to their CMC

subscriptions like CAD, tools,

fabrication and basecam while gaining

access to extra training and resources through

through [Music]

[Music]

fabric. Now a brief uh introduction of

uh our latest development at CMC micros

systemystem in support for IoT and HI

ecosystem in

Canada. This slide showcase our end to

end IoT development process from concept

to prototype. It begins with project

launch including consultation needs

analysis and partnership. Then we move

to design selecting components and

optimizing the IoT architecture. In

manufacturing we handle supplier

coordination production planning and

quality control and finally our

prototype phase include embedded

software cloud edge integration testing

and the path to volume production. This

is of course high level. So if you need

more details uh we can schedule a quick

meeting and

uh walk you through all what is

available. Um we've developed an open

source customizable IoT sensor platform

with the KitKat PCB design including

schematic layout and bill of materials

all available on fabric github. And uh

this is a Bluetooth low energy mode. So

that supports sensor networks, machine

monitoring and electromechanical sensing

connecting to apps uh for data display

and processing. Uh these demo uh samples

evaluation and we are developing

application across various verticals

with this IoT sensor demonstrators. One

example is the IoT platform for smart

agriculture that enable easy

integration, field testing and real-time

monitoring of environmental parameters

like temperature, humidity and soil

moisture. It supports application in

greenhouse automation, environmental

monitoring, livestock farms and

automatic automatic irrigation and

more. On the edge side, we have we offer

a one-stop shop for development from

concept to prototype. We begin with

conceptualization. Uh defining the

problem goals and addressing hardware

software constraints. Uh we have a large

collection of data sets that we use for

training. Next we move to model

training. Uh depending on the problem we

help our client select the right model

for their application. Uh we have an

infrastructure that allow us to train

this model uh efficiently and we

optimize them for deployment at the

edge. Um and we have some examples to

show here. So for the flow we use for

the edge uh development uh we we use our

infrastructure for training. So we start

with pre-trained models and we fine-tune

them uh on custom data sets. Uh we use

uh mostly uh uh our cluster which is uh

uh inference testing and we finetune it

on custom data sets. uh then we test the

uh the trained model again and for edge

deployment we use a variety of tools to

compress and optimize these models to be

suitable for edge deployments. Uh we use

various uh uh edge platforms for

deployment. I will show some of them

here. So this is the infrastructure we

we continue improving. Uh on the cloud

side here we have uh the FPGA GPU

cluster where we use mostly GPUs for

training and we use a complete software

stack for optimization and we partner

with uh Enser and Storins who are

building custom inference chips for uh

low power inference uh applications. We

also support Jetson Orurin from Nvidia

for uh most of our IoT demonstrators,

edge AI

demonstrators. Here is one example uh of

the uh IoT demonstrators, the AI

demonstrators that we have built as part

of fabric. So we uh took uh

state-of-the-art uh computer vision

model Euro V9. We train it on a custom

data set. The main objective here is to

enhance work safety through realtime

anomaly detection. This is this is a a

big models that we trained and

optimized. We were able to run at almost

40 frame per second on a jet or in real

time. And this allow for example to

detect workers who don't wear their

safety equipment. And the second uh

example here is a generative AI based

prompt vision for advanced video

analytics. And the system we have

developed uses uh our VIT which is a

powerful vision language model to detect

objects in real time based on natural

language uh prompts. So you just type uh

what you are looking for like helmets or

people with bags and the model instantly

highlight them in the video stream. The

front end application we have developed

show live statistic of for each detected

object uh with uh the time it was

detected and uh its number of occurrence

all while running efficiently on the

jets in orange. So we did a lot of

optimization of this model. So it run

fast in real time and this is a part of

our support and fabric to the uh

Canadian ecosystem who want to uh

integrate edge AI in their application.

Now before we dive into the workshop

topic uh I would like to go through uh

some uh high level uh uh market trends and

and

So we expect uh 29 billion devices

connected by 2023. This is uh an annual

increase of 12%. And the drivers IoT, 5G

and AI these are not buzz word they are

transforming many sectors healthcare

industry for all environment precision

agriculture smart city the data uh we

are generating uh massive sensor data

and the data is doubling every two to

three years and 75% of this data is now

uh currently processed at the edge. This

is a rise from 10% in 2018. This has uh

puts a lot of uh pressure on edge

computing where you need very high

performance low power edge computing

capabilities at the edge in order to

deal with all this data. The edge

transformation as most of you knows 60

to 70% of task automated by genai by

next year and 60% of it is multimodel.

Uh this is a rise from 1% in 2023.

This is extremely fast. Uh it's similar

to switching from a flip phone to a

smartphone overnight. So it's it's

really something that is that industry

is trying to capitalize on its uh

capabilities. The edge AI focus as we

know uh AI is continuously moving from

the cloud to the edge because of these

advantages low latency, bandwidth

saving, data privacy, security and

autonomy. What about energy? 90% of the

power is consumed by data movement. This

is this is a fact and uh this has led to

uh an increase or high demand for

energycentric hardware including uh new

innovative approach on classical

computing and even some uh advancements

in photonics and wideband gap

semiconductor materials. And if you need

to know more about photonics and white

banks gaps semiconductors, we have a

team dedicated to these uh different

technologies also the exploration of

quantum spiking and analog architecture.

We have a presentation from your mind

about analog architecture today. So I'm

looking forward to to know their latest

advancements security and optimization

trustworthiness which uh combine safety

security and reliability and privacy. Um

there is a need for uh standards to

ensure interoperability uh for adoption.

So these are high level trends and I I I

I think the speakers also have some uh

they cover some of these. So we will see

uh if we are aligned here. So back to

the workshop, I would like to start with

uh our first speaker. So kicking off our

lineup is Pierre Polland from Synopsis

with over 30 years of experience in AI,

noral processing and embedded systems.

Pierre has helped shape cutting edge SOC

technologies across multiple industries

and today he'll share his vision on cost

effective solution for generative AI at

the edge. Welcome Pierre. Please share

your screen. Thank you for the

introduction. Let me share. So, hello

everyone. I'm Pier Ple. I'm with

Synopsis. Um, based half the year in

Ottawa, Canada, and the other half in

France where I'm uh where I'm presenting

today. So, normally at 7:00 p.m. I've

had a glass of wine. I have not in in in

respect for this interesting workshop,

but I'll have one to celebrate in 10 or 11

p.m. outline for my talk is a quick

introduction on the latest trends in uh

transformers and uh which are the basis

of generative

AI. My audio has gone quiet. Can anyone

hear me?

We hear you perfectly. Yeah. Okay,

great. I can hear you. Perfect. Uh then

I'll give a very short introduction of a

product we've developed called the NPX6

which is a neural processing unit and

then look at the um uh key features of

these units in order to support these

these transformers and therefore Gen AI

and then we'll look specifically at you

know the challenges of genai mapping to

a neural processing unit like the NPX6

and if I have time a quick

outlook probably that will be for the panel.

panel.

So uh amazing uh change I I entered this

space of vision back in 2010ish. Um we

were working in set top box in my

previous uh job at ST micro electronics

in Europe and France and um at that time

we were doing algorithmic uh you know

applications. So we call that classic

computer vision using DSPs. Um and at

that time the you know the one of the

best al one of the best object detection

was called SIFT. um and that had about a

50% accuracy for the imageet top one

number. Uh the revolution happened uh at

University of Toronto um with AlexNet.

It took a you know quantum leap from 50

to 63% in in in a year or two. Um and

that's really when we move from what I

call the prehistoric vision times to uh

the the age of CNN's which is already

the medieval times uh in in terms of

what we're doing today. And you know if

you've been in this space you know

AlexNet became VGGG ResNet and then kind

of saturated these yellow uh CNN

saturated at kind of 90%

um accuracy which is which is still a

pretty good number but a ton of

innovation over at least 10 years to

achieve that with hundreds of papers uh

hundreds of groups and thousands of

papers. And then the the transformer age

started in in in 2020. And literally

within uh first transformers were

developed in the context of natural

language processing and then there was a

an an application of this approach to

vision and within 6 months they had

already caught up with the you know the

best efficient net uh you know CNN and

we're exceeding that with VIT and

applied to vision.

Um and so you know we're kind of hitting

kind of the the asmtoic limit of you

know the information contained in this

in this imageet. But the key message

here is that literally transformers

revolutionized uh CNN's and beat CNN

results in in less than 6 months and

Um so transformers uh were as I've just

mentioned were developed initially for

natural language processing and that's

you know it's at the basis of things

like net chat GPT um but the really

exciting work because we were working in

vision and the exciting uh uh learning

that happened in 2021 is that

transformers as is uh can be applied to

other domains like with like vision with

very little

modification and that we've discovered

that models that combine attention and

classical you know CNN's convolutions um

u out outperform you know the old CNN's

even for small models and initially they

were quite big but we've seen you know

things like VIT become mobile VIT and

get smaller and more

compact so we truly believe that these

transformers and the attention uh uh at

the at the core of a transformer

is are here to stay. Here to stay,

sorry. And let's give an example of

that. Why do we think they're here to

stay? Let's take a look at what the

state-of-the-art for CNN based

applications, something we call panoptic

segmentation. That's where you have an

image on the left hand side. And the

panoptic segmentation state-of-the-art,

you know, convolution neural network

will then be able to identify uh

instances of different classes of

objects. So cars, for example, you have

a uh the taxi in blue, the uh the

minivan in green. Uh people um and

that's about it in this case. Um beyond

just recognizing the objects, it

recognized different instances, so

they're in different colors. Um and then

it also does semantic segmentation. So

it's not only recognizing the car, but

its exact uh contour. Same thing with

the with the person.

So this was the state-of-the-art uh only

three maybe four years ago uh in

CNN's if you're uh building a autonomic

autonomous driving system this is very

shallow information about what this

scene is now this is an odd scene but

let's take a look at we say take the

same scene and we simply ask the

question what is unusual about this

image and we apply this using

lava and the lava response is the

unusual aspect of the image is a man is

ironing clothes on the back of a yellow

minivan while it is on the road. This is

an unconventional and unsafe place to

perform such an activity as ironing

clothes etc etc requires stable surface

blah blah blah. Ironing clothes and

moving vehicles could lead to potential

hazards for both the person doing the

ironing and the other road users. Now,

if I'm building an autonomous drive

software stack, this is the

interpretation I need in order to react.

Otherwise, the other scene just says

there's a pedestrian, there's two cars,

and doesn't say anything about what's

going on. Now, reading the text, it's

obviously some lawyers were involved in

the training here, but uh it's it's

still quite remarkable um this uh

richness of

interpretation and and we really believe

that this richness is needed for the

next generation of AI and as as we've

discovered in the last year or two.

So let's switch gears a little bit um

and talk a little bit about our neural

processing unit, the

NPX6. Um we started this project uh we

were in Canada and some of my key

architects actually studied with Jeff

Hinton at University of Toronto in the

80s and my hardware architect studied

under uh Yosua Benjio at Mel and so I

got lucky. Uh, I actually resisted this

at at the time and they didn't get fat.

Right. So, 42% of Americans are obese. It's

It's

Yeah, you can keep going. Yeah.

Okay, we're we're good. Yeah, we have

muted some some participants here. Some

some not so pleasant comments about

Americans. That's all I heard. Okay.

Okay.

Um, yes. So, I I got lucky. I two my

software architect and hardware

architect came to me in 2012 and said,

"Pierre, uh, AI is really cool. Look at

these new CNN's." I was a bit skeptical

to be honest. Um, I had worked in AI in

the '90s with, you know, knowledgebased

expert systems. And that was a failure.

Um, but I took a look. I read the papers

and we said, "Okay, let's let's put

together a small task force of, you

know, five, six people." And we built a

small first generation uh processor um

which we we we delivered in 2014. Um and

then we saw in two by 2014 it was clear

this was going somewhere and then we

developed since then uh uh four other

generations of this what we'll call the

blue the blue color here which are based

and optimized for CNN.

um back in 2021 we could or back in 2020

even we could see the importance of

transformers and this new generation so

we made a big skip to uh our next our

sixth generation the the NPX6

um and that's the basis of our current

product and that was a big uh

discontinuity we learned a lot from CNN

of course but it was not flexible enough

uh to to accommodate uh these new

applications and the you know natural

language processing or Swing T and VIT

applications. So the the MPX family in

fact we have three families of cores. We

have a low-end microcontrollers uh that

operate you know below 100 GOPS. We have

a vector DSPs, a general purpose DSP

families that used to be used for

computer vision and now just are are

more general for for vision and they can

do low-end AI applications you know

below one tops and then once we get to

one terops uh we have a fam a scalable

family that starts at the npx6 1k the 1k

means 1,024 max and then you have a 4k

which is 4,96 all the way up to 96k uh

which is 98,000 and Then that's our

largest uh single. So roughly 100,000

max. That's about uh 200 tops. Um and

that's our biggest single single NPU.

And then we can instantiate up to eight

NPUs. And that'll get us beyond 2,000

tops. Um we've introduced this two years

ago. Uh since in those two years, we've

licensed uh over 25 uh leading edge

customers. half of those in automotive

and some of those are at 2,000 tops

today. Our high-end uh automotive

customers, leading edge automotive

customers are uh in the 1,00 to 200

tops, 2,000 tops. And we have other

extremes of um in in vehicle

infotainment at one tops uh or uh low

power digital still cameras uh leading

edge uh consumer applications at one

top. So there's three orders of

magnitude between our low-end and our

architecture. So this is a uh quick

overview of the architecture. Um it's a

scalable architecture. It starts with a

a set of cores shown in yellow here uh

from one to a maximum of 24 cores. Each

core internally um has a uh two key

components. convolution accelerator that

does kind of the uh CNN's and matrix

multiplications. Um and that's uh is

4,96 max can run in integer only or with

a floatingoint unit option. And then

attached to that is a generic tensor

accelerator that does anything that's

not matrix multiplications or or CNN's

um things like activation functions but

a whole whole bunch of other functions

that are not

CNN's. And then finally um we have a

complex multi-level memory hierarchy.

Each core has its own level one memory

inside the core. There are 24 of those.

And then we have a level two shared

memory with a high performance and high

low latency interconnect custom network

on chip um that moves data between the

24 up to 24 cores and the level two

shared memory and of course the external

memory external DRAM memory.

Uh each of these cores has its own local

DMA and we have a top level DMA called

the streaming transfer unit and each of

these cores has its own internal

controller a small risk controller and

the top there's also a couple of

controllers at the top level.

Um even though the block diagram you

know talks a lot about hardware uh this

is a large you know doubledigit team um

and exactly half that team is developing

the tools which is probably the single

biggest challenge even more difficult

than the architecture design. So those

are compilers runtime SDKs simulators

platforms. So key architecture features

one of the the the objectives we gave

ourselves when we moved from our fifth

generation of CNN based machines to this

more general class was really to go

beyond CNN and support things like uh

RNN which was already getting old but

mostly transformers and genai recommener

networks are the uh you know the classes

of of uh of applications that emerge in

2021 and beyond but not only uh moving

beyond AI applications but also the

types of sensors we're using. So,

initially we're mostly focused on

vision. We generalized to multiple

sensor classes like radar and lighter

which are heavily used in automotive

which is about half of our customer

base. The other lesson the hard lesson

was flexibility is is essential

everywhere. Uh we kept on thinking CNN's

were going to stabilize at some point.

Um we were proven wrong every

generation. So we added more flexibility

and in this architecture we went even

much further. So we have a fully

programmable um what we call our generic

tensor accelerator which complements the

CNN. Both are extremely flexible and the

the uh the generic tensor accelerators

is fully

programmable. Uh we also went wider in

data types with integer a 16 integer 4

as well as an option for floating point

16 and brain float

16. So that's on the flexibility side

which is a key objective. The other

objective is continued improvements in

efficiency. Um our uh we've seen a MAC

utilization improvement of about 1.5x to

two based on all our lessons learned

around you know state-of-the-art CNN's

like mobile net efficient net and then

focusing on you know genai like stable

diffusion uh lambda 2 and so on. uh we

also uh brought in sparsity. So uh we

have a form of structured sparity very

similar to what's used on on general

purpose GPUs like Nvidia. Uh those of

you are familiar it's called structured

sparity. You get a somewhere between 1.4

to you know close to 2x performance

increase by using this structured

sparity. Um all the R&D is around

bandwidth reduction. Uh the challenge

there is moving data and that's the

challenge in power and the challenge in

complexity and software tool features is

all around data movement. It's not

around putting down hundreds of thousand

maps. That's the easy part. Putting

memories all over the place is easy.

It's about intelligently moving the data

through the

architecture. And then uh we also

improved latency because it's not only

about getting high throughput using high

batch sizes as was the tricks used in

the you know the 2000s and 20s early

2020s people use high batch size in

automotive it's not about it's not about

throughput it's about latency. It's the

time you detect a pedestrian or a guy

ironing on the back of a van. Um the

time to detect that was mostly important

and not as much the

throughput. And finally, we continue to

do power efficiency improvements based

on uh different uh techniques,

gaming. Um I'm not going to explain this

just to say if you add up all the

different things that happen, you have a

level one risk core doing control, you

have a DMA, you have a convolution

accelerator, your generic tensor

accelerator doing activations and soft

maxes, and then you have your output

DMA. Add all of these activities

together, you have 13 ways, 13 parallel

activities that need to be sequenced and

scheduled. Uh, which gives a hint of

some of the challenges of dealing with

the complexity of these

architectures. So, we've invested

heavily like I said exactly or even a

little bit more than half of our team is

building these compilers and runtime.

So, it takes standard representations

like PyTorch and TensorFlow. We convert

that to the industry standard

representation Onyx. uh we compile that

it generates an execution plan

interpreted by a runtime and then 99% of

this runs on the NPU but you can have

special secret sauce for certain

customers that may not run directly on

the accelerator and can run on our uh

vector DSP family and that is done also

by the tools.

So different use cases is you know

exploration uh we we do we compile onto

virtual platforms like the platform

architect and virtualizer which are

other tools developed in other groups in

synopsis we have function and

performance models we have emulators and

boards so what can you do here's a

simple example we have a YOLO V5 um

you're going to be exploring the impact

of u throughput uh depending on

bandwidth. So you might start off at 250

GB per second as an upper bound on

bandwidth assuming you have expensive

HBM interfaces and you're going to look

at the impact on the frame per second of

of bandwidth. The other dimension here

shown by the colors is you're exploring

sizes of onchip memory. The CSM is our

level two cluster shared memory. So the

purple curve has no onchip uh level two

memory while the yellow curve at the top

has 16 megabyte for this for this

machine. Um and you can see that there's

different trade-offs with more memory.

Uh you're less sensitive to bandwidth

because you can store more data on chip

and therefore not as sensitive to DAM.

And you can see that let's say you had a

target of 500 frames per second. Well,

basically there are quite a few data

points that meet that target and

therefore you can trade off well do I

want to spend more money on bandwidth uh

which has you know cost and power

impacts or do I want to spend more money

on memory uh in order to reduce

bandwidth and therefore with the you

know the green curve for example which

is a nice trade-off you can see you can

achieve 600 frames per second with 32

GBTE per second as a nice trade-off of 8

mgabyte or even going down to the red

curve which is 4 megabyte and still

achieving just above 500 frames per

second. So the tools allow you to do

this automatically and that's a key

thing here because doing this manually

is a non-starter. The complex of these

machines do not allow you to do even one

of these 25 data points in you know less

than days and weeks. Well, this can be

done in a couple of minutes on our exploration

tool. So I talked about transformers and

I'm going to go fast. I think we're

running out of time. Um, just to say

there's key features. There's features

in the convolution accelerator that are

unique and different from CNN's. So

things like matrix matrix multiplication

instead of just matrix vector

multiplications used for CNN's. You need

matrix matrix. You need feature maps

appear on both operands and not on just

on one side. Those are just an example.

You need this very flexible generous

tensor accelerator to do things like

softmax, layer normalization, new

activations functions like

GLU. And then finally, you need a

dedicated DMA that does complex things

like embedding lookups. Uh the all

features needed to support the the

constructs of a transformer which are

the basis for Gen AI.

So just to give you a sense of uh you

know what's the efficiency um we've run

you know vision transformer vit and swin

for different imprint sizes. This is on

our single core the 496 MAC and you can

see our maculization is varies between

60 and 70%. And our bandwidth is uh you

know in the range of an LP DDR5

somewhere between 20 GB and 30 GB. Of

course, under NDA with our customers,

you can get the exact numbers and just

giving you a range here. Um, a key

message here is that if you run these on

a GPU, Mac utilization is typically

below 5%. Sometimes 10, rarely above 20.

Um, these machines in the embedded space

need to be much more efficient and the

tools need and these are more dedicated

machines than a GPU and you can get much

higher utilization at very low area and power.

power.

So, genai stable diffusion the example.

Um, I'm going to skip this because we're

out of time. It'll be in the material we

leave with with the scene. Uh, I just

want to show where this compares. So,

our NPX 6 32K running in, you know, uh,

dense mode, uh, where there's no

sparity. We'll match a RTX 3060. The 32K

with structured sparsity will match the

Titan RTX. This is about uh 30 frames

per second for stable diffusion version

1.5. Uh so that's a $200 machine. It

consumes about 200 watt. Just as a

ballpark, these machines are less than

10 mm square and 5nmter. Um a Titan RTX

is many hundreds of millime square. And

maybe even more importantly, it consumes

less than 2 watts compared to 200 watts

on a general purpose GPU GPU like a

Titan. And that's kind of our mid

mid-enge machine. We can go higher, of

course, with the 64K. And then we're

approaching, you know, the the

state-of-the-art a year and a half ago

when this chart was was

developed. But I think the key message

here is by specializing by developing a

neural processing unit, not a GPU

variant, you can get these two order

magnitudes of power reduction and an

order magnitude or two of area reduction.

This applies to you know stable

diffusion but we also can apply this to

you know genai like lama 2. Uh all I

want to say about lama 2 is that this it

really the challenge of lama 2 is is

bandwidth limitations. So you need to do

tricks to reduce your coefficient size

go from you know integer 8 to integer 4.

Internally you can use higher accuracy

like integer 16 or or FP8 FP16 to

preserve accuracy but on the the the

bottleneck to DRAM is the coefficient

bandwidth and the large model sizes. So

what we've discovered is that basically

if you use all the available bandwidth

we're matching any public result using

the same amount of bandwidth because it

is completely bandwidth limited and not

resource limited which is not the case

for stable diffusion where you do have

it is compute

limited. So to summarize and I ran a few

minutes over time transformers are the

baseline for these deep learning models

that were developed initially in the

field of natural language processing.

We've discovered in the early 2020s that

we uh in six months we're able to

achieve state-of-the-art vision results

in other domains and we developed at

that time we started the design of the

uh NPX6 generation in

202021 and we took a bet that

transformers would be there to stay.

That bet so far has been uh proven right

and that we've seen not only

transformers remain but they're the

building block for these latest

generation of of Gen AI like stable

diffusion, lama 2, chat GPT and the very

latest uh you

know um mixture of expert B based

approaches like DeepSeek um that go

beyond and make more compact models by

having a dynamic model size and we can

support that as well. We've already done

the uh the uh preliminary benchmarking for

for

deepseek. So this is moving quickly.

These initially were in the cloud on

high performance uh high cost high power

GPUs and they're moving quickly in the

embedded space and we believe we're

we're we're prepared for this and we're

space. Thank you. I'm happy to take one

or two questions.

Thank you Pierre or in the panel. Thank

you Pierre. Just a reminder, you can

post your questions uh in the chat and

uh they will be addressed by the

speakers. So since there is no questions

in the chat, I I I I will I I'll have a

question for you, Pierre. Uh how do you

keep up with the rapidly evolving AI

models to ensure hardware architecture

remains compatible and efficient?

Yeah, so so far so good. It's been 5

years since we built the spec. So we

took uh fundamentals in basic primitives

as the building blocks and we made

everything programmable. Um that being

said it's programmable and it's

efficient around a certain class of

applications which are today you know

transformer dominated with a lot of

flexibility and complexity but still

they're CNN and

transformers. Hopefully that's still

true for the next couple years and we

therefore we have a market that's uh

that's that's valid for us. If there's a

new completely new invention, we'll

discover with

the with the rest of the world. But for

the moment, we don't see this um the the

choices we made in flexibility around a

class of of transformerbased and you

know genai based applications. So far so

good. But I cannot you know I don't have

a crystal ball but will there be a new

you know invention in two three four

years this happens in in AI every five years.

years.

I have a question for from the chat. Are

you planning on targeting the bigger

parameter models or is there interest in

smaller more specialized models as well?

That's more a question about our

customers and the answer from our

customers is the latter. Um in our space

we're in the embedded space. We're not

in the cloud. We have one or two

customers kicking the tires but most of

our committed customers are in auto half

of them are automotive. The other half

are consu high-end consumer, low-end

consumer, and all of those are really um

these more specialized models with

smaller because it's completely

bandwidth limited. So, it's not

realistic to use these these these large

models that are used in the in the cloud.

cloud.

Okay. Thank you, Kier. Uh next up is

Davis Sawyer from NXP Semiconductor, a

Canadian tech entrepreneur and AI

products marketing manager. Davis also

chairs the EdgeAI foundations industry

working group and brings a unique blend

of business and technical insights.

Today he'll talk about secure fine-tuned

LLMs for Gen AI at the edge. Welcome

Davis. Awesome. Thank you and and thanks

PR. Yeah, great uh way to kick things

off. A lot of great background and

context on how we've come to this place

and looking forward to diving in. Um you

know yeah it's definitely true that I

think one key insight was you know CNN's

were computebound and transformers are

now more memory bound and it's true that

we definitely at the edge especially

edge semiconductors play uh we we kind

of inherit what happens at the cloud and

some of the innovations there and then

you know look for markets look for

opportunities and build silicon that can

support that. I think interestingly for

this talk I'm actually despite NXP being

obviously a semi-actor company uh I'm

actually going to spend more time

talking about software and some of the

tools that we've built on top of our

SOC's and products that help make it

easier to deploy whole applications. Um

we definitely need you know these

benchmarks and these uh you know

testimonies to performance as a way of

uh both attracting customers and also

backing up and justifying is this you

know viable for for product practical

use cases. In this talk, I'm going to

show some of the some of the software

pipelines we've built um that are now

available, which is exciting. So, I'll

definitely point to some links and some

GitHub repos that the audience can can

access as of today to start seeing for

themselves some of the the value we

think we've created here. But, I'll dive

in assuming you can see my slides,

assuming you can hear me. Uh, everyone's

kind of gone quiet, so I'm going to jump

in here. I know we have limited time, so

I'll try to be as effective as

Yes. Okay. All good. All good. Thank

you. Excellent. So, here's the high

level overview today. What we call the

intelligent edge, which I'll define a

little more specifically. The edge can

be enableless term. So, I'm going to try

to be precise in in terms of what we

target. I'll also give a high level

overview of our AI software stack and

neutron MPU. You know, like like

Synopsis, like others, we have a

portfolio of in-house, but also licensed

MPUs um that I think give a a good range

of flexibility to our products that meet

different workloads. It's really about

rising to meet the needs of of what the

application demands and and having the

support for that both from throughput,

memory, CPU usage uh price performance,

power performance, all that kind of

stuff. Uh I'll then do a deeper dive on

our genlow and rag database generator.

These are two distinct software tools

that I think uh help create those

fine-tuned secure LMs we referenced in

the title. Um then I'll give a bit of a

what I think is a sneak peek to the the

future of where we see the edge market

which is enabling multimodal geni and

some recent strategic moves that NXP has

made to help support that again from our

product portfolio. I'll wrap up the

summary hopefully some questions and

looking forward to the panel as well. So

when we talk about the edge uh and we

talk about the opportunity we focus on

of course there were some some companies

you know named name named earlier and in

GPUs space specifically that dominate we

think is the training opportunity and

there is some training shift to the edge

definitely see that in factories uh

maybe locally for smart home hubs maybe

automotive as well where you have some

kind of you know connectivity shaping

how how these models are updated which

is the training part um but we focus on

the the prediction part and I mean one

of my favorite sayings in the space is

training is once and inference is

forever for any specific model so I

think that there's a big opportunity

when you focus on even just the

inference piece um and what that means

for a IML uh software and supporting uh

you know

devices how NXP looks at the intelligent

edge has a few lenses and there's

definitely not enough time to cover all

of our enablement today and all of our

options and portfolio so I'll focus on a

few key messages one key message is we

scale up from you know our MCXM MCUs

which are kind of the lowest footprint

size physical size uh you know power

power consumption ratio um that's

enabled by neutron on as of today we

have a lot of say time series and sensor

data driven by that some interesting uh

use cases that have already been built

over the last few years really we've

also introduced more recently this uh

crossover brand of MCUs under the IDMX

um flag that really support again you

know in the wearable category also power

power sensitive power sensitive use

cases but also some ML capabilities

penetrate into that market then our IMX

application or microprocessors really

that's our the bread and butter of

computer vision voice time series data

as well scaling up to the kind of stuff

that Pier was alluding to with

transformer-based workloads which of

course dramatically improve on the

accuracy you can do with computer vision

use cases or similar perception use

cases in general but there definitely is

this new class of applications that are

enabled by the reasoning or cognitive

abilities of LLMs and vision LLMs um

which of course I'll actually give some

demos of in a second which will be

pretty cool and so this is built on top

of our our EIQ software stack for our

customers to have an easier path and and

faster path to market by really

demystifying and simplifying a lot of

the stuff that has to happen I mean you

don't need maybe optimize every model as

as thoroughly. You don't need to do

retraining in many cases. But for those

cases where you have say the whole

spectrum of offtheshelf versus heavily

optimized, EIQ really rises to meet

those demands. We have a few specific

components that are a bit newer that I

won't cover again in depth today, but

want to mention time series studio

that's initially focused on MCUs but now

running on ID Automix as well specific

specific SOC's. Um that helps again for

AutoML for time series models. I will

focus today on the geni flow because one

of the themes is geni and I think

actually we cover almost every mega mega

theme mentioned for today by us at the

start um except for the AI and harsh

environments which will be cool to get

to but we cover model optimization we'll

cover software tools we'll cover

hardware design we'll cover um geni at

the edge of course and so again this is

meant to be flexible with both you know

productivity enhancement but also energy

efficiency and of course the performance

needed for the

task wouldn't be a good AI talk if we

didn't mention EIQ Qutron MPU which

again fits that scalability story where

we're scaling up from the MCXN you see

on the lowest end here with that you

know iteration scaling up to products

exist today with again external NPUs I

think that's what you know how NXP has

approached the market and may continue

to with with how we try to produce a

flexible product portfolio so our MX93 I

use the ARM ethos5 microMPU pretty

capable engine with with a software

pipeline you know and FQ components that

really help get the most performance

possible there our X8 plus has been in

the market for a few years and then our

our kind of what I'd call our lead

puncher flagship which is for ji

applications which is our IMX95 also

available today that actually uses our

in-house MPU which on paper we list as a

two top engine I actually think when you

see the performance it it punched above

its weight that's certainly true of

CNN's so your classical your not

classical but classic uh classification

and uh object detection segmentation

CNN's some of the stuff that was covered

earlier again to the newer generation of

models that are transformer-based um and

power workloads like vision transformers

of course And actually, I'll focus a bit

more on the LLM side. Um, we've built a

really capable voice UI, voice AI

pipeline that when you drop in an LLM at

the edge, uh, for both privacy and

real-time response reasons, when you

have the silicon to power it, um, can

create really interesting HMI, you know,

human machine interface and other

application spaces as well that weren't,

let's say, possible a few years ago even

until we had this transformer

breakthrough and then of course the edge

silicon to power

it. an underrated part because I think

it's a con, you know, one classic uh

perspective of AM practitioners is that

it's all about AI but I actually think

you know especially how NXP approaches

it with security in mind and really

best-in-class security even as I've been

recently learned postquantum crytography

which is super important for financial

automotive and and regulated domains. Um

we deliver that today. So when you

combine security plus intelligence at

the edge uh I think it's a very very

compelling offering and that's what NXP embodies.

embodies.

Now I'll go a little deeper into GIF

flow which is something I'm I'm happy to

say that I I own and drive here at NXP.

Um and I have a lot more innovation

coming in the back half of this year.

This actually exists today and I'll

point to a couple resources but to give

you a sense of why we built this and

what the problems it solves. I'll

actually look forward first at what are

our JI ambitions? What are we trying to

do here? So we already play in a wealth

of markets. Automotive being one

consumer uh to some degree mostly smart

home industrial uh smart building power

management uh wide breadth. Because of

this wide breadth, we have a big

opportunity to bring gener into these

domains that some already have some AI,

some AI is a bit newer technology, but

all the same, we have these new

capabilities that we want to be able to

leverage. And this is just I'd say not a

not an exhaustive list, but certainly a

good glimpse of places where we're

making a difference

already. The challenge though is you

have this geni landscape that we all

know is is changing rapidly. And so the

way that I tried to tried to parse this

and tried to present this for you know

audiences like today

is you have this core stack in the

middle. These are libraries we're

familiar with like the AI frameworks.

Then you have a lot of let's say

necessary components that you need to

either support or interface with somehow

also communication protocols as well as

be another big one for us. But for it

comes to edgi stack this is the kind of

the classic stack you know sandwich

diagram that we see a lot of places I

tried to bifrocate that from the moving

parts of AI that actually are really

dynamic. you get lots of innovation but

you know do you need to focus on as your

core expertise and then if you don't how

do you leverage it and so the takeaway

that I tried to distill for for us in XP

and for the edge is to solve problems by

creating optionality to leverage the

best of what changes so as we saw in the

previous talk you know leveraging

transformers huge deal and that actually

probably will change frequently what

comes next we don't know but having the

optionality to leverage it will be

important and so we also want to

optimize what doesn't change as often we

have this paradox of models changing

weekly or monthly but silicon and

products that need to live at the edge

in in devices in products for you know

10 to 20 years in some

cases excuse me so because of this

paradox we need to find a way to balance

both uh performance so getting the best

of what's possible while while having

longevity and I think when it comes to

IMX in particular longevity is a big

part of of what we do and stand for now

challenges we have with geni industry um

we should all be familiar with this from

even just playing with chatbt and other

other off-the-shelf or API based LLM

um they're limited in context

understanding. Obviously, they're

carpentage expensive. That's a given.

They have hallucinations, so they just

they make stuff up and combating that is

actually a pretty deep part of how they

work. That is not something readily

solved, let's say, unless you have some

some some powerful software

contributions, which again, rag is one

that's talked about a lot and I think is

here to stay as well. Um and and we

focus on NXP. And then errors and

reasoning. All this to say is that if

you want to use this for a a a use case

where there's material or people

involved, um you won't you can't have

any of these issues. You have to address

them somehow. And again, this happens in

software in most cases. And the sweet

spot we found by kind of assessing

what's out there and the approaches we

have. Uh on the bottom, you know, this

is how most of us use it for day-to-day

tasks that are not mission critical. On

the top, that's limit to a very small

audience in the world with the skills

and and and compute to do so. But here

in the middle, kind of goldilock

scenario, we have this ideal performance

overhead trade-off. And what we like

specifically about rag is it protects

users IP from the model creation even

from ourselves actually. We want to have

this environment where you have this

again secure and fine-tuned LMS without

compromising you know things you can't

compromise and this also helps you know

lower the the time to market where

you're not retraining um because you're

creating a database you're creating

something that can be stored on on the

edge device by parsing domain knowledge

specific knowhow in different forms to

be interfaced with an LM and actually

voice UI as we've done the genifi flow.

The other problem of course is model

size. I won't belabor this point but I

think one interesting you know thing we

have is that for every billion

parameters we have about a gigabyte of

memory we need not not bandwidth but

actual actual memory uh you know store

store these weights and models um so in

the integer a precision that data type

it's around a gigabyte and what this

means of course is as Pier alluded to is

you me these models tend to become

bandwidth they're memory memory

bound so because of these two precursors

we want to have fine-tuning we want to

have optimization we've baked this into

a software tool we call GIF flow and

this program is available um for free to

deploy you know off-the-shelf models. We

also have a commercial version that we

we provide for customers on on again the

IMX95 is the flagship product today

possibly other other families in the

future but that's where we really bring

both of these the capabilities the

finetuning to adapt these LMS to your

domain knowledge without compromising IP

but also eliminating those errors and

reasoning those hallucinations that you

have to for industry use on their side

optimization to get the best performance

the performance you need really I would

say the best performance is always

needed but acceptable performance

actually gets the job

Uh a little more deep dive on this would

actually be you know it's made of

modules. These modules are kind of the

building blocks DJI use cases that we

found to be quite common. So it comes to

voice you have kind of speech in speech

out these components wake events think

hey NXP you know hey XYZ that we want to

wake up could be also a visual event to

trigger that in the future and the

current version is focused on

conversation as I mentioned and I am

trying to go fast here. I apologize it's

uh just you know obviously time limited

but happy to go deeper on any of these

uh topics in the future. A quick demo of

what we're doing with uh with um this

rag engine and why it's so valuable

especially for contexts like medical is

it can actually have answers tailored to

data that is grounded in factuality,

grounded in truth and relevant to just

domain. So this is using an older bigger

model. We've actually I think had the

response time a lot faster with this.

You can see the text being generated.

We've introduced a streaming mode so you

don't have to wait for all the tokens to

be to be produced. you can actually

start producing earlier tokens faster.

conversational. But there you get the

TTS. That's the full that's kind of the

full use case of this. I think what we

can power the JUI plus the voice UI

before. It brings a lot of capability.

Closer look at the numbers here. Uh I

think where we were before and using

only CPU, it doesn't make sense. You

know, people read around just faster

than 510 tokens per second, but that 1.5

second delay just isn't natural. Isn't

suitable for conversational AI. Of

course, we can throttle that heavily

when we start focusing on both using a

neutron imp. So, that of course actually

accelerates greatly time to first token,

but achieves meaningful token speed. Um,

just for reference, this would be six

Cortex A CPUs versus a single NPU. We're

getting actually the performance of

about six, you know, high performance

Cortex A cores out of our Neutron MPU on

the IMX 95. Pretty cool.

As I mentioned at the start, this is

available today. We'd love for you guys

to start playing with, you know, the

voice UI that can be built with Jenny

Flow today and future engines coming

coming out to on uh later in this year.

The rack database generator is also a

unique tool that you can create from

kind of PDF in database out. I think

that's super effective to get a sense of

the quality um you know of of of how

effective this process is for

fine-tuning answers, but also the

efficiency because you know relative to

the size of an LLM, these databases are

designed for the edge and quite small.

I'll wrap up with a quick glimpse of

like I said the future before hopefully

time for a couple questions here. Um you

know we've made a a a big move in in

intending to acquire Canar which is a

MPU that's you know dedicated to kind of

the latest and greatest workloads and

and with quite efficient power um which

fit for the edge while expanding nicely

on the host SOC's which of course NXP is

our bread and butter. So we think this

is a really great uh union of of

technologies and teams and commercial

focus as well. And to give you a sense

of that as well, one space I focus on

might be an unsexy industry, but it's

industrial. And I think industrial is

ripe for geni innovation for reasons

like we see here with using um you know

uh visual, you know, both vision

transformer plus multimodal LLM to

understand what's happening in in a

series of

images. Um I won't go through all of

this here, but you can get I think this

is a great example of the kind of visual

intelligence that could be layered with

an agent for example to then notify

emergency services, notify supervisor

and actually take an action. So going

from perception to perception plus

action is a big theme for us with

Jenning at the edge. That's a quick so

quick summary. We can bring domain

specific intelligence to edge. These

elements can be optimized and deployed.

They can also be fine-tuned and then

leveraging the efficient acceleration we

have in the IMX family and integrated or

discrete MPUs plus this geni flow which

really serves as a one-stop shop. That's

how we bring GI to life in NXP today. Uh

thanks again for your time everyone and

hopefully just seen a few questions.

Thank you Davis for this great

presentation. There's a lot of material

to digest

today. I have a question from my

colleague at CMC James Miller. Uh is

there anything in the stack that helps

real time system developers ensure

timely bounded results? for example,

guaranteed critical responses within 50

milliseconds for safety in applications

like robotics and automotive. Yeah. So

for automotive uh I would call it so EIQ

is what we is AI space qualified. Um so

I think that there are some components

of that that help with these

deterministic requirements. One problem

of course with LM and AI in general is

that it has a stoastic or probabistic

nature. So when it comes to just LLM

throughput or these models uh having a hard cut off on their responses might be

hard cut off on their responses might be a little trickier. But for things like

a little trickier. But for things like the infotainment system automotive,

the infotainment system automotive, we've already deployed LMS at at

we've already deployed LMS at at reasonable conversation speeds. So I

reasonable conversation speeds. So I think for that kind of stuff, we can

think for that kind of stuff, we can already see a lot of innovation

already see a lot of innovation happening with the kinds of applications

happening with the kinds of applications that have these harsh requirements. You

that have these harsh requirements. You need a provider automotive grade uh AI

need a provider automotive grade uh AI and hardware stack to meet those needs.

and hardware stack to meet those needs. Yeah, thank you Davis. See you in the

Yeah, thank you Davis. See you in the panel. Thanks. Our next speaker, our

panel. Thanks. Our next speaker, our next speaker is Professor Warren Gross,

next speaker is Professor Warren Gross, a James Miguel professor and chair of

a James Miguel professor and chair of the department of electrical and

the department of electrical and computer engineering at Miguel

computer engineering at Miguel University. Warren's research bridges

University. Warren's research bridges algorithm and hardware with a focus on

algorithm and hardware with a focus on efficient deep learning models and

efficient deep learning models and hardware for machine learning. Today

hardware for machine learning. Today he'll present on parameter efficient

he'll present on parameter efficient fine-tuning of transformer-based

fine-tuning of transformer-based language model using data set pruning.

language model using data set pruning. Please join me in welcoming Warren. Why

Please join me in welcoming Warren. Why don't the stage is yours? Thank you very

don't the stage is yours? Thank you very much. Uh and it's a pleasure to be able

much. Uh and it's a pleasure to be able to speak again at this accelerating AI

to speak again at this accelerating AI workshop. This is

workshop. This is uh not the first time I've been here and

uh not the first time I've been here and I always enjoy the interactions and the

I always enjoy the interactions and the talks at this workshop. Uh can you all

talks at this workshop. Uh can you all see my slides? Okay. Yes. And we hear

see my slides? Okay. Yes. And we hear fine. Thank you. Okay. Great. So what

fine. Thank you. Okay. Great. So what I'm going to do today is talk about um

I'm going to do today is talk about um fine-tuning language models. following

fine-tuning language models. following on what Davis was talking about and I'm

on what Davis was talking about and I'm going to talk about some things that you

going to talk about some things that you can do in the U training process to make

can do in the U training process to make the um

the um uh the finetuning more

efficient. So we're we're talking about LLMs uh language models. Uh there was

LLMs uh language models. Uh there was great introductions to this area and

great introductions to this area and transformers in in the last two talks.

transformers in in the last two talks. Uh, I just wanted to show you um

Uh, I just wanted to show you um something that I found online uh as of

something that I found online uh as of late last year, some of the

late last year, some of the state-of-the-art LLMs. And what we're

state-of-the-art LLMs. And what we're seeing is that now in terms of the

seeing is that now in terms of the number of parameters in these LLMs,

number of parameters in these LLMs, we're talking about trillion plus

we're talking about trillion plus parameters. Uh these are absolutely

parameters. Uh these are absolutely enormous. And you know, looking around

enormous. And you know, looking around trying to find information about the

trying to find information about the training cost of these models, you see

training cost of these models, you see that you really need upwards of $100

that you really need upwards of $100 million to train uh one of these large

million to train uh one of these large language models from scratch. Now,

language models from scratch. Now, things changed very uh dramatically uh

things changed very uh dramatically uh at the end of the year when we saw Deep

at the end of the year when we saw Deep Seek uh which has um uh a training cost

Seek uh which has um uh a training cost uh dramatically lower of about $6

uh dramatically lower of about $6 million. uh and this has you know also

million. uh and this has you know also though it does have a lot of parameters

though it does have a lot of parameters but it's bit but smaller at 671 billion.

but it's bit but smaller at 671 billion. So there there are things that we can do

So there there are things that we can do in model design and we and also clever

in model design and we and also clever training uh to reduce this training cost

training uh to reduce this training cost but there there needs to be still uh

but there there needs to be still uh more attention paid to the efficiency of

more attention paid to the efficiency of training to bring it down even further.

training to bring it down even further. Looking at the trends in transformer

Looking at the trends in transformer size uh in terms of number of parameters

size uh in terms of number of parameters what we're seeing is an exponential

what we're seeing is an exponential increase in model size uh at the rate of

increase in model size uh at the rate of about 10 times per year. Uh this is a

about 10 times per year. Uh this is a very significant increase. And what we

very significant increase. And what we find is that when you have bigger and

find is that when you have bigger and bigger models, you also actually need

bigger models, you also actually need more and more data to accurately train

more and more data to accurately train the models. Uh and so there's really two

the models. Uh and so there's really two pieces to this. There's the how do you

pieces to this. There's the how do you efficiently train and and consider the

efficiently train and and consider the the hardware uh complexity and the and

the hardware uh complexity and the and the hardware you're training on. But

the hardware you're training on. But also there's a data set piece to this.

also there's a data set piece to this. How do you um uh consider the data sets

How do you um uh consider the data sets that we're we're going to talk about

that we're we're going to talk about both parts of this in in today's talk.

So the the way that training uh uh the training

that training uh uh the training challenge is addressed in large language

challenge is addressed in large language models is really broken down to a

models is really broken down to a two-stage process. The first stage is

two-stage process. The first stage is pre-training and that's the expensive

pre-training and that's the expensive stage. This is when you train from

stage. This is when you train from scratch on a huge data set, a

scratch on a huge data set, a pre-training data set. Uh and this is

pre-training data set. Uh and this is what would take the millions and

what would take the millions and millions of of dollars. uh and is really

millions of of dollars. uh and is really only something that can be done by large

only something that can be done by large large uh large companies that have

large uh large companies that have access to the large data set but also to

access to the large data set but also to the computational resources needed to do

the computational resources needed to do it. Um but then what you can do once you

it. Um but then what you can do once you have this general large uh pre-trained

have this general large uh pre-trained model, you can then fine-tune the model

model, you can then fine-tune the model in a second stage of training to a

in a second stage of training to a specific data set and for a specific

specific data set and for a specific task. uh and this finetuning data set is

task. uh and this finetuning data set is usually smaller and uh the finetuned

usually smaller and uh the finetuned model is then is then uh adapted to

model is then is then uh adapted to solving a particular

solving a particular task. In this talk, we're going to focus

task. In this talk, we're going to focus on the fine-tuning stage uh of this

on the fine-tuning stage uh of this process given uh an existing pre-trained

model. And so I wanted to talk about two key metrics, two key hardware metrics in

key metrics, two key hardware metrics in the training process. One will be the

the training process. One will be the training time, the amount of time it

training time, the amount of time it takes to train from start to end of the

takes to train from start to end of the process and the peak memory usage. So

process and the peak memory usage. So why is training time important? Because

why is training time important? Because it directly influences uh metrics of of

it directly influences uh metrics of of interest to everyone. The for example

interest to everyone. The for example energy usage. So the longer it takes to

energy usage. So the longer it takes to train, the more energy the process will

train, the more energy the process will take and the energy usage directly

take and the energy usage directly impacts the electricity bill or the

impacts the electricity bill or the battery life of the device as well as

battery life of the device as well as the carbon footprint.

the carbon footprint. On the other hand, the peak memory usage

On the other hand, the peak memory usage determines the minimum amount of memory

determines the minimum amount of memory you need to uh uh allocate on your

you need to uh uh allocate on your device uh such as a GPU or an NPU. This

device uh such as a GPU or an NPU. This will lead to more expensive devices and

will lead to more expensive devices and since you need many devices, this could

since you need many devices, this could be very significant cost as well as

be very significant cost as well as training difficulty. If your model

training difficulty. If your model doesn't fit in the memory of your GPU,

doesn't fit in the memory of your GPU, for example, then you may need to

for example, then you may need to partition the training process, move

partition the training process, move data in and out, and it it it can

data in and out, and it it it can complicate the training process.

complicate the training process. The other key aspect of memory is the

The other key aspect of memory is the number of memory accesses. And this

number of memory accesses. And this actually has a very strong uh influence

actually has a very strong uh influence on energy usage. So these are not

on energy usage. So these are not completely independent things, but

completely independent things, but they're both key metrics we want to

they're both key metrics we want to decrease. We want to have faster

decrease. We want to have faster training and using less memory.

So I'm going to introduce this uh picture on the right here which we'll

picture on the right here which we'll use throughout the talk to show the

use throughout the talk to show the effect of the different uh innovations

effect of the different uh innovations uh that can be applied to to the model

uh that can be applied to to the model training. So on the vertical axis we'll

training. So on the vertical axis we'll plot peak memory usage and the training

plot peak memory usage and the training time on the horizontal axis. Uh when you

time on the horizontal axis. Uh when you compare pre-training and fine-tuning you

compare pre-training and fine-tuning you see that fine-tuning uh has a much

see that fine-tuning uh has a much smaller training time than pre-training.

smaller training time than pre-training. But you'll notice that they both use the

But you'll notice that they both use the same amount of peak memory because we we

same amount of peak memory because we we haven't changed the

haven't changed the model. What we want to do is find

model. What we want to do is find techniques to move towards the bottom

techniques to move towards the bottom left of this uh of this graph to have

left of this uh of this graph to have low peak memory usage and lower training

low peak memory usage and lower training time. And we want to do all of this

time. And we want to do all of this without negatively affecting the model

accuracy. So the first thing we can do is use a technique called parameter

is use a technique called parameter efficient finetuning. And the basic idea

efficient finetuning. And the basic idea there is to avoid updating every single

there is to avoid updating every single model parameter when you're fine-tuning.

model parameter when you're fine-tuning. You can freeze large parts of the model

You can freeze large parts of the model weights and not actually update them and

weights and not actually update them and only focus on a subset of the weights um

only focus on a subset of the weights um uh in the fine-tuning. And this has a a

uh in the fine-tuning. And this has a a a couple of of uh of benefits. So if you

a couple of of uh of benefits. So if you look on the left, this is the normal uh

look on the left, this is the normal uh operation of training. First you do a

operation of training. First you do a forward pass, then you compute gradients

forward pass, then you compute gradients uh and then you update the weights using

uh and then you update the weights using those gradients. And the gradients and

those gradients. And the gradients and the weights happen to be moved in and

the weights happen to be moved in and out of memory uh uh which uh increases

out of memory uh uh which uh increases the amount of memory you need. Well, it

the amount of memory you need. Well, it sets the amount of memory you need and

sets the amount of memory you need and the time it takes to perform this

the time it takes to perform this training. In the memory are stored the

training. In the memory are stored the weights, the gradients and other uh

weights, the gradients and other uh optimizer states. Now in a uh frozen

optimizer states. Now in a uh frozen model you still have to perform the

model you still have to perform the forward pass but then you for a large

forward pass but then you for a large portion of the the parameters in the

portion of the the parameters in the model the frozen parameters you don't

model the frozen parameters you don't actually have to compute the gradients

actually have to compute the gradients or update the weights which means now

or update the weights which means now I'm going to reduce the amount of data

I'm going to reduce the amount of data that has to go back and forth between

that has to go back and forth between the compute and the memory I don't have

the compute and the memory I don't have to store uh all the gradients and the

to store uh all the gradients and the states so this is going to reduce the

states so this is going to reduce the peak memory uh quite substantially will

peak memory uh quite substantially will it reduce the time taken the training a

it reduce the time taken the training a little bit, but since you still have to

little bit, but since you still have to do the forward pass, the reduction in

do the forward pass, the reduction in time is not really significant. It's

time is not really significant. It's really the memory savings that you're

really the memory savings that you're you're achieving

you're achieving here. And the most popular way to do

here. And the most popular way to do this is a technique called Laura, low

this is a technique called Laura, low rank adaptation. And there you freeze

rank adaptation. And there you freeze most of the model. And and it's easiest

most of the model. And and it's easiest to think of freezing models in terms of

to think of freezing models in terms of freezing layers, but some of the layers,

freezing layers, but some of the layers, for example, the attention layers in a

for example, the attention layers in a transformer will be the ones that are

transformer will be the ones that are are are fine-tuned. Um and when you look

are are fine-tuned. Um and when you look at a layer that is going to be um not

at a layer that is going to be um not frozen, what you what you do is you take

frozen, what you what you do is you take the the linear layer which is the m byn

the the linear layer which is the m byn matrix and you actually freeze that and

matrix and you actually freeze that and add additional parameters in parallel to

add additional parameters in parallel to that linear layer. Um so it does involve

that linear layer. Um so it does involve adding a few extra parameters but this

adding a few extra parameters but this not too many because uh this parallel

not too many because uh this parallel layer is an n by r concatenated with an

layer is an n by r concatenated with an r byn matrix where r is a very very

r byn matrix where r is a very very small number. So you have a small

small number. So you have a small additional number of parameters, but in

additional number of parameters, but in terms of training, it makes it much much

terms of training, it makes it much much much easier. And in inference, you're

much easier. And in inference, you're going to use both of these layers. So

going to use both of these layers. So this will result in fewer memory

this will result in fewer memory accesses and smaller peak

accesses and smaller peak memory. And what is the effect of doing

memory. And what is the effect of doing uh parameter efficient finetuning like

uh parameter efficient finetuning like Laura? Well, the peak memory is

Laura? Well, the peak memory is considerably reduced. So you can see on

considerably reduced. So you can see on this graph that uh compared to standard

this graph that uh compared to standard finetuning, finetuning with Laura

finetuning, finetuning with Laura combined reduces the peak memory. the

combined reduces the peak memory. the training time is also slightly reduced

training time is also slightly reduced but not by a lot. Uh so how now we we

but not by a lot. Uh so how now we we we're asking the question how do we

we're asking the question how do we further decrease the training

further decrease the training time and to do that we need to look at

time and to do that we need to look at the other part of the equation which is

the other part of the equation which is the data that you're training on and

the data that you're training on and when you look at a data set which is a

when you look at a data set which is a collection of of data samples. You can

collection of of data samples. You can realize that not all the samples in the

realize that not all the samples in the data set are are equally helpful. Some

data set are are equally helpful. Some of them are not helpful at all. For

of them are not helpful at all. For example, some data points may be

example, some data points may be mislabeled and they can actually be

mislabeled and they can actually be misleading to the model. It may actually

misleading to the model. It may actually hurt your model if you train on them.

hurt your model if you train on them. Some of them are very very easy. We call

Some of them are very very easy. We call easy data points where they don't really

easy data points where they don't really add any information to the model that

add any information to the model that doesn't already exist in the pre-trained

doesn't already exist in the pre-trained model. On the other hand, some are very

model. On the other hand, some are very very difficult and if you train on them,

very difficult and if you train on them, it can actually lead to bad things and

it can actually lead to bad things and damage your model. So, we really would

damage your model. So, we really would like to not train on those kinds of data

like to not train on those kinds of data points. They're unwanted. And if we can

points. They're unwanted. And if we can identify them and prune them away, what

identify them and prune them away, what we can do is have a fine-tuning model

we can do is have a fine-tuning model that is now more accurate. But also if

that is now more accurate. But also if you think about it if we don't train on

you think about it if we don't train on on certain number of of data points then

on certain number of of data points then it's going to be faster to train. So it

it's going to be faster to train. So it has the the dual benefit of of uh

has the the dual benefit of of uh benefiting the accuracy potentially but

benefiting the accuracy potentially but also reducing the training time. That's

also reducing the training time. That's the training time is really the the

the training time is really the the aspect that we want to look at

aspect that we want to look at here. So in data step pruding you want

here. So in data step pruding you want to find which data points you don't uh

to find which data points you don't uh want to train on by evaluating some

want to train on by evaluating some score function. And so uh we've looked

score function. And so uh we've looked at at uh the design of score functions

at at uh the design of score functions to look at each data point in the uh in

to look at each data point in the uh in the data set and see if we can prune it

away. And our uh score function we call the h score and it works kind of like

the h score and it works kind of like this. So you do some training some

this. So you do some training some finetuning for a few epics and you look

finetuning for a few epics and you look whether the classification for each data

whether the classification for each data point was correct or not. So we look at

point was correct or not. So we look at the ground truth and we see did the

the ground truth and we see did the model correctly classify or not. And if

model correctly classify or not. And if the the model is correctly classifying

the the model is correctly classifying across all epics consistently then we

across all epics consistently then we give uh that data point a score of one

give uh that data point a score of one and then we do this multiple times for

and then we do this multiple times for different seeds or six seeds and we just

different seeds or six seeds and we just add the score for every seed. So what we

add the score for every seed. So what we end up with is a score for data points

end up with is a score for data points that are consistently always being

that are consistently always being correctly trained uh uh giving the

correctly trained uh uh giving the correct answer then you could have a

correct answer then you could have a score of six. On the other hand, if you

score of six. On the other hand, if you have data points that are always

have data points that are always consistently giving the wrong

consistently giving the wrong classification, then you have a score of

classification, then you have a score of zero. And so you can have scores between

zero. And so you can have scores between zero and six. If we look at the scores

zero and six. If we look at the scores of uh data points with score six, these

of uh data points with score six, these are really, really, really easy data

are really, really, really easy data points. They're always getting the right

points. They're always getting the right answer. The model probably already knew

answer. The model probably already knew the answer. So you don't need those. You

the answer. So you don't need those. You can prune those away. Scores that are

can prune those away. Scores that are zero is very difficult. You don't want

zero is very difficult. You don't want those there. So you prune those away.

those there. So you prune those away. And in the middle, you have scores that

And in the middle, you have scores that are ambiguous. These are the data points

are ambiguous. These are the data points we keep and train

on. So the training time is now reduced proportional to the size of the prune

proportional to the size of the prune subset. In our experiments, we're

subset. In our experiments, we're pruning away 70 to 80% of the data set.

pruning away 70 to 80% of the data set. So we're left with maybe 20 or 30% uh of

So we're left with maybe 20 or 30% uh of the original data set. So this can have

the original data set. So this can have significant decreases in training time.

significant decreases in training time. So oops, you can see now that um we've

So oops, you can see now that um we've now talked about two two methods. the

now talked about two two methods. the Laura which can reduce the peak memory

Laura which can reduce the peak memory usage and uh data set pruding using H

usage and uh data set pruding using H core which can reduce the training time.

core which can reduce the training time. What we'd like to do now is see can we

What we'd like to do now is see can we combine both of these techniques to

combine both of these techniques to drive us uh closer to the the bottom

drive us uh closer to the the bottom left to have low memory usage and low

left to have low memory usage and low training time. And so the proposed

training time. And so the proposed method does both. You take the

method does both. You take the pre-trained model apply low rank

pre-trained model apply low rank adaptation to come up with a parameter

adaptation to come up with a parameter efficient model. That model is then used

efficient model. That model is then used with the fine-tuning data set to compute

with the fine-tuning data set to compute an H score. Now that I have an H score,

an H score. Now that I have an H score, I can do data set pruning. I have a

I can do data set pruning. I have a prune data set applied to the lower

prune data set applied to the lower model. I can fine-tune to come up with

model. I can fine-tune to come up with my fine-tuned model which I can then

my fine-tuned model which I can then evaluate. So this is the results of of

evaluate. So this is the results of of the evaluation of these um of these two

the evaluation of these um of these two techniques combined. And we have here uh

techniques combined. And we have here uh combinations of one or the other

combinations of one or the other technique and then see the both of them

technique and then see the both of them together. Um and so compared to the the

together. Um and so compared to the the baseline which we'll just say uh has a

baseline which we'll just say uh has a speed up of of one normalized and um the

speed up of of one normalized and um the peak memory of about 10 gigs. The data

peak memory of about 10 gigs. The data the Laura by itself has limited speed up

the Laura by itself has limited speed up 1.2 times but a significant reduction in

1.2 times but a significant reduction in peak memory usage. the data set pruning

peak memory usage. the data set pruning using the hcore by itself has

using the hcore by itself has significant speed up of of over four

significant speed up of of over four times but of course doesn't reduce the

times but of course doesn't reduce the uh peak memory. So as we hypothesized

uh peak memory. So as we hypothesized the experiments showed that when you

the experiments showed that when you combine both of these techniques

combine both of these techniques together on average you could have over

together on average you could have over five times speed up and also enjoy the

five times speed up and also enjoy the significant compression uh of memory.

Now looking at accuracy uh what we've done here is we've looked at these two

done here is we've looked at these two techniques both in individually and

techniques both in individually and combined and we've also added uh

combined and we've also added uh comparison with random pruning not using

comparison with random pruning not using hcore and the reason why we include

hcore and the reason why we include random pruning is it's actually in the

random pruning is it's actually in the regime where we have significant data

regime where we have significant data set pruning of 80% or or more random is

set pruning of 80% or or more random is actually state-of-the-art. It's actually

actually state-of-the-art. It's actually better than the other uh scoring

better than the other uh scoring functions that have been posed. Maybe

functions that have been posed. Maybe when you're doing less aggressive

when you're doing less aggressive compression, there are other techniques

compression, there are other techniques that can be used. But in this highly

that can be used. But in this highly aggressive regime, uh random is is very

aggressive regime, uh random is is very good. So the question is does H4 do

good. So the question is does H4 do better? And it does. And in fact, it's

better? And it does. And in fact, it's necessary to get uh excellent

necessary to get uh excellent performance. So what we can see here is

performance. So what we can see here is that the accuracy um overall is actually

that the accuracy um overall is actually improved slightly by using H4 pruning.

improved slightly by using H4 pruning. Um and Laura helps as well. So the

Um and Laura helps as well. So the combination of the two is very effective

combination of the two is very effective and it either doesn't hurt or slightly

and it either doesn't hurt or slightly improves because of regularization

improves because of regularization affects the uh accuracy on a model like

affects the uh accuracy on a model like Roberto large which has 350 and so

Roberto large which has 350 and so million

million parameters. Finally I just wanted to

parameters. Finally I just wanted to introduce uh one additional uh set of

introduce uh one additional uh set of experiments on continual learning. So

experiments on continual learning. So what is continual learning? It's the

what is continual learning? It's the scenario where I'm going to fine-tune on

scenario where I'm going to fine-tune on two different tasks consecutively. So, I

two different tasks consecutively. So, I have a pre-trained model and I'll

have a pre-trained model and I'll fine-tune it to to task one and I was I

fine-tune it to to task one and I was I I'll end up with a model. Then I'll take

I'll end up with a model. Then I'll take that model, that fine-tune model and

that model, that fine-tune model and fine-tune on a different task, task

fine-tune on a different task, task number two. Now, the problem is now I

number two. Now, the problem is now I have a model that's been fine-tuned

have a model that's been fine-tuned twice. And what can tend to happen is

twice. And what can tend to happen is that when you fine-tune the second time,

that when you fine-tune the second time, the model forgets how to do the first uh

the model forgets how to do the first uh task. It can be damaged. And so we want

task. It can be damaged. And so we want to evaluate

to evaluate um do these techniques of data set

um do these techniques of data set pruning and

pruning and Laura help mitigate the forgetting in

Laura help mitigate the forgetting in the of the first model when when

the of the first model when when fine-tuning more than once and and the

fine-tuning more than once and and the answer is yes it does. And so what what

answer is yes it does. And so what what we'll do is we'll evaluate this

we'll do is we'll evaluate this fine-tuned model the final one that's

fine-tuned model the final one that's been trained on task one and two and

been trained on task one and two and we'll we'll evaluate them on both tasks

we'll we'll evaluate them on both tasks and we'll see the effect of of uh

and we'll see the effect of of uh applying these techniques.

applying these techniques. So you can see in the first line where

So you can see in the first line where there's there's uh no modification just

there's there's uh no modification just there's two scenarios one where we task

there's two scenarios one where we task number one is called MNLI and then we

number one is called MNLI and then we train it on QNLI and there's also this

train it on QNLI and there's also this other scenarios C2MDB

other scenarios C2MDB uh you can see that the the first model

uh you can see that the the first model the first task MNLI in this case is

the first task MNLI in this case is dramatically damaged by fine-tuning on

dramatically damaged by fine-tuning on QNLI so what we can see then is by

QNLI so what we can see then is by applying Laura you can get back some of

applying Laura you can get back some of the for forgetting. So you can mitigate

the for forgetting. So you can mitigate some of the damage but not a lot.

some of the damage but not a lot. Applying data set pruning then is key to

Applying data set pruning then is key to actually rebuild the performance on that

actually rebuild the performance on that first task. Uh and compared to random uh

first task. Uh and compared to random uh we find the age score actually does u

we find the age score actually does u does a little bit better. And so it it

does a little bit better. And so it it really uh shows us that the combination

really uh shows us that the combination of these two techniques not only gives

of these two techniques not only gives you um uh more efficient fine-tuning in

you um uh more efficient fine-tuning in terms of memory reduction and training

terms of memory reduction and training time but also can help in continual

time but also can help in continual learning scenarios by mitigating damage

learning scenarios by mitigating damage to the the first uh task.

to the the first uh task. So in conclusion, we've have looked at

So in conclusion, we've have looked at ways to reduce the peak memory usage and

ways to reduce the peak memory usage and training time of fine-tuning in uh data

training time of fine-tuning in uh data large language models. Uh the first

large language models. Uh the first technique Laura which is one that's not

technique Laura which is one that's not ours but we we've evaluated it uh is

ours but we we've evaluated it uh is very effective in mainly reducing the

very effective in mainly reducing the memory usage and our proposed data set

memory usage and our proposed data set pruning using Hore greatly reduces

pruning using Hore greatly reduces training time. So what we've done is

training time. So what we've done is combined both of these together and

combined both of these together and shown that it's very effective. We

shown that it's very effective. We receive a over five times speed up and a

receive a over five times speed up and a 40% peak memory reduction and uh I've

40% peak memory reduction and uh I've also shown that these two techniques in

also shown that these two techniques in combination are very effective in

combination are very effective in mitigating the forgetting uh of the

mitigating the forgetting uh of the first task in a continual learning

first task in a continual learning setup. So that's my presentation and I

setup. So that's my presentation and I want to thank you very much for uh for

want to thank you very much for uh for your attention.

your attention. Thank you Warren for this great

Thank you Warren for this great presentation and great research. Um I

presentation and great research. Um I think this is a much needed feature

think this is a much needed feature going forward to reduce the training

going forward to reduce the training time and the memory usage. This will

time and the memory usage. This will also reduce power in data center when we

also reduce power in data center when we train models. Right. So what are uh I I

train models. Right. So what are uh I I think this is not a technical question.

think this is not a technical question. What are the key tools and hardware

What are the key tools and hardware required to enable research in parameter

required to enable research in parameter efficient finetuning and data set

efficient finetuning and data set pruning and what are the main pain

pruning and what are the main pain points uh you face in advancing this

points uh you face in advancing this field? Right. So the that's a good

field? Right. So the that's a good question. Thank you very much. So the

question. Thank you very much. So the main um bottleneck in terms of tools for

main um bottleneck in terms of tools for this kind of research is available

this kind of research is available computational resources. So getting

computational resources. So getting enough GPUs or NPUs we using GPUs to do

enough GPUs or NPUs we using GPUs to do the training is um is very significant

the training is um is very significant uh bottleneck especially because in data

uh bottleneck especially because in data set pruning in order to compute the um h

set pruning in order to compute the um h score we need to do multiple training.

score we need to do multiple training. So this is this is um uh quite a pain

So this is this is um uh quite a pain point. And um the second uh challenge we

point. And um the second uh challenge we have is partly related to the resources

have is partly related to the resources we have and also partly related just to

we have and also partly related just to coming up with good techniques is how to

coming up with good techniques is how to scale this to very large language models

scale this to very large language models which we have not done yet although in

which we have not done yet although in this current ongoing work we're looking

this current ongoing work we're looking at how to apply this to much larger

at how to apply this to much larger class LLM. Yeah, this was my next

class LLM. Yeah, this was my next question. Thank you. Okay, thank you

question. Thank you. Okay, thank you Warren. Um, due time limitation, we're

Warren. Um, due time limitation, we're going to uh go to our next speaker, but

going to uh go to our next speaker, but please do not hesitate to ask uh the

please do not hesitate to ask uh the speakers question directly in the chat.

speakers question directly in the chat. They will be pleased to answer them. Up

They will be pleased to answer them. Up next is Borak Kmak, CTO of Edge Signal.

next is Borak Kmak, CTO of Edge Signal. With over 18 years of experience leading

With over 18 years of experience leading global products development in edge

global products development in edge computing and cyber security, Borak

computing and cyber security, Borak brings both technical and strategic

brings both technical and strategic insights. He'll be speaking on uh

insights. He'll be speaking on uh implementing generative AI in the edge

implementing generative AI in the edge environments, challenges and solutions.

environments, challenges and solutions. Let's hear from Borak. Borak, the stage

Let's hear from Borak. Borak, the stage is yours. Thank you so much, Yasin. So,

is yours. Thank you so much, Yasin. So, let me share my

let me share my screen. Is that visible now? It's

screen. Is that visible now? It's visible and we hear you fine. Thank you.

visible and we hear you fine. Thank you. Okay. Uh

Okay. Uh so yeah being the last uh speaker of

so yeah being the last uh speaker of this first section uh it's going to be

this first section uh it's going to be kind of repetitive because the things

kind of repetitive because the things that I want to discuss already some of

that I want to discuss already some of them are uh you know

them are uh you know uh well presented by the prior uh

uh well presented by the prior uh speakers but I will skip those parts a

speakers but I will skip those parts a little bit faster and come up to the

little bit faster and come up to the real challenges that we are coming

real challenges that we are coming uh like currently they facing in the

uh like currently they facing in the customer

customer environments.

environments. Uh

Uh so here uh the reason we wanted to take

so here uh the reason we wanted to take advantage of edge llm was

advantage of edge llm was uh pretty you know obvious. Of course,

uh pretty you know obvious. Of course, we have uh

we have uh many different

many different uh object detection models, object

uh object detection models, object classification models

classification models uh and those are really hard to maintain

uh and those are really hard to maintain and the accuracy of those models.

and the accuracy of those models. uh fluctuates depending on the

uh fluctuates depending on the environment even the you know uh

environment even the you know uh lightning condition or the you know

lightning condition or the you know height of the ceiling. These are all the

height of the ceiling. These are all the uh different factors which affects our

uh different factors which affects our uh accuracy in different

uh accuracy in different uh industries. So as edge signal we have

uh industries. So as edge signal we have uh foothold in retail hospitality

uh foothold in retail hospitality uh warehouses and we have a very good uh

uh warehouses and we have a very good uh project with department of defense of

project with department of defense of Canada. So this allowed us to

Canada. So this allowed us to uh sense all the different uh parameters

uh sense all the different uh parameters and different uh variables of each

and different uh variables of each industry.

So as I said we have our convolutional neuronet network models and we have our

neuronet network models and we have our LLMs. Uh the reason we want to leverage

LLMs. Uh the reason we want to leverage LLMs at the edge more is first we want

LLMs at the edge more is first we want rich contextual understanding. It's not

rich contextual understanding. It's not beyond just simple labels that we can

beyond just simple labels that we can extract from a simple uh CNN. And we

extract from a simple uh CNN. And we want to uh you know take advantage of a

want to uh you know take advantage of a unified backbone which can handle

unified backbone which can handle different scenarios. Of course, taking

different scenarios. Of course, taking advantage of finetuning and retrainings

advantage of finetuning and retrainings are also very important for us because

are also very important for us because with minimal effort when I say minimal

with minimal effort when I say minimal effort so we have significant you know

effort so we have significant you know challenges especially in the MLOps

challenges especially in the MLOps pipelines

pipelines uh and the data governance part and it's

uh and the data governance part and it's really hard to ensure the quality of the

really hard to ensure the quality of the data and the you know

data and the you know standards of the data. So with

standards of the data. So with LLM's simple image and caption uh you

LLM's simple image and caption uh you know uh data sets for uh

know uh data sets for uh finetunings really attracts us a lot a

finetunings really attracts us a lot a lot and of course ondevice reasoning uh

lot and of course ondevice reasoning uh we have a full-fledged

we have a full-fledged uh workflows and in inside that

uh workflows and in inside that workflows we can easily assess the

workflows we can easily assess the results of an LLM response and take

results of an LLM response and take action in regard regarding that one. So

action in regard regarding that one. So these are the main motivation that we

these are the main motivation that we want we are working on edge

want we are working on edge LLM. So

currently anomaly detection and object tracking are our main uh use cases

tracking are our main uh use cases especially in the warehouses and uh

especially in the warehouses and uh retail stores. uh we are using LLMs for

retail stores. uh we are using LLMs for detecting anomalies

detecting anomalies uh and reasoning a behavior

uh and reasoning a behavior uh and it it's run side by side with our

uh and it it's run side by side with our object detectors and classifiers. So you

object detectors and classifiers. So you can assume

can assume the detectors

the detectors classifiers are acting as a pre filter

classifiers are acting as a pre filter to the you know uh LLMs and after we

to the you know uh LLMs and after we have a we reduce the number of samples

have a we reduce the number of samples to send the prompt to LLMs we are doing

to send the prompt to LLMs we are doing a really good preprocessing before that.

a really good preprocessing before that. So by this way we can

So by this way we can uh utilize the LLM instances at the edge

uh utilize the LLM instances at the edge uh pretty

efficiently. So getting started I think the one of the most

started I think the one of the most important thing is uh choosing the right

important thing is uh choosing the right LLM. Uh currently

LLM. Uh currently we are

we are using two prominent uh you know LLMs.

using two prominent uh you know LLMs. One of them is of course uh Lamas 3.2

One of them is of course uh Lamas 3.2 and the other one is QAn.

and the other one is QAn. Uh we are able to we can able to run

Uh we are able to we can able to run them in edge devices like Jetson orin or

them in edge devices like Jetson orin or rock chip even like on a six tops

rock chip even like on a six tops uh

uh NPU. Now we are able to run them uh and

NPU. Now we are able to run them uh and get some good results with them. And in

get some good results with them. And in order to do this we applied some pruning

order to do this we applied some pruning techniques, some

techniques, some quantization and we evaluated our models

quantization and we evaluated our models on different

on different accelerators and we built a really

accelerators and we built a really advanced uh data pipeline for continuous

advanced uh data pipeline for continuous learning. And of course it's all subject

learning. And of course it's all subject to the consent uh you know

to the consent uh you know getting getting from the customer.

getting getting from the customer. uh if the customer is okay with the data

uh if the customer is okay with the data collection

collection uh we just enable our data collection

uh we just enable our data collection pipeline for that account and when I say

pipeline for that account and when I say account our system is a full um

account our system is a full um multi-talented system and this

multi-talented system and this multi-talented system allows us to just

multi-talented system allows us to just immediately set up a new data collection

immediately set up a new data collection stream from any store any time uh if we

stream from any store any time uh if we get the you consent from the customer.

get the you consent from the customer. Those customers also contributes us to

Those customers also contributes us to uh like improve our model

qualities. So one of the most important problems

So one of the most important problems that we faced so far was of course

that we faced so far was of course the computational limitations in the

the computational limitations in the computational resources.

computational resources. You can't go further to let's say a

You can't go further to let's say a couple of billion uh

couple of billion uh parameters and like the moment you start

parameters and like the moment you start applying the pruning and quantization

applying the pruning and quantization techniques immediately the you know LLM

techniques immediately the you know LLM responses

responses quality drops significantly. This is why

quality drops significantly. This is why on the other hand you start

on the other hand you start uh feeding the you know LLM with new

uh feeding the you know LLM with new fine tunings and trying to balance the

fine tunings and trying to balance the accuracy and the resource usage. So when

accuracy and the resource usage. So when it comes to especially some certain

it comes to especially some certain verticals like retail which is the most

verticals like retail which is the most ready industry to edge AI I should say

ready industry to edge AI I should say because like both

because like both hospitality smart manufacturing

hospitality smart manufacturing uh warehouses of course you see a lot of

uh warehouses of course you see a lot of interest from those industries as well

interest from those industries as well but I should

but I should say no one embraces edge AI more than

say no one embraces edge AI more than retail customers. So, but the downside

retail customers. So, but the downside of the retail customers is now you start

of the retail customers is now you start to deal with tens, hundreds of different

to deal with tens, hundreds of different branches for one customer like we have

branches for one customer like we have 400, you know, locations and even

400, you know, locations and even offering something like a uh $3,000

offering something like a uh $3,000 $4,000

cost really matters. And so we had to reduce our costs

reduce our costs significantly. And this is why we have

significantly. And this is why we have to be very frugal to select all those

to be very frugal to select all those you know LLMs and of course on that also

you know LLMs and of course on that also the object detection and classification

the object detection and classification models. So this is why uh that resource

models. So this is why uh that resource limitations pushed us a lot to the uh

limitations pushed us a lot to the uh all the tradeoffs that we have to do to

all the tradeoffs that we have to do to balance the accuracy and the uh resource

balance the accuracy and the uh resource usage.

usage. And of course another challenge that we

And of course another challenge that we had which we were in in fact lucky

had which we were in in fact lucky enough because edge signal started as an

enough because edge signal started as an edge computing company before starting

edge computing company before starting the our AI uh journey. We had a

the our AI uh journey. We had a full-fledged edge computing platform

full-fledged edge computing platform which allowed us to roll out any kind of

which allowed us to roll out any kind of software to any Linux based device which

software to any Linux based device which is also used by other companies. So some

is also used by other companies. So some our some of our customers are using our

our some of our customers are using our platform layer to manage their device

platform layer to manage their device fleet and some of them are using our

fleet and some of them are using our edge AI applications turnkey

edge AI applications turnkey applications directly to start analysis

applications directly to start analysis on existing video streams

on existing video streams and while doing that our like deployment

and while doing that our like deployment features, device management features

features, device management features which really significantly spend like

which really significantly spend like uh drains your energy uh during those

uh drains your energy uh during those projects. It really helped us a lot to

projects. It really helped us a lot to focus on real customer value rather than

focus on real customer value rather than the logistics. We were mainly uh focused

the logistics. We were mainly uh focused in the customer

value and yeah another thing I want to mention of course uh in a signal as I

mention of course uh in a signal as I said especially the retail customers are

said especially the retail customers are price sensitive the uh TCO of the

price sensitive the uh TCO of the uh systems

uh systems is pretty important this is why we have

is pretty important this is why we have to go with low cost solutions.

to go with low cost solutions. uh we started with especially rock chip

uh we started with especially rock chip and we have also saw uh that we use and

and we have also saw uh that we use and lately we adapted our systems to Jetson

lately we adapted our systems to Jetson Orin and uh Nvidia servers. So depending

Orin and uh Nvidia servers. So depending on the scale uh we always make a you

on the scale uh we always make a you know decision which solution fits best

know decision which solution fits best for this customer. For instance, there

for this customer. For instance, there is no reason to put multiple uh small

is no reason to put multiple uh small ARM PCs for a warehouse or an

ARM PCs for a warehouse or an hospitality customers which have mostly

hospitality customers which have mostly have like hundreds of cameras.

have like hundreds of cameras. in a

in a $200 uh edge gateway. Currently, we can

$200 uh edge gateway. Currently, we can handle

handle uh 7 to to eight

uh 7 to to eight uh live camera streams with our object

uh live camera streams with our object detection models. And of course

detection models. And of course uh when you take LLM part into account

uh when you take LLM part into account uh you have you you can't always send

uh you have you you can't always send all those frames to LLMs. uh we

all those frames to LLMs. uh we selectively like currently our TPS on

selectively like currently our TPS on such devices like a Jetson or in Nano is

such devices like a Jetson or in Nano is around the 2.5 uh seconds and we are

around the 2.5 uh seconds and we are working on to reduce this time with uh

working on to reduce this time with uh of course uh we have a couple of

of course uh we have a couple of university projects ongoing with Carlton

university projects ongoing with Carlton and Yu Odawa and yeah this is an ongoing

and Yu Odawa and yeah this is an ongoing you know effort that we are trying to

you know effort that we are trying to explain a video stream. This is the

explain a video stream. This is the ultimate goal of our company

ultimate goal of our company strategy. And for accuracy improvements,

strategy. And for accuracy improvements, uh we all started with prompt

uh we all started with prompt engineering. Uh

engineering. Uh and we are leveraging Lchain and

and we are leveraging Lchain and Langraph a lot because like we started

Langraph a lot because like we started with Lchain but recently we also started

with Lchain but recently we also started to use Langraph for more better using

to use Langraph for more better using reasoning. Uh, rug is something we are

reasoning. Uh, rug is something we are using a lot and we started with vector

using a lot and we started with vector database queries but lately we are very

database queries but lately we are very happy to take advantage of some tools

happy to take advantage of some tools which generates some database queries

which generates some database queries which allowed us to keep the data in the

which allowed us to keep the data in the original you know storage and we don't

original you know storage and we don't have to worry about the sampling uh and

have to worry about the sampling uh and the you know uh context windows. size of

the you know uh context windows. size of the LLM uh generated queries immediately

the LLM uh generated queries immediately you know sent to the databases and we

you know sent to the databases and we get the result and then the second query

get the result and then the second query is sent by automatically by both LCI and

is sent by automatically by both LCI and L graph can do this and we get pretty

L graph can do this and we get pretty good results with

good results with them. Another thing we are working on is

them. Another thing we are working on is uh parameter efficient fine-tuning.

uh parameter efficient fine-tuning. So here is our problem there. Uh

So here is our problem there. Uh especially human in the loop

especially human in the loop uh data you know captioning uh we had a

uh data you know captioning uh we had a little bit hard time there but

little bit hard time there but thankfully in the late couple of weeks

thankfully in the late couple of weeks uh we have a very good uh you know team

uh we have a very good uh you know team there who can govern all the data and

there who can govern all the data and responsible for uh better captionings.

responsible for uh better captionings. Another thing

Another thing that we we had some problems was we

that we we had some problems was we realized the number of

uh samples in the data set really matters in Lora. So anything below 1,000

matters in Lora. So anything below 1,000 you know uh let's say samples were

you know uh let's say samples were useless but think about we are working

useless but think about we are working on different tasks like for example

on different tasks like for example customer is interested in

customer is interested in uh employee loitering so employee

uh employee loitering so employee loitering yes with chain uh you can go

loitering yes with chain uh you can go to a certain level of you know loitering

to a certain level of you know loitering detection and when I say loitering it's

detection and when I say loitering it's not just standing still in a certain

not just standing still in a certain zone like our object detection side can

zone like our object detection side can handle that part but what is the real

handle that part but what is the real loitering maybe even in a zone they are

loitering maybe even in a zone they are dealing with some work related tasks so

dealing with some work related tasks so this is why we incorporated uh LLMs

this is why we incorporated uh LLMs there and like accusing somebody to

there and like accusing somebody to loiter is really a serious thing and

loiter is really a serious thing and with prompt engineering until a certain

with prompt engineering until a certain point we had very good results but now

point we had very good results but now we are working on adding another task to

we are working on adding another task to our LLMs but that requires you to

our LLMs but that requires you to collect more data. So how come you can

collect more data. So how come you can come up with some loyal related images?

come up with some loyal related images? So it's not available in any of the

So it's not available in any of the those you know data sets like kaggel or

those you know data sets like kaggel or other places. So you have to craft that

other places. So you have to craft that data sets and crafting that data set

data sets and crafting that data set from the customer premises is not easy.

from the customer premises is not easy. you have you need time, you need more

you have you need time, you need more samples and with Laura anything below

samples and with Laura anything below 1,000 was not effic effective I should

1,000 was not effic effective I should say and and lately uh we are planning to

say and and lately uh we are planning to try this retraining part as well. So I

try this retraining part as well. So I discussed about data governance and

discussed about data governance and quality and yeah especially using a

quality and yeah especially using a versioning tool like DVC and some tools

versioning tool like DVC and some tools like CIAD is very important here as

like CIAD is very important here as well and yeah that's it I guess from my

well and yeah that's it I guess from my side. Uh so this is an example of one of

side. Uh so this is an example of one of our devices that we manufactured with

our devices that we manufactured with our partner. Uh we have a couple of

our partner. Uh we have a couple of different accelerators that we have

different accelerators that we have which have supports passive cooling and

which have supports passive cooling and also one of the screenshots from our

also one of the screenshots from our platform which you see multiple tasks

platform which you see multiple tasks are performed with a single uh you know

are performed with a single uh you know uh LLM. So thank you so much for uh

uh LLM. So thank you so much for uh giving me this opportunity. Thank you

giving me this opportunity. Thank you Borak. It's great to see a successful

Borak. It's great to see a successful Canadian startup in this space. So

Canadian startup in this space. So congratulation and uh good luck in the

congratulation and uh good luck in the future. I have one question that I I

future. I have one question that I I hope you address it quickly. Do you use

hope you address it quickly. Do you use the rocket ship RKNN toolkit for LLM? Do

the rocket ship RKNN toolkit for LLM? Do you use your own LLM or does rock chip

you use your own LLM or does rock chip provide one for you? This is question

provide one for you? This is question from Davis.

from Davis. We use it. Yes. Uh we use the uh RK LLM

We use it. Yes. Uh we use the uh RK LLM toolkit.

toolkit. uh especially uh that part we used q but

uh especially uh that part we used q but the especially the results were not

the especially the results were not satisfactory like the moment you go to

satisfactory like the moment you go to int 8 or int uh the results really

int 8 or int uh the results really dropped significantly this is why

dropped significantly this is why current are working on jetson or uh and

current are working on jetson or uh and taking advantage of

taking advantage of especially lemon 3.2 too.

especially lemon 3.2 too. Thank you, Borak, for your presentation

Thank you, Borak, for your presentation today. Um, we are a little bit running

today. Um, we are a little bit running out of time uh 5 minute late. So, what

out of time uh 5 minute late. So, what about we uh skip the break and we go to

about we uh skip the break and we go to our next

our next speaker. Our next speaker is Katarina uh

speaker. Our next speaker is Katarina uh solution architect at NVIDIA. Katina

solution architect at NVIDIA. Katina work at the intersection of GPUs,

work at the intersection of GPUs, foundational models and reinforcement

foundational models and reinforcement learning with a focus on enabling

learning with a focus on enabling advanced AI systems. She'll be

advanced AI systems. She'll be presenting Nvidia's edge AI stock uh

presenting Nvidia's edge AI stock uh software and hardware. Over to you

software and hardware. Over to you Katarina.

Katarina. Thank you so much. Um my name is

Thank you so much. Um my name is Katrina. I'm a solution architect on

Katrina. I'm a solution architect on Nvidia and I'm happy to be here today to

Nvidia and I'm happy to be here today to walk you through how Edgi is powering

walk you through how Edgi is powering innovation across industries and what

innovation across industries and what Nvidia has to offer. Uh let's begin by

Nvidia has to offer. Uh let's begin by looking at the transformative impact of

looking at the transformative impact of AGI across industries. Analysts estimate

AGI across industries. Analysts estimate AI could generate an additional 13

AI could generate an additional 13 trillion US dollars in global economic

trillion US dollars in global economic output by 2030. And that number isn't

output by 2030. And that number isn't just theoretical. It's grounded in real

just theoretical. It's grounded in real world impact across many sectors. In

world impact across many sectors. In manufacturing, we're seeing the rise of

manufacturing, we're seeing the rise of smart factories where physical assets

smart factories where physical assets are managed by their additional twins

are managed by their additional twins allowing for realtime defect detection

allowing for realtime defect detection and production optimization. In

and production optimization. In agriculture, autonomous structures are

agriculture, autonomous structures are har and harvesting machines use AI for

har and harvesting machines use AI for precision treatment, improving crop

precision treatment, improving crop yield while reducing herbicide usage.

yield while reducing herbicide usage. And at home, AI powered cameras enhance

And at home, AI powered cameras enhance security and soon we may see robotic

security and soon we may see robotic companions that provide care or

companions that provide care or entertainment. Nvidia is playing a key

entertainment. Nvidia is playing a key role in enabling this transformation. We

role in enabling this transformation. We support a growing community of over 1.4

support a growing community of over 1.4 4 million developers and more than 6,000

4 million developers and more than 6,000 customers who are building and deploying

customers who are building and deploying AI applications on the edge using our

AI applications on the edge using our Jetson platform. This sets the stage for

Jetson platform. This sets the stage for how we power AI

how we power AI everywhere. Now that we've seen the

everywhere. Now that we've seen the broad impact of AI, let's talk about how

broad impact of AI, let's talk about how it enters the physical world through

it enters the physical world through robotics. With the rise of generative

robotics. With the rise of generative AI, we're going beyond screenbased

AI, we're going beyond screenbased interactions. NVIDIA is building three

interactions. NVIDIA is building three foundational platforms to fully enable

foundational platforms to fully enable the next wave of AI for the physical

the next wave of AI for the physical world. NVIDIA and DJX for training AI

world. NVIDIA and DJX for training AI models, Omniverse and RTX and OVX for

models, Omniverse and RTX and OVX for physically based simulation and ISAC and

physically based simulation and ISAC and AGX for the robots themselves. To make

AGX for the robots themselves. To make the physical AI possible, you need a

the physical AI possible, you need a complete end-to-end solution. That's

complete end-to-end solution. That's what Jetson delivers. It's not just

what Jetson delivers. It's not just hardware. It's a softwaredefined

hardware. It's a softwaredefined platform that includes our Jetson AI lab

platform that includes our Jetson AI lab for innovation, Jetpack for foundational

for innovation, Jetpack for foundational software and tools like Isac for

software and tools like Isac for robotics and Metropolis for vision AI

robotics and Metropolis for vision AI with one with over 170 ecosystem

with one with over 170 ecosystem partners. Jetson helps you build fast

partners. Jetson helps you build fast and scale efficiently. It's designed

and scale efficiently. It's designed from the ground up to support generative

from the ground up to support generative AI workloads and its modular

AI workloads and its modular architecture makes it highly adaptable

architecture makes it highly adaptable from small devices to industrial scale

from small devices to industrial scale robotics. Let's dive into the hardware.

robotics. Let's dive into the hardware. The Chhatsson Orin family spans from

The Chhatsson Orin family spans from entry-level nano modules to high

entry-level nano modules to high performance AGX orin boards. Depending

performance AGX orin boards. Depending on your needs, you can get anywhere from

on your needs, you can get anywhere from 34 tops to

34 tops to 275 tops of AI performance. Powered by

275 tops of AI performance. Powered by Nvidia's ampure GPU architecture and

Nvidia's ampure GPU architecture and deep learning accelerators. The modules

deep learning accelerators. The modules are compact, energy efficient, and

are compact, energy efficient, and highly capable with up to 64 GB of

highly capable with up to 64 GB of LPDDR5 memory and power envelopes from

LPDDR5 memory and power envelopes from 10 watt to 75 watt. In fact, AGX orin 64

10 watt to 75 watt. In fact, AGX orin 64 GB is the highest performing edge model

GB is the highest performing edge model in MLF benchmarks. And for smaller

in MLF benchmarks. And for smaller modules like orin X and nano, super mode

modules like orin X and nano, super mode boots performance up to 2x, enabling

boots performance up to 2x, enabling real generative AI use cases even at

real generative AI use cases even at lower power. Now let's look ahead. Our

lower power. Now let's look ahead. Our Jetson road map ensures longevity,

Jetson road map ensures longevity, performance growth, and continuity. The

performance growth, and continuity. The Orin family is here to stay through

Orin family is here to stay through 2030, giving developers long-term

2030, giving developers long-term support and planning security. And this

support and planning security. And this year, we introduced Chson AGX Thor, a

year, we introduced Chson AGX Thor, a nextG module that builds on Orange

nextG module that builds on Orange Legacy, but leaps forward in

Legacy, but leaps forward in capability. Here's what makes AGX Thor

capability. Here's what makes AGX Thor exciting. It's based on Nvidia's next

exciting. It's based on Nvidia's next generation black wall GPU architecture

generation black wall GPU architecture and delivers eight times the transformer

and delivers eight times the transformer performance of Orurin. You get 2.6 times

performance of Orurin. You get 2.6 times the CPU performance, twice the memory,

the CPU performance, twice the memory, and 10 times the inout bandwidth. Ideal

and 10 times the inout bandwidth. Ideal for advanced H devices and humanoid

for advanced H devices and humanoid robots. With thousand tops of AI

robots. With thousand tops of AI performance, Hio Thor opens the door to

performance, Hio Thor opens the door to ondevice LLMs, advanced sensor fusion,

ondevice LLMs, advanced sensor fusion, and complex decision- making at the

and complex decision- making at the edge. It's engineered for the most

edge. It's engineered for the most demanding AI robotics

demanding AI robotics applications. Of course, no platform

applications. Of course, no platform exists in a vacuum. The Jetson ecosystem

exists in a vacuum. The Jetson ecosystem brings together hardware, software,

brings together hardware, software, connectivity, and design services

connectivity, and design services partners to support every aspect of

partners to support every aspect of deployment. Camera and sensor partners

deployment. Camera and sensor partners help build robust vision systems.

help build robust vision systems. Software wonders provide SDKs and

Software wonders provide SDKs and development frameworks. and hardware

development frameworks. and hardware companies create everything from custom

companies create everything from custom carrier boards to fully integrated

carrier boards to fully integrated robotic systems. Together, this

robotic systems. Together, this ecosystem makes Chadson not just

ecosystem makes Chadson not just powerful, but practical, flexible, and

powerful, but practical, flexible, and deployable. Here's a more detailed view

deployable. Here's a more detailed view of our ecosystem. On the hardware side,

of our ecosystem. On the hardware side, we work with camera makers, carrier

we work with camera makers, carrier board suppliers, and full system

board suppliers, and full system integrators. On the software side, we

integrators. On the software side, we offer tools for model development,

offer tools for model development, optimization, and deployment. And for

optimization, and deployment. And for cloudto edge enabling enablement, we

cloudto edge enabling enablement, we partner with CSPs and IoT solution

partner with CSPs and IoT solution providers with over 170 active partners.

providers with over 170 active partners. The chatson ecosystem gives you options

The chatson ecosystem gives you options for every component in your EI pipeline

for every component in your EI pipeline letting you mix and match or go full

letting you mix and match or go full stack with validated solutions. And at

stack with validated solutions. And at the foundation of Jetson software is

the foundation of Jetson software is Jetpack. It includes core NVIDIA

Jetpack. It includes core NVIDIA libraries like CUDA, KUDNN, Tensor and

libraries like CUDA, KUDNN, Tensor and Triton for high performance compute and

Triton for high performance compute and inference. And with Jetpack 6, we

inference. And with Jetpack 6, we introduced major upgrades, better Linux

introduced major upgrades, better Linux support, decoupled AI and compute

support, decoupled AI and compute updates, and expanded diagnostics. There

updates, and expanded diagnostics. There are also new services for platform

are also new services for platform security like disk and memory

security like disk and memory encryption, secure boot, and trusted

encryption, secure boot, and trusted execution environments. All this makes

execution environments. All this makes Chadson not just fast, but secure and

Chadson not just fast, but secure and maintainable. And Jetson platform

maintainable. And Jetson platform services bring application ready

services bring application ready microservices to your fingertips. These

microservices to your fingertips. These are fully containerized, modular, and

are fully containerized, modular, and APIdriven building blocks for your AI

APIdriven building blocks for your AI pipeline. Services include sensor data

pipeline. Services include sensor data storage, calibration, perception,

storage, calibration, perception, monitoring, and IoT gateway integration.

monitoring, and IoT gateway integration. They're plug-and-play, so developers can

They're plug-and-play, so developers can move from concept to prototype in weeks,

move from concept to prototype in weeks, not years. And with chats and Laurens's

not years. And with chats and Laurens's powerful compute, you can even run

powerful compute, you can even run multiple large models simultaneously.

multiple large models simultaneously. This is edge native infrastructure

This is edge native infrastructure designed for generative AI workflows.

designed for generative AI workflows. And for developers working on robotics,

And for developers working on robotics, Nvidia Isaac is the toolkit of choice.

Nvidia Isaac is the toolkit of choice. Isaac perceptor provides software blocks

Isaac perceptor provides software blocks for navigation and perception like

for navigation and perception like disparity mapping, people segmentation

disparity mapping, people segmentation and occupancy grids. Meanwhile, Isac

and occupancy grids. Meanwhile, Isac manipulator helps build dextrous robotic

manipulator helps build dextrous robotic arms with features like 6D post tracking

arms with features like 6D post tracking and motion planning. All of this runs on

and motion planning. All of this runs on Jetson and leverages GPU acceleration

Jetson and leverages GPU acceleration with 30 plus ROS packages. Is integrate

with 30 plus ROS packages. Is integrate seamlessly with existing robotic stacks

seamlessly with existing robotic stacks while giving you the benefits of

while giving you the benefits of Nvidia's optimized air workflows and

Nvidia's optimized air workflows and simulation is key for testing and

simulation is key for testing and training robots safely. Isacim in

training robots safely. Isacim in Omniverse provides a photorealistic

Omniverse provides a photorealistic physics-based environment to do just

physics-based environment to do just that. You can simulate sensors, test

that. You can simulate sensors, test algorithms, and refine edge AI models

algorithms, and refine edge AI models before real world deployment. It's

before real world deployment. It's especially valuable for debugging rare

especially valuable for debugging rare edge cases or training reinforcement

edge cases or training reinforcement learning agents. This visual first

learning agents. This visual first approach speeds up development and

approach speeds up development and derisks your

derisks your deployment. And in verticals like

deployment. And in verticals like healthcare and scientific

healthcare and scientific instrumentation, real time sensor

instrumentation, real time sensor processing is critical. That's where

processing is critical. That's where hollow scanning comes in. It's a

hollow scanning comes in. It's a multimodel AI platform built for low

multimodel AI platform built for low latency edge applications. You can

latency edge applications. You can stream sensor data, run AI pipelines in

stream sensor data, run AI pipelines in real time, and integrate tools like

real time, and integrate tools like Monai for medical imaging or towel for

Monai for medical imaging or towel for custom model development. It's fully

custom model development. It's fully compatible with Jetson and provides

compatible with Jetson and provides runtime orchestration for edge edge

runtime orchestration for edge edge native deployments. And Metropolis is

native deployments. And Metropolis is our platform for vision AI applications

our platform for vision AI applications from smart cities to industrial

from smart cities to industrial inspection. It spans edge to cloud and

inspection. It spans edge to cloud and includes tools like omniverse replicator

includes tools like omniverse replicator for data generation, toao for model

for data generation, toao for model tuning, deepstream for inference and

tuning, deepstream for inference and microservices for deployment. With

microservices for deployment. With metropolis names, our new AI agents, you

metropolis names, our new AI agents, you can build cloud native apps that scale

can build cloud native apps that scale effortlessly. Whether you are counting

effortlessly. Whether you are counting vehicles, detecting defects, or building

vehicles, detecting defects, or building visual chatbots, Metropolis makes it

visual chatbots, Metropolis makes it possible. Let's walk through a typical

possible. Let's walk through a typical workflow. You start by generating

workflow. You start by generating synthetic data using Omniverse

synthetic data using Omniverse replicator. Then you fine-tune

replicator. Then you fine-tune state-of-the-art models with TOAO. You

state-of-the-art models with TOAO. You build the app using Deepstream and

build the app using Deepstream and optimize with Tensor RT and Triton. Then

optimize with Tensor RT and Triton. Then you deploy using Metropolis

you deploy using Metropolis microservices on Jetson. This modular

microservices on Jetson. This modular workflow supports rapid development

workflow supports rapid development while maintaining performance and

while maintaining performance and scalability from prototyping to

scalability from prototyping to production. And as mentioned before,

production. And as mentioned before, Nvidia heavily works on our three

Nvidia heavily works on our three platform solution. From these three use

platform solution. From these three use cases, you might have noticed that it

cases, you might have noticed that it includes three stages. First one, you're

includes three stages. First one, you're developing models on GGX. Then you're

developing models on GGX. Then you're simulating everything in Omniverse to

simulating everything in Omniverse to make sure that it works as expected,

make sure that it works as expected, detecting edge cases and safe proofing

detecting edge cases and safe proofing your solutions. And after that, you

your solutions. And after that, you deploy an

deploy an edge. And let's talk about where you can

edge. And let's talk about where you can get support, guidance, and latest

get support, guidance, and latest updates as you build on Jetson. NVIDIA

updates as you build on Jetson. NVIDIA maintains a robust set of developer

maintains a robust set of developer resources, starting with our official

resources, starting with our official documentation hub at

documentation hub at docs.envidia.com. There you'll find

docs.envidia.com. There you'll find everything from the chatpack overview to

everything from the chatpack overview to Linux forte development guides and

Linux forte development guides and multimedia programming references for

multimedia programming references for hands-on development. that unload center

hands-on development. that unload center hosts module data sheets, product design

hosts module data sheets, product design guides, and adaptation documentation,

guides, and adaptation documentation, all updated for current hardware. For

all updated for current hardware. For troubleshooting and sharing ideas, the

troubleshooting and sharing ideas, the Jetson forums are highly active

Jetson forums are highly active community space where Nvidia engineers

community space where Nvidia engineers and users contribute solutions daily.

and users contribute solutions daily. GitHub also features open-source

GitHub also features open-source examples and workflows through our AIoT

examples and workflows through our AIoT repos including code for Metropolis,

repos including code for Metropolis, Isac, and Hollow Scan platforms. And if

Isac, and Hollow Scan platforms. And if you are exploring success stories or new

you are exploring success stories or new ideas, we offer curated resources and

ideas, we offer curated resources and case studies from innovators using

case studies from innovators using Chadson across industries. If you're

Chadson across industries. If you're developing with Chadson or even just

developing with Chadson or even just starting, please bookmark these

starting, please bookmark these resources, engage with the community,

resources, engage with the community, and explore our GitHub repos. It's the

and explore our GitHub repos. It's the fastest way to ramp up and take your

fastest way to ramp up and take your ideas from prototype to

ideas from prototype to deployment. Thank you so much.

Thank you Katarina. Great presentation. I am one

Katarina. Great presentation. I am one of the fan of uh Nvidia. I use it for

of the fan of uh Nvidia. I use it for our AJI demonstrators. The strength is

our AJI demonstrators. The strength is the the ecosystem. So a lot of tools, a

the the ecosystem. So a lot of tools, a lot of reference design that you can

lot of reference design that you can just change and adapt to your situation.

just change and adapt to your situation. Thank you, Catalina. And now we switch

Thank you, Catalina. And now we switch to our next speaker.

I have the pleasure to welcome a long-term contributor to this workshop,

long-term contributor to this workshop, Professor France Primo from Poly

Professor France Primo from Poly Technique Montreal. His research focuses

Technique Montreal. His research focuses on energy efficient hardware design with

on energy efficient hardware design with expertise in error correction codes and

expertise in error correction codes and neural networks for efficient computing.

neural networks for efficient computing. Today he'll present Polara collaborative

Today he'll present Polara collaborative design of an open-source risk 5

design of an open-source risk 5 multi-core processor. The stage is yours

multi-core processor. The stage is yours Fruan.

Fruan. All right. Hi everyone. Thank you Yin

All right. Hi everyone. Thank you Yin for the introduction. So, so hi again.

for the introduction. So, so hi again. Um, so yes, I'll be talking today about

Um, so yes, I'll be talking today about the uh Polaro project which was uh you

the uh Polaro project which was uh you know a highly collaborative effort to

know a highly collaborative effort to design the a multi-core and uh most

design the a multi-core and uh most importantly

importantly open-source risk 5 uh processor. So I I

open-source risk 5 uh processor. So I I I'll kind of do something maybe a bit

I'll kind of do something maybe a bit different from the typical academic

different from the typical academic presentation. And I thought it would be

presentation. And I thought it would be interesting today to talk focus a bit

interesting today to talk focus a bit more on the management aspect of this

more on the management aspect of this project and talk about what it is to

project and talk about what it is to develop

develop uh hardware in an academic setting. Um

uh hardware in an academic setting. Um and and essentially the only way it is

and and essentially the only way it is possible is is through this

possible is is through this collaborative effort. So on this title

collaborative effort. So on this title slide here you have all the industrial

slide here you have all the industrial partners and academic partners that were

partners and academic partners that were involved in this

involved in this project and all of this was work was um

project and all of this was work was um done under the umbrella of the open

done under the umbrella of the open hardware group now open

hardware group now open hardware foundation

hardware foundation uh that dedicates itself to uh I guess a

uh that dedicates itself to uh I guess a silicon proven developing these silicon

silicon proven developing these silicon proven open-source processors right so

proven open-source processors right so we have uh right So the any so I'll talk

we have uh right So the any so I'll talk more about the partners shortly but

more about the partners shortly but basically the the object the general

basically the the object the general objective

objective here is uh we all know how successful

here is uh we all know how successful the open source software

the open source software uh ecosystem is. it's uh you know uh be

uh ecosystem is. it's uh you know uh be now essential to the functioning of

now essential to the functioning of entire industries not to say like you

entire industries not to say like you know our our societies as a

know our our societies as a whole. Um they they are great because

whole. Um they they are great because they lower barrier to entry also for for

they lower barrier to entry also for for startups allowed to you know not

startups allowed to you know not reinvent the wheel every time that you

reinvent the wheel every time that you want to develop something and the idea

want to develop something and the idea is to bring this reality to the hardware

is to bring this reality to the hardware ecosystem. Uh so there's you know I

ecosystem. Uh so there's you know I could have put a lot more logos there.

could have put a lot more logos there. of course open hardware group and uh and

of course open hardware group and uh and our project is far from the only uh

our project is far from the only uh endeavor in that in that realm but it is

endeavor in that in that realm but it is a

a um a budding ecosystem right it's still

um a budding ecosystem right it's still a work in

progress okay and uh right I want to mention also that you know given that

mention also that you know given that um as we saw in all the previous talks

um as we saw in all the previous talks uh there's a huge interest in Edge AI

uh there's a huge interest in Edge AI uh for a a very diverse uh range of of

uh for a a very diverse uh range of of specialized

specialized accelerators and um these accelerators,

accelerators and um these accelerators, you know, they they obviously share a

you know, they they obviously share a lot in in common. So I think it makes

lot in in common. So I think it makes sense to to bring this uh open-source

sense to to bring this uh open-source aspect to the to the ecosystem.

aspect to the to the ecosystem. Definitely. Uh so specifically for the

Definitely. Uh so specifically for the Polaro project the objectives were

Polaro project the objectives were therefore to develop this uh risk 5

therefore to develop this uh risk 5 processor multiple core uh including uh

processor multiple core uh including uh in this case a vectorbased accelerator.

in this case a vectorbased accelerator. So I have a little illustration here of

So I have a little illustration here of what we mean by vector. I mean probably

what we mean by vector. I mean probably you you all know but the um the special

you you all know but the um the special aspect of the vector processor that

aspect of the vector processor that we're developing is that it um is able

we're developing is that it um is able to deal with uh variable

to deal with uh variable length vectors. And so there's a number

length vectors. And so there's a number of execution units that can do uh

of execution units that can do uh parallel uh operations. But these

parallel uh operations. But these execution units don't have to correspond

execution units don't have to correspond to the length of the vector. Right? So

to the length of the vector. Right? So you can have these arbitrary length

you can have these arbitrary length vector vector and the important thing is

vector vector and the important thing is that you know to to do all this work uh

that you know to to do all this work uh you know could be like a very large

you know could be like a very large number say of MAC operations including

number say of MAC operations including all the load stores required uh this

all the load stores required uh this would all be uh governed by a single

would all be uh governed by a single instruction. Therefore it's very

instruction. Therefore it's very efficient from from that aspect. So

efficient from from that aspect. So we're looking for for edge deployment

we're looking for for edge deployment and and energy efficiency. So the polar

and and energy efficiency. So the polar project was basically the convergence of

project was basically the convergence of these of two uh major open-source

these of two uh major open-source hardware pro uh projects. So the first

hardware pro uh projects. So the first is uh open

is uh open pon which is uh an a a mesh uh an

pon which is uh an a a mesh uh an extensible mesh or network on chip to

extensible mesh or network on chip to connect uh cores together and uh two

connect uh cores together and uh two cores you know originally developed by

cores you know originally developed by Zurich uh the pulp platform. So the

Zurich uh the pulp platform. So the first was this Aryan uh Arian processor

first was this Aryan uh Arian processor that became CVA6 under the the open

that became CVA6 under the the open hardware portfolio and also the ARA

hardware portfolio and also the ARA vector accelerator became CVEC under the

vector accelerator became CVEC under the open hardware portfolio. We wanted to

open hardware portfolio. We wanted to put that together and make a chip uh

put that together and make a chip uh that we named Polara uh which was

that we named Polara uh which was actually taped

actually taped out. So open pon as I mentioned so is

out. So open pon as I mentioned so is open source project developed by

open source project developed by Princeton uh allows to put these uh you

Princeton uh allows to put these uh you know number of cores together in a in a

know number of cores together in a in a configurable and extensible way and

configurable and extensible way and handles cache coherency between the the

handles cache coherency between the the cores and then we as the cores

cores and then we as the cores themselves what we wanted to put

themselves what we wanted to put together were a number of these

together were a number of these combination of uh CVA6 for the scalar

combination of uh CVA6 for the scalar processor and then the ara for the

processor and then the ara for the vector accelerator.

vector accelerator. So uh CVA6 is a relatively simple uh you

So uh CVA6 is a relatively simple uh you know in order sixstage pipeline that

know in order sixstage pipeline that supports a virtual memory and on ARA

supports a virtual memory and on ARA side uh the processor implements the

side uh the processor implements the uh RVV extension for risk 5 instruction

uh RVV extension for risk 5 instruction set. So uh

set. So uh 1.0 uh has a number of execution units

1.0 uh has a number of execution units in our case four that are called lanes.

in our case four that are called lanes. So it can do kind of four vector

So it can do kind of four vector elements per clock cycle

elements per clock cycle um and handles arbitrary size vector as

um and handles arbitrary size vector as I mentioned. Okay. Now the the important

I mentioned. Okay. Now the the important part here is you know the collaborative

part here is you know the collaborative nature of the project the team. So this

nature of the project the team. So this as you saw there were the three

as you saw there were the three universities involved uh and you know

universities involved uh and you know two nonprofits plus the industrial

two nonprofits plus the industrial partners that I mentioned on the the

partners that I mentioned on the the cover slide. Um so lots of lots of

cover slide. Um so lots of lots of partners but at the end of the day you

partners but at the end of the day you know a relatively small team as far as

know a relatively small team as far as hardware projects go um and not that

hardware projects go um and not that many students also uh and this is what

many students also uh and this is what we have to deal with in academia too

we have to deal with in academia too right we can't just uh put together like

right we can't just uh put together like a 20 people team to design hardware so

a 20 people team to design hardware so it's very important to start from

it's very important to start from existing parts right so there was uh the

existing parts right so there was uh the two main students here were Yoan and um

two main students here were Yoan and um and exabet did did their masters in this

and exabet did did their masters in this project. Uh Osan did part of his PhD

project. Uh Osan did part of his PhD associated with that project and the our

associated with that project and the our our bringup guys were uh you know just

our bringup guys were uh you know just interns for the summer. Uh one is

interns for the summer. Uh one is currently working still. Uh and another

currently working still. Uh and another student at the UC Santa Barbara also

student at the UC Santa Barbara also helped us with the the RTL.

helped us with the the RTL. Okay. Um so maybe a bit more details

Okay. Um so maybe a bit more details about the architecture. So it's a four

about the architecture. So it's a four tile design.

tile design. Um each tile is a CVA6 plus error with

Um each tile is a CVA6 plus error with the number of caches. I don't want to

the number of caches. I don't want to maybe go too much into the details here.

maybe go too much into the details here. Um but all of this is so cache coherency

Um but all of this is so cache coherency between all the L2s they're all shared

between all the L2s they're all shared between the cores. This is provided by

between the cores. This is provided by the open PON

the open PON framework. Um and so these these routers

framework. Um and so these these routers as well are are are from that

as well are are are from that project. Okay. And then what we do uh in

project. Okay. And then what we do uh in order to you know keep this realistic is

order to you know keep this realistic is that of course we can't uh tape out an

that of course we can't uh tape out an ASIC with say a DDR controller right so

ASIC with say a DDR controller right so this um what's very nice with the open

this um what's very nice with the open pon framework is it it's also extent

pon framework is it it's also extent extensible offchip uh using these um um

extensible offchip uh using these um um chip bridge interface so in our case we

chip bridge interface so in our case we have this chip bridge going through FMC

have this chip bridge going through FMC connector in order to link an FPGA with

connector in order to link an FPGA with the ASIC and the FPGA contains all the

the ASIC and the FPGA contains all the uh drivers for the peripherals and in

uh drivers for the peripherals and in particular the the DDR memory. So sorry

particular the the DDR memory. So sorry and it goes the other way, right? So the

and it goes the other way, right? So the the FPG on the right side here is the is

the FPG on the right side here is the is the chipset and this is our our our four

the chipset and this is our our our four core chip here on the

uh um and so this of course like okay we just we didn't want to just you know

just we didn't want to just you know take existing RTL code and and and and

take existing RTL code and and and and tape it out. So the the the point was

tape it out. So the the the point was also here one of the focus of the

also here one of the focus of the project was um targeting low precision

project was um targeting low precision deep neural

network accelerators. So um so here some in instructions were added to uh kind of

in instructions were added to uh kind of try out this idea that if you want to

try out this idea that if you want to execute very low resolution neural

execute very low resolution neural networks like one bit and two bit

networks like one bit and two bit weights and

weights and activations it might be interesting to

activations it might be interesting to do bit serial computation. So these uh

do bit serial computation. So these uh instructions here are there to um new

instructions here are there to um new instructions and that that can take

instructions and that that can take vectors and take we can see each bit

vectors and take we can see each bit position. So each say bit zero and uh

position. So each say bit zero and uh pack them together uh in order then to

pack them together uh in order then to process all the bit zeros together and

process all the bit zeros together and the bit ones together and so on. uh with

the bit ones together and so on. uh with also new instructions to do uh a

also new instructions to do uh a population count and uh shift accumulate

population count and uh shift accumulate instructions in order to to have a bit

instructions in order to to have a bit serial

serial um neural network layers uh that are

um neural network layers uh that are executed on the vector accelerator and

executed on the vector accelerator and we also had to work on a new bridge I

we also had to work on a new bridge I guess to to add in order to connect the

guess to to add in order to connect the CVA6 and our projects with the open pon

CVA6 and our projects with the open pon project. So one speaks axi and the other

project. So one speaks axi and the other one you know comes with its its own of

one you know comes with its its own of course knock uh special knock which is

course knock uh special knock which is not axi so there's some bridging here to

not axi so there's some bridging here to be done

be done um here my point with this so and and of

um here my point with this so and and of course you know after we modify the RTI

course you know after we modify the RTI we did need to do the physical

we did need to do the physical implementation in order to to get the

implementation in order to to get the tape out uh I don't want to give you

tape out uh I don't want to give you like a lecture on what uh physical

like a lecture on what uh physical implementation is all about just to kind

implementation is all about just to kind of uh remember that it's it's a

of uh remember that it's it's a complicated process process, right? And

complicated process process, right? And kind of maybe remind myself too that I I

kind of maybe remind myself too that I I always underestimate how how difficult

always underestimate how how difficult it will be. And this is of course still,

it will be. And this is of course still, you know, trying to start with uh all

you know, trying to start with uh all the knowledge of the of the partners

the knowledge of the of the partners that we have in particular ETH Surk,

that we have in particular ETH Surk, which you know, they probably tape out

which you know, they probably tape out um chip every month or something like

um chip every month or something like that. So they have a lot of uh expertise

that. So they have a lot of uh expertise doing that. But nonetheless, you know,

doing that. But nonetheless, you know, there's some difficulties that are

there's some difficulties that are incompressible. And if you put um uh

incompressible. And if you put um uh right master student on that uh however

right master student on that uh however smart that master student is, you know,

smart that master student is, you know, still someone that's junior and it's is

still someone that's junior and it's is is learning the ropes uh as the project

is learning the ropes uh as the project goes

goes along. So complicated process uh you

along. So complicated process uh you know about 10,000 lines of code just to

know about 10,000 lines of code just to drive the various tools here. Um and of

drive the various tools here. Um and of course so you you know you will

course so you you know you will encounter a number of problems that you

encounter a number of problems that you need to solve and uh this this

need to solve and uh this this translates into development time. So

translates into development time. So these were just some examples of all the

these were just some examples of all the problems that uh you know Yan had to had

problems that uh you know Yan had to had to fix in that

to fix in that um in that project. So you know there's

um in that project. So you know there's of course the mandatory HDL bug even

of course the mandatory HDL bug even though the you know the RTL the code for

though the you know the RTL the code for this project right is mostly already

this project right is mostly already verified most some of it typically has

verified most some of it typically has already successfully taped out and

already successfully taped out and things like that but

things like that but um still you know that there's going to

um still you know that there's going to be additional bugs that you find

be additional bugs that you find um and then as you go through the actual

um and then as you go through the actual physical implementation flow uh there's

physical implementation flow uh there's going to be errors in the flow, errors

going to be errors in the flow, errors in uh all the steps of the flow really,

in uh all the steps of the flow really, right? And not to mention then um so of

right? And not to mention then um so of course

course verification crucial to ensure that the

verification crucial to ensure that the chip actually works when it comes back

chip actually works when it comes back um but you're going to have also errors

um but you're going to have also errors with the verification process itself. So

with the verification process itself. So and so

and so on. Um this an example here of uh say uh

on. Um this an example here of uh say uh one like just to kind of dive in into a

one like just to kind of dive in into a small detail of the physical

small detail of the physical implementation is you're you're trying

implementation is you're you're trying to build your clock tree trying to

to build your clock tree trying to distribute your clock signal everywhere

distribute your clock signal everywhere in the chip and uh you're trying to make

in the chip and uh you're trying to make sure that the clock delays between

sure that the clock delays between various parts of the chip is equalized

various parts of the chip is equalized right uh well in this case uh so what we

right uh well in this case uh so what we we had this uh kind of hierarchical ical

we had this uh kind of hierarchical ical process to put the chip together. And it

process to put the chip together. And it turned out that um this the way that

turned out that um this the way that that was done, you know, by the the

that was done, you know, by the the student led to this this discrepancy in

student led to this this discrepancy in the clock signal. So that was something

the clock signal. So that was something that needed to be fixed uh relatively

that needed to be fixed uh relatively late in in the process as we found out

late in in the process as we found out that we had missed something related to

that we had missed something related to constraints and and so on.

constraints and and so on. Um right so at the end of the day though

Um right so at the end of the day though we did get a a layout for this chip. So

we did get a a layout for this chip. So you can see here the nice uh four core

you can see here the nice uh four core layout and you can kind of pick out in

layout and you can kind of pick out in each core there's these four vector

each core there's these four vector lanes. So each lane being kind of one

lanes. So each lane being kind of one execution

execution unit. So in total you get these

unit. So in total you get these basically four* four uh execution units.

basically four* four uh execution units. So quite a, you know, good compute

So quite a, you know, good compute power, good processing power on this

power, good processing power on this chip, but it's still a um embedded class

chip, but it's still a um embedded class with, you know, less than 2 watt uh

with, you know, less than 2 watt uh power consumption and 3x3 mm in um in uh

power consumption and 3x3 mm in um in uh 22 nanometer

process. Um and so okay, these are the specs for the the size of the different

specs for the the size of the different caches so on. And so there's there's a

caches so on. And so there's there's a reasonable amount of of of shared L2

reasonable amount of of of shared L2 here

here uh which feeds these

uh which feeds these cores. Okay. And this you know still you

cores. Okay. And this you know still you know open source project. So you can

know open source project. So you can trace back to the the various open

trace back to the the various open source projects that it comes from. You

source projects that it comes from. You know what what commit uh what uh was was

know what what commit uh what uh was was taken was frozen and put into into that

taken was frozen and put into into that chip. and we you know with some

chip. and we you know with some additions coming from the from our

side. Okay. Then as part of the validation uh what we did was to have a

validation uh what we did was to have a full FPG emulation of this chip. So we

full FPG emulation of this chip. So we took a large FPGA board the Alvo

took a large FPGA board the Alvo 280 which almost fits the entire design.

280 which almost fits the entire design. Um and

Um and um and so we so this this interesting in

um and so we so this this interesting in itself for validation but I wanted to

itself for validation but I wanted to mention also that once you have also

mention also that once you have also this working uh FPGA uh emulation you

this working uh FPGA uh emulation you can kind of spin off various studies

can kind of spin off various studies architectural studies uh which we did in

architectural studies uh which we did in this case um so I'm going to go over it

this case um so I'm going to go over it quickly I don't have time but this is uh

quickly I don't have time but this is uh so one uh project that we did was called

so one uh project that we did was called mspark

mspark which was again so spark was taking uh

which was again so spark was taking uh the original error core removing the

the original error core removing the fulling point unit and adding another

fulling point unit and adding another instruction that was targeting a

instruction that was targeting a specific way to uh let me jump to that a

specific way to uh let me jump to that a specific way to pack low precision

specific way to pack low precision operands uh together in order to more

operands uh together in order to more efficiently again do low resolution

efficiently again do low resolution neural networks. uh then we distribute

neural networks. uh then we distribute that on the cores

that on the cores uh and we get some some speed ups

uh and we get some some speed ups compared to to not doing that uh which

compared to to not doing that uh which was interesting and so you know there's

was interesting and so you know there's a lot of flexibility with this FPGA

a lot of flexibility with this FPGA approach okay then then we get to you

approach okay then then we get to you know we receive the chips back and we

know we receive the chips back and we have to do chip bring up and testing uh

have to do chip bring up and testing uh so that was involved even more

so that was involved even more development of the of the chipset on

development of the of the chipset on FPGA designing a PCB

FPGA designing a PCB Um, and we have two setups. One with the

Um, and we have two setups. One with the Genesis 2 board. So, one where we

Genesis 2 board. So, one where we actually put the the chip, one where we

actually put the the chip, one where we emulate the chip in order to test our

emulate the chip in order to test our test

test setup. And to end the presentation,

setup. And to end the presentation, here's a few pictures. So, you actually

here's a few pictures. So, you actually got the chips back. We took a makeshift

got the chips back. We took a makeshift picture through a microscope. And you

picture through a microscope. And you can see the our nice polar bear here on

can see the our nice polar bear here on the on the IC. Uh we have so this this

the on the IC. Uh we have so this this is the test setup where we have the PCB

is the test setup where we have the PCB with a chip and the chipset FPGA

with a chip and the chipset FPGA running. Um and this is the the

running. Um and this is the the reference system where we replace the

reference system where we replace the chip with the working FPGA to make sure

chip with the working FPGA to make sure our test setup is

our test setup is fine. Um so I'd like to and then you

fine. Um so I'd like to and then you know I I have to admit that the chip

know I I have to admit that the chip doesn't fully work. Uh so we're still

doesn't fully work. Uh so we're still working on that now trying to debug and

working on that now trying to debug and and

and and salvage some some functionality out

and salvage some some functionality out of chips we we taped up. So to conclude

of chips we we taped up. So to conclude uh you know it's it's challenging to do

uh you know it's it's challenging to do these project as part of uh academia

these project as part of uh academia you're working with a small team junior

you're working with a small team junior staff and while you need to master you

staff and while you need to master you know a huge range of skills from the the

know a huge range of skills from the the bare metal software development RTL

bare metal software development RTL design FPGA

design FPGA prototyping physical implementation and

prototyping physical implementation and and verification at all of these levels.

and verification at all of these levels. So it's it's challenging not to miss uh

So it's it's challenging not to miss uh a spot

a spot somewhere. Um but what helps a lot is

somewhere. Um but what helps a lot is this open source nature of the project.

this open source nature of the project. The the tremendous support that we got

The the tremendous support that we got from the partners and I think this is

from the partners and I think this is really encouraging to to continue in

really encouraging to to continue in that direction even though this project

that direction even though this project wasn't a full success. Um there's a lot

wasn't a full success. Um there's a lot to be done in these kinds of open-

to be done in these kinds of open- source and collaborative project. um

source and collaborative project. um still the you know amount of engineering

still the you know amount of engineering work divided by scientific contribution

work divided by scientific contribution from my academic perspective remains

from my academic perspective remains rather high but I think you know it's

rather high but I think you know it's going to get better as this opensource

going to get better as this opensource ecosystem improves and uh and we maybe

ecosystem improves and uh and we maybe get more sharing possible.

get more sharing possible. Okay,

Okay, Franis, this is a very impressive

Franis, this is a very impressive presentation. I think this is a very

presentation. I think this is a very impressive impressive project because

impressive impressive project because it's open source. It's a lot of

it's open source. It's a lot of collaboration between students. It's

collaboration between students. It's easy to publish. Maybe just one uh

easy to publish. Maybe just one uh question for the future. Looking ahead,

question for the future. Looking ahead, how do you see the role of open source

how do you see the role of open source risk 5 processors evolving to support

risk 5 processors evolving to support the development of low resolution neural

the development of low resolution neural network um and other uh resource

network um and other uh resource constraint application.

constraint application. Yeah, I think you know uh riskfire from

Yeah, I think you know uh riskfire from because of its open source nature is

because of its open source nature is really a great asset for um um for

really a great asset for um um for academic research and also uh you know

academic research and also uh you know clearly there's a lot of interest from

clearly there's a lot of interest from from industry as well. So, as I

from industry as well. So, as I mentioned, really the nice thing here is

mentioned, really the nice thing here is to take something that is uh a viable

to take something that is uh a viable product and to be able to add your your

product and to be able to add your your innovation on top of it uh and still get

innovation on top of it uh and still get you know something to showcase at the

you know something to showcase at the end and

end and uh a a prototype for it uh without

uh a a prototype for it uh without having to um have your own uh you know

having to um have your own uh you know 50 people team uh to to to roll out your

50 people team uh to to to roll out your entirely uh your own solution.

entirely uh your own solution. I have a question from my colleague

I have a question from my colleague James Miller at CMC. What products,

James Miller at CMC. What products, service, hardware, software do you need

service, hardware, software do you need from CMC to make the next chip fab

from CMC to make the next chip fab projects

projects easier? So, it's a high level question.

easier? So, it's a high level question. Yeah, maybe not. You know, I I think the

Yeah, maybe not. You know, I I think the main pain point in in here was uh the

main pain point in in here was uh the difficulty, you know, the the highly um

difficulty, you know, the the highly um encumbered environment that we're in in

encumbered environment that we're in in terms of licensing for the the CAD

terms of licensing for the the CAD tools. Like even if you you're talking

tools. Like even if you you're talking between people that all have licenses

between people that all have licenses for the various tools that we need, it's

for the various tools that we need, it's sometimes unclear if we're allowed to

sometimes unclear if we're allowed to share scripts and and and details and

share scripts and and and details and things like that. So this is um this

things like that. So this is um this makes it more challenging.

makes it more challenging. Okay, thank you Franis. See you at the

Okay, thank you Franis. See you at the panel. Um let's switch now to our next

panel. Um let's switch now to our next speaker, Professor Lee Chen from the

speaker, Professor Lee Chen from the University of Saskatchewan. Lee research

University of Saskatchewan. Lee research spans radiations effects in micro

spans radiations effects in micro electronics, fault tolerant neural

electronics, fault tolerant neural networks and reliable hardware systems.

networks and reliable hardware systems. Today he'll share with us his work on

Today he'll share with us his work on radiation effect in convolutional neural

radiation effect in convolutional neural network implemented with FPGA and

network implemented with FPGA and mitigation technique. Welcome Lee. Share

mitigation technique. Welcome Lee. Share your screen and start your presentation.

your screen and start your presentation. Thank you.

Thank you. Yeah. Uh thanks for the introduction and

Yeah. Uh thanks for the introduction and uh yeah today uh uh my talk about the

uh yeah today uh uh my talk about the radiation effects in the in networks uh

radiation effects in the in networks uh mainly implement PGA and some uh

mainly implement PGA and some uh mitigation techniques.

mitigation techniques. Um yeah here just like uh uh example for

Um yeah here just like uh uh example for example if you have this uh neuron

example if you have this uh neuron networks and then the um for the for the

networks and then the um for the for the uh image recognization right on the road

uh image recognization right on the road if you uh there's a truck but however

if you uh there's a truck but however due to some sort of uh arrows and then

due to some sort of uh arrows and then you image the as a a bird obviously uh

you image the as a a bird obviously uh the control system will make a wrong

the control system will make a wrong decision right and so those arrows could

decision right and so those arrows could be come from those soft era and one of

be come from those soft era and one of the major contributions to the soft era

the major contributions to the soft era actually the radiation and that

actually the radiation and that radiation actually not only uh exist in

radiation actually not only uh exist in the in the space but also at the ground

the in the space but also at the ground level later on I give a brief

level later on I give a brief introduction so basically that's

introduction so basically that's motivation why we want to conduct this

motivation why we want to conduct this research and today's talk basically I'll

research and today's talk basically I'll uh talk talk about some background and

uh talk talk about some background and then the motivation and then we'll talk

then the motivation and then we'll talk about the experimental setup and result

about the experimental setup and result and there are some

and there are some conclusions and first of course I uh

conclusions and first of course I uh this one

this one uh uh the artificial networks yeah this

uh uh the artificial networks yeah this workshop all about that so I won't

workshop all about that so I won't repeat too much about that and uh we

repeat too much about that and uh we select a relatively simple uh

select a relatively simple uh convolutional neuronet networks the

convolutional neuronet networks the there quite a few reasons the the most

there quite a few reasons the the most important uh reason actually that we

important uh reason actually that we want to study a really simple one so

want to study a really simple one so that we can gain some really insight and

that we can gain some really insight and then we can use that knowledge apply to

then we can use that knowledge apply to some other more complicated neuron

some other more complicated neuron networks. And so basically we just uh

networks. And so basically we just uh use a lin net uh five and so basically

use a lin net uh five and so basically we have two con uh con convolutional uh

we have two con uh con convolutional uh layer and the two uh fully clamp

layer and the two uh fully clamp layer and uh yeah why we use IPGA

layer and uh yeah why we use IPGA because anyway IPGA is a very flexible

because anyway IPGA is a very flexible right and uh it's a very kind of very

right and uh it's a very kind of very good platform for the um acceler

good platform for the um acceler acceleration and uh then I'll give a

acceleration and uh then I'll give a little bit more uh in introduction about

little bit more uh in introduction about the radiation because uh anyway this is

the radiation because uh anyway this is a little bit outside our topic of this

a little bit outside our topic of this uh

uh workshop.

workshop. Um so basically um we know that in the

Um so basically um we know that in the space right the the the particles they

space right the the the particles they are redundant radiation particles but

are redundant radiation particles but even at the ground level we still have

even at the ground level we still have some radiation particles and if you look

some radiation particles and if you look at the the particles that at the ground

at the the particles that at the ground right and and so we have the neutrons

right and and so we have the neutrons and the the spectrum covers very from

and the the spectrum covers very from low

energy anyway from the the flux point of view actually neutrons are

view actually neutrons are dominant include the the neutron mu

dominant include the the neutron mu muons are dominant and and uh I just

muons are dominant and and uh I just give you one number for example uh even

give you one number for example uh even at uh at ground level

at uh at ground level uh one number we the one cm square we're

uh one number we the one cm square we're going to receive like 10 to 20 uh high

going to receive like 10 to 20 uh high energy neutrons 1 cm square so basically

energy neutrons 1 cm square so basically the electronics get hit by those

the electronics get hit by those neutrons, they're going to have some

neutrons, they're going to have some upset. And so here this diagram

upset. And so here this diagram basically just uh show that what

basically just uh show that what happened if a particle heat the device.

happened if a particle heat the device. So once a particle heat device and uh

So once a particle heat device and uh they they generate a track right the

they they generate a track right the track along with those uh ionized with

track along with those uh ionized with this electron whole pairs and and then

this electron whole pairs and and then the charge will get separated and then

the charge will get separated and then the the diffusion will collect the

the the diffusion will collect the charge and then they will generate a

charge and then they will generate a pulse. So basically that's the where

pulse. So basically that's the where that single single event come from. Okay

that single single event come from. Okay single event come from. So basically

single event come from. So basically they generate either a negative or

they generate either a negative or positive current pulse and that pulse

positive current pulse and that pulse injected into the diffusion will cause

injected into the diffusion will cause some uh errors and generally speaking

some uh errors and generally speaking this kind of event we call this error

this kind of event we call this error generated we call the soft errors

generated we call the soft errors because uh it is really it comes and

because uh it is really it comes and goes right and so we call soft errors

goes right and so we call soft errors but of course relation they can also

but of course relation they can also cause some hard errors we're not

cause some hard errors we're not discussing this

discussing this talk I just give you example that how

talk I just give you example that how significant those soft errors even at

significant those soft errors even at the ground level. For example, a cell

the ground level. For example, a cell phone with like a four meg megabit low

phone with like a four meg megabit low power memory with a soft error rate of

power memory with a soft error rate of a,000 ft. A fit basically is 10 to nine

a,000 ft. A fit basically is 10 to nine hours per device one failure they call

hours per device one failure they call one ft. So it's kind of general uh the

one ft. So it's kind of general uh the fit rate. So basically means that if

fit rate. So basically means that if with this piece of full mid mic bit uh

with this piece of full mid mic bit uh SRAMM right and uh it will have a one

SRAMM right and uh it will have a one soft error every 28 years uh sounds like

soft error every 28 years uh sounds like not that too bad right however if you

not that too bad right however if you look at high-end router right when with

look at high-end router right when with 10 gig gigabit SRAMM with a rate of 600

10 gig gigabit SRAMM with a rate of 600 ft and then it one error occur every 170

ft and then it one error occur every 170 hours and if a router farm that use 100

hours and if a router farm that use 100 g gigabit of memory Mor and the

g gigabit of memory Mor and the potential network error interrupting

potential network error interrupting will happen every 17 hours. And so it

will happen every 17 hours. And so it means that actually if it's the

means that actually if it's the electronic with with large amount of

electronic with with large amount of SRAM and the storage uh base, right? And

SRAM and the storage uh base, right? And it could uh have a lot of soft errors.

it could uh have a lot of soft errors. Okay. And those ones actually only at

Okay. And those ones actually only at the ground level. If we look at the

the ground level. If we look at the airplane because of the altitude, right?

airplane because of the altitude, right? higher and then if at 35,000 ft and then

higher and then if at 35,000 ft and then with 256 MAC of the memory and then what

with 256 MAC of the memory and then what happened actually is that uh uh we have

happened actually is that uh uh we have like 10 100,000 uh feet for map of

like 10 100,000 uh feet for map of another number is that one potential

another number is that one potential error every five hours. So this is why

error every five hours. So this is why actually electronics in the aerospace

actually electronics in the aerospace and in the space environment they they

and in the space environment they they potentially upset much

potentially upset much more. Okay. So yeah looking at the

more. Okay. So yeah looking at the radiation environment and then we we we

radiation environment and then we we we basically we want to see that how

basically we want to see that how difficult the parts for for example if

difficult the parts for for example if we look at the same models right we

we look at the same models right we implement the same models and in IPJ and

implement the same models and in IPJ and how the different part of the layers got

how the different part of the layers got affected by the radiation and uh then

affected by the radiation and uh then look at the CM and B RM is config

look at the CM and B RM is config configuration memory right the B RAM is

configuration memory right the B RAM is block memory in the when we implement

block memory in the when we implement the in the IPJ A what's the difference

the in the IPJ A what's the difference between this performance between the C C

between this performance between the C C RAM and B RAM and then the object

RAM and B RAM and then the object objective obviously will bring a uh

objective obviously will bring a uh evaluation of reliability of IPJ based

evaluation of reliability of IPJ based and accelerator right we compare the the

and accelerator right we compare the the layers the s sensitivity of the each

layers the s sensitivity of the each layers and we also want to compare the

layers and we also want to compare the configure memory versus the block memory

configure memory versus the block memory uh which one are more sensitive and then

uh which one are more sensitive and then methodology that we try to use uh we

methodology that we try to use uh we apply in the proton and also laser

apply in the proton and also laser testing to show the to try to do the

testing to show the to try to do the study and and then based on that when we

study and and then based on that when we try to come up with some hardening

try to come up with some hardening approaches. So here's the very simple uh

approaches. So here's the very simple uh u mini state set and we implement with

u mini state set and we implement with this

this uh two conversion layer and two uh fully

uh two conversion layer and two uh fully connected layer and we come up with

connected layer and we come up with output and uh the IPG we used actually

output and uh the IPG we used actually the simple MPG is a KEX autoscale uh

the simple MPG is a KEX autoscale uh IPGA. So it have uh it have

IPGA. So it have uh it have the the uh alut flop DSP and B RAM and

the the uh alut flop DSP and B RAM and those are the ones that uh parameters we

those are the ones that uh parameters we used and and uh in term of

used and and uh in term of implementation basically that this is

implementation basically that this is the one that we um after training okay

the one that we um after training okay so we error error

so we error error the d rate is 9 98.6 6 uh the precision

the d rate is 9 98.6 6 uh the precision rate at software level and after we

rate at software level and after we after after we implemented IPGA it goes

after after we implemented IPGA it goes to 96 it drop a little bit not too much

to 96 it drop a little bit not too much and uh here's the architecture of this

and uh here's the architecture of this linet 5 um

structure okay so once we implemented this this model into a IPJ and then we

this this model into a IPJ and then we irradied it with the proton the proton

irradied it with the proton the proton um It basically is just like kind of

um It basically is just like kind of accelerated testing uh so that to mimic

accelerated testing uh so that to mimic the the ground of the space applications

the the ground of the space applications um to to to achieve the result. And here

um to to to achieve the result. And here is that you see

is that you see the the testing equipment hided behind

the the testing equipment hided behind the uh concrete wall to avoid the the

the uh concrete wall to avoid the the radiation and the board itself in the

radiation and the board itself in the beam. Actually this is the FPGA we we

iridated and uh beside the proton testing we also did uh extensive on the

testing we also did uh extensive on the laser testing the reason actually is

laser testing the reason actually is that because proton beam actually quite

that because proton beam actually quite expensive it's about 9,000 hours uh to

expensive it's about 9,000 hours uh to uh to use the beam and uh so the beam

uh to use the beam and uh so the beam time is really limited and then by using

time is really limited and then by using laser the laser is much cheaper because

laser the laser is much cheaper because we set up uh a pulse laser facility in

we set up uh a pulse laser facility in our campus in our lab and so we have a

our campus in our lab and so we have a unlimited use of this laser and second

unlimited use of this laser and second one actually that the laser could mimic

one actually that the laser could mimic the

the the particles and so they can also

the particles and so they can also generate offset into the chip and then

generate offset into the chip and then uh other advantage actually is that we

uh other advantage actually is that we can use the laser scan because uh laser

can use the laser scan because uh laser we can direct the location and time

we can direct the location and time right we can scan the whole chip or we

right we can scan the whole chip or we scan the certain location of chip it

scan the certain location of chip it also give us a lot of uh flexibility to

also give us a lot of uh flexibility to do the carry out the

testing and uh the the laser facility actually is commonly used in this

actually is commonly used in this radiation effect community to do carry

radiation effect community to do carry out the testing especially for research

out the testing especially for research purpose.

purpose. Yeah, this is the image that we looked

Yeah, this is the image that we looked at under the uh under the confocal

at under the uh under the confocal microscope and you can see that actually

microscope and you can see that actually this is the DSP DSP uh column and this

this is the DSP DSP uh column and this is a the lock column and this is the B

is a the lock column and this is the B RAM. So we can we know exactly where the

RAM. So we can we know exactly where the uh our designs are and here is even more

uh our designs are and here is even more a better picture. For example, uh you

a better picture. For example, uh you see that the each layers and they are in

see that the each layers and they are in different colors, right? And then we by

different colors, right? And then we by under the microscope we know exactly

under the microscope we know exactly where the this layers are and we can

where the this layers are and we can irradiate only certain po portion of

irradiate only certain po portion of this uh FPGA or we can uh scan the whole

this uh FPGA or we can uh scan the whole whole chip by using the laser. So that's

whole chip by using the laser. So that's kind of another advantage of the laser.

kind of another advantage of the laser. And then for the for the implementations

And then for the for the implementations actually we have two type one is unh

actually we have two type one is unh hardened. So basically just uh uh plain

hardened. So basically just uh uh plain implementation and the second one

implementation and the second one actually we use this uh QMR we call the

actually we use this uh QMR we call the triple module redundancy and in that

triple module redundancy and in that case we only duplicate the the C2 the

case we only duplicate the the C2 the convol convolution layer two. Uh anyway,

convol convolution layer two. Uh anyway, I give you a reason why we only uh

I give you a reason why we only uh duplicated that part because we're using

duplicated that part because we're using the triple modular redundancy and then

the triple modular redundancy and then that part we can have a better

that part we can have a better protection. So that's the

protection. So that's the idea. Okay. So in in term of this uh

idea. Okay. So in in term of this uh uh the calculation so basically the each

uh the calculation so basically the each layer right they load the parameter

layer right they load the parameter pated the uh the weight and also the

pated the uh the weight and also the data and then they put into the uh

data and then they put into the uh processing element right and then they

processing element right and then they carry out the calculation and then store

carry out the calculation and then store to the BAM and then they move to the

to the BAM and then they move to the next

next layer and here just anyway just a flow

layer and here just anyway just a flow of this how we do convolution uh layer

of this how we do convolution uh layer workflow and the fully connected layer

workflow and the fully connected layer flow. I I won't give too much detail on

flow. I I won't give too much detail on this one because it's fairly

this one because it's fairly straightforward. Okay. So let's look at

straightforward. Okay. So let's look at the result. Okay. So first one let's

the result. Okay. So first one let's look at the the fully neuronet network

look at the the fully neuronet network scan. Basically we scanned the whole uh

scan. Basically we scanned the whole uh convolution neuron network with uh both

convolution neuron network with uh both RAM and and uh and CMA. So basically we

RAM and and uh and CMA. So basically we scan the whole FPGA. We scan the whole

scan the whole FPGA. We scan the whole FPGA and so then it included config

FPGA and so then it included config configured memory and the block memory

configured memory and the block memory and you can see that actually this is

and you can see that actually this is the vertical part it shows the error

the vertical part it shows the error rate okay the error rate and then

rate okay the error rate and then obviously we can see that the fully

obviously we can see that the fully connected layer have the highest uh

connected layer have the highest uh error rate and uh the second one is the

error rate and uh the second one is the convolution layer two and layer one and

convolution layer two and layer one and the fully fully connected layer two so

the fully fully connected layer two so that's include both the uh B RAM and C

that's include both the uh B RAM and C RAM. And then if we only look at the the

RAM. And then if we only look at the the errors from the from the CRAM not the B

errors from the from the CRAM not the B RAM and what happened actually the

RAM and what happened actually the convolutional layer two this one have

convolutional layer two this one have highest error rate and uh then the

highest error rate and uh then the convolution layer one and this fully

convolution layer one and this fully layer one and two they have less error

layer one and two they have less error rate. So that one is when we only look

rate. So that one is when we only look at the arrow from the com from the

at the arrow from the com from the configure memory not the block memory.

configure memory not the block memory. Block memory basically the store all the

Block memory basically the store all the parameters

parameters right okay and then let's look at the

right okay and then let's look at the yeah so so this is diagram shows that

yeah so so this is diagram shows that the error come from the um the critical

the error come from the um the critical error okay from the uh the CM and uh

error okay from the uh the CM and uh from the B RAM. So, so the the green one

from the B RAM. So, so the the green one actually is from the uh the BRAM and the

actually is from the uh the BRAM and the the orange color actually is from the

the orange color actually is from the config configuring memory. So obviously

config configuring memory. So obviously we see that uh the config memory sort of

we see that uh the config memory sort of dominant right you see that from each

dominant right you see that from each layer they have this if this configur

layer they have this if this configur memory have some errors and then they

memory have some errors and then they will they will contribute to the the

will they will contribute to the the the final result and

the final result and uh and and the block me block memory the

uh and and the block me block memory the most contribution is from the fully

most contribution is from the fully click uh fully connected uh connection

click uh fully connected uh connection layer one the reason anyway is that

layer one the reason anyway is that because uh flick layer they have a lot

because uh flick layer they have a lot of parameters right and stored in the

of parameters right and stored in the BAM and then if if the BRM some problems

BAM and then if if the BRM some problems eventually they could possibly

eventually they could possibly contribute to the uh the system

contribute to the uh the system error and and then let's let's look at

error and and then let's let's look at the that's basically the all the

the that's basically the all the critical errors right and if we look at

critical errors right and if we look at the critical error and the nonritical

the critical error and the nonritical error and we can see that uh uh

error and we can see that uh uh this 72%

this 72% um is noncritical error and and then

um is noncritical error and and then only only 28% actually is is critical

only only 28% actually is is critical error and that's for uh if come from the

error and that's for uh if come from the B RAM but if the error come from the C

B RAM but if the error come from the C RAM however more than half of the error

RAM however more than half of the error they will become the critical error so

they will become the critical error so basically means that if if the configur

basically means that if if the configur memory have some errors and more than

memory have some errors and more than 50% they will contribute to the uh

50% they will contribute to the uh critical error So that's another

critical error So that's another difference between the CRAM and B RAM.

difference between the CRAM and B RAM. Uh it's kind of reasonable, right?

Uh it's kind of reasonable, right? Because C CM is configuring memory and

Because C CM is configuring memory and then it it sort of control the the

then it it sort of control the the logic, right? They change the

logic, right? They change the functionality of the of the chip and so

functionality of the of the chip and so that that one eventually it will cause

that that one eventually it will cause the

the error. Okay. And then we carry out the

error. Okay. And then we carry out the this is the laser scan, right? and the

this is the laser scan, right? and the result and then let's look at the result

result and then let's look at the result from the um from the proton experiment.

from the um from the proton experiment. So basically this the same idea because

So basically this the same idea because proton we cannot do the scan right

proton we cannot do the scan right proton basically they just cover the

proton basically they just cover the whole chip and then obviously we can

whole chip and then obviously we can still see that the fully connected layer

still see that the fully connected layer one they contribute most of the errors

one they contribute most of the errors because that include the block be bm and

because that include the block be bm and cm and then if we look at the error come

cm and then if we look at the error come come at the source right still uh the

come at the source right still uh the the ser right they they have a big

the ser right they they have a big portion of the error from RAM they lead

portion of the error from RAM they lead to the critical error uh compared to the

to the critical error uh compared to the block memory only like 20% eventually

block memory only like 20% eventually they become the critical error and and

they become the critical error and and uh then the the last slide basically we

uh then the the last slide basically we we we look at the

um the mitigation part right for example this is our rate uh uh is on hardened

this is our rate uh uh is on hardened and then if we do a full TMR for example

and then if we do a full TMR for example we just uh triple

we just uh triple model redundant on each layer and then

model redundant on each layer and then the error reduced quite fast. But

the error reduced quite fast. But however it require a

however it require a big then black camera for example we

big then black camera for example we only harden one

only harden one layer which

layer which rate. So yeah this is the last slide. So

rate. So yeah this is the last slide. So basically uh

basically uh it we just try to look using a simple

it we just try to look using a simple neuronet network and we we try look at

neuronet network and we we try look at the how the layer sensitivity and also

the how the layer sensitivity and also the BM CRM contribution to overall uh

the BM CRM contribution to overall uh error rate. Um yeah basically that's the

error rate. Um yeah basically that's the the conclusion and uh the current work

the conclusion and uh the current work actually we also uh we have done study

actually we also uh we have done study on this uh dist bird based transformer

on this uh dist bird based transformer model and uh it's been implemented on

model and uh it's been implemented on media and then we we use laser scan and

media and then we we use laser scan and we found out actually uh

also difference is uh beyond one and it shows that must there's a significant

shows that must there's a significant error could

error could uh should come. So we can use some sort

uh should come. So we can use some sort of uh parameter to monitor the error

of uh parameter to monitor the error rate. And uh then the next one another

rate. And uh then the next one another project we come we're working we sort of

project we come we're working we sort of complete this one is that we develop and

complete this one is that we develop and test the radiation tolerant risk by

test the radiation tolerant risk by microcontroller and actually this really

microcontroller and actually this really should be thankful to see and provide

should be thankful to see and provide all the support and uh the hardware open

all the support and uh the hardware open hardware uh um group and uh yeah we

hardware uh um group and uh yeah we designed and tested it with five and uh

designed and tested it with five and uh result I think promising and use it

result I think promising and use it future like program.

Thank you Lee. Great presentation as usual. Uh AI is increasingly being

usual. Uh AI is increasingly being deployed on FPJ in space missions due to

deployed on FPJ in space missions due to their flexibility and radiation

their flexibility and radiation tolerance and this research is uh

tolerance and this research is uh extremely important. I think I have just

extremely important. I think I have just one question about the future.

one question about the future. Uh do you see the use of

Uh do you see the use of uh complex generative AI models in space

uh complex generative AI models in space to uh enable space systems whether

to uh enable space systems whether satellites or or or space missions to be

satellites or or or space missions to be more autonomous in the futures and what

more autonomous in the futures and what what type of uh computing hardware uh

what type of uh computing hardware uh are more qualified to support that? Are

are more qualified to support that? Are there FPGAAS or or we will see more like

there FPGAAS or or we will see more like specialized computing systems in the

specialized computing systems in the future?

future? Yeah, actually that that's very good

Yeah, actually that that's very good question and uh I can see that uh right

question and uh I can see that uh right now uh there are a lot of we so-called

now uh there are a lot of we so-called on board processing especially for

on board processing especially for example imaging and I think eventually

example imaging and I think eventually some other large language model could be

some other large language model could be used right now right the satellite of

used right now right the satellite of they they took a lot of satellite right

they they took a lot of satellite right they do lots of image but they obviously

they do lots of image but they obviously they cannot I'm the audio back. Uh uh

they cannot I'm the audio back. Uh uh Lee uh your audio is is dropping. Can

Lee uh your audio is is dropping. Can you can you speak closer to the mic,

you can you speak closer to the mic, please? Oh, I see. Yeah, now is better.

please? Oh, I see. Yeah, now is better. Yeah. Okay. Yeah. So, so the the the

Yeah. Okay. Yeah. So, so the the the point actually that uh yeah, we we so

point actually that uh yeah, we we so called the onboard processing and and so

called the onboard processing and and so uh especially those uh remote sensing,

uh especially those uh remote sensing, image sensing, right? and and so I I can

image sensing, right? and and so I I can see that this kind of uh um AI

see that this kind of uh um AI processing it it be become more and more

processing it it be become more and more critical for for space exploration.

critical for for space exploration. Uh France, do you want to ask a

Uh France, do you want to ask a question?

question? Yes, I was just wondering yeah when

Yes, I was just wondering yeah when there are errors in the configuration

there are errors in the configuration RAM does that affect mostly the the

RAM does that affect mostly the the lookup tables or or also the routing uh

lookup tables or or also the routing uh like what would be the more more

like what would be the more more critical uh issue arising from it.

critical uh issue arising from it. Right. Right. So, so basically the the

Right. Right. So, so basically the the if you look at the result uh it kind of

if you look at the result uh it kind of support the support the

support the support the the conclusion is that the configurable

the conclusion is that the configurable memory obviously it could can be more

memory obviously it could can be more critical for IPG implementation.

Okay. Yeah. Great. See you Chen in the panel if you are still around. uh

panel if you are still around. uh because this is optional for the

because this is optional for the speakers as a reminder and uh it's been

speakers as a reminder and uh it's been a wonderful workshop. We have been

a wonderful workshop. We have been covering generative AI and AI in general

covering generative AI and AI in general from various perspective and there's a

from various perspective and there's a lot of things to learn. I think I I need

lot of things to learn. I think I I need to rewatch the the workshop to summarize

to rewatch the the workshop to summarize and understand uh all these trends. Are

and understand uh all these trends. Are you ready to switch to the last

you ready to switch to the last presentation? So we will switch gear and

presentation? So we will switch gear and if you are interested in learning about

if you are interested in learning about uh the use of analog computing for the

uh the use of analog computing for the next generation next generation AI

next generation next generation AI workloads stay tuned for this one. So we

workloads stay tuned for this one. So we are thrilled to welcome Nirage Mur

are thrilled to welcome Nirage Mur co-founder of Blue Mind Inc. with over

co-founder of Blue Mind Inc. with over 20 years experience of leadership in

20 years experience of leadership in silicon innovation. Nash now uh leads a

silicon innovation. Nash now uh leads a deep tech startup building the world's

deep tech startup building the world's most efficient neural network processor.

most efficient neural network processor. He'll be presenting all analog neural

He'll be presenting all analog neural network processor to deliver highly

network processor to deliver highly efficient and performant AI

efficient and performant AI inferencing. Over to you Nirage.

inferencing. Over to you Nirage. [Music]

[Music] Great. Uh let's dive right into it. Uh

Great. Uh let's dive right into it. Uh as uh as as was mentioned uh we are a

as uh as as was mentioned uh we are a fabulous semiconductor company on a

fabulous semiconductor company on a mission to develop the world's most

mission to develop the world's most energyefficient neural network

energyefficient neural network processors. That's uh that's our primary

processors. That's uh that's our primary focus. Um um you can see the tagline

focus. Um um you can see the tagline here. We we recognize as as everybody

here. We we recognize as as everybody else in this gathering that the world is

else in this gathering that the world is moving to deploying gen AI, right? But

moving to deploying gen AI, right? But the the challenge is going to be okay

the the challenge is going to be okay gen AI at what cost? And one of the big

gen AI at what cost? And one of the big factors here is is energy consumption.

factors here is is energy consumption. So we've made it our objective because

So we've made it our objective because we've seen this coming for a while uh

we've seen this coming for a while uh that AI's proliferation is going to

that AI's proliferation is going to ultimately be gated by the amount of

ultimately be gated by the amount of energy that can be consumed in in many

energy that can be consumed in in many applications. Um, so we set out to to

applications. Um, so we set out to to really remove that from the equation.

really remove that from the equation. Not by improving AI inferencing by 2x,

Not by improving AI inferencing by 2x, 4x, 10x, but more like a 100x, a,000x,

4x, 10x, but more like a 100x, a,000x, 5,000x without compromising on latency,

5,000x without compromising on latency, performance, and cost. Naturally, our

performance, and cost. Naturally, our initial markets are around the edge

initial markets are around the edge where energy is heavily constrained and

where energy is heavily constrained and typically devices have to carry their

typically devices have to carry their energy source with them in the form of

energy source with them in the form of batteries. So this has drawn us very

batteries. So this has drawn us very naturally to smart sensors, devices, AI

naturally to smart sensors, devices, AI IoT, wearables and smart mobility kind

IoT, wearables and smart mobility kind of markets. Um we're doing all of this

of markets. Um we're doing all of this with a very proprietary technology which

with a very proprietary technology which I'll get into. Uh but we are very

I'll get into. Uh but we are very product focused team. So we are

product focused team. So we are developing our first product, the BM10,

developing our first product, the BM10, the chip that you see here on the

the chip that you see here on the picture. Um that we are currently uh

picture. Um that we are currently uh working to get samples for to our tier

working to get samples for to our tier one customers. and we've got a

one customers. and we've got a multi-stage roadmap to ultimately target

multi-stage roadmap to ultimately target uh Gen AI applications. Uh I think last

uh Gen AI applications. Uh I think last year will go down in history as the year

year will go down in history as the year where Gen AI finally made an appearance

where Gen AI finally made an appearance on the edge on devices. Uh Apple

on the edge on devices. Uh Apple announced uh you know the iPhone 16 Pro,

announced uh you know the iPhone 16 Pro, Meta and Rayban had the smart glasses.

Meta and Rayban had the smart glasses. uh users love their very responsive um

uh users love their very responsive um interaction with these devices along

interaction with these devices along with of course the features that they

with of course the features that they provided. The one challenge however the

provided. The one challenge however the one pain point was energy consumption on

one pain point was energy consumption on the iPhone. Uh if you use these Gen AI

the iPhone. Uh if you use these Gen AI capabilities all the time, you're going

capabilities all the time, you're going to run through your battery within a

to run through your battery within a couple of hours. On the glasses, if you

couple of hours. On the glasses, if you turn on always on audio to interact with

turn on always on audio to interact with the glasses, well now you've reduced

the glasses, well now you've reduced your battery life by about 30%. uh these

your battery life by about 30%. uh these this is the unacceptable tradeoff that

this is the unacceptable tradeoff that users are having to make on these

users are having to make on these devices and we're setting about to to

devices and we're setting about to to change that. Um so in addition to the

change that. Um so in addition to the power consumption uh problem as as a lot

power consumption uh problem as as a lot of you have have observed inferencing uh

of you have have observed inferencing uh is going to dominate AI workloads right

is going to dominate AI workloads right a lot of focus so far has been around

a lot of focus so far has been around training uh that is now moving very

training uh that is now moving very rapidly to inferencing as these models

rapidly to inferencing as these models get deployed and put out in the world uh

get deployed and put out in the world uh nextgen use cases like uh robotics and

nextgen use cases like uh robotics and human augmentation are being limited by

human augmentation are being limited by the fundamentals of digital processors

the fundamentals of digital processors and I'll I'll talk a little bit more

and I'll I'll talk a little bit more about this but uh the semiconductor

about this but uh the semiconductor industry has you could say it's a mature

industry has you could say it's a mature industry and actually has been optimized

industry and actually has been optimized heavily to deliver phenomenal

heavily to deliver phenomenal performance at at a very low

performance at at a very low cost with machine

cost with machine learning paradigm has been upended and

learning paradigm has been upended and and we've really reached the the limits

and we've really reached the the limits uh to what our existing architectures

uh to what our existing architectures can deliver. Uh there are also

can deliver. Uh there are also significant privacy and latency

significant privacy and latency concerns. Of course, with cloud-based AI

concerns. Of course, with cloud-based AI processing, you don't want to be sending

processing, you don't want to be sending uh all your real-time data into a cloud

uh all your real-time data into a cloud uh into jurisdictions where you may not

uh into jurisdictions where you may not have any control over your data. So, all

have any control over your data. So, all of these problems exist. Uh they're

of these problems exist. Uh they're they're kind of added challenges at the

they're kind of added challenges at the edge. Um our industry has been doing a

edge. Um our industry has been doing a lot to address the these challenges in

lot to address the these challenges in the semiconductor world. Um, you know,

the semiconductor world. Um, you know, there's a there's a there's a chart

there's a there's a there's a chart that's trying to depict the kind of

that's trying to depict the kind of activities that have been happening and

activities that have been happening and consuming many billions of dollars and

consuming many billions of dollars and and lots of smart engineers working

and lots of smart engineers working around this challenge. taking a

around this challenge. taking a 85year-old vonoman computer architecture

85year-old vonoman computer architecture which was really developed for human

which was really developed for human written sequential code and modifying

written sequential code and modifying optimizing it to uh to meet the needs of

optimizing it to uh to meet the needs of this very compute inensive and massively

this very compute inensive and massively parallel nature of neural network

parallel nature of neural network processing and we've gone from you know

processing and we've gone from you know your traditional CPUs to GPUs uh we've

your traditional CPUs to GPUs uh we've gone to near memory compute moving the

gone to near memory compute moving the memory on chip uh and eliminating the

memory on chip uh and eliminating the need for going offchip to access

need for going offchip to access activations and weights. And uh finally

activations and weights. And uh finally even in memory this is this is uh being

even in memory this is this is uh being heavily worked on in the industry now

heavily worked on in the industry now where you integrate some compute

where you integrate some compute elements right right in the memory array

elements right right in the memory array itself all with the goal of reducing the

itself all with the goal of reducing the energy lost in data movement. This is a

energy lost in data movement. This is a huge grain uh when you're doing these

huge grain uh when you're doing these very intense matmal operations that are

very intense matmal operations that are inherent with all the neural networks

inherent with all the neural networks that we have today.

that we have today. energy group is, you know, we talk a lot

energy group is, you know, we talk a lot about the tops, but but an important

about the tops, but but an important metric is tops per watt, right? What is

metric is tops per watt, right? What is the energy efficiency uh per watt and

the energy efficiency uh per watt and and the the the bleeding edge in the

and the the the bleeding edge in the industry right now is in the tens of

industry right now is in the tens of tops per watt, maybe around 50 tops per

tops per watt, maybe around 50 tops per watt is the bleeding edge and CPUs are s

watt is the bleeding edge and CPUs are s are kind of sub five tops per watt,

are kind of sub five tops per watt, right? So that's the kind of the spread

right? So that's the kind of the spread in the industry today and this is not

in the industry today and this is not getting a lot better. Okay, the the

getting a lot better. Okay, the the going into really advanced nodes like

going into really advanced nodes like 5nanmter and 2nmter is not changing this

5nanmter and 2nmter is not changing this equation anymore. So we've really hit

equation anymore. So we've really hit this performance wall where the vonoyman

this performance wall where the vonoyman architecture and uh Moors law are really

architecture and uh Moors law are really not helping anymore. So a new way of

not helping anymore. So a new way of thinking is needed and this is exactly

thinking is needed and this is exactly what we're doing with our analog compute

what we're doing with our analog compute architecture at a very high level.

architecture at a very high level. Right? what we're doing is taking uh

Right? what we're doing is taking uh your traditional MAC arrays, right? And

your traditional MAC arrays, right? And and the folks u in earlier talks talked

and the folks u in earlier talks talked about uh how uh they're implementing a

about uh how uh they're implementing a huge number of these increasing number

huge number of these increasing number of these to offer increased capacity in

of these to offer increased capacity in their chips. Um but we're we're we're

their chips. Um but we're we're we're kind of going going down to first

kind of going going down to first principles and reimagining okay what

principles and reimagining okay what exactly is done when you're doing a

exactly is done when you're doing a multiply accumulate operation and how

multiply accumulate operation and how could we do that more efficiently. The

could we do that more efficiently. The digital world uses a paradigm on the on

digital world uses a paradigm on the on the left here uh or some variant of this

the left here uh or some variant of this where you got memories storing your

where you got memories storing your weights storing your activations. You

weights storing your activations. You suck them out of the memory. Um you load

suck them out of the memory. Um you load them into a Mac. You do the you do the

them into a Mac. You do the you do the arithmetic. you write the results back

arithmetic. you write the results back into your memories. All of this is

into your memories. All of this is controlled by a high-speed clock and an

controlled by a high-speed clock and an instruction set that gets compiled from

instruction set that gets compiled from some highle AI framework. Right? So

some highle AI framework. Right? So that's typically how these instruction

that's typically how these instruction set based processors work. Uh what we've

set based processors work. Uh what we've done is we've gone back to first

done is we've gone back to first principles and said okay how can we do

principles and said okay how can we do this more efficiently? How can we

this more efficiently? How can we leverage the inherent device physics of

leverage the inherent device physics of the transistors themselves in order to

the transistors themselves in order to do some mathematical operations?

do some mathematical operations? And this led us to our architectural

And this led us to our architectural breakthrough um we've we've named Ample.

breakthrough um we've we've named Ample. Um it it draws its inspiration from the

Um it it draws its inspiration from the human brain which is an incredibly

human brain which is an incredibly efficient uh biochemical signal

efficient uh biochemical signal processor. Not really a computer of any

processor. Not really a computer of any sort. Um and we've we've taken that

sort. Um and we've we've taken that inspiration and created three main

inspiration and created three main elements of our architecture. The

elements of our architecture. The synapsis neuron and our ephemeral memory

synapsis neuron and our ephemeral memory um to implement standard neural

um to implement standard neural networks. Our

networks. Our synapses are multipliers essentially

synapses are multipliers essentially that uh exploit the device physics of

that uh exploit the device physics of the transistor itself to perform the

the transistor itself to perform the multiplication. So when an activation

multiplication. So when an activation hits the gate of the transistor by

hits the gate of the transistor by modifying the threshold voltage of it,

modifying the threshold voltage of it, the drain current represents the

the drain current represents the multiply result. Okay. And then to sum

multiply result. Okay. And then to sum multiple synapses together, we can just

multiple synapses together, we can just string them together. Um just connect

string them together. Um just connect the wires together. So summation in when

the wires together. So summation in when you're using charge uh is very simple.

you're using charge uh is very simple. You just connect wires together and let

You just connect wires together and let physics do the work for you. Uh and then

physics do the work for you. Uh and then we can store that resultant value in

we can store that resultant value in neurons uh which are capacitor-based

neurons uh which are capacitor-based element which then can fire that

element which then can fire that activation downstream to the following

activation downstream to the following layer uh for further processing. We also

layer uh for further processing. We also have our ephemeral memory. The ephemeral

have our ephemeral memory. The ephemeral memory is highly silicon area and power

memory is highly silicon area and power optimized storage element that can take

optimized storage element that can take um the the the way we encode activations

um the the the way we encode activations and data and our architecture is

and data and our architecture is different than the digital world. it can

different than the digital world. it can take those encoded values and store them

take those encoded values and store them temporarily not for days or hours but

temporarily not for days or hours but more milliseconds and microsconds while

more milliseconds and microsconds while they're u needed to be stored for the

they're u needed to be stored for the next layer to come in and process the

next layer to come in and process the information. So these three elements uh

information. So these three elements uh basically form the foundation of our

basically form the foundation of our architecture. uh some additional

architecture. uh some additional benefits right we we we do everything

benefits right we we we do everything end to end in the neural network without

end to end in the neural network without any kind of data conversion a tods or

any kind of data conversion a tods or dtos these are power hogs and we've

dtos these are power hogs and we've eliminated them completely from our

eliminated them completely from our architecture uh we don't use any fast

architecture uh we don't use any fast clocks uh in fact we don't use any

clocks uh in fact we don't use any clocks at all this is 100% eventdriven

clocks at all this is 100% eventdriven uh architecture so if you have sparse

uh architecture so if you have sparse sparity in the neural network weights or

sparity in the neural network weights or activations that is automatically taken

activations that is automatically taken care If there's zeros, um, we just don't

care If there's zeros, um, we just don't consume any charge and and don't do

consume any charge and and don't do anything. In the digital world, well,

anything. In the digital world, well, you you may be reading and writing zeros

you you may be reading and writing zeros into your memory and doing zero

into your memory and doing zero multiplies and so on. So, sparsity

multiplies and so on. So, sparsity handling in the digital paradigm is is

handling in the digital paradigm is is much more tricky to do. Uh, here it's

much more tricky to do. Uh, here it's just taken care of by the architecture.

just taken care of by the architecture. Um, we also are impervious to PBT

Um, we also are impervious to PBT variations. This is an analog

variations. This is an analog architecture. So it's it's not trivial

architecture. So it's it's not trivial to deal with these effects, these

to deal with these effects, these operating condition effects. And we have

operating condition effects. And we have developed some proprietary operation and

developed some proprietary operation and controls routines and circuits that run

controls routines and circuits that run and keep everything in check uh

and keep everything in check uh delivering no loss in accuracy versus a

delivering no loss in accuracy versus a standard digital architecture. Uh

standard digital architecture. Uh additionally uh the cherry on top is the

additionally uh the cherry on top is the is that we can implement these uh the

is that we can implement these uh the these designs in very mature standard

these designs in very mature standard CBOS processes. So we don't need any

CBOS processes. So we don't need any special masks to develop this. They're

special masks to develop this. They're they're standard digital flow and uh you

they're standard digital flow and uh you know processes that that were bleeding

know processes that that were bleeding edge uh 12 years ago, right? So this

edge uh 12 years ago, right? So this keeps cost very low making them really

keeps cost very low making them really ideal for these high volume edge

ideal for these high volume edge deployments. Um the other kind of side

deployments. Um the other kind of side effect of of doing things in an analog

effect of of doing things in an analog way is that you can interact with our

way is that you can interact with our analog world in a different way today.

analog world in a different way today. Uh if you have an environment sensor uh

Uh if you have an environment sensor uh measuring a parameter like light in an

measuring a parameter like light in an image sensor or a pressure sensor or a

image sensor or a pressure sensor or a microphone, these are all analog

microphone, these are all analog parameters that that an analog

parameters that that an analog environment sensor typically measures

environment sensor typically measures and then we digitize the information.

and then we digitize the information. Right? That's typically what's done

Right? That's typically what's done today. We digitize it. we send a digital

today. We digitize it. we send a digital signal into your digital processor that

signal into your digital processor that then runs essentially an emulation of a

then runs essentially an emulation of a neural network in its uh instruction

neural network in its uh instruction setbased architecture. Um what we do

setbased architecture. Um what we do because we are an analog processor, we

because we are an analog processor, we can eliminate that digitization step

can eliminate that digitization step that consumes power that adds to the

that consumes power that adds to the cost of the solution as well. And we can

cost of the solution as well. And we can ingest analog environment in information

ingest analog environment in information directly into our neural network.

directly into our neural network. Thereby making this thing really ideal

Thereby making this thing really ideal for integration with sensors and and

for integration with sensors and and making dumb sensors smart.

making dumb sensors smart. The other uh attribute is we can scale

The other uh attribute is we can scale up these these these devices to do

up these these these devices to do pretty large networks. We do have on our

pretty large networks. We do have on our road map a pathway to get to u genai

road map a pathway to get to u genai transformer uh based networks in in call

transformer uh based networks in in call it the the llama 8 billion kind of

it the the llama 8 billion kind of range. Uh that's that's kind of where we

range. Uh that's that's kind of where we where we hope to get to in the next

where we hope to get to in the next couple of years. Um we've also paid a

couple of years. Um we've also paid a lot of attention on the software stack.

lot of attention on the software stack. We want to make our silicon as easy to

We want to make our silicon as easy to integrate with existing AI frameworks

integrate with existing AI frameworks without the need for any we work there

without the need for any we work there from the data science side. So um so we

from the data science side. So um so we have an ESSB product line like our first

have an ESSB product line like our first ship the VM 110 where we we can provide

ship the VM 110 where we we can provide um we can provide training models uh we

um we can provide training models uh we can use a customer data set or our own

can use a customer data set or our own data set do the training using a

data set do the training using a standard uh framework in in the cloud.

standard uh framework in in the cloud. Once training is complete we have a

Once training is complete we have a mapper that then loads onto our chip and

mapper that then loads onto our chip and then we run uh inferencing from that

then we run uh inferencing from that point onwards. We also do custom

point onwards. We also do custom solutions. Uh we can build custom

solutions. Uh we can build custom products for high volume strategic

products for high volume strategic applications. So our technology lends

applications. So our technology lends itself well there as well. Um here are

itself well there as well. Um here are three examples of the kind of u impact

three examples of the kind of u impact we can have uh for natural language

we can have uh for natural language processing pretty tiny networks in the

processing pretty tiny networks in the 100 call it 150 kilobyte range. Uh one

100 call it 150 kilobyte range. Uh one microwatt is what we can deliver.

microwatt is what we can deliver. Industryleading solutions are north of

Industryleading solutions are north of 100 microwatts. If you're doing uh you

100 microwatts. If you're doing uh you know object detection a basic network uh

know object detection a basic network uh like a BG11 uh 500 plus microwatts we

like a BG11 uh 500 plus microwatts we come in at sub 10 microwatts at 5 fps

come in at sub 10 microwatts at 5 fps gesture this is a beefier network about

gesture this is a beefier network about north of four 4 million 4 megabyte

north of four 4 million 4 megabyte parameters you know 100 microwatts is

parameters you know 100 microwatts is what we can do when standard digital

what we can do when standard digital solutions are tens to hundreds of

solutions are tens to hundreds of millows right so as the network gets

millows right so as the network gets bigger so does our our advantage

bigger so does our our advantage here. Um, we have a we have a a strong

here. Um, we have a we have a a strong track record of delivering silicon.

track record of delivering silicon. We've done three silicon tape outs so

We've done three silicon tape outs so far. Um, and we've got uh, you know,

far. Um, and we've got uh, you know, multiple customers that we're engaged

multiple customers that we're engaged with trying to get this solution out

with trying to get this solution out into the market. Um, I'm going to skip a

into the market. Um, I'm going to skip a couple of charts in the interest of

couple of charts in the interest of time. Uh, as I mentioned earlier, we're

time. Uh, as I mentioned earlier, we're we're executing a three-stage road map

we're executing a three-stage road map here, getting from audio all the way to

here, getting from audio all the way to uh to Gen AI use cases in the future.

uh to Gen AI use cases in the future. Um, here are three examples of the kind

Um, here are three examples of the kind of use cases we're driving right now.

of use cases we're driving right now. You know, object detection in smart

You know, object detection in smart cameras that are battery operated is

cameras that are battery operated is very interesting for surveillance and

very interesting for surveillance and retail kind of environments. uh smart

retail kind of environments. uh smart tires that can detect road conditions um

tires that can detect road conditions um and classify those for an autonomous

and classify those for an autonomous driver or or a human driver. And then uh

driver or or a human driver. And then uh smart wearables is another space where

smart wearables is another space where you want to insert uh audio interface in

you want to insert uh audio interface in in devices that last many weeks on a

in devices that last many weeks on a single charge. U that's an example of a

single charge. U that's an example of a use case we're targeting right now. So

use case we're targeting right now. So with that, I'll I'll uh I'll pause. I

with that, I'll I'll uh I'll pause. I think I've used up all my time. Maybe I

think I've used up all my time. Maybe I don't know if I have time for a question

don't know if I have time for a question or two.

or two. Just on time like uh analog

Just on time like uh analog chips. Um I don't have any questions in

chips. Um I don't have any questions in the chat. I'm I'm really fascinated

the chat. I'm I'm really fascinated fascinated by the uh the specification

fascinated by the uh the specification you provided. Maybe uh just one high

you provided. Maybe uh just one high level question. Uh I'm new to analog

level question. Uh I'm new to analog computing especially for AI. So given

computing especially for AI. So given how quickly AI models evolve and uh how

how quickly AI models evolve and uh how does your analog architectures adapt to

does your analog architectures adapt to support future model requirements and

support future model requirements and how long do you expect a processor that

how long do you expect a processor that is designed today to remain relevant in

is designed today to remain relevant in performance?

performance? Yeah, it's it's a great question. Um

Yeah, it's it's a great question. Um that's one of the tradeoffs you have to

that's one of the tradeoffs you have to very cleverly make. Okay, there's we're

very cleverly make. Okay, there's we're not building general purpose processors

not building general purpose processors here. We're building very application

here. We're building very application specific processors that will support um

specific processors that will support um call it a zoo of networks. Um but it

call it a zoo of networks. Um but it will not I it's not going to be as

will not I it's not going to be as flexible as your risk 5 or ARM coursees.

flexible as your risk 5 or ARM coursees. Okay. The idea is we want to at least

Okay. The idea is we want to at least initially target with our gen one

initially target with our gen one products applications that have

products applications that have welldefined network requirements and

welldefined network requirements and they're driving energy efficiency.

they're driving energy efficiency. Right? That's what that's their that's

Right? That's what that's their that's the single biggest pain point. In our

the single biggest pain point. In our future generations, we are envisioning a

future generations, we are envisioning a much more flexible uh and programmable

much more flexible uh and programmable architecture kind of like an FPGA for AI

architecture kind of like an FPGA for AI if you will uh with our analog compute

if you will uh with our analog compute elements that can be reconfigured much

elements that can be reconfigured much more than than they can in the gen one

more than than they can in the gen one version.

version. It's it's a trade-off. Yeah, go ahead

It's it's a trade-off. Yeah, go ahead next. Okay. Thanks. Uh yeah, thanks for

next. Okay. Thanks. Uh yeah, thanks for the the presentation. It's nice to see

the the presentation. It's nice to see people trying to do uh analog computing

people trying to do uh analog computing for for AI. I I think as you mentioned

for for AI. I I think as you mentioned the main challenge with the this

the main challenge with the this approach is going to be robustness,

approach is going to be robustness, right? Because you're directly exposing

right? Because you're directly exposing your computation to uh to the noise and

your computation to uh to the noise and and fabrication variation of your of

and fabrication variation of your of your circuit. So I I was wondering um if

your circuit. So I I was wondering um if you could say a bit more about that and

you could say a bit more about that and in particular like is is do you think

in particular like is is do you think the better approach to this is to modify

the better approach to this is to modify the DNN model to to make it more robust

the DNN model to to make it more robust or do you think that the robustness

or do you think that the robustness should come from the circuit itself?

should come from the circuit itself? Um actually Franis the robustness is not

Um actually Franis the robustness is not an issue that's the problem we have

an issue that's the problem we have solved. We have we've spent the last

solved. We have we've spent the last three years in R&D uh make guaranteeing

three years in R&D uh make guaranteeing that our computations are accurate uh

that our computations are accurate uh over time over temperature uh right this

over time over temperature uh right this is exactly what our IP is so robustness

is exactly what our IP is so robustness is not really a challenge um or it is a

is not really a challenge um or it is a challenge but it's one that we have

challenge but it's one that we have addressed our we we're striving for

addressed our we we're striving for accuracy um equivalent to a digital MCU

accuracy um equivalent to a digital MCU for the same workload

but But how like so you so you you do you modify the DNN model or not? No, we

you modify the DNN model or not? No, we do not we do not we do not need to

do not we do not we do not need to modify the model. So we we deliver uh we

modify the model. So we we deliver uh we we use standard CNN's RNN's and and soon

we use standard CNN's RNN's and and soon transformers. Um we don't need to modify

transformers. Um we don't need to modify the model. We execute the same model. Uh

the model. We execute the same model. Uh and we've taken care of this accuracy

and we've taken care of this accuracy challenge in our

challenge in our circuits. Okay. Thank you. One one last

circuits. Okay. Thank you. One one last question. Why train? uh unmute yourself

question. Why train? uh unmute yourself and ask your question if you wish. Oh

and ask your question if you wish. Oh yeah, thank you Yasin and thank you Nash

yeah, thank you Yasin and thank you Nash for a really interesting presentation. I

for a really interesting presentation. I um I have a question about the do my

um I have a question about the do my translator. I think I may miss at some

translator. I think I may miss at some point, but is it like a do you do you

point, but is it like a do you do you guys develop um it as as a software

guys develop um it as as a software development kit that maps the model into

development kit that maps the model into the um the amp ambo architectures or can

the um the amp ambo architectures or can you specify more with that? Thank you.

you specify more with that? Thank you. Yeah, sure. Uh there there are two steps

Yeah, sure. Uh there there are two steps to it. Uh we do have a software that um

to it. Uh we do have a software that um that helps configure the hardware to the

that helps configure the hardware to the network the target network. Right? So

network the target network. Right? So there's a configuration tool there and

there's a configuration tool there and then the other step is taking the

then the other step is taking the trained uh neural network weights coming

trained uh neural network weights coming from your training process and and

from your training process and and converting them to a format that can be

converting them to a format that can be loaded onto the chip. So those are the

loaded onto the chip. So those are the two steps and we provide software um you

two steps and we provide software um you know drivers and utilities that that

know drivers and utilities that that accommodate both those things.

accommodate both those things. Okay. So two steps. Okay. Yeah. Yes. I

Okay. So two steps. Okay. Yeah. Yes. I got it. Thank you. Perfect. Thank you

got it. Thank you. Perfect. Thank you very much Niraj. Now to close out our

very much Niraj. Now to close out our incredible lineup of presentations, it's

incredible lineup of presentations, it's time to to dive into a forward-looking

time to to dive into a forward-looking conversation on the future of generative

conversation on the future of generative AI at the edge. To lead this panel, I am

AI at the edge. To lead this panel, I am very pleased to welcome Walter, a

very pleased to welcome Walter, a seasoned expert in IoT innovation

seasoned expert in IoT innovation ecosystems and technology strategy.

ecosystems and technology strategy. Walter is the executive director of AIoT

Walter is the executive director of AIoT Canada and bring deep industry insight

Canada and bring deep industry insight and experience to today's discussion.

and experience to today's discussion. Please join me in welcoming Walter to

Please join me in welcoming Walter to the stage as he guide us through an

the stage as he guide us through an engaging panel on the challenges,

engaging panel on the challenges, opportunities and innovation shaping

opportunities and innovation shaping edge generative AI. I would like to ask

edge generative AI. I would like to ask the speakers uh voluntarily to to unmute

the speakers uh voluntarily to to unmute their themsel and uh show their camera

their themsel and uh show their camera so they can be part of this discussion

so they can be part of this discussion and this is optional so no pressure.

and this is optional so no pressure. Walter, do you have any slides or no? I

Walter, do you have any slides or no? I don't have slides which is great. Let's

don't have slides which is great. Let's dive in. Okay. So, great. Um, if we

dive in. Okay. So, great. Um, if we could uh remove the share and just see

could uh remove the share and just see everybody's faces, that would be great.

everybody's faces, that would be great. Um, so first of all, congratulations

Um, so first of all, congratulations everyone to for your insightful

everyone to for your insightful presentations. Uh, there's a lot of

presentations. Uh, there's a lot of innovation presented here. Uh, it almost

innovation presented here. Uh, it almost makes me want to do silicon design

makes me want to do silicon design again. Uh, and you see um great job for

again. Uh, and you see um great job for keeping the whole team whole thing on

keeping the whole team whole thing on track. Um, so my name is Walter as Yen

track. Um, so my name is Walter as Yen said. Um I'm the president of AIoT

said. Um I'm the president of AIoT Canada. It's an industry association

Canada. It's an industry association that brings IoT and AI actors together.

that brings IoT and AI actors together. So we provide a platform for our members

So we provide a platform for our members to connect, interact, learn from each

to connect, interact, learn from each other. And we also provide kind of voice

other. And we also provide kind of voice of the industry for our members toward

of the industry for our members toward government strategies and policies. And

government strategies and policies. And um our members include AIoT providers,

um our members include AIoT providers, adopters, uh investors, academia and so

adopters, uh investors, academia and so on. So I'm very eager to uh moderate

on. So I'm very eager to uh moderate this discussion. Um

this discussion. Um since no one needs introduction here,

since no one needs introduction here, we'll just jump right into it. Um so

we'll just jump right into it. Um so far most of the talks have been pretty

far most of the talks have been pretty down in the weeds of technology, right?

down in the weeds of technology, right? So this past panel discussion is

So this past panel discussion is different that we need to take our eyes

different that we need to take our eyes up and cast it forward and our two tools

up and cast it forward and our two tools are going to be a crystal ball and a

are going to be a crystal ball and a radar. Right? So um we'll cover

radar. Right? So um we'll cover opportunities, challenges and some of

opportunities, challenges and some of the innovations underfoot to address

the innovations underfoot to address those uh challenges and

those uh challenges and opportunities. So it's not a Q&A on your

opportunities. So it's not a Q&A on your presentations for this one. You have to

presentations for this one. You have to present your Gen AI, right? So you have

present your Gen AI, right? So you have all this vast knowledge and

all this vast knowledge and experience about gen AI and AI in

experience about gen AI and AI in general and now you're going to use this

general and now you're going to use this knowledge, have a model in your head to

knowledge, have a model in your head to kind of forecast and generate

kind of forecast and generate uh predictions about the future.

uh predictions about the future. Um, now I may be asking specific uh

Um, now I may be asking specific uh people questions, but in general it'll

people questions, but in general it'll be a question and it's open to anybody

be a question and it's open to anybody that wants to uh address it. Just uh

that wants to uh address it. Just uh start talking and uh there's a race

start talking and uh there's a race condition. I'll deal with it. And uh

condition. I'll deal with it. And uh also don't hesitate to uh ask questions

also don't hesitate to uh ask questions yourself to other panelists or make

yourself to other panelists or make comments on other comments. Okay. So

comments on other comments. Okay. So let's start with opportunities. I will

let's start with opportunities. I will allocate roughly same amount of time to

allocate roughly same amount of time to opportunities and then challenges and

opportunities and then challenges and then solutions and innovations. So first

then solutions and innovations. So first question would be what are some of the

question would be what are some of the key opportunities or use cases that you

key opportunities or use cases that you see dominating edge gen AI in the next

see dominating edge gen AI in the next say two to five years. Uh I noticed that

say two to five years. Uh I noticed that some of you gave a shout out to

some of you gave a shout out to automotive. So clearly right now that

automotive. So clearly right now that seems to be it. Is it going to continue?

seems to be it. Is it going to continue? Are there other sectors that will start

Are there other sectors that will start to dominate in that

to dominate in that space? Anybody want to tackle

that? I can certainly say that um I I mentioned we've engaged with about 25

mentioned we've engaged with about 25 plus customers and we didn't go looking

plus customers and we didn't go looking in any market. We just have a general

in any market. We just have a general product and we let the customers sign

product and we let the customers sign up. uh they voted with their feet and to

up. uh they voted with their feet and to our surprise um because in our previous

our surprise um because in our previous product I'd say 30% of our customers

product I'd say 30% of our customers were automotive and 70 were

were automotive and 70 were non-automotive that's consumer video

non-automotive that's consumer video surveillance

surveillance manufacturing AR VR printers etc. Um but

manufacturing AR VR printers etc. Um but the automotive was 30%. Suddenly that

the automotive was 30%. Suddenly that nearly doubled we're kind of over 50

nearly doubled we're kind of over 50 closer to 60% for this uh latest

closer to 60% for this uh latest generation. Um so that was the first

generation. Um so that was the first surprise is that automotive really is

surprise is that automotive really is and that's worldwide. It's Asia, Japan,

and that's worldwide. It's Asia, Japan, Europe, North America, less North

Europe, North America, less North America surprisingly um but uh the rest

America surprisingly um but uh the rest of the world for sure. And the other

of the world for sure. And the other surprise was well we expected to object

surprise was well we expected to object detection and you pedestrian recognition

detection and you pedestrian recognition and and stopping the car but then we get

and and stopping the car but then we get a ton of requests for you know Lama 2

a ton of requests for you know Lama 2 and for in cabin infotainment and all

and for in cabin infotainment and all kinds of other things. So that was for

kinds of other things. So that was for me a uh a surprise both that automotive

me a uh a surprise both that automotive was so dominant in our customer base and

was so dominant in our customer base and second that um the class of applications

second that um the class of applications was what was not more than I expected.

was what was not more than I expected. We got the radar, the LAR, the sensor

We got the radar, the LAR, the sensor fusion with video vision um but we also

fusion with video vision um but we also got this whole new class of even you

got this whole new class of even you know stable diffusion you know for

know stable diffusion you know for infotainment. So that was so that for me

infotainment. So that was so that for me points at least from the the data you

points at least from the the data you know going from 30 to 60% at least from

know going from 30 to 60% at least from our exposure that automotive clearly

our exposure that automotive clearly seems a good opportunity and much wider

seems a good opportunity and much wider and multi-dimensional than I was

and multi-dimensional than I was expecting. So is that going to do you

expecting. So is that going to do you think persist over the foreseeable

think persist over the foreseeable future you know whatever horizon you

future you know whatever horizon you want to put on it whether it's two to 10

want to put on it whether it's two to 10 years or one to three or whatever the

years or one to three or whatever the case may be.

case may be. Well, because automotive in the end is

Well, because automotive in the end is some it's a place we spend I don't know

some it's a place we spend I don't know 20% of our free time and if we live in

20% of our free time and if we live in Toronto that's that's 60%. Uh, in any

Toronto that's that's 60%. Uh, in any case, it's a lot of time. It's a lot of

case, it's a lot of time. It's a lot of what we do and we're kind of prisoner

what we do and we're kind of prisoner and so there's lots of things you can do

and so there's lots of things you can do with electronics in your car and so

with electronics in your car and so drive it. the drive itself is the

drive it. the drive itself is the obvious one but um you know doing your

obvious one but um you know doing your work and and entertainment and so yeah I

work and and entertainment and so yeah I think uh we've only just touched on uh

think uh we've only just touched on uh 5% like I take the latest you know Tesla

5% like I take the latest you know Tesla is a pretty good example of pretty

is a pretty good example of pretty high-end autonomous drive and it's still

high-end autonomous drive and it's still only 30% there it's still a little risky

only 30% there it's still a little risky but it's already pretty impressive but

but it's already pretty impressive but we're not there yet in terms of all the

we're not there yet in terms of all the other potential I've not seen anything

other potential I've not seen anything yet it's really 5 years in the future

yet it's really 5 years in the future that the things that our customers for

that the things that our customers for the licensing now will be in cars in in

the licensing now will be in cars in in five six years. Okay. Um so that that

five six years. Okay. Um so that that good points. So Davis uh I also noted

good points. So Davis uh I also noted that you you talked about uh uh uh

that you you talked about uh uh uh automotive applications. Uh do you have

automotive applications. Uh do you have any comments on that? Uh is that still

any comments on that? Uh is that still going to be a strong thing going or is

going to be a strong thing going or is health going to take over or any other

health going to take over or any other sector? I think they just left the Oh,

sector? I think they just left the Oh, did he? Okay. Okay. Would anybody else

did he? Okay. Okay. Would anybody else want to add to this uh question about

want to add to this uh question about which sector might might become

which sector might might become prominent uh and complement uh uh the

prominent uh and complement uh uh the automotive in terms of use of a

AJI here? Uh, James Miller, are you noticing any trends with say the

noticing any trends with say the automotive

automotive industry not uh producing as many cars

industry not uh producing as many cars or or not increasing their production of

or or not increasing their production of cars so much and people moving towards

cars so much and people moving towards say uh transportation systems like light

say uh transportation systems like light rail and other types of things like

rail and other types of things like that? And perhaps some early are you

that? And perhaps some early are you seeing any early

seeing any early indicators of some of the technology or

indicators of some of the technology or infotainment and solutions and whatnot

infotainment and solutions and whatnot moving from the car to more mass transit

moving from the car to more mass transit uh scalable solutions for cities.

uh scalable solutions for cities. Yeah. So our customers uh don't see that

Yeah. So our customers uh don't see that that kind of it's a competition, right?

that kind of it's a competition, right? that's they're working to keep cars

that's they're working to keep cars alive and interesting and a good market.

alive and interesting and a good market. Um, so they're for sure not telling us

Um, so they're for sure not telling us about, you know, what the competition is

about, you know, what the competition is and and we're not engaging with, you

and and we're not engaging with, you know, alternate forms of of transit

know, alternate forms of of transit other than, you know, robo taxi. I guess

other than, you know, robo taxi. I guess you could consider that as going in that

you could consider that as going in that direction. Um, what I can say from the

direction. Um, what I can say from the automotive customers, it's a industry in

automotive customers, it's a industry in crisis.

crisis. um there's the established players and

um there's the established players and then there's suddenly a whole bunch of

then there's suddenly a whole bunch of new players in particular in China with

new players in particular in China with EV uh expertise and AI uh you know

EV uh expertise and AI uh you know strong knowhow and so there's a real

strong knowhow and so there's a real competition in terms of there's a a new

competition in terms of there's a a new car of the future and the established

car of the future and the established players are are struggling with that

players are are struggling with that they're trying to reinvent themselves

they're trying to reinvent themselves and the uh Asian market and yeah I'd say

and the uh Asian market and yeah I'd say mostly in China and Asia you

mostly in China and Asia you aggressively moving in, changing the

aggressively moving in, changing the rules of the game. Tesla was the first.

rules of the game. Tesla was the first. Um, but now we're seeing, you know, a

Um, but now we're seeing, you know, a whole bunch of other companies that are

whole bunch of other companies that are going beyond Tesla. So, that's the first

going beyond Tesla. So, that's the first thing I see is the unknown. Um, because

thing I see is the unknown. Um, because it's like two markets fighting

it's like two markets fighting themselves. The established one and this

themselves. The established one and this brand new one that could just sweep

brand new one that could just sweep everything away and people are reacting

everything away and people are reacting different ways. Some trying to reinvent

different ways. Some trying to reinvent themselves, some trying to just do

themselves, some trying to just do better what they're good at.

better what they're good at. um hard to have a crystal ball there

um hard to have a crystal ball there other than massive changes of foot.

other than massive changes of foot. There's a crisis of foot um and the kind

There's a crisis of foot um and the kind of mass transit and alternate forms of

of mass transit and alternate forms of transportation through our customers. We

transportation through our customers. We don't unfortunately I don't see that uh

don't unfortunately I don't see that uh that trend. Okay. I

that trend. Okay. I Okay. Yes. Just curious how many uh

Okay. Yes. Just curious how many uh speakers are are left and participating

speakers are are left and participating in this

in this panel. Uh so we have uh Pierre. Go

panel. Uh so we have uh Pierre. Go ahead. Yes. How many? We have uh Warren.

ahead. Yes. How many? We have uh Warren. Warren is still

Warren is still here

here Pierre

Pierre Chen and as I said this is optional. We

Chen and as I said this is optional. We can also open the floor for the

can also open the floor for the attendees to engage with the speakers if

attendees to engage with the speakers if they want to ask any questions.

they want to ask any questions. Okay. All right. So um automotive um

Okay. All right. So um automotive um so what particular

so what particular uh um edge gen AI capabilities is

uh um edge gen AI capabilities is required for to address these u um

required for to address these u um opportunities as opposed to for example

opportunities as opposed to for example um cloud

um cloud AI it's probably the usual ones that

AI it's probably the usual ones that already already uh happening but uh

already already uh happening but uh anything in particular that f the future

anything in particular that f the future opportunity ities will

anyone. All right. So, I'll put my two cents worth in. I I I guess uh it's the

cents worth in. I I I guess uh it's the usual ones, you know, local

usual ones, you know, local uh real time response and uh delays and

uh real time response and uh delays and and things like

and things like that. Okay. Um privacy. Yep. Is a big

that. Okay. Um privacy. Yep. Is a big one. Security. Yeah, I would I would

one. Security. Yeah, I would I would think robotics would be a important

think robotics would be a important application as well. I don't know if

application as well. I don't know if from the industry you're seeing

from the industry you're seeing uh uh movements in that direction. Okay.

uh uh movements in that direction. Okay. Yeah, we definitely are. We have some of

Yeah, we definitely are. We have some of our we have a range of customers and

our we have a range of customers and some of them have what we'll call most

some of them have what we'll call most of them are application specific you

of them are application specific you know a focused area but some of them do

know a focused area but some of them do general platforms in a certain

general platforms in a certain performance range and those that do

performance range and those that do these general platforms a lot of them

these general platforms a lot of them are doing robotics as one of the kind of

are doing robotics as one of the kind of more opportunistic so I'm seeing a lot

more opportunistic so I'm seeing a lot of interest not not the level of cars as

of interest not not the level of cars as I just mentioned 60% but probably 10% of

I just mentioned 60% but probably 10% of our customers are doing interesting

our customers are doing interesting driving disruptive robotic applications

driving disruptive robotic applications um more in Asia than other places but

um more in Asia than other places but not only. So how much of that uh okay

not only. So how much of that uh okay robotics I guess is it's a big spectrum

robotics I guess is it's a big spectrum of capabilities but including smart

of capabilities but including smart manufacturing it's not just robotics to

manufacturing it's not just robotics to play with their kids but you know to

play with their kids but you know to manufacture mass manufacturing smart uh

manufacture mass manufacturing smart uh smart robots. So how much is uh Agentic

smart robots. So how much is uh Agentic AI going to drive all of this

AI going to drive all of this uh in terms of uh silicon and and um in

uh in terms of uh silicon and and um in terms of delivering edge AI? Are we

terms of delivering edge AI? Are we going to see a JTek uh AI at the edge or

going to see a JTek uh AI at the edge or are we just talking about mainly in the

are we just talking about mainly in the cloud?

Personally I see it in the edge uh for a specialized domain. So not you know not

specialized domain. So not you know not the full chat GPT or LMA but a you know

the full chat GPT or LMA but a you know domain specific smart uh agentic uh AI

domain specific smart uh agentic uh AI that knows about I don't know

that knows about I don't know manufacturing you're on the you're on

manufacturing you're on the you're on the floor there's an issue it's very

the floor there's an issue it's very good at what it does but and as good as

good at what it does but and as good as most humans but only in that specific

most humans but only in that specific space in which case you reduce the you

space in which case you reduce the you know the training and uh knowledge space

know the training and uh knowledge space by couple of orders of magnitude I can

by couple of orders of magnitude I can see that happening in the uh in the

see that happening in the uh in the embeded space, right? Okay, sounds good.

embeded space, right? Okay, sounds good. All right. So, let's uh switch over to

All right. So, let's uh switch over to challenges. Um given you know the suite

challenges. Um given you know the suite of

of uh um opportunities that we have out

uh um opportunities that we have out there uh automotive I guess robots and

there uh automotive I guess robots and uh probably other ones that we haven't

uh probably other ones that we haven't touched on that just people don't uh

touched on that just people don't uh want to mention like what are some of

want to mention like what are some of the challenges in in terms of delivering

the challenges in in terms of delivering solutions to those opportunities? you

solutions to those opportunities? you know the challenges could be technical

know the challenges could be technical or they could be

or they could be uh maybe business and investment related

uh maybe business and investment related or or talent or regulatory. What what

or or talent or regulatory. What what what do you think going forward we'll

what do you think going forward we'll see uh as the main challenges in

see uh as the main challenges in delivering some of those opportunities?

I can I can start on so just pull back a second and think about the general

second and think about the general purpose computing that we've been living

purpose computing that we've been living for the last 50 years and some of you

for the last 50 years and some of you less some of you more but we've been you

less some of you more but we've been you know we invented APL and then forran and

know we invented APL and then forran and pro you know then eventually we got to C

pro you know then eventually we got to C and then we went to objectoriented and

and then we went to objectoriented and list and then eventually we all went

list and then eventually we all went back to C right in the end we never

back to C right in the end we never found a general purpose programming that

found a general purpose programming that was not C which is was a shock and a

was not C which is was a shock and a scandal. We invented 380 different

scandal. We invented 380 different programming languages. In the end,

programming languages. In the end, there's one or two that are used today.

there's one or two that are used today. Um, amazingly in AI because you have a

Um, amazingly in AI because you have a more well-defined compute programming

more well-defined compute programming model uh which is you know neural

model uh which is you know neural networks including you know the latest

networks including you know the latest transformers they still conform to the

transformers they still conform to the basics of that pytorch or you know

basics of that pytorch or you know tensorflow light tensorflow programming

tensorflow light tensorflow programming model. We have a essentially one

model. We have a essentially one programming model with couple of

programming model with couple of variants that are very close and not

variants that are very close and not only are they general purpose but they

only are they general purpose but they actually be used to program

actually be used to program supercomputers not to program one risk

supercomputers not to program one risk core. You know, we've spent

core. You know, we've spent 800 million years programming a couple

800 million years programming a couple of risks. They're increasingly complex

of risks. They're increasingly complex with, you know, look ahead and

with, you know, look ahead and translation look at buffers and it's a

translation look at buffers and it's a mess, right? Suddenly there's this very

mess, right? Suddenly there's this very elegant highle data flow like matrix

elegant highle data flow like matrix multiplication with some activations.

multiplication with some activations. It's actually reasonably simple. So

It's actually reasonably simple. So that's an amazing opportunity is that we

that's an amazing opportunity is that we actually have a high level programming

actually have a high level programming model that can program that is rich

model that can program that is rich enough and general enough and enough to

enough and general enough and enough to program supercomputers that are AI

program supercomputers that are AI based. Now the challenge of behind that

based. Now the challenge of behind that is that these compilers are really hard.

is that these compilers are really hard. So that's a great opportunity and if you

So that's a great opportunity and if you invest in it heavily you can you know

invest in it heavily you can you know we've put hundreds of manurs to get a

we've put hundreds of manurs to get a good AI compiler for one architecture

good AI compiler for one architecture ours. Um so I think the the the

ours. Um so I think the the the opportunity and the challenge is how do

opportunity and the challenge is how do we create from this great opportunity a

we create from this great opportunity a single language single programming

single language single programming model. um how do we create a compiler

model. um how do we create a compiler that can actually address GPUs on one

that can actually address GPUs on one end, uh AS6 on the complete other end,

end, uh AS6 on the complete other end, FPGA somewhere in the middle and then

FPGA somewhere in the middle and then NPUs with a pretty wide and diverse

NPUs with a pretty wide and diverse range of NPUs that exist on the market

range of NPUs that exist on the market today all innovating in different ways.

today all innovating in different ways. And then of course then it brings to the

And then of course then it brings to the you know analog and the optical

you know analog and the optical computing and the um in inmemory compute

computing and the um in inmemory compute all these clever architectures but

all these clever architectures but without this compiler they're unusable

without this compiler they're unusable right because the complexity is is high.

right because the complexity is is high. So that's what I want to bring up is for

So that's what I want to bring up is for me the the challenge of the AI space

me the the challenge of the AI space with all these innovative architectures

with all these innovative architectures is how do we leverage the opportunity of

is how do we leverage the opportunity of a single programming model with

a single programming model with something general enough to uh to be

something general enough to uh to be used across a wide range of

used across a wide range of applications. So, are you talking about

applications. So, are you talking about compilers going from whatever this

compilers going from whatever this language is down to like a a Mac level

language is down to like a a Mac level architecture all the way down to

architecture all the way down to silicon?

silicon? Not down to not silicon compilers

Not down to not silicon compilers necessarily. I mean, FPGA, you might

necessarily. I mean, FPGA, you might argue you you can get there, but um

argue you you can get there, but um let's just stay with a well- definfined

let's just stay with a well- definfined architecture you need to compile to. So,

architecture you need to compile to. So, it's it's generating, you know, object

it's it's generating, you know, object code or configuration code for a

code or configuration code for a flexible architecture. Yeah. Okay. So,

flexible architecture. Yeah. Okay. So, Warren, you put on your camera. I I I

Warren, you put on your camera. I I I suspect you have a question or a

suspect you have a question or a comment. Sure. So I would give you my

comment. Sure. So I would give you my perspective of one of the challenges

perspective of one of the challenges going forward and um it it has to do

going forward and um it it has to do with what I was talking about a little

with what I was talking about a little bit um in in data sets and so I think as

bit um in in data sets and so I think as AI is evolving to more complex models

AI is evolving to more complex models more complex

more complex scenarios multi-step reasoning agentic

scenarios multi-step reasoning agentic AI where um the AI will be interacting

AI where um the AI will be interacting with the

with the environment

environment now collecting enough quality data to be

now collecting enough quality data to be able to train the models uh to act in

able to train the models uh to act in these complex manners will become very

these complex manners will become very difficult. So we're already seeing um a

difficult. So we're already seeing um a movement to synthetic data to simulation

movement to synthetic data to simulation environments to be able to get enough

environments to be able to get enough training examples for these AIS and I

training examples for these AIS and I think there's a lot of challenges there

think there's a lot of challenges there um in ensuring that we able to to train

um in ensuring that we able to to train on quality data understanding

on quality data understanding um how much real or human intervention

um how much real or human intervention will be required um and this become

will be required um and this become especially difficult in unstructured

especially difficult in unstructured environments where you would u have

environments where you would u have robots operating. So I think this is

robots operating. So I think this is going to be a big challenge uh going

going to be a big challenge uh going forward. Okay. All right. Is there

forward. Okay. All right. Is there anyone in the audience that has uh

anyone in the audience that has uh thoughts about the challenges uh going

thoughts about the challenges uh going forward in in in delivering on some of

forward in in in delivering on some of the opportunities?

I can just make a a high level statement especially in the Canadian context.

especially in the Canadian context. Right. So I think AI inference is a is a

Right. So I think AI inference is a is a very important uh uh element of uh AI

very important uh uh element of uh AI adoption in Canada especially at the

adoption in Canada especially at the edge. You have companies like entered

edge. You have companies like entered and storage designing uh AI chips that

and storage designing uh AI chips that are fine-tuned for uh AI workload at the

are fine-tuned for uh AI workload at the edge uh maybe not in cars but at the the

edge uh maybe not in cars but at the the near edge. I think connecting with the

near edge. I think connecting with the end uh uh user is very important

end uh uh user is very important supporting this architecture with

supporting this architecture with applications that are running out of the

applications that are running out of the box instead of instead of saying look I

box instead of instead of saying look I have a cool uh AI inference card use it

have a cool uh AI inference card use it I don't think this is a good selling

I don't think this is a good selling point uh this is why Nvidia is uh is

point uh this is why Nvidia is uh is having tremendous success because they

having tremendous success because they are supporting various uh sectors at the

are supporting various uh sectors at the end user level so they have libraries

end user level so they have libraries they have applications, they have

they have applications, they have reference design, people can just take

reference design, people can just take them and use them in their application.

them and use them in their application. So I think uh building uh a resilience

So I think uh building uh a resilience ecosystem and and connecting both of

ecosystem and and connecting both of both side of the supply chain you have

both side of the supply chain you have these uh producers of technologies uh

these uh producers of technologies uh like uh Nvidia and storage and entered

like uh Nvidia and storage and entered and you have the end users uh across all

and you have the end users uh across all verticals whether is smart health,

verticals whether is smart health, manufacturing, agriculture and you need

manufacturing, agriculture and you need a ready to go solution so people can

a ready to go solution so people can adopt it and use it. This is my high

adopt it and use it. This is my high level two cents uh for this conversation

level two cents uh for this conversation and

and uh it's it's a problem um and this is uh

uh it's it's a problem um and this is uh something that's defined success I guess

something that's defined success I guess for these companies especially in

for these companies especially in Canada. So how much of a um thanks for

Canada. So how much of a um thanks for that how much of a uh issue is uh just

that how much of a uh issue is uh just getting the talent. So far we we touched

getting the talent. So far we we touched on technical problems, right? Uh or

on technical problems, right? Uh or technical desires. Uh how much of a an

technical desires. Uh how much of a an issue or a challenge is getting the

issue or a challenge is getting the right people, the talent that you need

right people, the talent that you need to actually create solutions?

to actually create solutions? So in Canada we have we have these three

So in Canada we have we have these three uh very uh important institutes. Mila,

uh very uh important institutes. Mila, Vector Institute, Ames, they they focus

Vector Institute, Ames, they they focus on uh algorithmic innovation. We have

on uh algorithmic innovation. We have great universities like poly techchnique

great universities like poly techchnique uh like University of Toronto, Cyber

uh like University of Toronto, Cyber Fraser University. They have great

Fraser University. They have great research teams uh building innovative

research teams uh building innovative algorithms. I think there is a gap

algorithms. I think there is a gap between the algorithm development and

between the algorithm development and the hardware development, right? And uh

the hardware development, right? And uh I think we should see some initiatives

I think we should see some initiatives uh in Canada that bridge that gap uh to

uh in Canada that bridge that gap uh to enable a vibrant ecosystem where you

enable a vibrant ecosystem where you have various uh uh experts in various

have various uh uh experts in various area collaborating with each other. This

area collaborating with each other. This is my uh my main feedback here. A

is my uh my main feedback here. A compiler should solve all that, right?

compiler should solve all that, right? I I completely agree with you scene on

I I completely agree with you scene on the bridge between the application space

the bridge between the application space and I would qualify not necessarily to

and I would qualify not necessarily to create new hardware although that you

create new hardware although that you know that's an important one but

know that's an important one but creating new hardware and then the tools

creating new hardware and then the tools to program that hardware is a nightmare

to program that hardware is a nightmare and it's expensive and not that many

and it's expensive and not that many companies can do it and I'm fortunate

companies can do it and I'm fortunate I'm part of a company we can afford it

I'm part of a company we can afford it but there there are a handful that can

but there there are a handful that can do this but there is another I mean I'm

do this but there is another I mean I'm willing to bet that a company that would

willing to bet that a company that would take an innovative application and map

take an innovative application and map it onto our architecture efficiently in

it onto our architecture efficiently in a way that gets that 100x power

a way that gets that 100x power advantage uh 10x uh cost advantage and

advantage uh 10x uh cost advantage and leverages that really efficiently to use

leverages that really efficiently to use ours well would make much more money

ours well would make much more money than we've made just because they would

than we've made just because they would put six people on this we put you know

put six people on this we put you know 10 times more for 10 years they could do

10 times more for 10 years they could do with with a half dozen of people over

with with a half dozen of people over one year and probably produce more value

one year and probably produce more value for the C paying customer that they be

for the C paying customer that they be willing to pay a lot of money for this

willing to pay a lot of money for this because the customer in the end doesn't

because the customer in the end doesn't really care about the compiler and the

really care about the compiler and the architecture and how many Macs you have.

architecture and how many Macs you have. They care about a solution to market

They care about a solution to market quickly that is efficient. So we have

quickly that is efficient. So we have some of our um consuming high-end consu

some of our um consuming high-end consu uh leading consumer companies like Moore

uh leading consumer companies like Moore um that have partners who do exactly

um that have partners who do exactly that. They just say listen we we're

that. They just say listen we we're using the synopsis solution. we we like

using the synopsis solution. we we like it, but we're using standard graphs and

it, but we're using standard graphs and we compile them and sometimes it's

we compile them and sometimes it's great, sometimes it's medium, sometimes

great, sometimes it's medium, sometimes it's not great. Um, could you just write

it's not great. Um, could you just write your models in a way that leverages this

your models in a way that leverages this product, not Nvidia GPU? Nothing wrong

product, not Nvidia GPU? Nothing wrong with doing it for Nvidia GPU, but that

with doing it for Nvidia GPU, but that that's their starting point. And then

that's their starting point. And then there's a four or five NPUs out there.

there's a four or five NPUs out there. They happen to be using ours. They

They happen to be using ours. They wanted to run well on ours. I'm sure

wanted to run well on ours. I'm sure they could make a lot more money than

they could make a lot more money than we're making. And I shouldn't say this

we're making. And I shouldn't say this because my boss might be listening. But

because my boss might be listening. But so very very interesting. I'm just

so very very interesting. I'm just curious

curious um where does Gen

um where does Gen AI act what role does Gen AI actually

AI act what role does Gen AI actually play in the design process itself? Gen

play in the design process itself? Gen Gen AI generating Gen AI solutions,

Gen AI generating Gen AI solutions, right? Are where are we with all of

right? Are where are we with all of that? I'm sure that'll that's that's

that? I'm sure that'll that's that's something that that's like a holy grail

something that that's like a holy grail probably uh going forward. Mhm. That's a

probably uh going forward. Mhm. That's a 2001 a space odyssey. Oh, you uh Dave, I

2001 a space odyssey. Oh, you uh Dave, I have a problem. I cannot design your

have a problem. I cannot design your chip because that would put me out of a

chip because that would put me out of a job.

job. We use AI a ton that synopsis. There's a

We use AI a ton that synopsis. There's a huge use of of AI for design clever

huge use of of AI for design clever ways. So, I'm not part of that

ways. So, I'm not part of that organization. So, I can just say from

organization. So, I can just say from the outside is a major part of our

the outside is a major part of our investment. Synopsis was one both in the

investment. Synopsis was one both in the embedded AI and inference but also in

embedded AI and inference but also in the you know use of AI for EDA was

the you know use of AI for EDA was another leader and and started early. So

another leader and and started early. So I definitely can see they're using this

I definitely can see they're using this for, you know, power optimization,

for, you know, power optimization, physical design, placement, routing.

physical design, placement, routing. There's a ton of opportunities that this

There's a ton of opportunities that this works well, right? So we're kind of get

works well, right? So we're kind of get uh Biosmos is getting into the solutions

uh Biosmos is getting into the solutions area question, right? That's which is

area question, right? That's which is okay. Uh so I I think what in the end I

okay. Uh so I I think what in the end I should just be able to verbally say I

should just be able to verbally say I want a system that does this and this

want a system that does this and this and this uh in terms of AI capability

and this uh in terms of AI capability and push a

and push a button and and there it is right uh

button and and there it is right uh we'll see when that comes um yeah I

we'll see when that comes um yeah I don't see AI genai doing like RTL design

don't see AI genai doing like RTL design anytime soon you know there's there's

anytime soon you know there's there's some aspects where you you need to have

some aspects where you you need to have some

some structure in in the solution that maybe

structure in in the solution that maybe is is more difficult to get with the

is is more difficult to get with the with AI. Well, okay, let me challenge

with AI. Well, okay, let me challenge that. Uh you're you might be absolutely

that. Uh you're you might be absolutely right and I'm sure you are. But okay,

right and I'm sure you are. But okay, you know Zuckerberg is over there saying

you know Zuckerberg is over there saying in the next whatever number of years uh

in the next whatever number of years uh 50% of software will be written by AI

50% of software will be written by AI and I mean you look at software it is

and I mean you look at software it is structured there is a structure uh you

structured there is a structure uh you know there's certain syntax certain

know there's certain syntax certain things that that you have to follow

things that that you have to follow right and RTL is like that so if you

right and RTL is like that so if you take Zuckerberg's word for it then why

take Zuckerberg's word for it then why would couldn't you apply that to RTL Oh

would couldn't you apply that to RTL Oh yeah, fair fair enough. Yeah, I suppose

yeah, fair fair enough. Yeah, I suppose if you have I mean I was I was shocked

if you have I mean I was I was shocked in one year the what they call AGI

in one year the what they call AGI benchmarks artificial general

benchmarks artificial general intelligence. They compared you know the

intelligence. They compared you know the results of programming but this is

results of programming but this is software programming. um and they came

software programming. um and they came to the point where they were kind of

to the point where they were kind of grade six students and then they were

grade six students and then they were the CTO of leading um software companies

the CTO of leading um software companies were being challenged by the results in

were being challenged by the results in one year uh of of just training these

one year uh of of just training these models for programming. So this 50% of

models for programming. So this 50% of programming will be done by AI.

programming will be done by AI. Absolutely. Um but then for me the RTL

Absolutely. Um but then for me the RTL is uh the people as as smart but no

is uh the people as as smart but no smart no reason. Um and the problem is

smart no reason. Um and the problem is as hard and they're probably no harder

as hard and they're probably no harder and it's different. I'm sure there's a a

and it's different. I'm sure there's a a but there's a smaller market so I'm sure

but there's a smaller market so I'm sure no one's worked as hard in you know

no one's worked as hard in you know automating RTL. Yeah. But for software

automating RTL. Yeah. But for software there's a much bigger market. So people

there's a much bigger market. So people have worked on that and I see no reason

have worked on that and I see no reason why you know companies would not say

why you know companies would not say okay we've got enough tens of thousands

okay we've got enough tens of thousands of RTL programmer RTL designers to

of RTL programmer RTL designers to automate that better and it should not

automate that better and it should not be seen as a challenge to anyone. It's

be seen as a challenge to anyone. It's the opposite is the smart people will

the opposite is the smart people will use the easy tools and do the really

use the easy tools and do the really hard stuff which were you know we will

hard stuff which were you know we will not do with AI

not do with AI creative out of the box. So I'm very I'm

creative out of the box. So I'm very I'm very interested in that topic of RTL. Um

very interested in that topic of RTL. Um maybe having various agents and each

maybe having various agents and each agent is working at different level. You

agent is working at different level. You have an agent that is high level RTL

have an agent that is high level RTL trained to uh come up with the best

trained to uh come up with the best architecture possible and you have

architecture possible and you have various agents optimizing those blocks

various agents optimizing those blocks depending on what type of acceleration

depending on what type of acceleration you need and you have a high level agent

you need and you have a high level agent who is the architect that's bring all

who is the architect that's bring all these things together. So I don't know

these things together. So I don't know this will take place but uh I think

this will take place but uh I think there's a trend of using various agents

there's a trend of using various agents train it for various purposes and link

train it for various purposes and link them together in a in a flow and this

them together in a in a flow and this will have impacts especially in the CAD

will have impacts especially in the CAD industry. Okay. So, okay to be to be

industry. Okay. So, okay to be to be fair and you know to uh I think to

fair and you know to uh I think to Franis's uh point uh I mean there are

Franis's uh point uh I mean there are similar there are structures but in in

similar there are structures but in in the end RTL translates to hardware right

the end RTL translates to hardware right and ultimately in my experience back in

and ultimately in my experience back in Nordell days they had some great tools

Nordell days they had some great tools RTL and and so and other compilers and

RTL and and so and other compilers and that was great but in the final thrust

that was great but in the final thrust of getting the chip out you always had

of getting the chip out you always had to tweak in there and do some custom uh

to tweak in there and do some custom uh logic um uh changes etc. So there is a

logic um uh changes etc. So there is a difference I recognize and and the

difference I recognize and and the impact is different because software

impact is different because software well it's just more memory or more

well it's just more memory or more cycles

cycles uh in hardware you're locked in right

uh in hardware you're locked in right you're locked in in chip size and you're

you're locked in in chip size and you're locked in in cost so so it does require

locked in in cost so so it does require extra diligence in in terms of uh the

extra diligence in in terms of uh the compiler

compiler output. Okay. Um,

output. Okay. Um, so are there any

so are there any other ways that we can innovations

other ways that we can innovations coming down the down the road other than

coming down the down the road other than you know compilers and um

you know compilers and um um you know getting more talent on and

um you know getting more talent on and so on that that will help us bring

so on that that will help us bring address some of the uh future

address some of the uh future opportunities like in AI in uh

opportunities like in AI in uh automotive or health so on what's what's

automotive or health so on what's what's down the So in terms of silicon or other

down the So in terms of silicon or other uh uh technologies,

another big point that does is being mentioned a lot is of course

mentioned a lot is of course trustworthiness uh which is mostly uh

trustworthiness uh which is mostly uh being addressed at the software at the

being addressed at the software at the model level. But I think in in hardware

model level. But I think in in hardware we have a a role to play uh about that

we have a a role to play uh about that right in in making sure that the

right in in making sure that the hardware can

hardware can uh work in um in a harsh environments

uh work in um in a harsh environments like uh professor Chen was was was

like uh professor Chen was was was talking about uh can be extremely

talking about uh can be extremely reliable in in for mission critical

reliable in in for mission critical applications

applications uh and so on. So I think there's it's

uh and so on. So I think there's it's it's it's more a more minor point but

it's it's more a more minor point but still uh quite important in in

still uh quite important in in developing the the hardware properly I

developing the the hardware properly I believe. Yeah. So there's that there's

believe. Yeah. So there's that there's kind of the

kind of the environmental protection to

environmental protection to environmental uh uh faults if you will

environmental uh uh faults if you will but then there's the uh deliberate uh

but then there's the uh deliberate uh hacking right of uh yes that's of the

hacking right of uh yes that's of the algorithm of of the structures of the

algorithm of of the structures of the memory of the content right so I think

memory of the content right so I think that's going to be a must especially

that's going to be a must especially once you start implementing using

once you start implementing using agentic AIs that make their own

agentic AIs that make their own decisions and make their own Right?

decisions and make their own Right? You're not just asking a question and

You're not just asking a question and then do some doing something that

then do some doing something that they're doing the uh the AI is doing the

they're doing the uh the AI is doing the uh the actual uh decision making and the

uh the actual uh decision making and the actual uh actuating if you will right

actual uh actuating if you will right yeah we can recall like the I guess the

yeah we can recall like the I guess the original uh that's quite dated now but

original uh that's quite dated now but the original example for adversarial uh

the original example for adversarial uh say image recognition where you change a

say image recognition where you change a few pixels in a stop sign and then

few pixels in a stop sign and then suddenly like it doesn't get recognized

suddenly like it doesn't get recognized at all or things like that. So

at all or things like that. So definitely that that needs to be

definitely that that needs to be addressed, right? Yeah. Okay. Yeah. And

addressed, right? Yeah. Okay. Yeah. And that's a real concern, you know, talking

that's a real concern, you know, talking to leading the highquality automotive,

to leading the highquality automotive, they they don't want this to happen. So

they they don't want this to happen. So they always have redundancy not on just

they always have redundancy not on just two AIs comparing each other. They have

two AIs comparing each other. They have an AI and then they have an algorithmic

an AI and then they have an algorithmic approach which is very dumb, but will

approach which is very dumb, but will detect that it's a stop sign and not a

detect that it's a stop sign and not a corrupted uh you know adversarial

corrupted uh you know adversarial network that's saying it's not a stop

network that's saying it's not a stop sign. So they have this uh you know

sign. So they have this uh you know innovation around uh Yeah. Um

innovation around uh Yeah. Um like a backseat driver. Yeah. Your

like a backseat driver. Yeah. Your mother-in-law in the back seat saying

mother-in-law in the back seat saying you're missing the stop sign. All right.

you're missing the stop sign. All right. Uh so I think that's it for me. If Naj

Uh so I think that's it for me. If Naj was here, I'd ask him a little more

was here, I'd ask him a little more about the uh analog part. Um

about the uh analog part. Um I did want to say something about analog

I did want to say something about analog just because this is one of the both

just because this is one of the both opportunities, innovations and

opportunities, innovations and challenges and it's a frustrating one

challenges and it's a frustrating one because we have a lot of potential

because we have a lot of potential customers that are startups that come to

customers that are startups that come to us and say we have this amazing analog

us and say we have this amazing analog compute you know we can do max and CNN's

compute you know we can do max and CNN's at 100th the power and it looks great

at 100th the power and it looks great and it is um we have companies that do

and it is um we have companies that do opto electronics and Okay, we can it's

opto electronics and Okay, we can it's all about data movement. So we can do

all about data movement. So we can do this through integrated fiber optics get

this through integrated fiber optics get it connected through whatever and then

it connected through whatever and then we have companies that say well it's all

we have companies that say well it's all about memory right and this is the

about memory right and this is the elephant right everyone sees it from

elephant right everyone sees it from their angle it's all about memory so why

their angle it's all about memory so why since it's all about data movement in

since it's all about data movement in memory well let's do compute in memory

memory well let's do compute in memory rather than you know memory around

rather than you know memory around compute and all these ideas are great

compute and all these ideas are great and and from in their niche they do

and and from in their niche they do something amazing which is 100 times

something amazing which is 100 times better than anything else or at least 10

better than anything else or at least 10 times better and then we we work

times better and then we we work together try to they say yes but the

together try to they say yes but the problem is is that you know what we get

problem is is that you know what we get from PyTorch is so much stuff you know

from PyTorch is so much stuff you know actually there's only 10% of it that

actually there's only 10% of it that maps naturally and easily um so what

maps naturally and easily um so what about the 90% can we couple ours with

about the 90% can we couple ours with yours and you know do deep heart surgery

yours and you know do deep heart surgery and plug in our m our analog max or take

and plug in our m our analog max or take out your knock and put in an optical one

out your knock and put in an optical one take out your memories and put optical

take out your memories and put optical and but we try this and we open-mindedly

and but we try this and we open-mindedly explore this but there's always these

explore this but there's always these bottlenecks that appear. You remove one

bottlenecks that appear. You remove one problem, you create another one and the

problem, you create another one and the other one is sometimes worse, sometimes

other one is sometimes worse, sometimes it's not as bad, but then you have to

it's not as bad, but then you have to build a new compiler, you have to build

build a new compiler, you have to build all kinds of tools, and then how

all kinds of tools, and then how flexible will it be in the future? And

flexible will it be in the future? And for me, the root cause of this is that

for me, the root cause of this is that people build there might be some amazing

people build there might be some amazing innovation by combining these things,

innovation by combining these things, right? But because people build their

right? But because people build their neural networks on a GPU, Nvidia, AMD,

neural networks on a GPU, Nvidia, AMD, doesn't matter which. And they have all

doesn't matter which. And they have all that generality and they don't care

that generality and they don't care about efficiency. They don't care about

about efficiency. They don't care about how they find a really cool new

how they find a really cool new XGU++ activation function, which is wow,

XGU++ activation function, which is wow, that gives me 2% better accuracy. And

that gives me 2% better accuracy. And then they go off and do some kind of

then they go off and do some kind of softmax variant. And then they go off

softmax variant. And then they go off and do an non-maxim suppression with a

and do an non-maxim suppression with a whatever. And so you because they have

whatever. And so you because they have access to the most general purpose

access to the most general purpose processor in the world even though it's

processor in the world even though it's hard to program but it is general

hard to program but it is general purpose. Um you don't have any

purpose. Um you don't have any homogeneity in the models and no synergy

homogeneity in the models and no synergy of the model with the architecture. And

of the model with the architecture. And for me there's got to be a sweet spot

for me there's got to be a sweet spot where we have a clean model which is

where we have a clean model which is restricted to what these optical in in

restricted to what these optical in in compute memory and uh in in memory

compute memory and uh in in memory compute and um analog you know can can

compute and um analog you know can can leverage but for now it's a chicken and

leverage but for now it's a chicken and egg till until you can break through

egg till until you can break through with these people won't use it and if

with these people won't use it and if they don't use it they're using a GPU

they don't use it they're using a GPU and therefore they're programming all

and therefore they're programming all kinds of weird stuff which sometimes is

kinds of weird stuff which sometimes is needed sometimes is because it's it's

needed sometimes is because it's it's there. Yeah, I guess um you know uh

there. Yeah, I guess um you know uh design velocity is a real uh need,

design velocity is a real uh need, right? Uh so the the more you

right? Uh so the the more you standardize the the the greater you can

standardize the the the greater you can have the velocity. However, I must say

have the velocity. However, I must say my own opinion is uh that the analog is

my own opinion is uh that the analog is sufficiently distinct from just you know

sufficiently distinct from just you know tweaks of different digital uh things

tweaks of different digital uh things that uh and if it saves you 10x power or

that uh and if it saves you 10x power or whatever and you want to put that on a

whatever and you want to put that on a watch or or some some device, you know,

watch or or some some device, you know, in IoT, it definitely uh makes sense.

in IoT, it definitely uh makes sense. But anyway, that's that's just me. It's

But anyway, that's that's just me. It's very specialized. I think you're

very specialized. I think you're absolutely right. Our use cases with

absolutely right. Our use cases with these startups is it's never specialized

these startups is it's never specialized enough. It always requires to plug in

enough. It always requires to plug in our general, you know, generic tech

our general, you know, generic tech sensor accelerator which is 10 times the

sensor accelerator which is 10 times the area and power of that unit. So yeah,

area and power of that unit. So yeah, you've saved 50%, you've brought down to

you've saved 50%, you've brought down to 5%. But you still have 50% on the other

5%. But you still have 50% on the other side which has not changed and then

side which has not changed and then you've introduced new bottlenecks and

you've introduced new bottlenecks and you have no tools. So in the end it

you have no tools. So in the end it doesn't give you it's you know it's law

doesn't give you it's you know it's law in in a different form but it's uh these

in in a different form but it's uh these have everything gets a 10x and that's

have everything gets a 10x and that's the problem because of all the

the problem because of all the complexity of the weird stuff. You can't

complexity of the weird stuff. You can't do the weird stuff in analog and so

do the weird stuff in analog and so you're stuck with auto and and

you're stuck with auto and and unfortunately Macs are only 30% of the

unfortunately Macs are only 30% of the overall power and area and complexity.

overall power and area and complexity. No, it's not none of 2% of complexity,

No, it's not none of 2% of complexity, but you know, still 30% of power and

but you know, still 30% of power and area. If you have 70% that can't be

area. If you have 70% that can't be analog, then you're still stuck. Yeah,

analog, then you're still stuck. Yeah, exactly. So, you you you can build want

exactly. So, you you you can build want to build a different house, some grandio

to build a different house, some grandio house, but if you can't use a hammer to

house, but if you can't use a hammer to do it, then you're stuck, right? Okay.

do it, then you're stuck, right? Okay. Sorry, I'm I'm turning it over to you,

Sorry, I'm I'm turning it over to you, Yim. And uh uh I would like to thank you

Yim. And uh uh I would like to thank you Walter personally for accepting to

Walter personally for accepting to moderate this panel session. And it's

moderate this panel session. And it's time to conclude the workshop. Uh we are

time to conclude the workshop. Uh we are 8 minutes over the time but it's fine.

8 minutes over the time but it's fine. It was a

It was a very great

very great great uh discussion. Um so thank you to

great uh discussion. Um so thank you to all our speakers and panelists and

all our speakers and panelists and attendees for making today's workshop uh

attendees for making today's workshop uh such a success. uh this uh the insight

such a success. uh this uh the insight shared uh highlights just how exciting

shared uh highlights just how exciting and critical uh the future of AJI truly

and critical uh the future of AJI truly is. Um so stay inspired, stay connected

is. Um so stay inspired, stay connected and we'll see you at the next one. Thank

and we'll see you at the next one. Thank you very much. Thanks for organizing

you very much. Thanks for organizing this. Very well done. Thank you. Thank

this. Very well done. Thank you. Thank you. Bye-bye. Thank you. Bye-bye.

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Accelerating AI Workshop 2025 – Challenges and Opportunities in Cloud and Edge Computing