0:01 All right, let me set the context for
0:02 the video. In this video, we have
0:05 Rishab. Rishab is an AI engineer at a
0:06 US-based startup working out of India
0:09 remotely. He's around 20 years old. I
0:10 think he started doing research around
0:12 1.5 years ago and he has sprinted
0:14 through this journey of AI research
0:15 probably because he's really good at
0:17 maths since his Jed days. Um, he's one
0:18 of the few people I've known in the
0:20 country who actually do core AI and
0:22 machine learning research. Um, so I
0:23 thought it makes sense to bring him on
0:25 the channel to introduce to you guys how
0:27 can you follow a similar path and learn
0:29 about core machine learning and AI
0:30 research. He's divided the video into
0:32 two parts applied AI and AI research.
0:34 You can pick which one makes more sense
0:36 for you. Generally though, unless you're
0:38 decently good at math, I would not go
0:40 down the AI research path. If you are
0:41 good at full stack development, want to
0:44 get go into machine learning or AI as an
0:45 applied engineer, then the second path
0:47 is for you. With that, let's get into
0:48 the video.
0:50 >> Hey everyone and welcome to the AI road
0:52 map video. We have seen a lot of these
0:55 comments under our videos. People asking
0:57 for different AI road map. So here is
0:59 the one. A very big disclaimer before I
1:01 start the video is that this is the path
1:03 which worked for me might not work for
1:05 you. But the things I cover here are
1:07 very general things which everyone
1:09 should know if you're working in AI.
1:10 I'll be dividing this into two parts.
1:13 Core AI and applied AI. Core AI is where
1:15 you build the models, use different
1:16 architectures to make them better,
1:18 different optimization techniques,
1:20 different inference optimizations. While
1:23 applied AI is more of building on top of
1:26 these LLM APIs like chat GPD, cloud
1:28 gemini. New stuff which came recently or
1:30 in the past couple of months is AI
1:33 agents and MCPS. These also comes under
1:35 the category of applied AI. Before
1:37 diving deep into the road map, I think
1:38 we should first discuss the problems
1:40 which you might be facing right now. It
1:42 is very common for you to fell in the
1:45 pitfall and never actually start to
1:47 learn AI. So there's actually new tools
1:49 dropping every couple of weeks. It's
1:51 very common for people to get confused
1:52 of what to learn, when to learn, and
1:54 where to learn. Everyone says something
1:56 different. I mean, people have their
1:58 opinions, and I'll be sharing mine.
2:00 Since I've worked on both side of the
2:02 industries, research and applied AI,
2:04 I'll be able to provide you a very wide
2:06 perspective of what you guys should
2:08 follow for landing a good job. I think
2:10 the first and foremost thing you guys
2:11 should definitely learn is Python
2:13 fundamentals which consist of
2:16 conditional loops classes and objects
2:19 and common libraries numpy pandas pytor
2:22 I think this code snippet will give you
2:24 a very good idea if you are able to
2:26 understand this code snippet what it is
2:28 doing can you modify it can you add new
2:30 things to it I think you're good to go
2:32 with Python this is a very fundamental
2:33 because everything will be built on top
2:36 of Python since it is the widely used
2:38 language in the AI industry Moving on to
2:40 the next thing is classical ML. I'll not
2:42 ask you to dive very deep into this
2:44 since this might take a long time and if
2:46 you are very short on time you should
2:49 basically just get the cracks of what it
2:51 is and how it is done. No need to dive
2:54 deep into implementation for now but it
2:55 will be better if you do it in future.
2:57 It consists of your regression,
2:58 classification, different classification
3:00 techniques, different clustering
3:02 techniques while also giving you
3:04 different ideas of evaluations like
3:07 accuracy, precision, recall, F1 score,
3:09 etc. It will give you very common ideas
3:10 of machine learning like cross
3:13 validation set, train and test split,
3:15 how do we optimize things, etc. This
3:17 will build a very good base for you if
3:19 you're going to get very deeper into
3:21 here. And my suggestion is that if you
3:23 have good enough time, you should
3:25 definitely start with the first building
3:27 block which is to go deep into classical
3:29 ML. I think the very next step after
3:31 just getting over classical ML is
3:33 getting into deep learning which is just
3:35 a subset of machine learning. In this
3:37 basically you should be able to
3:39 understand what is a neuron basically a
3:42 perceptron. What is a neural network?
3:44 These are built with number of different
3:46 neurons or perceptrons. You should
3:48 understand the main algorithm which is
3:50 back propagation which makes the whole
3:52 neural network learn something new by
3:53 changing it weights via back
3:55 propagation. You should know about MLP's
3:57 activation function loss functions
4:00 optimization regularization and again
4:01 I've already mentioned different
4:04 libraries to learn. PyTorch is the holy
4:06 bible of AI if you are writing
4:07 implementation of different
4:10 architectures or if you're writing GPU
4:12 kernels and the library you should focus
4:14 the most on this is PyTorch since you'll
4:16 be writing a lot of PyTorch to actually
4:18 write all these architectures. Now that
4:21 you have built a decent base you
4:23 understand what machine learning is what
4:25 their different concepts are and how you
4:26 can use them. You understand what a
4:29 neural network is, how does it work,
4:30 what is activation functions, what is
4:33 loss functions. The very next step for
4:34 you should be to explore different
4:36 parts. I'm not saying you should explore
4:38 all these at the same time. But yeah,
4:41 pick one, move to next, pick another,
4:43 move to next. I've mentioned basically
4:44 four parts here, but you can definitely
4:47 find more. Those are computer vision,
4:49 natural language processing,
4:51 reinforcement learning, audio, speech
4:52 and audio processing. What comes under
4:55 these are computer vision basically
4:57 works with images and video data. So
5:00 it's like working with CNN's or
5:01 convolutional neural networks which help
5:04 you understand about an image. It helps
5:06 you build models which can detect object
5:09 or do segmentation of different objects.
5:12 NLP is basically working with textual
5:14 data. You pre-process the text data. You
5:16 convert the text into embeddings. You
5:18 learn about sequence models. You learn
5:22 about RNN's LSTM GRU. You learn about
5:24 named entity recognization etc. These
5:26 all comes under the entity of natural
5:28 language processing. Reinforcement
5:30 learning is a very old field and
5:33 suddenly got popular into the world of
5:35 large language models. But the
5:38 reinforcement learning we use in LLM is
5:39 very different from what classical
5:41 reinforcement learning is. It is good
5:44 for you to dive deeper into both. I
5:46 think if you are very good in RL or
5:47 understand the concepts very deeply, the
5:49 very best option for you is to work
5:52 under some robotic company or maybe make
5:54 something of your own. This goes through
5:57 concepts like MDPs, markov decision
5:59 processes, deep Q networks, policy
6:02 gradient methods etc. While the next
6:04 step or the next path you guys can
6:05 follow is speech and audio processing
6:07 which is your like the next path you
6:09 guys can follow is speech and audio
6:11 processing which consist of text to
6:13 speech, automatic speech recognition
6:15 etc. You should explore all these path
6:17 individually. Give them their time. I
6:19 think after you have given them enough
6:21 time you'll be able to understand what
6:23 is the best for you. Either you are good
6:25 with CV, are you good with RL, are you
6:27 good with NLP? If I talk about my
6:29 experience, I also dive deep into very
6:30 different fields. I tried computer
6:33 vision, I tried RL, I tried NLP. What
6:36 attracted me most was NLP and RL. Moving
6:38 on to the holy grail which changed
6:41 everything is transformers. You should
6:42 understand how it works. What's the
6:44 whole architecture? How does data flow
6:46 inside it? What is multi head attention?
6:48 What is attention mechanism? What is
6:49 self attention? Followed by latest
6:52 advancements like mixture of experts,
6:54 multi head latent detention etc. I built
6:57 a primer of this earlier on this channel
6:59 where you can learn about how data flow
7:00 inside transformers and how does it
7:02 internally work. We incorporate
7:04 incorporate the meaning of hello into
7:07 this one. We okay so we need some kind
7:09 of structure or pattern by watching that
7:11 video. I think you'll be able to learn
7:13 how transformers work intuitively and
7:15 also how they are able to process and
7:18 actually learn all this data. Moving on
7:19 to the hot topic right now which is
7:23 large language models such as GPD, claw
7:25 or gemini. All of these are considered
7:27 under the category of large language
7:29 models since they have very large number
7:31 of parameters in them. So they are large
7:32 language models. I've divided them into
7:34 three stages. These are basically the
7:37 training stages of a model. A model goes
7:39 through three training stages which are
7:40 pre-training, mid-training and
7:43 post-raining. In each step, model learns
7:45 something new. Pre-training is when the
7:47 model learns from scratch. We give it a
7:49 massive amount of data, architecture,
7:51 optimization techniques and make it
7:54 learn something which gives a base model
7:56 followed by training. When we give this
7:58 model domain specific knowledge like
8:01 math, science etc. This can also be
8:03 named as continual pre-training since
8:06 the model will now learn more of these
8:08 domain specific knowledge and will
8:09 converge on these. The next part of the
8:11 pipeline is post- training where the
8:14 model is tuned to work as a chatbot.
8:16 Before this step, the model is just an
8:18 expert predictor. What instruct tuning
8:20 do is it makes it works like a chatbot.
8:22 So it understand that user ask a
8:23 question and I have to answer it in a
8:26 certain way. RLHF, RLVR, test time
8:28 computer all comes under the category of
8:30 post- training where the model learns to
8:32 not give harmful advices or not give
8:34 harmful content while also we induce the
8:37 capability of thinking which is testing
8:39 compute to model at this stage. All of
8:41 these three stages need different
8:43 experts to work on them. When you
8:45 understand how a transformer work, how a
8:46 large language model work, you'll be
8:48 able to understand what pre-training,
8:49 mid-training, post- trainining means.
8:51 Becoming an expert in any one of these
8:54 fields can result you a very good pay
8:56 while also being very respected. If you
8:57 become or if you learn about any of
9:00 these field and get into it very deep,
9:02 you'll be able to crack a job easily.
9:04 One other thing you guys can learn about
9:06 is finetuning. Finetuning is basically
9:08 training your model on some specific
9:11 data so that it gives the best result
9:13 based on that data. It can be done via
9:15 some common libraries like sloth or
9:18 hugging face TRL. It goes if you want to
9:20 understand more of how does it works
9:22 there's different techniques like low
9:24 rank optimization quantized low rank
9:27 optimizations how does adapters work etc
9:29 all these together make you understand
9:31 how does finetuning work which is
9:32 basically updating some set of
9:34 parameters so model can learn from a new
9:37 data I think this kind of sums up what
9:40 core AI is n I've not gone very deep
9:42 into this and each step of this pipeline
9:44 is divided into many subsets once you
9:47 understand these you'll be able to dive
9:49 into deeper concept yourself. Moving on
9:52 to some easy stuff, comparatively easy
9:54 stuff which is basically applied AI. The
9:55 very first thing you need to understand
9:59 is working with LM APIs like code chat
10:02 GBT any API. You should be able to call
10:05 it make it response to your input. What
10:07 you can learn here is basically prompt
10:10 engineering function calling tool use
10:12 etc. Prompt engineering basically
10:14 engineering a prompt which gives a
10:16 certain output when given a certain
10:18 input and basically increase the chance
10:21 of right output. A function calling in
10:23 is when you give a model a specific
10:25 function which the model can use and
10:27 give the output in JSON so that you can
10:29 further use it in your pipeline. Moving
10:31 on to embeddings and vector databases.
10:33 So the concept behind this is that
10:36 converting a textual data into a vector
10:39 database. So think of it in a
10:41 n-dimensional space where you can put
10:43 different data and similar data gets
10:45 clustered into same domain. Here you
10:48 need to learn about how does similarity
10:50 matrix works. What is indexing? How can
10:53 I do best retrieval so that you can if
10:55 you learn these very well you'll be able
10:57 to understand rag easily which stands
11:00 for retrieval augmented generation. RA
11:01 consists of loading a document,
11:04 pre-processing it, using different
11:06 chunking strategies, using re-ranking
11:08 models, using hybrid search like dense
11:10 search, BM25 search to actually use data
11:12 from a local database, pass into your
11:15 LLM and get a better result. Why this is
11:18 done? All the data is not used to train
11:21 LLMs. Some data might be private or some
11:23 data might be new after the model has
11:25 gone through its training stage. So this
11:27 is when rack comes in. Intuitively, what
11:29 rag does is it provides an external
11:31 database which consists of these new
11:33 data sources or any private data sources
11:36 which the LLM can query and get the
11:38 output from the text data and use it in
11:40 its prompt to give a final output.
11:42 You'll have to do a lot of evaluations
11:44 to make sure your rag model is working
11:46 well without any hallucinations cuz
11:48 hallucinations can break the further
11:50 pipelines. Moving on to the hot stuff
11:52 right now which are AI agents. What are
11:53 AI agents should be your first question.
11:55 So you'd be able to understand what AI
11:58 agents are which basically stands for
12:00 anything which can perceive, reason and
12:02 act. You need to understand what tool
12:04 use is, what function calling is, how
12:06 can you give an agent a tool, how can it
12:09 use it, what is planning, what is
12:11 memory, how can I keep more memory, how
12:13 can I make a consistent agent which
12:15 keeps the previous memories, how can I
12:17 build a multi- aent system? Good
12:19 framework for this is langraph, langen,
12:22 camel, etc. Few months ago, Anthropic
12:25 came with model context protocol which
12:27 is like a universal USB you can put into
12:30 a AI model and it basically standardize
12:31 everything. So you can connect anything
12:35 with it either it's your DB or some API
12:37 or you want to execute your code. You
12:39 can basically just make an MCP server of
12:41 it and just connect it to your LLM. Your
12:43 LLM will then be able to use this and
12:45 fetch or do anything with this
12:47 information. Before MCP you might need
12:50 to have different API for everything but
12:53 what MCP does is basically a middleware
12:54 between these different apps and your
12:56 LLMs which standardize everything. ML
12:58 option pro is the easiest field if
13:00 you're currently working as a web to
13:03 engineer or basic backend engineer since
13:05 here you'll be working with docker
13:08 kubernetes to deploy these built models
13:10 doing experiment tracking inference
13:13 optimizations via ulm tensor and
13:14 monitoring your models for any breakage
13:16 or changes. Now moving on to the most
13:18 important part of this video. Some
13:21 general advice and takeaways. General
13:23 advice from me would be to be up to date
13:25 with this field. The sources you guys
13:27 can refer to are Twitter, archive,
13:29 hugging face spaces and hugging face
13:31 news. The key to keep on learning is
13:33 always be curious and learn the next
13:34 thing. Build the next project and do
13:36 that thing. In this field, you have to
13:38 be a rapid developer adapting to things
13:40 fast and executing things fast. That is
13:42 what will make you unique and might help
13:45 you getting a job soon. I think the one
13:46 thing you should take away from this
13:48 video is that don't just keep on
13:50 learning a theory or just watching a
13:52 video, completing a lecture series.
13:54 Instead, start building a project.
13:56 You'll learn a lot more by building
13:59 something than you are learning by just
14:00 watching some video or reading some
14:04 blogs. This was a very brief AI road map
14:05 which will give you an overall idea of
14:07 how things are currently working in the
14:10 AI space. After this, you'll be able to
14:12 yourself dive into deeper concepts, look
14:14 at new tech and just understand them how
14:16 they work. You'll become a rapid
14:18 developer in the field of AI by using
14:21 AI. You'll know where to find, what to
14:24 find and when to find. If I talk from my
14:26 experience, what the industry currently
14:28 needs is someone who understand concepts
14:30 very deeply. If you're working in core
14:33 AI, while working in applied AI, what I
14:34 got to know is that you have to be a
14:36 rapid developer. You have to understand
14:38 new tech fast. Integrate it into your
14:41 system. Build docs on top of it. Build
14:44 projects on top of it while using AI
14:46 since it has given a boost to all the
14:47 developers. So now you have to adapt to
14:49 it. With this, I'd like to end the