0:01 [Music]
0:08 in this video we'll be discussing
0:11 convolutional neural networks a
0:13 convolutional neural network also known
0:16 as a CNN or comp net is an artificial
0:18 neural network that is so far been most
0:20 popularly used for analyzing images
0:22 although image analysis has been the
0:25 most widespread use of CNN's they can
0:27 also be used for other data analysis or
0:29 classification problems as well most
0:32 generally we can think of a CNN as an
0:34 artificial neural network that has some
0:35 type of specialization for being able to
0:37 pick out or detect patterns and make
0:39 sense of them
0:40 this pattern detection is what makes CNN
0:43 so useful for image analysis so if a CNN
0:46 is just some form of an artificial
0:48 neural network what differentiates it
0:50 from just a standard multi-layer
0:51 perceptron or MLP well a CNN has hidden
0:55 layers called convolutional layers and
0:57 these layers are precisely what makes a
0:59 CNN well a CNN now CNN's can and usually
1:04 do have other non convolutional layers
1:06 as well but the basis of a CNN is the
1:09 convolutional layers all right so what
1:12 do these convolutional layers do just
1:14 like any other layer a convolutional
1:16 layer receives input then transforms the
1:18 input in some way and then outputs the
1:21 transform input to the next layer with a
1:23 convolutional layer this transformation
1:25 is a convolution operation we'll come
1:28 back to this operation in a bit for now
1:30 let's look at a high-level idea of what
1:32 convolutional layers are doing as
1:34 mentioned earlier convolutional neural
1:36 networks are able to tech patterns and
1:38 images more precisely the convolutional
1:41 layers are able to detect patterns well
1:44 actually let's be a little more precise
1:46 than that with each convolutional layer
1:48 we need to specify the number of filters
1:50 the layers should have and will speak
1:52 technically about what a filter is in
1:54 just a few moments but for now
1:56 understand that these filters are
1:57 actually what detect the patterns now
2:00 when I say that the filters are able to
2:02 detect patterns what precisely do I mean
2:04 by patterns well think about how much
2:07 may be going on in any single image
2:09 multiple edges shapes textures objects
2:13 etc
2:13 so one type of quote pattern that a
2:16 filter could detect could be edges and
2:19 images so this filter would be called an
2:21 edge detector for example some filters
2:24 may detect corners some may detect
2:26 circles other squares now these simple
2:29 and kind of geometric filters are what
2:31 we'd see at the start of our network
2:33 the deeper our network goes the more
2:35 sophisticated these filters become so in
2:38 later layers rather than edges in simple
2:40 shapes our filters may be able to detect
2:42 specific objects like eyes ears hair or
2:46 fur feathers scales and beaks even and
2:49 in even deeper layers the filters are
2:51 able to take even more sophisticated
2:52 objects like full dogs cats lizards and
2:56 birds to understand what's actually
2:58 happening here with these convolutional
3:00 layers and their respective filters
3:01 let's look at an example so say we have
3:04 a convolutional neural network that's
3:06 accepting images of handwritten digits
3:08 like from the amnesty ADA set and our
3:10 network is classifying them into their
3:12 respective categories of whether the
3:13 images of a 1 2 3 etc let's now assume
3:17 that the first hidden layer in our model
3:19 is a convolutional layer as mentioned
3:22 earlier when adding a convolutional
3:23 layer to a model we also have to specify
3:25 how many filters we want the layer to
3:27 have a filter can technically just be
3:30 thought of as a relatively small matrix
3:32 for which we decide the number of rows
3:34 and number of columns that this matrix
3:35 has and the values within the matrix are
3:38 initialized with random numbers so for
3:41 this first convolutional layer in this
3:43 example of ours we're going to specify
3:44 that we want the layer to contain one
3:46 filter of size 3 by 3 now when this
3:50 convolutional layer receives input the
3:52 filter will slide over each 3x3 set of
3:55 pixels from the input itself until it
3:57 slid over every 3x3 block of pixels from
4:00 the entire image this sliding is
4:02 actually referred to as convolving so
4:05 really we should say that the filter is
4:07 going to convolve across each 3x3 block
4:10 of pixels from the input to actually
4:12 illustrate this I'm going to use an
4:14 example that Jeremy Howard used in one
4:16 of his lectures for fast AI his example
4:18 really gave me a lot of insight behind
4:20 what was going on within a convolutional
4:22 layer so I'd like to share that with you
4:23 all too I've also linked to his lecture
4:25 in the description of this video
4:27 so here we have our matrix
4:29 representation of an image of a7 from
4:31 the emne status set the values in this
4:34 matrix are the individual pixels from
4:36 the image alright so this is our input
4:39 and this input will be passed to a
4:41 convolutional layer as just discussed
4:43 we've specified this layer to only have
4:45 one filter and this filter is going to
4:48 convolve across each 3x3 block of pixels
4:50 from the input so here's our 3x3 filter
4:53 of random numbers here when the filter
4:55 first lands on the first 3x3 block of
4:58 pixels the dot product of the filter
5:00 itself with the 3x3 block of pixels from
5:03 the input will be computed and stored
5:06 this will occur for each 3x3 set of
5:08 pixels that the filter convulse so look
5:11 we would just take the dot product of
5:13 the filter here with this first 3x3
5:15 block and then we'd store it over here
5:19 now we slide to the next 3x3 block take
5:23 the dot product and then store the value
5:25 here if we look at the formula for each
5:27 of these cells we can see that it is
5:29 just indeed the dot product of the
5:31 filter with each 3x3 section of pixels
5:34 from the input so here we have this
5:36 first value is the dot product of this
5:39 input with this filter and then if I
5:42 click on another random value over here
5:43 we can see that this value is the dot
5:46 product of the filter with this input so
5:49 after this filter has convolve the
5:51 entire input will be left with a new
5:53 representation of our input which is
5:55 going to be made up of the entire matrix
5:57 of those store dot products we got from
5:59 the filter this matrix of dot products
6:02 is going to be the output of this layer
6:03 and is represented here this is what
6:06 will then be passed to the next layer as
6:07 input in this same process that we just
6:10 went through with the filter will happen
6:11 to this new output with the next layers
6:13 filters now this was just a very simple
6:16 illustration but as mentioned we can
6:18 think of these filters as pattern
6:20 detectors so we can't really observe any
6:23 specific pattern that was picked out
6:24 from our filter in the example we just
6:26 looked at in Excel but let's show our
6:28 original image of the 7 here and now
6:31 let's say we have 4 3 by 3 filters for
6:34 our convolutional layer and these
6:36 filters are filled with the values you
6:38 see here and these values can be
6:40 represented visual
6:41 as these filters where the minus ones
6:43 correspond to black ones correspond to
6:46 white and zeros correspond to gray
6:48 so if we convolve our original image of
6:51 a7 with each of these four filters
6:53 individually this is what the output
6:55 would look like for each filter we can
6:58 see that all four of these filters are
6:59 detecting edges and the output that
7:02 brightest pixels can be interpreted as
7:04 what the filter has detected so this
7:07 first one we can see detects top
7:08 horizontal edges of the seven and that's
7:11 indicated by the brightest pixels here
7:13 the second detects the left vertical
7:15 edges again being displayed with the
7:17 brightest pixels the third detects
7:19 bottom horizontal edges and the fourth
7:21 detects right vertical edges now these
7:25 filters are really basic and just detect
7:27 edges these are filters we may see
7:29 towards the start of our network more
7:31 complex filters would be located deeper
7:33 in the network and would gradually be
7:34 able to detect more sophisticated
7:36 patterns like the ones shown here we can
7:39 see the shapes that the filters on the
7:41 left detected from the images on the
7:42 right this one here detects circles and
7:45 this one at the bottom is detecting
7:47 corners and as we go even further into
7:50 our layers the filters are able to
7:52 detect much more complex patterns like
7:55 these dog faces being interpreted in
7:57 this filter or even the bird legs
7:59 detected in this one all right so now if
8:02 you're interested in seeing how to work
8:04 with cnn's and code then check out the
8:06 CNN and fine tuning videos in Mike
8:08 Harris deep learning playlist so I hope
8:11 that you now have a basic understanding
8:12 of convolutional neural networks and how
8:14 these networks are made up of
8:16 convolutional layers which themselves
8:17 are made up of filters and I hope you
8:20 found this video helpful if you did
8:22 please like the video subscribe suggest
8:24 and comment and thanks for watching
8:27 [Music]