0:00 OK, pop quiz.
0:02 What am I drawing?
0:06 I'm going to make three
0:08 predictions here.
0:10 Firstly.
0:11 You think at your house, you'd be
0:13 right?
0:14 Secondly, that that
0:16 just came pretty easily to you, it
0:18 was effortless.
0:18 And thirdly, you're thinking
0:20 that I'm not much of an artist
0:23 and you'd be right on all counts
0:24 there.
0:25 But how can we look at this set
0:27 of geometric shapes and think,
0:29 Oh, how?
0:31 If you live in a house, I bet it
0:33 looks nothing like this.
0:34 Well, that ability to perform
0:36 object identification that comes so
0:38 easily to us does not
0:40 come so easily to a computer,
0:42 but that is where we can apply
0:44 something called convolutional
0:47 neural networks
0:49 to the problem.
0:51 Now, a convolutional neural
0:54 network or a.
0:56 See, and and.
0:58 Is a area of deep learning
1:01 that specializes in pattern
1:02 recognition.
1:04 My name is Martin Keane, and
1:07 I work in the IBM garage
1:09 at IBM.
1:10 Now let's take a look
1:12 at how CNN works
1:14 at a high level.
1:16 Well, let's break it down.
1:17 CNN convolutional neural network
1:20 Well, let's start with the
1:21 artificial neural network part.
1:24 This is a standard network
1:26 that consists of multiple layers
1:28 that are interconnected,
1:30 and each layer receives
1:32 some input.
1:34 Transforms that input to something
1:36 else and passes an output
1:38 to the next layer, that's
1:40 how neural networks work and
1:42 see an end is a particular
1:44 part of the neural network or a
1:46 section of layers that say it's
1:47 these three layers here
1:50 and within these layers, we have
1:52 something called filters.
1:55 And it's the filters that perform
1:58 the pattern recognition
2:00 that CNN is so good
2:02 at.
2:04 So let's apply
2:06 this to our house example now.
2:08 If this house were an actual image,
2:10 it would be a series
2:12 of pixels, just like any image.
2:17 And if we zoom in on a particular
2:19 part of this house,
2:21 let's say we zoom in around here,
2:23 then we would get, well,
2:26 the window.
2:28 And what we're saying here is that a
2:30 window consists of some
2:32 perfectly straight lines.
2:35 Almost perfectly straight lines.
2:37 But, you know, a window doesn't need
2:38 to look like that window could
2:40 equally look like this, and we would
2:41 still say it was a window.
2:44 The cool thing about CNN is
2:46 that using filters.
2:47 CNN could also say that these
2:49 two objects represent the same
2:51 thing.
2:52 The way they do that, then, is
2:54 through the application of these
2:55 filters. So let's take a look at how
2:57 that works.
2:58 Now, a filter is basically
3:01 just a three by three block.
3:04 And within that block, we can
3:05 specify a pattern to look for.
3:08 So we could say, let's look
3:10 for.
3:12 Pattern like this, a right
3:14 angle in our
3:16 image.
3:17 So what we do is we take this filter
3:19 and it's a three by three block
3:21 here. We will analyze the equivalent
3:23 three by three block up here as
3:24 well.
3:25 So.
3:27 We'll look at first of all, these
3:28 first.
3:30 Group of three by three pixels,
3:32 and we will see how close
3:34 are they to the filter
3:36 shape?
3:37 And we'll get that numeric score,
3:40 then we will move across one, come
3:42 to the right and look at the next
3:44 three by three block of pixels and
3:45 score how close they are to the
3:47 filter shape.
3:48 And we will continue to slide over
3:50 or vote over all
3:52 of these pixel layers until
3:54 we have not every
3:57 three by three block.
4:00 Now, that's just for one filter.
4:01 But what that will give us is an
4:03 array of numbers that say how
4:04 closely and the image
4:06 matches filter,
4:09 but we can add more filters
4:11 so we could add another three by
4:12 three filter here.
4:13 And perhaps this one looks for a
4:15 shape like this.
4:18 And we could add a third filter
4:20 here, and perhaps this looks
4:22 for a different kind of right angle
4:24 shape.
4:27 If we take the numeric arrays
4:29 from all of these filters and
4:31 combine them together in a process
4:32 called pooling, then we have
4:34 a much better understanding
4:36 of what is contained within
4:38 this series of pixels.
4:40 Now that's just the first layer
4:42 of the CNN.
4:43 And as we go deeper into the
4:45 neural network, the filters
4:47 become more abstract all they can do
4:49 more.
4:51 So the second
4:53 layer of filters perhaps can perform
4:55 tasks like basic object
4:56 recognition.
4:58 So we can have filters here that
5:00 might be able to recognize
5:02 the presence of a window
5:04 or the presence of a door
5:06 or the presence
5:08 of a roof.
5:10 And as we go deeper into the sea
5:12 and into the next leg, well, maybe
5:13 these filters can perform even more
5:16 abstract tasks, like
5:18 being able to determine whether
5:20 we're looking at a house
5:22 or we're looking at an apartment
5:25 or whether we're looking at a
5:26 skyscraper.
5:29 So you can see the application
5:31 of these filters increases
5:34 as we go through the network and can
5:35 perform more and more tasks.
5:37 And that's a very high level
5:39 basic overview of what CNN
5:42 is. It has a ton of business
5:44 applications.
5:45 Think of OCR, for example,
5:47 for understanding handwritten
5:48 documents.
5:49 Think of visual recognition
5:51 and facial detection and visual
5:53 search.
5:54 Think of medical imagery and
5:56 looking at that and determining what
5:58 is being shown in an imaging scan.
6:01 And even think of the fact that
6:02 we can apply a CNN to perform
6:05 object identification for.
6:08 Body drawn houses, if
6:10 you have any questions, please drop
6:12 us a line below, and if you want to
6:13 see more videos like this in the
6:15 future, please like and subscribe.
6:18 Thanks for watching.