YouTube Transcript:
Deep learning project end to end | Potato Disease Classification - 2 :Data collection, preprocessing

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

Video Summary

Summary

Core Theme

This content outlines the initial steps of a data science project focused on potato disease classification, detailing the process of acquiring, preparing, and structuring image data for model training using TensorFlow.

Any data science project starts with

data collection process.

AtliQ Agriculture has three options

of collecting data. First, we can use

ready-made data. We can either buy it

from third-party vendor or get it from

kaggle etc. Second option is we can have a

team of data annotators whose job is to

collect these images from farmers and

annotate those images either as a

healthy potato leaves or having a early

or late blight disease. So this team of

annotators

can work with farmers you know they can

maybe go to the

fields, farmer fields and they can ask

farmers to take the pictures or they can

take pictures themselves

and they can classify with the help of

farmer or by some means you know by

domain knowledge that

okay these are classified as

deceased potato plants versus the

healthy potato plants so they can

manually collect the data- this option is

expensive it requires budget so you have

to work with your stakeholders and kind

of get the budget approved and it might

be time consuming as well.

The third option is data scientists can

write web scraping scripts to go through

different websites which has potato

and collect those images and then use

the tools like Docano there are so many

tools that are available which can help

you annotate the data so either you

annotate that or

you get annotated images by using those

web scraping tools.

In this project we are going to use

ready-made data from kaggle.

We will be using this kaggle data set

for our model training you can click on

this download button. It's

326 29 megabyte data whatever

and it has

not only the images for potato disease

classification but

some tomato and pepper this is

classification as well. We are going to

ignore all of this we will just focus on

these three directories so I had already

previously downloaded this zip file when

I right click and extract all I get this

folder. And in this folder

I had the you know tomato all these

directories but those directories I have

deleted so I deleted those directories

manually so I asked you to do the same

thing. Go here

delete all the directories

except these three

then you will copy paste this directory

into your project directory. Now for

project directory I have C code folder

and here I am going to create a new

folder called potato

disease.

So I want all of you to practice this

code along with me. If you just watch my

video it's a waste of your time you

practice as you watch this video only

then it is useful. You know this is the

best advice that someone can give you

okay?

I have this

folder ready for my project

and in that I'm going to create a new

folder called training okay.?

and

Then i'm going to launch get bash so I

have this

get bash

which allows me to run all the unix

commands you can use windows command

prompt as well.

and I will run python minus m notebook

which is gonna launch you know my

jupyter notebook here

and in this

I will locate my potato this is folder

go training

create a new python 3 file and this will

model okay? So you can say okay training

whatever just give some name to

this particular

notebook

and then

we are going to

import some essential libraries so the

purpose of this video actually is

to download the data set into tf

dataset TF data input pipeline and then

we will do some data cleaning and we

will make our data set ready for model

training. So that's the purpose of this

video.

S here

let me

download some essential

you know

modules here

and then

the first thing I'm going to do is

okay? So we had

this.

Okay so in the download my download

folder

somewhere I had this planned village

directory right? So planned village

directory I'm going to the

do control C

and then

control V here. So I will copy all those

images

into the same folder where I'm running

this

notebook you know my IPV notebook so you

see now I have this directory and if you

look at all this

this is like early blight so there are

thousand images here and if you look at

all these images you see there is there

are these black dots

So this is showing that this

potato plant has some kind of disease. If

you look at healthy

plants

healthy leaves are healthy you know.

There are no blacks spots and they look

pretty good

the other one late blight will also have

late blight is a little more

deteriorating. See you look at all these

leaves they look pretty horrible so we

have all this data here in our directory

and

now I'm going to use tensorflow's

data set

[Music]

download these images into

tf.data.dataset.

Now if you don't know about TF data set

you need to pause this video right now

go to Youtube search for tensorflow data

input pipeline and you will see my video.

Here you need to watch this video it

will clarify your concepts. Basically

what's the purpose of tf.data.dataset

let's say you have all these images on

your hard disk okay and you can download

these images into batches

because there could be so many images

right? So if you

read these images and in batches into

this TF data set structure then you can

do like dot filter dot map you can do

amazing things so please watch this

video and I will now assume that your

concepts around

TF data sets are clear and we can now

load that data using

our tf dot

like this this particular API so TF dot

carer dot pre-processing image data set

from directory. Okay now

okay what does this do so you can search

tensorflow

image data set from directory. It will

show you an API for this directory so

you specify a directory first.

So let's say you have

main directory you have your classes and

you these are all the images so this one

call will load all the images into

your tensor. Basically into your data set.

Okay so our so the first argument is

what directory? Okay what is our

directory? Okay so let me write this here

our directory name

plant village, correct?

See plant village that's our data

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Deep learning project end to end | Potato Disease Classification - 2 :Data collection, preprocessing

Video Transcript

Summary

Core Theme

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
Deep learning project end to end | Potato Disease Classification - 2 :Data collection, preprocessing