Problem Explanation
In this problem, we are given 50,000 (32*32*3) training images and 10000 testing images with labels. here image height width is 32 * 32 and 3 is for the RGB color channel.
Labels are 'plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck'
So, here task is to identify the image choose which class it belongs to.
This task is basically about classifying images among given 10 labels/classes.
We can perform this task using 2 methods:
1)Artificial Neural network (ANN) (Only ANN)
2)Convolution Neural network (CNN) (This contains Convolution layers and Fully connected dense layers with pooling)
I am going to solve this problem using CNN
In CNN there are 2 ways
The first is using Torch and the second is using tensor flow and Kera.
I have given a solution using a torch.
Solution Discussion
Note :- Reference - https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
Let's look into a solution,
The first step is to import torch and related libraries and load data into the machine.
Here the input is normalized using transforms.normalize() method.
Normalization is an important part of image classification.
here batch size is the number of images passed in the one go in the network.
I have updated the batch size to 16.
Batch size is one of the hyperparameters for this problem so tried increasing that param.
moreover, when a large learning rate is used, the higher the batch size, the better the performance of a CNN.(Ref:-https://www.sciencedirect.com/science/article/pii/S2405959519303455)
num_work - updated it to 1 and then 2
num work is the number of subprocesses for data loading.
The second step :
It is just about showing training images.
here created one method imshow to show an image using matploat.
matploat basically unnormalize the normalized image then convert it to NumPy and then take the transpose of it and show it to the user.
The Third Step:-
This is the Main important step of this image classification.
In this step we make the whole Neural network structure.
In this step, CNN has developed means all computation is done in this layer.
The first is the convolution layer.
It takes 3 channels as input and 16 as output and uses 3*3 kernels for computation.
Kernal is nothing but a filter used to fetch the feature matrix from the input image.
How it works is it moves over, all the input data and performs the dot product with the sub-part of input data. It gives dot product as a result.
padding means the amount of pixel added by kernel while processing.
The output of the convolution layer is passed to the activation function relu.
here I am using relu but there are many other options like softmax, sigmoid.
tried with both but performance was not improving so I used relu again.
The output of the activation function is passed to the convolution layer.
The next is the Convolution layer: input channel-16, output channel-32, kernel -3*3, padding-1
The next Convolution layer: input channel-32, output channel-64, kernel -5*5, padding-1
The next Convolution layer: input channel-64, output channel-128, kernel -5*5, padding-1
Now it will be passed to fully connected layers/Dence layers/Artificial neural net,
the output of the above convolution layer is 128. Which is passed to the activation function from which output will be 128 *6*6 matrix.
IN the above network I am using max-pooling after convolution layer 1 and convolution layer 4.
Pooling is used to fetch important features from a given input and reduce its size.
Here I am using max-pooling to fetch important features.
how it works is it moves overall input and fetches max from the sub-input matrix.
The fully connected layer receives input as 128*6*6 because it can not work on the matrix so we have to flatten the matrix. so now input will be 128*6*6
Fully connected layer1 input 128*36 and output:128 Then the output is passed to the activation function.
Fully connected layer1 input 128 and output:84 Then the output is passed to the activation function.
Fully connected layer1 input 84 and output:64Then the output is passed to the activation function.
Fully connected layer1 input 64 and output:34 Then the output is passed to the activation function.
Fully connected layer1 input 34and output:10 Then the output is passed to the activation function.
The final output should be 10 because you need to classify input from the given 10 labels or classes.
Now we have built our convolution layer,
It is time to train our model using train data set.
Working of this CNN:- It basically takes images as input. images are nothing but array or matrix.
here input is 32*32*3 images.
Working is very simple,
for example here instead of a machine, we have a few people, for example, there are 8 people to recognize it is a dog image or not.
From that 8, 2(A, B) decides the tail and leg part of the dog image, another 2(C, D) decide the face and hands of the dog image.
each one gives a number between 0 to 1. A, B give their result to E and C and D submit their result to F. Now From the results of A and B, E decides it is a dog's lower part or not. If the result is greater than 0.5 then it is the dog's lower part. In the same way, the result of C and D, F decides if it is the dog's upper part or not. Now E and F combined to give the result to G. G gives the result to H. H tells if it is dog image or not and if the prediction is true it tells G it is correct otherwise it tells G to improve its performance. G back propagates this information to E and F. E and F give this information to A, B, C, D. In this way they can improve their performance from back propagation. Here it is important to have many people in recognization tasks so that the task is divided into small pieces so that we can get more accurate results.
The same approach is applied to Convolution layers.
Fourth step:
here I am using the CrossEntropyLoss function as a loss function to calculate loss over time.
It measures the performance of the model and if there is an error it tries to improve the model by backward propagation.
I have tried both SGD and Adam.
SGD - Stochastic gradient descent and ADAM, are some of the many optimizers.
Fifth Step:
Now, we will actually train data using Build CNN and forward and backpropagate using the above Gradient descent.
Here we check loss at each epoch. It is really needed to check if our backward propagation of gradient descent is properly giving input and its input is actually improving our performance or not. If it is not proper, the model will not get feedback properly and will always generate errors.
It will never improve performance in such a case.
Sixth step Now we will save the trained model so that we can predict from it whenever need.
The saving model can be extremely useful when you want to test an already trained model.
we will just call the saved model and predict from it so we will not waste our time in training our model.
Seventh Step
It is the final step for checking the performance of your build model.
In this step, we pass the test loader which is the data list of the test model.
accuracy is predicted based on how many were correct from the total.
Eighth Step
This step is optional. But it is needed if you want to know the accuracy of each class/label.
It means how many cat images were predicted correctly from total cat images.
This calculation is for all 10 classes.
A different model of prediction
Model1 :
Change 1: batch_size- 16
change 2: num_workers -0
change 3: whole convolution layer, added different layers to increase accuracy
change 4 : optimizer = optim.Adam(net.parameters(), lr=0.001)
change 5 : epoch (10)
Model 2
It is same as reference document,
Model 3
change 1: batch_size = 8
change 2 : optimizer = optim.Adam(net2.parameters(), lr=0.001)
change 3 : epoch(10)
note : here convolution layer is same as model 2 (reference)
Accuracy
Model 1:
Model accuracy: 71-73%
Loss during each epoch :
Model 2:
Model accuracy: 52%
Loss during each epoch:
Model 3:
Model accuracy: 52%
Loss during each epoch:60%
Epochs are different so it is not feasible to make graph of loss.
Different Comparision graph
Graph1
Bellow graph is for showing a comparison between accuracy of each model on each class/labels
For example, the red bar shows the accuracy of model1 and the first red in the chart shows the accuracy of model 1 for the class plane.
I have stored the accuracy of each class for each model in an array and used that array to plot the accuracy of each class for each model in a
bar chart.
Graph2
It is basically a bar chart for comparing the accuracy of each model.
I have stored the final accuracy of each model in different variables.
Using that variable to plot accuracy in the chart for each model.
Challenges Faced
One of the most important things was to get good accuracy.
The question was what should I do to get good accuracy.
For that, I tried different hyperparameters like Optimizer method, batch size, num_workers, learning rate, epoch, number of convolution layers.
Some increased accuracy some decreased performance.
Overall it was trial and error.
Highest accuracy was achieved using ,
Change 1: batch_size- 16
change 2: num_workers -0
change 3: whole convolution layer, added different layers to increase accuracy
change 4 : optimizer = optim.Adam(net.parameters(), lr=0.001)
change 5 : epoch (10)
References
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
Comments