Cassava Leaf Disease Identification, Midway Report

Connor Uzzo
4 min readMar 16, 2021

By Akshay Shah, Harry Chalfin, and Connor Uzzo

Since our initial post in February (see https://connor-uzzo.medium.com/cassava-leaf-disease-classification-c590375fb457), we have been in the process of building a convolutional neural network (CNN) that can identify different cassava leaf diseases based on a digital image of the leaves. This has involved rethinking some of our coding strategy. Some of these changes were to overcome unforeseen issues issues like limited data storage and long run times, while others, like introducing data augmentation, were made to improve the overall potential of the model.

Improvements

We chose to switch from implementing our model in Google Colab to instead using Kaggle itself, since this allowed us to load our dataset more easily. When we submitted our last post, we were having trouble doing this without depleting our computer’s RAMs since the dataset was so large. The easiest solution to this issue that we found was to introduce the training, validation, and test sets as image data generators (this is done using the Tensorflow function ImageDataGenerator()), and to use the .flow_from_dataframe() method on these generator objects to define training and validation sets that we can fit a model on.

In addition, we wrote code that produced plots of the accuracy and categorical cross entropy loss on our training and validation sets. Currently, we are seeing some strange behavior in that the validation set appears to be scoring better on the model than the training set on some epochs. We are still working on finding a solution to this issue and are wondering if our model is underfitting the data. Following these plots we wrote code to display a confusion matrix representing how well our model classifies images across the various classes.

First Three Model Attempts

When we began creating the model itself, we first tried building a CNN ourselves based on intuition of what the architecture usually looks like. However, this proved to be very difficult, and our first model was almost immediately extremely overfit. The run time of this network was also disappointingly slow, partially because it is a very large dataset being loaded from a generator but also because our initial batch size was set too high at 512, which we eventually learned to lower to 32.

However, even after its long run time this model had unacceptably low validation accuracy scores, only achieving about 15-20% accuracy. We attempted one more model built from scratch, but quickly decided to scrap it and import ResNet50 from Tensorflow instead. This pre-trained CNN is 50 layers deep and is often very effective for classifying image data. Following the ResNet output, we added a max-pooling layer and three flattened dense layers, each with a 0.2 dropout probability. We ended the model with a 5 node dense layer with softmax activation to act as our class probabilities. Our first few runs with this model had between 60 and 65% accuracy.

Our best functioning model built upon the second one with a few key changes to the hyperparameters and architecture. These included shrinking the size of some of our flattened dense layers, increasing steps per epoch and validation steps, adding light l2-regularization, and using he-normal weight initialization. After training just the layers we added (that is, with the ResNet layer parameters frozen), this model achieved 71.04% validation accuracy on its best epoch.

Accuracy over 50 epochs of training. These values were much noisier than the corresponding loss curves.
Loss over 50 epochs of training. Here the loss function being used is categorical cross-entropy.
Confusion matrix for our model. Our model clearly identifies class 3 three the best, which makes sense since it is the largest class in our dataset.

Moving Forward

We want to try a few other small changes on the model before fine tuning, mostly tinkering with different ways to mitigate overfitting. We have mostly been relying on dropout to prevent overfitting, though we also tried l2 regularization. We also plan to try using a learning rate scheduler function to lower our learning rate at higher epochs, which should hopefully increase the effectiveness of our optimizer. After we settle on which method or methods we plan to use, we will unfreeze the base model and lower the learning rate even more to fine tune the overall model, probably only letting it run for a few more epochs. At this point we will be ready to have our model evaluated.

Links

Original EDA, in Google Colab: https://colab.research.google.com/drive/1hak-mOKTmVMTIQ_aJ72lgLsJdhUGTyB6

Kaggle notebook: https://www.kaggle.com/connortuzzo/data2040-midterm-project/edit

--

--