Cassava Leaf Disease Classification Final Report

5 min readMar 21, 2021

By Akshay Shah, Harry Chalfin and Connor Uzzo

Executive Summary

Since our second blog post, we adjusted and tuned our model and comfortably passed 80% validation accuracy with our final implementation (83.5%). This was achieved by simplifying our model’s head, using a stronger base, and implementing early stopping to halt the training before overfitting. Our new model head (Figure 1) simply takes the output of the base, passes it through a global average pooling layer, and flows directly into the 5 node dense layer that represents class prediction probabilities.

Figure 1: Code for creating model head in Keras

Initially, we tried using Google Colab for this effort but had some trouble loading in all the images and handling them. (Google Colab carries the advantage of allowing more than one individual to simultaneously make changes to the notebook.) We switched to using Kaggle directly to streamline the project.

Aside from the accuracy and loss curves (Figures 2 and 3), we also created a confusion matrix and a classification report to get a more detailed understanding of our model.

Since the test set we were given has only one image in it, we used our validation set as a test set as well for the sake of producing the confusion matrix and classification report. Initially, we found that the model accuracy appeared to be very poor (in the 40% range) even though we knew from our graphs that the validation accuracy was closer to 80%. We discovered the reason for this was that the validation set had been shuffled for its initial purpose of tuning the model’s parameters but that this shuffling had incidentally yielded random predictions. As a result, the ‘y_pred_labels’ and the ‘y_true’ values for the validation set were not in the same order, leading to the bizarrely low accuracy value. Once we identified the problem, we simply turned the shuffle parameter off in generating the validation set. In an ideal situation, we would also have a substantial test set at our disposal which we could use to create our confusion matrix and classification report; we would set the shuffle parameters to ‘True’ for the testing and validation sets but to ‘False’ for the test set. However, the competition purposefully only gave us a one-image test set so as to guard against cheating.

Final Model Submission

While we experimented with many different network head architectures, we eventually found that the best model used both a deeply complex pre-trained base and a simple head. In our case, we switched from the standard ResNet50 to Inception ResNet V2, which is a much deeper but much more powerful base. We changed the head of our model to simply be a single global average pooling layer and a dense layer with 5 neurons with softmax activation, representing the class probabilities. These increased our average validation accuracy scores from about 70–75% to 80–83%.

We also tried using an exponential decay learning rate scheduler, but it did not improve our model’s performance so we removed it. We added an early stopping callback as well since the model training would take so long, and often begin to overfit in the last few epochs. To train this model, since its head was so simple we started with the base model unfrozen and proceeded with a small learning rate of 10^-4 to create only slight changes in the network parameters. We fit the model using a small number of epochs (20) with high steps per epoch and validation steps (250 and 25, respectively).

On its best epoch, our model achieved a validation accuracy of 83.50% and a validation loss of 0.5526. Our baseline “majority classifier” model that we discussed in our first blog post had a validation accuracy of only 61.0%, so we were able to achieve a 22.5% improvement in our final model.

Final Training History

Figure 2: Accuracy over training epochs. In this run, the early stopping callback stopped the model training 12 epochs in. The best validation accuracy was 83.5%, and it occurred at epoch 10.

Figure 3: The corresponding loss curves to the above training history. The lowest validation occurred at the eighth epoch, despite the highest accuracy occurring at the tenth.

Figure 4: Classification report for the validation set. Note: classification report and confusion matrices are from a different run than the training history plots.

Figure 5: Non-normalized confusion matrix for the validation set

Figure 6: Confusion matrix normalized by true value. Each value corresponds to the empirical probability of choosing the column number given the true value is the row number.

Figure 7: Confusion matrix normalized by predicted values. These values correspond to the empirical probability of the true class being the row number given the column number was predicted.

Potential Improvements

Since we achieved must greater success with a stronger base model, we would probably research and attempt to fit others if we had more time. We also would take more time to experiment with different optimizers, since we only worked with Adam and SGD in this project. Lastly, while the exponential decay learning rate scheduler proved ineffective for our model, other types of learning rate schedulers could possibly improve training by slowing model progress near the minimum validation loss.

Blog Posts I and II: