Post by Dean Wetherby

It’s the end of the school year. After the professor applies a curve, you find out that you got an A in a pretty tough engineering class. It doesn’t matter if you got 100% or just barely eked a 90%. For classification tasks, an A is pretty good but not nearly good enough.

Lately I’ve been writing a few binary classifiers for aerial imaging applications. One example of a classifier is deciding if a frame of the video was taken above clouds which obscures the ground view. So I built a dataset consisting of cloudy and non-cloudy images appropriately split into training, testing and validation sets. I train my classifier which yields 95% accuracy on the test data. That’s pretty good, right? As you can probably guess from the preceding paragraph, it really isn’t.

 

Let’s say we’re processing an aerial video that’s 10 minutes long and has a frame rate of 30 frames per second. That’s 10 min * 60 sec/min * 30 frames/sec or 18,000 frames total. When we run our classifier on the video to detect cloudy and not-cloudy video frames, the classifier is going to incorrectly categorize approximately five percent of the time. This means some non-cloud images will be predicted to be cloudy and cloudy images will be predicted to be not cloudy. 5% of 18,000 frames is 900 frames! That means we are misclassifying a whole lot of frames with an ‘A’ grade classifier.

Although there are numerous approaches to increasing the accuracy/precision/recall of your model, I bring up this issue in order to manage machine learning expectations. Even if the classifier was adjusted somehow to be 99% accurate, we would still be incorrectly classifying 180 frames in our 10 minute video. Missing this many frames could make users frustrated with the classifier. So why not train the model to be 100% accurate? At some point, labeling the training images as cloudy and non-cloudy becomes subjective. In the case of the cloud predictor, what do we do in the case of partially cloudy images? How much of the frame should be covered in clouds before it is labeled cloudy? It’s this decision boundary that the model has difficulty with. If your classifier has a similarly challenging task, sometimes an ‘A’ is the best you can do.