Wanna become a data scientist? Checkout Beyond Machine!
Metrics in machine learning
As some of you might know, metrics in machine learning is one of my favourite topics. I have already spoken about RMSE and MAE, Cohen’s Kappa, the concordance correlation coefficient and other metrics.
On this post, I’d like to talk about another very important metric: The F-1 score.
So, what is the F1-score and why would you want to use it?
The F-1 score is simply the harmonic mean between precision and recall. What is precision and recall? Let’s do a quick refresher. In a two-class classification problem, the confusion matrix is defined as below:
So, based on that, the precision is defined as
and the recall is defined as
The image below from Wikipedia provides a very nice overview.
In simple terms:
Precision: If I predict that something is X, then what is the probability that it is really X?
Recall: Out of all instances of X in the dataset, how many we did we manage to correctly collect?
So, I mentioned that the F1-score is the harmonic mean of the precision and recall.
What is the harmonic mean? For two numbers, the harmonic mean is simply defined as:
Why are we using the harmonic mean? The harmonic mean is the correct mean to use when we are averaging out ratios. Precision and recall are ratios, so the right way to average them is through the harmonic mean. So, the final formula simply becomes:
So, when to use the F-1 score?
The F-1 score is very useful when you are dealing with imbalanced classes problems. These are problems when one class can dominate the dataset.
Take the example of predicting a disease. Let’s say that only only 10% of the instances in your dataset have the actual disease. This means that you could get 90% accuracy by simply predicting the negative class all the time. But, how useful is this? Not very useful, as you wouldn’t have predicted a single instance of the actual disease! This is where the F1-score can be very helpful. In this example, the recall for the positive class would be 0, and hence the F1-score would also be 0.
So, next time you are dealing with imbalanced classes you know that the F1-score is a far superior metric compared to accuracy. Another great metric is Cohen’s Kappa, which you should make sure to check out.