Wanna become a data scientist? Checkout Beyond Machine!

Metrics in machine learning

As some of you might know, metrics in machine learning is one of my favourite topics. I have already spoken about RMSE and MAE, Cohen’s Kappa, the concordance correlation coefficient and other metrics.

On this post, I’d like to talk about another very important metric: The F-1 score.

So, what is the F1-score and why would you want to use it?

The F1-score

The F-1 score is simply the harmonic mean between precision and recall. What is precision and recall? Let’s do a quick refresher. In a two-class classification problem, the confusion matrix is defined as below:

confusion matrix

So, based on that, the precision is defined as

{\displaystyle {\text{Precision}}={\frac {tp}{tp+fp}}\,}

and the recall is defined as

{\displaystyle {\text{Recall}}={\frac {tp}{tp+fn}}\,}

The image below from Wikipedia provides a very nice overview.

Precision recall

In simple terms:

Precision: If I predict that something is X, then what is the probability that it is really X?

Recall: Out of all instances of X in the dataset, how many we did we manage to correctly collect?

So, I mentioned that the F1-score is the harmonic mean of the precision and recall.

What is the harmonic mean? For two numbers, the harmonic mean is simply defined as:

{\displaystyle H={\frac {2x_{1}x_{2}}{x_{1}+x_{2}}}.}

For those of you who don’t know, this is one of the 3 Pythagorean means, the other ones being the classic arithmetic average and the geometric mean.

Why are we using the harmonic mean? The harmonic mean is the correct mean to use when we are averaging out ratios. Precision and recall are ratios, so the right way to average them is through the harmonic mean. So, the final formula simply becomes:

{\displaystyle F_{1}={\frac {2}{\mathrm {recall} ^{-1}+\mathrm {precision} ^{-1}}}=2\cdot {\frac {\mathrm {precision} \cdot \mathrm {recall} }{\mathrm {precision} +\mathrm {recall} }}={\frac {\mathrm {tp} }{\mathrm {tp} +{\frac {1}{2}}(\mathrm {fp} +\mathrm {fn} )}}}

So, when to use the F-1 score?

The F-1 score is very useful when you are dealing with imbalanced classes problems. These are problems when one class can dominate the dataset.

Take the example of predicting a disease. Let’s say that only only 10% of the instances in your dataset have the actual disease. This means that you could get 90% accuracy by simply predicting the negative class all the time. But, how useful is this? Not very useful, as you wouldn’t have predicted a single instance of the actual disease! This is where the F1-score can be very helpful. In this example, the recall for the positive class would be 0, and hence the F1-score would also be 0.

So, next time you are dealing with imbalanced classes you know that the F1-score is a far superior metric compared to accuracy. Another great metric is Cohen’s Kappa, which you should make sure to check out.

Wanna become a data scientist? Checkout Beyond Machine!

Dr. Stylianos Kampakis is the owner and author of The Data Scientist.