F1 Score
May 20, 2023
In the field of artificial intelligence and machine learning, the ability to accurately classify or predict outcomes is of paramount importance. To evaluate the performance of classification models, several metrics have been proposed, one of which is the F1 score. The F1 score is a measure of a model’s accuracy that balances precision and recall, two important metrics for classification tasks. In this article, we will delve into what the F1 score is, how it is calculated, and how it is used to evaluate the performance of classification models.
Precision and Recall
Before we delve into the F1 score, we need to understand two important metrics for classification tasks: precision and recall. Precision is a measure of the model’s ability to correctly identify positive results. It is calculated as the number of true positives (i.e., the number of correct positive predictions) divided by the total number of positive predictions (i.e., the sum of true positives and false positives). Mathematically, it can be represented as:
Precision = True Positives / (True Positives + False Positives)
On the other hand, recall is a measure of the model’s ability to correctly identify all positive instances. It is calculated as the number of true positives divided by the total number of actual positive instances (i.e., the sum of true positives and false negatives). Mathematically, it can be represented as:
Recall = True Positives / (True Positives + False Negatives)
In a classification task, there is often a trade-off between precision and recall. For example, a model that predicts every instance as positive (i.e., no negative predictions) will have perfect recall but poor precision. Similarly, a model that predicts no instance as positive (i.e., no positive predictions) will have perfect precision but poor recall. The challenge for classification models is to strike a balance between precision and recall.
The F1 Score
The F1 score is a harmonic mean of precision and recall. It is defined as the weighted average of precision and recall, where the weights are determined by a parameter called beta. The F1 score is calculated as:
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
In the case of beta = 1 (i.e., equal weightage to precision and recall), the F1 score is also known as the balanced F-score, and it is the harmonic mean of precision and recall. The F1 score ranges from 0 to 1, where 1 is the best possible score.
The F1 score is a useful metric for evaluating the performance of classification models, especially when the data is imbalanced (i.e., the number of instances of one class is much larger than the other). In such cases, accuracy may not be a good metric, as a model that predicts only the majority class will have high accuracy but poor performance on the minority class. The F1 score takes into account both precision and recall, and thus provides a more balanced measure of performance.
Example
Let us consider an example to understand the F1 score better. Suppose we have a binary classification task, where we want to identify whether a person has a disease or not. We have a dataset of 100 instances, out of which 90 instances are negative (i.e., the person does not have the disease) and 10 instances are positive (i.e., the person has the disease). We build a classification model that predicts whether a person has the disease or not based on some features. We evaluate the performance of the model using the F1 score.
Suppose the model predicts 15 positive instances, out of which 8 are true positives (i.e., the person actually has the disease) and 7 are false positives (i.e., the person does not have the disease but is predicted as positive). The model also predicts 5 negative instances, out of which 4 are true negatives (i.e., the person does not have the disease and is predicted as negative) and 1 is a false negative (i.e., the person actually has the disease but is predicted as negative). We can represent this information in a confusion matrix, as shown below:
Actual Positive Actual Negative
Predicted Positive 8 7
Predicted Negative 1 4
Using the confusion matrix, we can calculate precision and recall as follows:
Precision = True Positives / (True Positives + False Positives) = 8 / (8 + 7) = 0.53
Recall = True Positives / (True Positives + False Negatives) = 8 / (8 + 1) = 0.89
Using the precision and recall values, we can calculate the F1 score as follows:
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.53 * 0.89) / (0.53 + 0.89) = 0.67
Thus, the F1 score of our model is 0.67, which is a measure of its overall accuracy.