Confusion Matrix

May 20, 2023

A confusion matrix is a table that is used to evaluate the performance of a classification model. It is also known as an error matrix. It is used to measure the accuracy of a model in predicting outcomes. It is a table that shows the number of correct predictions and incorrect predictions that are made by the model.

Structure of a Confusion Matrix

A confusion matrix is a table that is used to evaluate the performance of a classification model. It is also known as an error matrix. It is used to measure the accuracy of a model in predicting outcomes. It is a table that shows the number of correct predictions and incorrect predictions that are made by the model.

\(\)

A confusion matrix is divided into four sections, as shown below:

|                   | Actual True      | Actual False     |
|-------------------|------------------|------------------|
| Predicted True    | True Positive (TP)| False Positive (FP)|
| Predicted False   | False Negative (FN)| True Negative (TN) |
  • Actual True (AT): The actual value of the target variable is True.
  • Actual False (AF): The actual value of the target variable is False.
  • Predicted True (PT): The predicted value of the target variable is True.
  • Predicted False (PF): The predicted value of the target variable is False.

Interpretation of Confusion Matrix

The four quadrants of the confusion matrix have the following meanings:

  • True Positive (TP): The model correctly predicted that the target variable is True.
  • False Positive (FP): The model incorrectly predicted that the target variable is True.
  • False Negative (FN): The model incorrectly predicted that the target variable is False.
  • True Negative (TN): The model correctly predicted that the target variable is False.

The following metrics can be calculated from the confusion matrix:

Accuracy

The accuracy of a model is the proportion of the total number of correct predictions to the total number of predictions made by the model.

$$Accuracy = (TP + TN) / (TP + FP + FN + TN)$$

Sensitivity (Recall)

Sensitivity is the proportion of true positives to the total number of actual positives. It is also known as Recall.

$$Sensitivity = TP / (TP + FN)$$

Specificity

Specificity is the proportion of true negatives to the total number of actual negatives.

$$Specificity = TN / (TN + FP)$$

Precision

Precision is the proportion of true positives to the total number of predicted positives.

$$Precision = TP / (TP + FP)$$

F1-Score

The F1-score is the harmonic mean of precision and recall.

$$F1-score = 2 * (Precision * Recall) / (Precision + Recall)$$

Example

To understand the concept of a confusion matrix, let’s consider the following example:

Suppose a company wants to develop a model to predict whether a customer will buy a product or not. The company collects data on customer age, gender, income, and past purchase history. The target variable is whether the customer bought the product or not.

The company trains a logistic regression model on the data and obtains the following confusion matrix:

|                   | Actual True  | Actual False |
|-------------------|--------------|--------------|
| Predicted True    | 500          | 50           |
| Predicted False   | 100          | 350          |

From the confusion matrix, we can calculate the following metrics:

$$Accuracy = (500 + 350) / (500 + 50 + 100 + 350) = 0.85$$

$$Sensitivity = 500 / (500 + 100) = 0.83$$

$$Specificity = 350 / (350 + 50) = 0.88$$

$$Precision = 500 / (500 + 50) = 0.91$$

$$F1-score = 2 * (0.91 * 0.83) / (0.91 + 0.83) = 0.87$$

From the confusion matrix, we can see that the model correctly predicted that 500 customers will buy the product (TP) and 350 customers will not buy the product (TN). The model incorrectly predicted that 50 customers will buy the product when they did not (FP) and 100 customers will not buy the product when they did (FN).