May 20, 2023

In machine learning, evaluating the performance of a classification model is crucial to determining whether it is able to accurately predict the outcomes of interest.

One commonly used performance metric is the area under the receiver operating characteristic curve (ROC-AUC), which measures the ability of a model to distinguish between positive and negative classes. However, in some scenarios, the focus may be on the precision of the positive class predictions rather than the ability to distinguish between classes. In these cases, the area under the precision-recall curve (PR-AUC) may be a more appropriate metric.

What is PR-AUC?

PR-AUC stands for Precision-Recall Area Under Curve. It is a performance metric used to evaluate the quality of binary classification models, particularly when the focus is on positive class prediction accuracy. This metric is calculated by plotting the precision (ratio of true positive predictions to all positive predictions) against the recall (ratio of true positive predictions to all actual positive instances) at various classification thresholds, and calculating the area under the resulting curve.

PR-AUC: 0.85
  1.00-| * * * * * * * * * *
       |*                   *
  0.75-|*                    *
       | *                    *
  0.50-|  *                   *
       |   *                  *
  0.25-|    * * * * * * * * * *
       0.00   0.25   0.50   0.75   1.00

The PR-AUC curve shows the relationship between precision and recall at various classification thresholds. A model with perfect precision and recall would have a PR-AUC of 1.0, while a model with random predictions would have a PR-AUC of 0.5.

Why Use PR-AUC?

While ROC-AUC is a widely used performance metric for binary classification models, it may not always be appropriate in scenarios where the focus is on positive class prediction accuracy. This is because ROC-AUC is based on the true positive rate and false positive rate, which can be influenced by imbalanced class distributions or varying cost ratios of false positive and false negative predictions.

PR-AUC, on the other hand, focuses on the precision and recall of the positive class predictions, which can be particularly useful in scenarios where the cost of false positives is high (e.g. in medical diagnoses). In addition, PR-AUC is less susceptible to class imbalance than ROC-AUC, making it a more reliable measure of classification performance in imbalanced datasets.

How to Calculate PR-AUC

Given a set of predicted binary class probabilities and their corresponding true labels, PR-AUC can be calculated using the following steps:

  1. Rank the predicted probabilities from highest to lowest.
  2. For each probability threshold, calculate the precision and recall by comparing the predicted labels to the true labels.
  3. Plot the precision-recall pairs as a curve.
  4. Calculate the area under the curve using numerical integration methods.

Example Usage

To illustrate the usage of PR-AUC, consider a binary classification task where the goal is to predict whether a patient has a certain medical condition based on various clinical features. Suppose we have developed a logistic regression model that outputs the probability of the positive class (having the medical condition) given a set of features. We can evaluate the performance of this model using PR-AUC as follows:

from sklearn.metrics import precision_recall_curve, auc

# Generate example predicted probabilities and true labels
y_true = [0, 1, 1, 0, 1, 0, 1, 0, 1, 1]
y_prob = [0.2, 0.6, 0.8, 0.3, 0.7, 0.4, 0.9, 0.1, 0.5, 0.75]

# Calculate precision-recall pairs and PR-AUC
precision, recall, thresholds = precision_recall_curve(y_true, y_prob)
pr_auc = auc(recall, precision)

In this example, the precision_recall_curve function from sklearn.metrics is used to calculate the precision and recall pairs at various probability thresholds, and the auc function is used to calculate the area under the resulting curve. The resulting PR-AUC value indicates the performance of the model in terms of precision-recall tradeoff.