PRAUC
May 20, 2023
In machine learning, evaluating the performance of a classification model is crucial to determining whether it is able to accurately predict the outcomes of interest.
One commonly used performance metric is the area under the receiver operating characteristic curve (ROCAUC), which measures the ability of a model to distinguish between positive and negative classes. However, in some scenarios, the focus may be on the precision of the positive class predictions rather than the ability to distinguish between classes. In these cases, the area under the precisionrecall curve (PRAUC) may be a more appropriate metric.
What is PRAUC?
PRAUC stands for PrecisionRecall Area Under Curve. It is a performance metric used to evaluate the quality of binary classification models, particularly when the focus is on positive class prediction accuracy. This metric is calculated by plotting the precision (ratio of true positive predictions to all positive predictions) against the recall (ratio of true positive predictions to all actual positive instances) at various classification thresholds, and calculating the area under the resulting curve.
PRAUC: 0.85

1.00 * * * * * * * * * *
* *
0.75* *
 * *
0.50 * *
 * *
0.25 * * * * * * * * * *

0.00+
0.00 0.25 0.50 0.75 1.00
Recall
The PRAUC curve shows the relationship between precision and recall at various classification thresholds. A model with perfect precision and recall would have a PRAUC of 1.0, while a model with random predictions would have a PRAUC of 0.5.
Why Use PRAUC?
While ROCAUC is a widely used performance metric for binary classification models, it may not always be appropriate in scenarios where the focus is on positive class prediction accuracy. This is because ROCAUC is based on the true positive rate and false positive rate, which can be influenced by imbalanced class distributions or varying cost ratios of false positive and false negative predictions.
PRAUC, on the other hand, focuses on the precision and recall of the positive class predictions, which can be particularly useful in scenarios where the cost of false positives is high (e.g. in medical diagnoses). In addition, PRAUC is less susceptible to class imbalance than ROCAUC, making it a more reliable measure of classification performance in imbalanced datasets.
How to Calculate PRAUC
Given a set of predicted binary class probabilities and their corresponding true labels, PRAUC can be calculated using the following steps:
 Rank the predicted probabilities from highest to lowest.
 For each probability threshold, calculate the precision and recall by comparing the predicted labels to the true labels.
 Plot the precisionrecall pairs as a curve.
 Calculate the area under the curve using numerical integration methods.
Example Usage
To illustrate the usage of PRAUC, consider a binary classification task where the goal is to predict whether a patient has a certain medical condition based on various clinical features. Suppose we have developed a logistic regression model that outputs the probability of the positive class (having the medical condition) given a set of features. We can evaluate the performance of this model using PRAUC as follows:
from sklearn.metrics import precision_recall_curve, auc
# Generate example predicted probabilities and true labels
y_true = [0, 1, 1, 0, 1, 0, 1, 0, 1, 1]
y_prob = [0.2, 0.6, 0.8, 0.3, 0.7, 0.4, 0.9, 0.1, 0.5, 0.75]
# Calculate precisionrecall pairs and PRAUC
precision, recall, thresholds = precision_recall_curve(y_true, y_prob)
pr_auc = auc(recall, precision)
In this example, the precision_recall_curve
function from sklearn.metrics
is used to calculate the precision and recall pairs at various probability thresholds, and the auc
function is used to calculate the area under the resulting curve. The resulting PRAUC value indicates the performance of the model in terms of precisionrecall tradeoff.