RMSLE

May 20, 2023

Root Mean Squared Logarithmic Error (RMSLE) is a metric that is commonly used in machine learning and statistical analysis to measure the difference between predicted and actual values, especially when the target variable is skewed.

Definition

RMSLE is calculated as the root mean squared error of the logarithmic differences between predicted and actual values. This is useful when the range of values of the target variable is large, as it constrains the error to a smaller range of values.

The formula for RMSLE is:

RMSLE = sqrt(1/n * sum(log(1+y_pred) - log(1+y_true))^2)

where n is the number of samples, y_pred is the predicted value, and y_true is the actual value.

Example

Consider a regression problem where we are predicting the price of a house based on its features such as location, square footage, and number of bedrooms. Suppose that the actual price of a house is $1,000,000, but our model predicts $900,000. The logarithmic difference between the predicted and actual price is:

log(1+900000) - log(1+1000000) = -0.0512

If there are n such predictions, we can calculate RMSLE as follows:

RMSLE = sqrt(1/n * sum((-0.0512)^2))

Interpretation

RMSLE is a measure of accuracy of a model’s predictions. It is commonly used in Kaggle competitions for evaluating the performance of machine learning models. The lower the RMSLE score, the better the model’s performance.

One of the advantages of RMSLE is that it penalizes underestimation more severely than overestimation. This is because the logarithmic difference between two values is larger when the actual value is smaller.

Variations

There are several variations of RMSLE that are used in different contexts. For example, in some cases, the logarithm is taken with respect to the mean of the actual values instead of 1.

Another variation is Root Mean Squared Error (RMSE), which is calculated as the root mean squared difference between predicted and actual values. RMSE is commonly used when the target variable is not skewed.