RMSLE
May 20, 2023
Root Mean Squared Logarithmic Error (RMSLE) is a metric that is commonly used in machine learning and statistical analysis to measure the difference between predicted and actual values, especially when the target variable is skewed.
Definition
RMSLE is calculated as the root mean squared error of the logarithmic differences between predicted and actual values. This is useful when the range of values of the target variable is large, as it constrains the error to a smaller range of values.
The formula for RMSLE is:
RMSLE = sqrt(1/n * sum(log(1+y_pred) - log(1+y_true))^2)
where n
is the number of samples, y_pred
is the predicted value, and y_true
is the actual value.
Example
Consider a regression problem where we are predicting the price of a house based on its features such as location, square footage, and number of bedrooms. Suppose that the actual price of a house is $1,000,000, but our model predicts $900,000. The logarithmic difference between the predicted and actual price is:
log(1+900000) - log(1+1000000) = -0.0512
If there are n
such predictions, we can calculate RMSLE as follows:
RMSLE = sqrt(1/n * sum((-0.0512)^2))
Interpretation
RMSLE is a measure of accuracy of a model’s predictions. It is commonly used in Kaggle competitions for evaluating the performance of machine learning models. The lower the RMSLE score, the better the model’s performance.
One of the advantages of RMSLE is that it penalizes underestimation more severely than overestimation. This is because the logarithmic difference between two values is larger when the actual value is smaller.
Variations
There are several variations of RMSLE that are used in different contexts. For example, in some cases, the logarithm is taken with respect to the mean of the actual values instead of 1.
Another variation is Root Mean Squared Error (RMSE), which is calculated as the root mean squared difference between predicted and actual values. RMSE is commonly used when the target variable is not skewed.