Variance Explanation Ratio

May 20, 2023

The Variance Explanation Ratio (VER) is a statistical measure commonly used in Machine Learning to evaluate the performance of regression models. It measures the proportion of variance in the dependent variable that can be explained by the independent variable(s). The VER can help us understand how well our model fits the data and how much of the variation in the data is due to the independent variable(s).

Definition

The Variance Explanation Ratio (VER) is defined as the proportion of the total variance in the dependent variable that is explained by the independent variable(s). Mathematically, it can be expressed as:



$$VER = \frac{SS_{reg}}{SS_{tot}}$$

Where SS_{reg} is the sum of squares due to regression and SS_{tot} is the total sum of squares. The sum of squares due to regression is the amount of variation in the dependent variable that is explained by the independent variable(s), while the total sum of squares is the amount of variation in the dependent variable that is not explained by the independent variable(s).

The VER can take on values between 0 and 1, with higher values indicating that the independent variable(s) explain a larger proportion of the variation in the dependent variable.

Calculation

To calculate the VER, we first need to fit a regression model to our data. Let’s consider a simple example where we want to predict a person’s weight based on their height. We collect data on the heights and weights of 10 individuals, and fit a linear regression model to the data:

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

# create example data
height = np.array([1.6, 1.7, 1.8, 1.7, 1.6, 1.9, 1.8, 1.7, 1.6, 1.8])
weight = np.array([60, 70, 80, 70, 65, 90, 80, 70, 65, 75])

# fit linear regression model
model = LinearRegression().fit(height.reshape(-1, 1), weight)

# calculate VER
y_pred = model.predict(height.reshape(-1, 1))
SS_reg = np.sum((y_pred - np.mean(weight)) ** 2)
SS_tot = np.sum((weight - np.mean(weight)) ** 2)
VER = SS_reg / SS_tot
print(f"VER: {VER:.2f}")

Running this code will output the VER for our model:

VER: 0.79

This tells us that the height variable explains 79% of the variation in the weight variable.

Interpretation

The VER provides a measure of how well our regression model fits the data. A high VER indicates that our model is doing a good job of explaining the variation in the dependent variable, while a low VER indicates that there may be other variables that are important in explaining the variation.

In the example above, our model has a VER of 0.79, which is relatively high. This suggests that height is a good predictor of weight, and that our model is doing a good job of capturing the relationship between the two variables.

Use in Machine Learning

The VER is a useful tool in evaluating the performance of machine learning models. In the context of regression models, it can help us determine how well our model fits the data and identify which independent variable(s) are most important in explaining the variation in the dependent variable.

However, it is important to note that the VER is just one of many measures that can be used to evaluate the performance of a machine learning model. Other measures, such as the coefficient of determination (R^2), may provide additional insights into the performance of the model.