April 28, 2023

In statistics, the mean absolute deviation (MAD) is a measure of the average distance between each data point and the mean of the data set. It is a useful tool to understand the variability of a data set, and is commonly used in machine learning algorithms to evaluate the performance of models.

## Calculation

The MAD is calculated by taking the absolute difference between each data point and the mean of the data set, and then taking the average of those differences. Mathematically, the formula for the MAD is:



$$MAD = \frac{1}{n} \sum_{i=1}^{n} |x_i – \bar{x}|$$

where:

• $$MAD$$ is the mean absolute deviation
• $$n$$ is the number of data points in the data set
• $$x_i$$ is the i-th data point
• $$\bar{x}$$ is the mean of the data set

## Example

Let’s say we have the following data set:

 Data Point | Value
------------+-------
1      |   2
2      |   4
3      |   6
4      |   8
5      |  10


To calculate the MAD, we first find the mean of the data set:

$$\bar{x} = \frac{2 + 4 + 6 + 8 + 10}{5} = 6$$

Next, we find the absolute difference between each data point and the mean:

 Data Point | Value | Absolute Difference
------------+-------+---------------------
1      |   2   |          4
2      |   4   |          2
3      |   6   |          0
4      |   8   |          2
5      |  10   |          4


Finally, we take the average of those differences:

$$MAD = \frac{4 + 2 + 0 + 2 + 4}{5} = 2.4$$

Therefore, the mean absolute deviation of this data set is 2.4.

## Use in Machine Learning

In machine learning, the MAD is commonly used to evaluate the performance of regression models. A regression model is a type of machine learning model that is used to predict a continuous output variable based on one or more input variables.

To evaluate the performance of a regression model, we use a metric called the mean absolute error (MAE), which is simply the average of the absolute differences between the predicted output and the actual output. However, the MAE can be difficult to interpret, as it is in the same units as the output variable.

To make the MAE more interpretable, we can divide it by the MAD of the output variable. This gives us a metric called the mean absolute percentage error (MAPE), which is the average of the absolute percentage errors between the predicted output and the actual output. The MAPE is expressed as a percentage, and is independent of the units of the output variable.

## Example

Let’s say we have a regression model that predicts the price of a house based on its size (in square feet). We have a data set of 10 houses, and we want to evaluate the performance of our model.

 House | Size (in sq. ft.) | Price (in thousands of dollars)
-------+-------------------+--------------------------------
1   |       1000        |              200
2   |       1500        |              300
3   |       2000        |              400
4   |       2500        |              500
5   |       3000        |              600
6   |       3500        |              700
7   |       4000        |              800
8   |       4500        |              900
9   |       5000        |             1000
10   |       5500        |             1100


We train our regression model on this data set, and use it to predict the price of each house based on its size. We then calculate the mean absolute error (MAE) between the predicted prices and the actual prices:

 House | Predicted Price | Actual Price | Absolute Error
-------+----------------+--------------+---------------
1   |      170       |     200      |      30
2   |      255       |     300      |      45
3   |      340       |     400      |      60
4   |      425       |     500      |      75
5   |      510       |     600      |      90
6   |      595       |     700      |     105
7   |      680       |     800      |     120
8   |      765       |     900      |     135
9   |      850       |    1000      |     150
10   |      935       |    1100      |     165


The MAE is simply the average of the absolute errors:

$$MAE = \frac{30 + 45 + 60 + 75 + 90 + 105 + 120 + 135 + 150 + 165}{10} = 97.5$$

To make this metric more interpretable, we divide it by the MAD of the actual prices. We first calculate the MAD of the actual prices:

$$MAD = \frac{1}{10} \sum_{i=1}^{10} |x_i – \bar{x}| = \frac{1}{10} \sum_{i=1}^{10} |p_i – 650| = 150$$

where:

• $$MAD$$ is the mean absolute deviation of the actual prices
• $$p_i$$ is the i-th actual price
• $$\bar{x}$$ is the mean of the actual prices

We then calculate the mean absolute percentage error (MAPE):

$$MAPE = \frac{MAE}{MAD} \times 100 = \frac{97.5}{150} \times 100 = 65%$$

Therefore, the mean absolute percentage error of our regression model is 65%. This means that, on average, our predictions are off by 65% of the variability of the actual prices.