Central Absolute Deviation
May 20, 2023
Central absolute deviation (CAD) is a measure of the dispersion or variability of a set of data. It is a robust alternative to standard deviation that is less sensitive to outliers or extreme values in the data. Central absolute deviation is also known as median absolute deviation (MAD), as it is calculated based on the median value of the data.
The central absolute deviation of a set of n data points x1, x2, …, xn is defined as follows:\(\)
$$CAD = median(|xi – median(x)|)$$
where | | denotes absolute value and median(x) is the median value of the data.
The central absolute deviation measures the typical distance between each data point and the median value of the data. It is a useful measure of dispersion when the data contains outliers or extreme values that would significantly affect the standard deviation. The CAD is also useful when the data is not normally distributed, as it does not assume a particular shape of the distribution.
Suppose we have the following dataset of 7 numbers:
2, 3, 4, 5, 6, 10, 20
The median value of this dataset is 5, and the absolute deviations from the median are:
|2 - 5| = 3 |3 - 5| = 2 |4 - 5| = 1 |5 - 5| = 0 |6 - 5| = 1 |10 - 5| = 5 |20 - 5| = 15
The median of these absolute deviations is 1, so the central absolute deviation of the dataset is 1.
Advantages and Disadvantages
The central absolute deviation has several advantages over the standard deviation:
- It is a more robust measure of dispersion that is less sensitive to outliers or extreme values in the data.
- It does not require assumptions about the shape of the distribution of the data.
- It is relatively easy to calculate and interpret.
However, the central absolute deviation also has some disadvantages:
- It is less commonly used than the standard deviation, so it may be less familiar to some users.
- It may not be as efficient as the standard deviation for large datasets, as it requires calculating the median value.
Comparison with Standard Deviation
The standard deviation is another measure of dispersion that is commonly used in statistics and machine learning. It is defined as the square root of the variance, which is the average of the squared deviations from the mean value of the data.
The formula for the sample standard deviation is:
$$s = sqrt(sum((xi – x)^2)/(n-1))$$
where x is the sample mean of the data, and n is the sample size.
The standard deviation has some advantages over the central absolute deviation:
- It is a more efficient estimator of the population standard deviation, especially for large datasets.
- It has well-established statistical properties, such as being normally distributed under certain conditions.
However, the standard deviation also has some disadvantages:
- It is sensitive to outliers or extreme values in the data, which can significantly affect its value.
- It assumes that the data is normally distributed, which may not be the case in practice.
- It can be difficult to interpret, especially for non-normal distributions.