10  Measures of Dispersion

Measures of dispersion are statistical tools used to describe the spread or variability within a data set. Unlike measures of central tendency (mean, median, mode) that summarize data with a single value representing the center of the data, measures of dispersion give insights into how much the data varies or how “spread out” the data points are. Understanding the variability helps in comprehending the reliability and precision of the central measures. The primary measures of dispersion include the Range, Interquartile Range (IQR), Variance, Standard Deviation, and Absolute Deviation.

10.1 Range

The range is the simplest measure of dispersion and is calculated as the difference between the maximum and minimum values in the data set.

Example: For the data set {1, 2, 4, 7, 9}, the range is \(9 - 1 = 8\).

10.1.1 Interquartile Range (IQR)

The IQR measures the middle spread of the data, essentially covering the central 50% of data points. It is the difference between the 75th percentile (Q3) and the 25th percentile (Q1).

Example: For the data set {1, 2, 4, 7, 9}, where Q1 is 2 and Q3 is 7, the IQR is \(7 - 2 = 5\).

10.2 Variance

Variance measures the average of the squared differences from the Mean. It gives a sense of how much the data points deviate from the mean. The formula for variance differs slightly between samples and populations.

  • Population Variance (\(\sigma^2\)): \(\sigma^2 = \frac{\sum (x_i - \mu)^2}{N}\)
  • Sample Variance (\(s^2\)): \(s^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}\)

Example: For the data set {1, 2, 4, 7, 9}, with a mean of 4.6, the sample variance is calculated as follows:

\[s^2 = \frac{(1-4.6)^2 + (2-4.6)^2 + (4-4.6)^2 + (7-4.6)^2 + (9-4.6)^2}{5-1}\]

\[= \frac{46.8}{4} = 11.7\]

10.2.1 Standard Deviation

The standard deviation is the square root of the variance and provides a measure of dispersion in the same units as the data. It is one of the most commonly used measures of dispersion because it is easily interpreted.

  • Population Standard Deviation (\(\sigma\)): \(\sigma = \sqrt{\sigma^2}\)
  • Sample Standard Deviation (\(s\)): \(s = \sqrt{s^2}\)

Example: Continuing from the variance example, the sample standard deviation of {1, 2, 4, 7, 9} is \(\sqrt{11.7} \approx 3.42\).

10.2.2 Absolute Deviation / Mean Absolute Deviation (MAD)

Absolute deviation measures the average distance between each data point and the mean, ignoring the direction (positive or negative). It is a robust measure of variability.

Example: For the data set {1, 2, 4, 7, 9} with a mean of 4.6, the MAD is calculated as follows:

\[MAD = \frac{|1-4.6| + |2-4.6| + |4-4.6| + |7-4.6| + |9-4.6|}{5}\]

\[= \frac{13.2}{5} = 2.64\]

10.3 Importance of Measures of dispersion

Measures of dispersion are crucial in statistical analysis for understanding the variability within a data set. They complement measures of central tendency by providing a fuller picture of the data’s distribution. The choice of which measure to use depends on the data characteristics and the analysis’s objectives. Variance and standard deviation are particularly useful in many statistical analyses, including statistical modeling and hypothesis testing, while the range and IQR provide quick insights into data spread. The MAD offers a robust alternative less affected by outliers.

Summary

Concept Description
Core Idea
Dispersion The collection of statistics that describe how varied or spread out the values in a dataset are
Range-Based Measures
Range The difference between the largest and smallest value, a quick but outlier-sensitive spread indicator
Interquartile Range (IQR) The distance between the 75th and 25th percentiles, covering the middle 50 percent of the data
Variance and Standard Deviation
Population Variance Average squared deviation from the population mean, computed by dividing by N
Sample Variance Average squared deviation from the sample mean, computed by dividing by n minus 1 to correct bias
Population Standard Deviation Square root of the population variance, expressed in the original units of measurement
Sample Standard Deviation Square root of the sample variance, the most widely reported spread statistic in business analytics
Robust Measure
Mean Absolute Deviation (MAD) The average absolute distance of each observation from the mean, a robust alternative to standard deviation
In R
max() and min() Return the maximum and minimum of a numeric vector, used to compute the range
IQR() Built-in R function that returns the interquartile range of a numeric vector
var() Built-in R function that returns the sample variance using the n minus 1 denominator
sd() Built-in R function that returns the sample standard deviation
mean(abs(x - mean(x))) Idiomatic R expression for computing the mean absolute deviation from the mean