11 Measures of Skewness
The shape of a dataset’s distribution is characterized by its skewness and kurtosis, offering insights into the data’s symmetry and peakness.
- Skewness: Indicates the asymmetry of the distribution, with positive skew showing a tail on the right, and negative skew a tail on the left.
- Kurtosis: Measures the “tailedness” of the distribution, with high kurtosis indicating more variance due to rare extreme deviations.
Understanding these measures helps in identifying the symmetry and the peakedness of the distribution, respectively, which are crucial for analyzing the data’s behavior and making informed decisions.
11.1 Skewness
Skewness measures the degree of asymmetry or deviation from symmetry in the distribution of data (Karl Pearson, 1895). A distribution is symmetrical if it looks the same to the left and right of the center point.
- Zero Skewness: Indicates a perfectly symmetrical distribution.
- Positive Skewness: Indicates a distribution with a tail that stretches out more towards the positive side of the scale.
- Negative Skewness: Indicates a distribution with a tail that stretches out more towards the negative side of the scale.
Formula for Skewness: \[ Skewness = \frac{N \sum (X_i - \overline{X})^3}{(N-1)(N-2)S^3} \]
Where:
- \(N\) is the number of observations,
- \(X_i\) is each individual observation,
- \(\overline{X}\) is the mean of the observations,
- \(S\) is the standard deviation.
Skewness measures the asymmetry of a distribution:
- Skewness > 0 → Positively skewed (Right-skewed)
- Skewness = 0 → Symmetric (Normal distribution)
- Skewness < 0 → Negatively skewed (Left-skewed)
11.2 Example problem
Consider a dataset of exam scores: [55, 60, 65, 65, 70, 75, 80]. The distribution of these scores might show slight skewness (positive or negative) depending on how they deviate from the mean. If the data were more concentrated on the lower end (more high scores), the distribution would be positively skewed.
Summary
| Concept | Description |
|---|---|
| Core Idea | |
| Distribution Shape | The way data points are distributed, described by symmetry and peakedness |
| Skewness Concepts | |
| Skewness Definition | The degree of asymmetry of a distribution relative to a symmetric reference |
| Zero Skewness | A distribution is perfectly symmetric, with mean, median, and mode aligning |
| Positive Skewness | The tail extends further to the right, pulled by a few larger values |
| Negative Skewness | The tail extends further to the left, pulled by a few smaller values |
| Computation | |
| Skewness Formula | Sample skewness is N times the sum of cubed deviations, divided by (N-1)(N-2) and by the cube of the standard deviation |
| In R and Python | |
| moments::skewness() | Computes sample skewness of a numeric vector in R, provided by the moments package |
| scipy.stats.skew() | Computes skewness in Python, available through the SciPy statistics module |