30 Repeated Measures ANOVA

Repeated Measures ANOVA (Analysis of Variance) is a statistical technique used to compare means across three or more group measurements taken from the same subjects. This type of ANOVA is particularly useful when dealing with correlated group data, such as measurements taken over time from the same subjects, or under different conditions. By accounting for subject variability, it provides a more powerful means of detecting differences than separate independent tests.

Purpose

The primary purpose of a Repeated Measures ANOVA is to determine if there are significant differences between multiple measurements (or conditions) taken from the same subjects. It is often used in experiments where the same subjects are subjected to different treatments or conditions over time.

30.1 Assumptions

Repeated Measures ANOVA relies on several key assumptions:

Sphericity: The variances of the differences between all combinations of related groups (levels) must be equal. This assumption can be tested with Mauchly’s test of sphericity.
Normality: The differences between treatment levels for each subject should be normally distributed.
Independence: Although observations within groups may be related, groups must be independent of each other.

30.2 Hypotheses

The hypotheses for a Repeated Measures ANOVA are as follows:

Null Hypothesis (H₀): There are no differences in the means across the measurements; any observed differences are due to random variation.
Alternative Hypothesis (H₁): There are significant differences in the means across the measurements.

30.3 Calculations

The analysis involves several key calculations:

Within-Subjects Sum of Squares (SSW): Measures the variability within subjects over the different time points or conditions.
Between-Subjects Sum of Squares (SSB): Measures the variability between subjects.
Total Sum of Squares (SST): The aggregate of SSW and SSB.
Degrees of Freedom (DF): Calculated separately for within-subjects and between-subjects effects.
F-statistic: The ratio of the mean squares between conditions to the mean squares within subjects, following an F-distribution under the null hypothesis.

30.4 Interpretation

The result of a Repeated Measures ANOVA is typically reported as an F-statistic and its corresponding p-value. This helps to determine if the observed differences across the repeated measures are statistically significant:

If the F-statistic is larger than the critical value (or the p-value is smaller than the significance level, typically 0.05), it suggests significant differences across the repeated measures, leading to the rejection of the null hypothesis.
If the F-statistic is smaller than the critical value, the null hypothesis is not rejected, suggesting that the differences across measures are not statistically significant.

For the example problem involving the effectiveness of a new drug on heart rate recovery at different time points post-exercise, we provide a complete dataset below and illustrate how to analyze it using Repeated Measures ANOVA in R and Python.

30.5 Repeated Measures ANOVA Example problem

Let’s assume there are 10 subjects in the study, and each subject’s heart rate is recorded at four time points: immediately after exercise (T1), 1 minute after (T2), 3 minutes after (T3), and 5 minutes after (T4).

Subject	T1	T2	T3	T4
1	120	110	100	90
2	130	120	110	100
3	135	125	115	105
4	140	130	120	110
5	125	115	105	95
6	118	108	98	88
7	123	113	103	93
8	128	118	108	98
9	133	123	113	103
10	138	128	118	108

30.6 Repeated Measures ANOVA using R and Python

We’ll prepare the data and run a Repeated Measures ANOVA — in R using the afex package and in Python using statsmodels.

Python

The output from the Repeated Measures ANOVA provides the following statistical values related to the effect of time on heart rate across multiple measurements:

Output Details:

F Value: 0.6293
Num DF (Numerator Degrees of Freedom): 3.0000
Den DF (Denominator Degrees of Freedom): 27.0000
Pr > F (p-value): 0.6024

Interpretation:

F Value: The F-value of 0.6293 is relatively low, suggesting that the variability between the mean heart rates at different times is not substantially greater than the variability within each time group.
p-value: The p-value is 0.6024, which is significantly greater than the typical alpha level of 0.05 used to determine statistical significance.
Statistical Significance: With a p-value of 0.6024, we fail to reject the null hypothesis. This indicates that there is no statistically significant difference in heart rate measurements across the four time points (T1, T2, T3, T4). In other words, the changes in heart rate over time are not greater than what might be expected by random chance.

Summary

Concept	Description
Foundations
Repeated Measures ANOVA	An ANOVA for designs where each subject is measured under three or more conditions or time points
Same Subjects Across Conditions	The same units provide all measurements, creating dependencies that the analysis explicitly models
Power Advantage Over Independent ANOVA	Modelling subject effects removes between-subject variability, increasing sensitivity
Assumptions
Sphericity	The variances of all pairwise differences between conditions should be equal
Mauchly's Test of Sphericity	A formal test of the sphericity assumption, run before trusting the unadjusted F-statistic
Normality of Differences	The differences between paired conditions should be approximately normally distributed
Independence of Subjects	Subjects must be independent of one another, even though their repeated measurements are not
Hypotheses
Null Hypothesis	States that the means at every condition or time point are equal
Alternative Hypothesis	States that at least one of the within-subject means differs from the others
Computation
Within-Subjects SS	Variability within each subject across the repeated conditions, the signal of interest
Between-Subjects SS	Variability between subjects, partitioned out so it does not inflate the error term
F-Statistic	Ratio of mean squares for the within-subject factor to the residual mean square
Greenhouse-Geisser Correction	Adjusts the degrees of freedom downward when sphericity is violated, preserving Type I error control
Decision
Decision Rule	Reject H0 when F exceeds the critical value or when p (corrected if needed) is below alpha
In R and Python
R via aov_ez() (afex)	Use aov_ez('Subject', 'y', data, within = 'Time') from the afex package to fit the model in R
Python via AnovaRM (statsmodels)	Use AnovaRM(data, 'y', 'Subject', within = ['Time']).fit() from statsmodels in Python