29 Post Hoc Tests for ANOVA

Overview

ANOVA (Analysis of Variance) is a statistical method used to test differences between two or more group means. When an ANOVA indicates significant differences, it does not tell which specific groups differ. Post hoc tests are used to conduct pairwise comparisons between group means after a significant ANOVA result.

Purpose

The main purpose of post hoc tests is to control the type I error rate that increases with multiple comparisons. These tests apply various corrections to maintain a more accurate level of statistical significance.

Why Conduct Post-hoc Tests:

Identify Differences: If the ANOVA results are significant, it only tells us that at least one group mean is different from the others. However, it doesn’t specify which groups are different from which. Post-hoc tests are conducted to pinpoint exactly which pairs of groups differ.

Control Type I Error: When making multiple comparisons, the chance of committing a Type I error (false positive) increases. Post-hoc tests adjust for this multiple comparison problem to maintain the overall Type I error rate at the desired level.

29.1 Common Post Hoc Tests

Tukey’s Honest Significant Difference (HSD) (John W. Tukey, 1949): This is one of the most popular post hoc tests when all groups have equal sample sizes. It controls the family-wise error rate and is robust across a range of scenarios.
Bonferroni Correction: This method adjusts the p-value threshold by dividing it by the number of comparisons. It is very conservative, reducing the power to detect differences when numerous comparisons are made.
Scheffé’s Test: Another conservative test, Scheffé’s test is particularly useful when exploring all possible contrasts among group means, not just pairwise comparisons.
Dunnett’s Test: This test compares a control group against all other groups and is useful in clinical trials.
Fisher’s Least Significant Difference (LSD): This test does not adjust for multiple comparisons, so it has higher power but also a higher risk of type I errors.

29.2 Example: Tukey’s HSD Test in a One-Way ANOVA

Scenario

Suppose a botanist wants to compare the growth of plant species in different fertilizers. They have four types of fertilizers and measure growth (in cm) after a set period.

Data

Fertilizer A: 15, 14, 16, 14, 15
Fertilizer B: 22, 20, 21, 22, 21
Fertilizer C: 28, 25, 27, 30, 29
Fertilizer D: 15, 13, 14, 15, 14

29.3 Post Hoc Tests for one way ANOVA using R and Python

First, we conduct a one-way ANOVA to see if there are significant differences among the means of these groups.

Python

To interpret the results from Tukey’s Honest Significant Difference (HSD) test for multiple comparisons of means as you’ve provided, let’s break down each component of the output:

29.3.1 Output Breakdown

diff: The difference in means between the groups being compared.
lwr: The lower bound of the 95% confidence interval for the mean difference.
upr: The upper bound of the 95% confidence interval for the mean difference.
p adj: The p-value adjusted for multiple comparisons.

Interpretation

This test will compare every pair of fertilizers. If the differences in their means are greater than a certain critical value, Tukey’s test will indicate these differences as significant. The output will include confidence intervals for each difference and a p-value for each comparison.

Practical Application

The botanist can use the results to determine which fertilizers significantly enhance growth compared to others, guiding future experimental designs or agricultural practices.

Post hoc tests are crucial for making informed decisions after an ANOVA. They help identify specific differences between groups, allowing researchers to understand deeper nuances beyond the initial ANOVA results. When selecting a post hoc test, consider the balance between controlling type I errors and maintaining statistical power, based on the study design and objectives.

Summary

Concept	Description
Foundations
Post Hoc Tests	Pairwise comparisons performed after a significant ANOVA to locate the source of differences
Why They Follow ANOVA	ANOVA only signals that some difference exists, so post hoc tests identify which specific groups differ
Family-Wise Error Rate	The probability of making at least one Type I error across a family of comparisons
Type I Error Inflation	Running many uncorrected pairwise tests inflates the chance of at least one false positive
Common Tests
Tukey's HSD	Compares all pairs of group means while controlling family-wise error, ideal with equal sample sizes
Bonferroni Correction	Divides alpha by the number of comparisons, very conservative and loses power when many tests are run
Scheffé's Test	A conservative test that supports any contrast among group means, not just pairwise comparisons
Dunnett's Test	Compares each group against a single control group, common in clinical trials
Fisher's LSD	Pairwise t-tests with no correction, more powerful but with inflated false-positive risk
Reading the Output
Mean Difference (diff)	The estimated difference in means between the two groups being compared
Confidence Interval (lwr, upr)	The lower and upper bounds of the 95 percent confidence interval for the mean difference
Adjusted p-value (p adj)	The p-value after adjustment for multiple comparisons, used to judge significance
Choosing a Test
Equal vs Unequal Sample Sizes	Choice of test depends on whether the groups have equal sample sizes and on the study design
Power vs Conservatism Trade-off	More conservative tests reduce false positives but lower the chance of detecting true differences
In R and Python
TukeyHSD() in R	Use TukeyHSD(aov_object) after fitting an ANOVA model with aov() in R
pairwise_tukeyhsd() in Python	Use statsmodels.stats.multicomp.pairwise_tukeyhsd(values, groups, alpha) in Python