18 Cochran’s Q test-post-hoc test
Cochran’s Q Test
Cochran’s Q Test (William G. Cochran, 1950) is a non-parametric statistical test used to determine whether there are significant differences in the frequencies of a binary outcome across three or more related groups or conditions. It is an extension of the McNemar test for scenarios involving more than two related groups and is commonly used for repeated measures where the response variable is dichotomous. This test is useful for analyzing data from studies where the same subjects are under different conditions, such as different time points or different treatments.
18.1 Understanding Cochran’s Q Test:
1. Null and Alternative Hypotheses:
- Null Hypothesis (H0): The null hypothesis states that the proportions of the binary outcome are the same across all groups or conditions.
- Alternative Hypothesis (H1): The alternative hypothesis suggests that there is a significant difference in the proportions of the binary outcome across at least one of the conditions.
2. Test Statistic:
- The Cochran’s Q test statistic is based on the number of times each subject has the characteristic of interest across all conditions and the total number of characteristics observed for all subjects across all conditions.
- The test statistic follows a chi-squared distribution with (k - 1) degrees of freedom under the null hypothesis, where (k) is the number of related groups or conditions.
3. Calculation of Test Statistic:
- Let (n) be the total number of subjects, and (k) be the number of conditions. The Cochran’s Q test statistic is calculated by comparing the variance of the total scores across conditions with the variance expected by chance.
- The formula for Cochran’s Q test statistic is: \[ Q = \frac{(k-1)(k\sum{T_j^2} - (\sum{T_j})^2)}{k\sum{t_i} - \sum{T_j^2}} \] where (T_j) is the total number of times the characteristic appears in the (j)-th condition and (t_i) is the total number of times the characteristic appears for the (i)-th subject.
4. Interpretation of Results:
- If the calculated (Q) value is greater than the critical value from the chi-squared distribution with (k - 1) degrees of freedom at the chosen significance level (commonly (= 0.05)), then the null hypothesis is rejected, indicating significant differences across conditions.
18.2 Applications of Cochran’s Q Test:
A. Medical Research:
- In clinical trials, the Cochran’s Q test is used to assess the consistency of treatment effects observed at different time points or under different conditions within the same group of patients.
B. Psychology:
- Psychologists may apply the Cochran’s Q test to evaluate the consistency of binary responses (like success or failure) across repeated measures or different experimental conditions.
C. Quality Control:
- In industrial settings, the Cochran’s Q test can be used to compare the pass/fail rates of products or processes across different shifts or batches.
Considerations:
- Cochran’s Q test assumes that the observations are independent within subjects but not between subjects.
- The test is only applicable to binary (dichotomous) outcomes.
- The Cochran’s Q test may lose power if the sample size is small, and alternative methods should be considered in such cases.
In summary, Cochran’s Q test offers a robust method for analyzing differences in binary outcomes across more than two related groups or conditions. It is especially valuable for repeated measures design where the same subjects are exposed to different conditions, allowing researchers to investigate the consistency of an effect or response across those conditions.
18.3 Example Problem: Cochran’s Q test
A software company wants to test the reliability of three versions of a software application (Version A, Version B, and Version C) under the same conditions. They have 10 testers that each test all three versions for reliability. The outcome is binary: Pass (if the software version works reliably during the test) or Fail (if it does not). The results are as follows:
| Tester | Version A | Version B | Version C |
|---|---|---|---|
| 1 | 1 | 0 | 0 |
| 2 | 1 | 0 | 0 |
| 3 | 1 | 0 | 0 |
| 4 | 1 | 0 | 0 |
| 5 | 1 | 0 | 0 |
| 6 | 1 | 1 | 0 |
| 7 | 1 | 1 | 0 |
| 8 | 1 | 1 | 1 |
| 9 | 1 | 1 | 1 |
| 10 | 0 | 1 | 1 |
(Pass=1, Fail=0)
The company wants to know if there is a significant difference in reliability between the three software versions.
18.3.1 Calculation of Cochran’s Q Test:
-
Calculate the totals for each version (sum across testers):
- \(T_A = 1+1+1+1+1+1+1+1+1+0 = 9\)
- \(T_B = 0+0+0+0+0+1+1+1+1+1 = 5\)
- \(T_C = 0+0+0+0+0+0+0+1+1+1 = 3\)
-
Calculate the totals for each tester (sum across versions):
- \(t_1 = 1 + 0 + 0 = 1\)
- \(t_2 = 1 + 0 + 0 = 1\)
- \(t_3 = 1 + 0 + 0 = 1\)
- \(t_4 = 1 + 0 + 0 = 1\)
- \(t_5 = 1 + 0 + 0 = 1\)
- \(t_6 = 1 + 1 + 0 = 2\)
- \(t_7 = 1 + 1 + 0 = 2\)
- \(t_8 = 1 + 1 + 1 = 3\)
- \(t_9 = 1 + 1 + 1 = 3\)
- \(t_{10} = 0 + 1 + 1 = 2\)
-
Compute the sums needed for the Q statistic:
- \(\sum T_j^2 = 9^2 + 5^2 + 3^2 = 115\)
- \((\sum T_j)^2 = (9+5+3)^2 = 17^2 = 289\)
- \(\sum t_i = 1+1+1+1+1+2+2+3+3+2 = 17\)
- \(\sum t_i^2 = 1^2+1^2+1^2+1^2+1^2+2^2+2^2+3^2+3^2+2^2 = 35\)
-
\(k\sum t_i - \sum t_i^2 = 3 \times 17 - 35 = 16\)
- Number of versions: \(k = 3\)
-
Calculate the Q statistic: \[ Q = (k-1)\;\frac{k\sum T_j^2 - (\sum T_j)^2}{k\sum t_i - \sum t_i^2} \]
Substituting values:
\[ Q = (3-1)\;\frac{3(115) - 289}{16} = 2 \times \frac{345 - 289}{16} = 2 \times \frac{56}{16} = 7 \]
-
Result:
- \(Q = 7\) with \(df = k-1 = 2\)
- Comparing with table value, \(\chi^2_2\), \(p \approx 0.03\).
Interpretation
Since \(p < 0.05\), the Cochran’s Q test indicates a statistically significant difference in reliability among the three software versions.
Conclusion
At least one software version differs in reliability. The company can conclude that the performance of the software versions is not the same, and some versions are more reliable than others.
18.4 Cochran’s Q Test in R and Python
18.5 Post-hoc Tests for Cochran’s Q Test:
After performing Cochran’s Q test, if the result is significant, it implies that there are differences in the binary outcomes across the related groups. However, Cochran’s Q test does not specify which groups differ from each other. To identify the specific groups between which these differences occur, you would perform post-hoc pairwise comparisons.
For Cochran’s Q test, one common approach for post-hoc analysis is to use pairwise comparisons with a Bonferroni correction to adjust for multiple testing.
Let’s consider a hypothetical example involving Cochran’s Q test and its subsequent post-hoc analysis:
18.6 Post-hoc Tests for Cochran’s Q Test in R and Python
Summary
| Concept | Description |
|---|---|
| Foundations | |
| Cochran's Q Test | Non-parametric test for three or more related groups measuring a binary outcome on the same subjects |
| Extension of McNemar | Generalises McNemar's test from two paired conditions to any number of paired conditions |
| Binary Outcome | Applicable only when the outcome takes one of two possible values, such as pass and fail |
| Hypotheses and Formula | |
| Null Hypothesis | All conditions share the same proportion of successes, so any condition is exchangeable with any other |
| Alternative Hypothesis | At least one condition differs in success proportion from the others |
| Q Statistic | Formula based on condition totals and subject totals that follows a chi-squared distribution |
| Degrees of Freedom (k-1) | Reference degrees of freedom equal the number of conditions minus one |
| Condition Totals (T_j) | Column totals representing the number of successes in each condition |
| Subject Totals (t_i) | Row totals representing the number of successful responses for each subject |
| Applications | |
| Medical Research | Compares treatment effects at multiple time points within the same patient cohort |
| Psychology | Tests whether repeated binary responses differ across experimental conditions |
| Quality Control | Compares pass and fail rates across shifts, batches, or machines |
| Social Sciences | Analyses repeated survey outcomes across multiple waves of the same respondents |
| Post-hoc Analysis | |
| Post-hoc Pairwise Tests | Pairwise McNemar tests that identify which conditions differ when the overall Q is significant |
| Bonferroni Correction | Divides the alpha level by the number of comparisons to control the family-wise error rate |
| In R and Python | |
| R friedman.test / cochrans_q | In R, Cochran's Q can be computed via friedman.test or the cochrans_q helper |
| Python scipy.stats.friedmanchisquare | In Python, Cochran's Q is reproduced by scipy.stats.friedmanchisquare on binary data |
| Caveats | |
| Sample Size Sensitivity | The test loses power with small samples, requiring alternative approaches when n is tiny |