34  Mann-Whitney U Test

The Mann-Whitney U test (Henry B. Mann & Donald R. Whitney, 1947), also known as the Wilcoxon rank-sum test, is a non-parametric test used to compare differences between two independent groups when the assumption of a normally distributed data cannot be assumed. It is often used as an alternative to the independent samples t-test when data are not normally distributed.

Purpose

The Mann-Whitney U Test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. It is the non-parametric alternative to the independent two-sample t-test.

How it Works

  • The test works by ranking all the values from both groups together. The ranks are then used to calculate the U statistic (a measure of the number of times a score from one group precedes a score from the other group).

  • The test essentially assesses whether one group tends to have higher or lower values than the other, without assuming a specific distribution of the scores.

34.1 Assumptions

The Mann-Whitney U test is based on the following assumptions:

  1. Independence of Samples: The samples from the two groups must be independent of each other.
  2. Ordinal Data: The data do not need to be normally distributed, but should be ordinal or continuous.
  3. Similarity of Shape: The distributions of the two groups should have the same shape, allowing for a difference in medians.

34.1.1 Hypotheses

The hypotheses for the Mann-Whitney U test are framed as follows:

  • Null Hypothesis (H₀): There is no difference in the medians of the two groups.
  • Alternative Hypothesis (H₁): There is a difference in the medians of the two groups.

34.2 Formula

The U statistic is calculated by first ranking all the data from both groups together. Each data point gets a rank, and the ranks for each group are summed. The U statistic is then computed using these rank sums. The formula for U is:

\[ U = n_1n_2 + \frac{n_1(n_1+1)}{2} - R_1 \]

Where:

  • (n_1) and (n_2) are the sample sizes of the two groups.
  • (R_1) is the sum of the ranks in the first group.

34.2.1 Calculation Steps

  1. Combine all observations from both groups into a single dataset.
  2. Rank all observations from the lowest to the highest, handling ties by assigning to each tied value the average of the ranks they would have otherwise occupied.
  3. Calculate the sum of ranks for each group.
  4. Use the sum of ranks to compute the U statistic for each group.

34.2.2 Interpretation

The smaller U value is used for the test statistic. This value is then compared to a critical value from the Mann-Whitney U distribution table (or calculated using an approximation for large samples). If the calculated U is less than the critical value from the table, or if the p-value is less than the chosen alpha level, the null hypothesis is rejected, indicating a significant difference between the groups.

34.3 Example Problem

Consider two groups of patients treated with different methods to reduce symptoms. Group A consists of 6 patients and Group B consists of 6 patients. Their scores are:

  • Group A: 120, 101, 130, 115, 100, 130
  • Group B: 85, 90, 110, 115, 120, 125

Hypotheses:

  • Null Hypothesis (H₀): The median symptom reduction is equal between both treatments.
  • Alternative Hypothesis (H₁): The median symptom reduction differs between the treatments.

34.3.1 Mann-Whitney U Test using Excel:

📥 Stats Basics (Excel)

34.4 Mann-Whitney U Test using R and Python

NOTE:

  • In R, the Mann-Whitney U test is known as the Wilcoxon rank sum test when it’s applied to two independent samples, and it is indeed performed using the wilcox.test function. This naming might cause some confusion, but they are essentially the same test.

This test allows researchers and analysts to assess the evidence against the null hypothesis in a manner that is robust to non-normal data distributions.


Summary

Concept Description
Foundations
Mann-Whitney U Test A non-parametric test that compares two independent groups based on the ranks of their values
Also Known As Wilcoxon Rank-Sum Equivalent to the Wilcoxon rank-sum test, named after both authors of the underlying procedure
Non-parametric Counterpart of t-test Used in place of the independent-samples t-test when normality cannot be assumed
Assumptions
Independence of Samples The two groups must be independent of each other
Ordinal or Continuous Outcome The outcome must be at least ordinal so the values from both groups can be ranked together
Similarity of Distribution Shape The two groups should have similarly shaped distributions, allowing for a difference in medians
Hypotheses
Null Hypothesis States that the two populations have the same median (or, more strictly, the same distribution)
Alternative Hypothesis States that one population tends to produce larger values than the other
Computation
Combined Ranking All observations from both groups are pooled and ranked from smallest to largest
Sum of Ranks Ranks are added separately within each group; rank sums drive the test statistic
U Statistic Formula U = n1 times n2 plus n1 times (n1 plus 1) over 2 minus the sum of ranks in group 1
Handling Ties Tied values are assigned the average of the ranks they would have occupied
Decision
Decision Rule Reject H0 when U is small enough or when the corresponding p-value is below alpha
In R and Python
R via wilcox.test() Use wilcox.test(x, y) on two vectors in R; the function defaults to the two-sample form
Python via mannwhitneyu() Use scipy.stats.mannwhitneyu(x, y, alternative='two-sided') in Python