27  One way Anova

One-way ANOVA (Analysis of Variance) is a statistical technique used to compare the means of three or more independent (unrelated) groups to determine if there are any statistically significant differences between the mean scores of these groups. It extends the t-test for comparing more than two groups, providing a way to handle complex comparisons without increasing the risk of committing Type I errors (incorrectly rejecting the null hypothesis).

Purpose

The primary purpose of a one-way ANOVA is to test if at least one group mean is different from the others, which suggests that at least one treatment or condition has an effect that is not common to all groups.

27.1 Assumptions

One-way ANOVA makes several key assumptions:

  1. Independence of Observations: Each group’s observations must be independent of the observations in other groups.
  2. Normality: Data in each group should be approximately normally distributed.
  3. Homogeneity of Variances: All groups must have the same variance, often assessed by Levene’s Test of Equality of Variances.

27.2 Hypotheses

The hypotheses for a one-way ANOVA are formulated as:

  • Null Hypothesis (H₀): The means of all groups are equal, implying no effect of the independent variable on the dependent variable across the groups.
  • Alternative Hypothesis (H₁): At least one group mean is different from the others, suggesting an effect of the independent variable.

27.3 Calculations

The analysis involves several key calculations:

  • Total Sum of Squares (SST): Measures the total variability in the dependent variable.
  • Sum of Squares Between (SSB): Reflects the variability due to the interaction between the groups.
  • Sum of Squares Within (SSW): Captures the variability within each group.
  • Degrees of Freedom (DF): Varies for each sum of squares; DF between = \(k - 1\) (where \(k\) is the number of groups) and DF within = \(N - k\) (where \(N\) is the total number of observations).
  • Mean Squares: Each sum of squares is divided by its respective degrees of freedom to obtain mean squares (MSB and MSW).
  • F-statistic: The ratio of MSB to MSW, which follows an F-distribution under the null hypothesis.

27.4 Interpretation

The result of a one-way ANOVA is typically reported as an F-statistic and its corresponding p-value. The F-statistic determines whether the observed variances between means are large enough to be considered statistically significant:

  • If the F-statistic is larger than the critical value (or if the p-value is less than the significance level, typically 0.05), the null hypothesis is rejected, indicating significant differences among the means.
  • If the F-statistic is smaller than the critical value, the null hypothesis is not rejected, suggesting no significant difference among the group means.

27.5 One way Anova Example Problem

A company wants to know the impact of three different selection methods on the employee performance. The HR analyst chose 15 employees at random and collected the data of sales volume reached by each employee. Out of 15 employees, 5 employees were taken from each of the selection methods. The data obtained are given below.

No. Emp Referral Job Portals Consultancy
1 11 17 15
2 15 18 16
3 18 21 18
4 19 22 19
5 22 27 22

At the 0.05 level of significance, do the selection methods have different effects on the performance of employees?

Calculations:

To perform a one-way ANOVA test to see if there are significant differences in the performance of employees based on their selection method (Emp Referral, Job Portals, Consultancy), we need to calculate several components including the group means, the overall mean, the sum of squares between groups (SSB), the sum of squares within groups (SSW), and the total sum of squares (SST). Additionally, we’ll calculate the F-statistic and compare it to the critical F-value from an F-distribution table.

Data Organization:

Group A (Emp Referral): \([11, 15, 18, 19, 22]\) Group B (Job Portals): \([17, 18, 21, 22, 27]\) Group C (Consultancy): \([15, 16, 18, 19, 22]\)

Calculate the Means for Each Group:

\[ \bar{x}_A = \frac{11 + 15 + 18 + 19 + 22}{5} = 17 \] \[ \bar{x}_B = \frac{17 + 18 + 21 + 22 + 27}{5} = 21 \] \[ \bar{x}_C = \frac{15 + 16 + 18 + 19 + 22}{5} = 18 \]

Calculate the Overall Mean:

\[ \bar{x} = \frac{11 + 15 + 18 + 19 + 22 + 17 + 18 + 21 + 22 + 27 + 15 + 16 + 18 + 19 + 22}{15} = 18.667 \]

Calculate Sum of Squares Between Groups (SSB):

\[ SSB = 5[(\bar{x}_A - \bar{x})^2 + (\bar{x}_B - \bar{x})^2 + (\bar{x}_C - \bar{x})^2] \] \[ = 5[(17 - 18.667)^2 + (21 - 18.667)^2 + (18 - 18.667)^2] \] \[ = 5[(-1.667)^2 + (2.333)^2 + (-0.667)^2] \] \[ = 5[2.778 + 5.444 + 0.444] = 5 \times 8.667 \] \[= 43.333 \]

Calculate Sum of Squares Within Groups (SSW):

\[ SSW = \sum_{i=1}^{5} (x_{Ai} - \bar{x}_A)^2 + \sum_{i=1}^{5} (x_{Bi} - \bar{x}_B)^2 + \sum_{i=1}^{5} (x_{Ci} - \bar{x}_C)^2 \] \[ = [(11-17)^2 + (15-17)^2 + (18-17)^2 + (19-17)^2 + (22-17)^2] \] \[ \;\;\;\; + [(17-21)^2 + (18-21)^2 + (21-21)^2 + (22-21)^2 + (27-21)^2] \] \[ \;\;\;\; + [(15-18)^2 + (16-18)^2 + (18-18)^2 + (19-18)^2 + (22-18)^2] \] \[ = [36 + 4 + 1 + 4 + 25 + 16 + 9 + 0 + 1 + 36 + 9 + 4 + 0 + 1 + 16] \] \[= 162 \]

Calculate the Total Sum of Squares (SST):

\[ SST = SSB + SSW = 43.333 + 162 = 205.333 \]

Calculate Mean Squares:

\[ between groups = MSB = \frac{SSB}{k-1} = \frac{43.333}{3-1} = 21.667 \] \[ within groups = MSW = \frac{SSW}{N-k} = \frac{162}{15-3} = 13.5 \]

Calculate F-statistic:

\[ F = \frac{MSB}{MSW} = \frac{21.667}{13.5} = 1.605 \]

degrees of freedom

  1. Degrees of freedom for the numerator (df1): This corresponds to the number of groups minus one. In your case, with three groups (Emp Referral, Job Portals, Consultancy), \(df1 = 3 - 1 = 2\).
  2. Degrees of freedom for the denominator (df2): This corresponds to the total number of observations minus the number of groups. For 15 employees and 3 groups, \(df2 = 15 - 3 = 12\).
  3. Significance level (α): Typically, this is set at 0.05 for most studies, implying a 95% confidence level in the results.

27.5.1 Critical F-value Interpretation

You would locate the value in the F-table where \(df1 = 2\) and \(df2 = 12\), at the row and column intersecting at \(α = 0.05\). The critical F-value at these degrees of freedom and significance level is typically provided by statistical tables available in textbooks or online resources.

For practical purposes, based on typical values found in F-distribution tables for these degrees of freedom: - If the critical F-value is around 3.89 (common value for df1 = 2, df2 = 12, at α = 0.05), then since 1.605 < 3.89, you would fail to reject the null hypothesis, concluding that there is no significant effect of the selection method on employee performance at the 0.05 significance level.

This interpretation means that, based on your ANOVA results, the different selection methods do not statistically significantly impact employee sales performance.

27.6 One-Way ANOVA Test using R and Python

27.7 Example Research Articles on Anova:

  1. Factors Affecting Customer Satisfaction in Fast Food Restaurant “Jollibee” during the COVID-19 Pandemic — Sustainability, 2022. 👉 Download Article
  2. The Influence of Perceived Value, Customer Satisfaction, and Trust on Loyalty in Entertainment Platforms — Applied Sciences, 2024. 👉 Download Article

Summary

Concept Description
Foundations
One-Way ANOVA A test that compares the means of three or more independent groups defined by one categorical factor
Single Factor The grouping variable is a single categorical factor with two or more levels
Assumptions
Independence of Observations Observations must be independent between and within groups
Normality Within Groups The outcome should be approximately normally distributed within each group
Homogeneity of Variances The variances across groups should be approximately equal, tested with Levene's or Bartlett's
Hypotheses
Null Hypothesis States that all group means are equal
Alternative Hypothesis States that at least one group mean differs from the others
Computation
Sum of Squares Between (SSB) Measures variability between group means relative to the overall mean, the between-group signal
Sum of Squares Within (SSW) Measures variability within each group around its own mean, the within-group noise
Total Sum of Squares (SST) The total variability of the outcome, equal to SSB plus SSW
Mean Squares (MSB and MSW) Sums of squares divided by their degrees of freedom, producing unbiased variance estimates
Degrees of Freedom df for between = k minus 1, df for within = N minus k, where k is groups and N is total observations
F-Statistic Ratio of MSB to MSW, compared against an F-distribution with the given degrees of freedom
Decision
Decision Rule Reject H0 when F exceeds the critical value or when the p-value is below alpha
In R and Python
R via aov() Use aov(outcome ~ factor, data = df) and pass the object to summary() to read the F-statistic and p-value
Python via ols and anova_lm Use ols('outcome ~ C(factor)', data=df).fit() and statsmodels.stats.anova_lm() to get the ANOVA table