(This article was first published on R-exercises , and kindly contributed toR-bloggers)
When we are interested in finding if there is a statistical difference in the mean of two groups we use the t test. When we have more than two groups we cannot use the t test, instead we have to use analysis of variance (ANOVA). In one way ANOVA we have one continuous dependent variable and one independent grouping variable or factor. When we have two groups the t test and one way ANOVA are equivalent.
For our one way ANOVA results to be valid there are several assumptions that need to be satisfied. These assumptions are listed below.
The dependent variable is required to be continuous
The independent variable is required to be categorical with or more categories.
The dependent and independent variables have values for each row of data.
Observations in each group are independent.
The dependent variable is approximately normally distributed in each group.
There is approximate equality of variance in all the groups.
We should not have any outliers
When our data shows non-normality, unequal variance or presence of outliers you can transform your data or use a non-parametric test like Kruskal-Wallis. It is good to note Kruskal-Wallis does not require normality of data but still requires equal variance in your groups.
For this exercise we will use data on patients having stomach, colon, ovary, brochus, or breast cancer. The objective of the study was to identify if the number of days a patient survived was influenced by the organ affected. Our dependent variable is Survival measured in days. Our independent variable is Organ. The data is available here http://lib.stat.cmu.edu/DASL/Datafiles/CancerSurvival.html and a cancer-survival file has been uploaded
Solutions to these exercises can be found here
Load the data into R
Create summary statistics for each organ
Check if we have any outliers using boxplot
Check for normality using Shapiro.wilk test
Check for equality of variance
Transform your data and check for normality and equality of variance.
Run one way ANOVA test
Perform a Tukey HSD post hoc test
Use a Kruskal-Wallis test