Psy 242
Handout for ANOVA and post-hoc tests
Statistical Inference:
“Just by chance” means just due to sampling error, when no real difference or real relationship exists in the population.
Statistical tests of significance are used to answer the question “How often would I get this result just by chance”?
How do all these statistical tests work?
Basically, each of them calculates a ratio based on your
sample data. The ratios are different
for the various test statistics, but in general you can think of them as being:
effect / error.
If the null hypothesis is true (Null Hypothesis = “There is no real effect”), then statistical theory can tell us exactly how probable a certain value of our test statistic is. If the value of our test statistic that we calculate from our sample is very unlikely to happen when the null hypothesis is true, then we reject the null hypothesis. In other words, we conclude that “It is not true that there is no real effect.”
How do we know what the probability of getting a certain value of a test statistic is? We have to refer to the sampling distribution of that statistic. Take the t statistic for example. We know (from statistical theory) that if the null hypothesis is true, then the sampling distribution of t will have a mean of zero. Values of t that are close to zero are very likely to happen when the null hypothesis is true. Values far from zero are very unlikely. You can use a table in a statistics book, or use a computer stats package to calculate exactly how often a given value of t occurs when the null hypothesis is true. The probability of getting a value of t that far from zero is called p. When p is less than .05, we say that this value of t is so unlikely to happen when the null hypothesis is true, that we conclude that the null hypothesis must be false. Then we say that the difference between the two means is statistically significant.
A general rule of thumb for how all of these tests of significance work is:
The larger this ratio is, the less likely it is that what we observed happened just by chance.
Comparing two means: t tests
Comparing more than two means: ANOVA (F test)
The F test is a generalization of the t test that can compare more than two means at once. When you are comparing just two means, either the F test (also called ANOVA) or a t test can be used. In fact, if you square the value of t it will be exactly the value of F for the same data; F = t2
With ANOVA, not only can you compare more than two means at once, you can also test for the effects of more than one independent variable at once.
You can also test for the interaction of two or more variables. An interaction is when the effect of one variable changes depending on the level of a second variable. If, for example, a study found that married men live longer than single men, but unmarried women live longer than married women, there would be an interaction of the effects of marriage and gender on the dependent variable longevity. Being married increases longevity for men, but decreases it for women. An ANOVA can be used to test whether an interaction is statistically significant.
So what statistical tests do we usually use to analyze the results of an experiment?
SPSS HOWTO - Between-Subjects Design:
SPSS HOWTO - Within-Subjects Design:
What are post-hoc tests and why are they necessary?
An ANOVA can only tell us that there is some significant difference among the means somewhere – it can not tell us which pairs of means are significantly different. For that, we must use some sort of post-hoc (“after the fact”) test to compare the individual pairs of means we are interested in. We could simply use separate t-tests for each pair of means we want to compare. Why not use t-tests? Because running more than one statistical test of significance on the same set of data increases the chance that we will reject the null hypothesis when we should not. The post-hoc tests in SPSS for between-subjects designs correct for this. Unfortunately, SPSS does not do post-hoc adjustments for within-subjects designs. A good rule of thumb is to specify ahead of time which paired comparisons or contrasts you plan to do. Then the comparisons are planned rather than post-hoc, and therefore less problematic.