Psy 242

Handout for ANOVA and post-hoc tests

Statistical Inference:

Is what we observed likely to have happened by chance?
How often would this happen by chance?
If it is very unlikely to happen by chance (< 5% of the time) we will conclude that it did not happen by chance – that our finding reflects something “real” about the whole population.

“Just by chance” means just due to sampling error, when no real difference or real relationship exists in the population.

Statistical tests of significance are used to answer the question “How often would I get this result just by chance”?

Chi-square: How often would percentages differ this much just by chance?
Correlation: How often would two variables be this strongly related just by chance? (A kind of t-test is used to test how often we would get a correlation of a certain size just by chance, if the real correlation in the population were zero).
t-test: How often would two means differ by this much just by chance?
ANOVA (F test): How often would the differences among the means be this great just by chance?

How do all these statistical tests work?

Basically, each of them calculates a ratio based on your sample data. The ratios are different for the various test statistics, but in general you can think of them as being:
effect / error.

“Effect” is a measure of what we are testing for, such as the difference between two means (the difference between the means for the control and experimental groups for instance).
“Error” is a measure of chance variation, such as the variability within each group.

If the null hypothesis is true (Null Hypothesis = “There is no real effect”), then statistical theory can tell us exactly how probable a certain value of our test statistic is. If the value of our test statistic that we calculate from our sample is very unlikely to happen when the null hypothesis is true, then we reject the null hypothesis. In other words, we conclude that “It is not true that there is no real effect.”

How do we know what the probability of getting a certain value of a test statistic is? We have to refer to the sampling distribution of that statistic. Take the t statistic for example. We know (from statistical theory) that if the null hypothesis is true, then the sampling distribution of t will have a mean of zero. Values of t that are close to zero are very likely to happen when the null hypothesis is true. Values far from zero are very unlikely. You can use a table in a statistics book, or use a computer stats package to calculate exactly how often a given value of t occurs when the null hypothesis is true. The probability of getting a value of t that far from zero is called p. When p is less than .05, we say that this value of t is so unlikely to happen when the null hypothesis is true, that we conclude that the null hypothesis must be false. Then we say that the difference between the two means is statistically significant.

A general rule of thumb for how all of these tests of significance work is:

The larger this ratio is, the less likely it is that what we observed happened just by chance.

Comparing two means: t tests

effect: the difference between the sample means
error: the standard error of the difference between two means

Comparing more than two means: ANOVA (F test)

effect: the variance of the sample means (between-groups variance)
error: the variance within (within-groups variance)

The F test is a generalization of the t test that can compare more than two means at once. When you are comparing just two means, either the F test (also called ANOVA) or a t test can be used. In fact, if you square the value of t it will be exactly the value of F for the same data; F = t²

With ANOVA, not only can you compare more than two means at once, you can also test for the effects of more than one independent variable at once.

You can also test for the interaction of two or more variables. An interaction is when the effect of one variable changes depending on the level of a second variable. If, for example, a study found that married men live longer than single men, but unmarried women live longer than married women, there would be an interaction of the effects of marriage and gender on the dependent variable longevity. Being married increases longevity for men, but decreases it for women. An ANOVA can be used to test whether an interaction is statistically significant.

So what statistical tests do we usually use to analyze the results of an experiment?

Experiments with only 2 levels of a single IV: either a t test or ANOVA
Experiments with more than 2 levels of a single IV: ANOVA
Experiments with more than one IV: ANOVA

ANOVA for comparing more than 2 conditions of a single IV

SPSS HOWTO - Between-Subjects Design:

Analyze -> compare means -> One-way ANOVA
Move dependent variable to “dependent list”
Move independent variable to “factor” box
“Post-hoc” – select one or more (such as Tukey, Bonferroni, Scheffe), then click “continue”
click “OK”

SPSS HOWTO - Within-Subjects Design:

Analyze -> General Linear Model -> Repeated Measures
“Within-subject factor name” – type in a name for your IV; it will not be one of the variable names in your data file, since each “variable” in the SPSS data corresponds to just one level of your IV.
“Number of levels” – type in the number of conditions
Click “Add”
Click “Define”
In the left panel, highlight the names of the conditions using the mouse
Click on the arrow button to move them into the “Within subjects variables” box
You can then use the up/down arrow buttons to re-arrange the order of the conditions if you need to.
Click “contrasts” and select “Polynomial” if it is not already selected
Click “continue”
Click “OK”

What are post-hoc tests and why are they necessary?

An ANOVA can only tell us that there is some significant difference among the means somewhere – it can not tell us which pairs of means are significantly different. For that, we must use some sort of post-hoc (“after the fact”) test to compare the individual pairs of means we are interested in. We could simply use separate t-tests for each pair of means we want to compare. Why not use t-tests? Because running more than one statistical test of significance on the same set of data increases the chance that we will reject the null hypothesis when we should not. The post-hoc tests in SPSS for between-subjects designs correct for this. Unfortunately, SPSS does not do post-hoc adjustments for within-subjects designs. A good rule of thumb is to specify ahead of time which paired comparisons or contrasts you plan to do. Then the comparisons are planned rather than post-hoc, and therefore less problematic.