Psychology 242

Data Analysis for Surveys

 

This will cover the following aspects of data analysis for surveys, with instructions for doing the analyses in SPSS:

  1. Reversing the coding of an item
  2. Combining items to define a “scale” that measures a variable
  3. Reliability analysis of a scale using Cronbach’s alpha
  4. Using a correlation coefficient to test a hypothesis about the relationship of two variables.

 

The initial data file for the results of a survey should contain one row per participant, and one column per question.  I am also assuming that the responses to each question have been coded numerically, and that the responses for each question are either dichotomous (only two possible answers, coded as “0” and “1” in the data file) or are measured on either an ordinal, interval, or ratio scale.  The methods described here are not appropriate for categorical data with more than 2 possible values.

 

A survey can be used either to estimate the proportion of the population that holds a certain opinion, or to test for a relationship between two variables.  In this tutorial, we will do the latter.  Specifically, I will show how survey data could be used to test the following hypothesis:  People who care more about the internal characteristics of a television show (who the writers, directors, and actors are) will be less likely to care about external factors (what else is on, what other people think of the show, etc). 

 

The data file used in this example is tv-survey.sav, which comes with SPSS as a sample data file.  You may download it from the link in the preceding sentence if you can not find it on your computer.  This file contains 7 variables (corresponding to 7 questions on a survey) and 906 observations (subjects) for each question.  Each variable is dichotomous, with values of 0 and 1 only.  For the purposes of this example, we will assume that each variable represents a yes (1) or no (0) question about whether the participant watches TV shows for a given reason, with each item asking about a different reason.  The items are:

  1. Any – any reason
  2. Bored - There are no other popular shows on at that time
  3. Critics - The critics still give the show good reviews
  4. Peers - Other people still watch the show
  5. Writers - The original screenwriters stay with the show
  6. Directors - The original directors stay with the show
  7. Cast - The original cast stays with the show

 

Only the last three items seem to refer to things that are integral (internal) to the show itself.  The rest seem to refer to external factors (except for item 1, which does not really fit in either category but we will call “external” for now).  Remember out hypothesis was about a relationship between two variables:  internal features of a show and external features.  We will use a correlation coefficient to test whether the hypothesized relationship exists.  But first we must define two scales to measure the two variables.  Each scale will be defined as a combination of items from the survey.  Once we have defined the scales and checked to make sure they are reliable, we can then calculate the correlation of the two scales to test our hypothesis.  Here are the steps we will follow:

  1. Recode any items that are reverse-coded from the other items
  2. Define a scale (a new variable in SPSS) for each variable in our hypothesis
  3. Analyze the reliability of each scale
  4. Test our hypothesis by calculating the correlation of the two scales

 

1.  Recode reverse-coded items.  In this dataset, all of the items are coded the same way:  yes=1 and no=0, and higher numbers always mean “more likely to watch for this reason.”   But what if item 4 had instead read “I am less likely to watch a show if other people still watch it, yes (1) or no (0).”  Then it would be reverse-coded from the other items – 0 and 1 would mean the opposite of what they did for the other items.  When you have reverse-coded items, you must first transform the responses before you combine those items with the others to create a scale.  Here are the steps you would follow in SPSS to reverse the scale for item 4:

 

 

You should now see a new column in the SPSS data window labeled “peers_r” that contains the reverse-coded data for “peers.”  We will not actually use this item in the analysis for this data however.

 

2.  Define a scale for each variable in the hypothesis.  We will define one scale for each of the two variables.  Each scale will be a linear combination of the survey items that are related to that variable.

For the variable external:

·       Transform -> Compute

·       Click “reset”

·       In the “Target Variable” box, type external

·       In the “Numeric Expression” box enter any +  bored + critics + peers (hint: you can use the arrow button to place the variable names into the equation)

·       Click “OK”

For the variable internal:

·       Transform -> Compute

·       Click “reset”

·       In the “Target Variable” box, type internal

·       In the “Numeric Expression” box enter writers + director + cast 

·       Click “OK”

 

3.  Analyze the reliability of each scale.  In order to be of any use, a measure must be reliable.  We will test the internal reliability of the scales in our survey using a statistic called “Cronbach’s alpha.”  Alpha is similar to a correlation coefficient:  the maximum value is 1.0, and a good (reliable) scale should have an alpha of .80 or higher.  The following instructions are for analyzing the reliability of the “external” scale:

Analysis of the reliability of the “internal” scale is left as an exercise for the reader.

 

4.  Calculate the correlation coefficient between the two scales.  Does the hypothesis predict a positive or a negative correlation between “internal” and “external”?  Here are the steps for producing a scatterplot and calculating the correlation coeffiecient: