To Lecture Notes

IT 223 -- 1/24/11

 

Review Questions

  1. What percent of observations are in the following bins of the standard normal distribution:

    Ans: a. 1.0000 = 100%     b. 0.0000 = 0%     c. 0.6827 = 68%     d. 0.9545 = 95%     e. 0.9973 = 99.7%     e. 0.9999367 = 99.994%

  2. What percent of observations are in the following bins of the standard normal distribution:

    Ans: Whether the interval endpoint is included or not does not change the probability. The answer is 0.6827 in each case.

  3. The heights of a population of women has x = 64" with SD = 2.5". What proportion of women have a height of 6 feet even (between 71.5 and 72.5 inches). Solve this problem using the normal table and with SPSS.

  4. Compute Q1, Q3, and IQR for a standard normal distribution.

    Ans: In the standard normal table, look up the z-score that corresponds to a proportion of 0.2500. (Recall that Q1 is the 25th percentile.) This z-score is z = -0.67. The z-score corresponding to Q3 = 75th percentile is z = 0.067. IQR = Q3 - Q1 = 0.67 - (-0.67) = 1.34.

  5. What proportion of the observations are outliers (mild or extreme) for a normal distribution using z-scores?

    Ans: Outliers are points that lie beyond the inner fences, which are at

    and The proportion of data points less than -2.68 is 0.00368 = 0.4%. The proportion of data points greater than 2.68 is 0.00368 = 0.4$, so the total proportion of points that lie outside the inner fences is 2 * 0.04% = 0.8%.

  6. Assuming that IQ scores are normally distributed with μ = 100 and σ = 15, how many persons out of 1 billion (109) have an IQ greater than 175?

    Ans: μ = 100, σ = 15, x = 175, z = (x - μ) / σ = (175 - 100) / 15 = 5. The proprtion of persons with z > 5 equals the proportion of persons with z < -1 because the normal histogram is symmetric. Use the extreme normal table to look up -5. The proportion of persons with z < -5 is 2.867 × 10-7. The number of persons out of 1 billion (109) with z < -5 is 109 × 2.867 × 10-7 = 2.867 × 102 = 287.

  7. A tropical island receives an average of 80 inches per year with an SD of 10 inches per year. These averages are normally distributed. Out of 100,000 years, how many years have less than 40 inches per year?

    Ans: z = (x - x) / SD) = (40 - 80) / 10 = -4. The area of the normal curve for the bin (-∞, -4] is 3.167 × 10-5. The number of years out of 100,000 having rainfall of less than 40 inches per year is 3.167 × 10-5 × 105 = 3.167, or about 3 years.

  8. In a scatterplot, the variable v1 is plotted vs. v2.

    1. Which is the x-variable? which is the y-variable?

      Ans: v2 is the x-variable, v1 is the y-variable.

    2. Which is the independent variable? which is the dependent variable?

      Ans: v2 is the independent variable, v1 is the dependent variable.

  9. Construct the normal plot by hand using Van der Waerden's Method for this dataset:

    Ans: The data is already sorted. (If not, sort the data before forming the normal plot.) To find the normal scores with Van der Waerden's method, find z-scores that will divide the standard normal curve into n+1 = 7+1 = 8 equal areas. Look up 1/8 = 0.125 in the body of the standard normal table to find the z-score of -1.15. Look up 2/8 = 0.250 to find -0.67; 3/8 = 0.375 → -0.32; 4/8 = 0.500 → 0.00; 5/8 = 0.625 → 0.32; 6/8 = 0.750 → 0.67; 7/8 = 0.825 → 1.15. Now plot the normal scores vs. the actual data points in a scatterplot.

  10. Use SPSS to create the normal plot for the dataset in Problem 9.

    Ans: Use Analyze >> Q-Q Plot, and choose Van der Waerden's Method.

  11. Create the normal plot for the following:
     
    1. 500 generated normal random numbers with μ = 6 and σ = 4.

    2. 500 generated uniform random numbers with μ = -1 and σ = 1.

    We have already seen two ways to generate the observation numbers (variable i) by (1) typing the numbers in by hand, and (2) by copying and pasting a column from Excel. A third way to generate the observation numbers is described in Syntax to enter a Range into a Dataset.

 

Confidence Intervals for μ

 

Linear Correlation

 

Practice Problems

  1. Estimate the correlation r in these situations:

    1. Height of father, height of son.

        i. -0.30    ii. 0.05    iii. 0.70    iv. 0.99

    2. IQ of husband, IQ of wife.

        i. -0.70    ii. 0.00    iii. 0.60    iv. 1.00

    3. Height of husband, height of wife if men always married women that were exactly 6 inches shorter.

        i. -0.60    ii. 0.60    iii. 0.99    iv. 1.00

    4. Weight of husband, weight of wife if men always married women that weighed 70% of their husbands weight.

        i. 0.00    ii. 0.50    iii. 0.70    iv. 1.00

  2. Match the correlation to the dataset:

    1. GPA in freshman year, GPA in sophomore year.

    2. GPA in freshman year, GPA in senior year.

    3. Length and weight of 2 by 4 boards.

  3. What would happen to the correlation r if

    1. x were replaced by x + 10.

    2. y were replaced by 2 times y.

    3. x and y were interchanged.

  4. How large must r be to be considered meaningful?

  5. Use SPSS to compute the pairwise correlations of the variables in the Nielsen Dataset. Interpret them.

  6. Give a situation where there is strong causality between two variables x and y, but the correlation is close to 0.

  7. Compute the correlation r of this dataset "by hand" using SPSS (not using Analyze >> Correlate >> Bivariate).

  8. If x = 5, y = 3, SDx = 0.5, SDy = 2, and r = 0.75, compute the regression line.

  9. Using the regression line in Question 4, what is the predicted value for y if x = 4.

 

Bivariate Normal Datasets

 

More about Correlations

 

Linear Regression

 

Project 3