Programming Assignment 3

CSC 323 - Data Analysis and Statistical Software

Due: 3/22/2001

Your colleague is interested in developing a model to predict the performance of workstations and believes that the Tower of Hanoi (TOH) benchmark is appropriate as a measure of performance. She decides to conduct an experiment to develop this model and selects a sample of workstations, based on different CPU's, from different manufacturers. She executes the TOH benchmark for each workstation and also records Clock Rating and the Matrix Inversion (MI) score reported by the manufacturer.

Note: The TOH benchmark is the time in microseconds (ms) to make 25 TOH moves. The MI score is the time in microseconds (ms) to complete a standard matrix inversion.

Use simple linear regression methods to conduct a thorough analysis of the data collected for this experiment. Each observation in the file consists of the following values:

Consider TOH Benchmark to be the response variable. Clock Rating and MI Score are candidate explanatory variables.

  1. Your program should accomplish the following: (30%)
    Note: Code and output for the income/gpa problem discussed in class is available.
    1. Read your data from an external file.
    2. Execute the PRINT procedure.
    3. Produce a scatterplot of the response variable vs. each of the candidate explanatory variables.
    4. Execute PROC CORR for the response variable and each of the candidate explanatory variables.
    5. For the best explanatory variable:
      1. Generate estimates of your slope and intercept using PROC REG.
      2. Execute PROC UNIVARIATE with the appropriate options for your residuals.
      3. Produce a residual plot.

      Note: For PROC PRINT, be sure to use labels for column headings rather than variable names. Use names for data sets and variables that are meaningful. You should generate an appropriate title for the output of these procedures.

  2. Your analysis should address the following. Note that an example of the format required for your analysis is provided. Remember that your analysis will include an additional section which identifies the best explanatory variable. (70%)
    1. Identify the best explanatory variable, that is, the explanatory variable that does the best job of explaining variability in the response variable.
    2. For the best explanatory variable:
      1. State the regression model. Use appropriate symbols for all of its parameters.
      2. Conduct a residual analysis. You must state and address all relevant hypotheses.
      3. Provide estimates of the model parameters and state the regression equation. You must interpret the coefficients of the regression equation.
    3. Summarize the results of your analysis.