Programming Assignment 3

CSC 323 - Data Analysis and Statistical Software

Due:Section 601 - 6/9/99; Section 602 - 6/11/99

The quality assurance manager at your company has asked you to analyze the data collected for a quality assurance experiment. The manager explains that a sample of modules were selected and submitted to the software validation and verification team for evaluation. The manager further explains that a quality score was assigned to each module. You have been asked to study the relationship between quality and two factors that are thought to affect quality. These factors are module complexity and testing coverage.

The number of decisions (i.e. if...then...else; do...while etc.) in each module exercised by a testing procedure is known as the coverage of the testing procedure. The complexity of a module may be determined by examining the possible paths (i.e. edges and nodes) in a module:

complexity = # of edges - # of nodes + 1

Use simple linear regression methods to conduct a thorough analysis of the data collected for this experiment. Each observation in the file consists of the following values:

Consider Quality to be the response variable. Coverage and Complexity are candidate explanatory variables.

An example of the format required for your analysis is provided. However, your analysis will also include a section which identifies the best explanatory variable.

  1. Your program should accomplish the following (code and output for the income/gpa problem discussed in class is provided):
    1. Read your data from an external file.
    2. Execute the PRINT procedure.
    3. Produce a scatterplot of the response variable vs. each of the candidate explanatory variables.
    4. Execute PROC CORR for the response variable and each of the candidate explanatory variables.
    5. For the best explanatory variable:
      1. Generate estimates of your slope and intercept using PROC REG.
      2. Execute PROC UNIVARIATE with the appropriate options for your residuals.
      3. Produce a residual plot.

      Note: For PROC PRINT, be sure to use labels for column headings rather than variable names. Use names for data sets and variables that are meaningful. You should generate an appropriate title for the output of these procedures.

  2. Your analysis should at least address the following:
    1. Identify the best explanatory variable, that is, the explanatory variable that does the best job of explaining variability in the response variable.
    2. For the best explanatory variable:
      1. State the regression model. Use appropriate symbols for all of its parameters.
      2. Conduct a residual analysis. You must state and address all relevant hypotheses.
      3. Provide estimates of the model parameters and state the regression equation. You must interpret the coefficients of the regression equation.
    3. Summarize the results of your analysis.