Programming Assignment 3

CSC423/324 - Data Analysis
Due: 3/20/2003

 

The new director of software development at a large local company has decided to investigate software quality. The director is particularly interested in developing a model that will allow her to predict software quality from metrics that may be obtained from the software module and the testing process.

She discusses the matter with the quality assurance management team and decides to examine several randomly selected modules from the software portfolio. She has been advised to use decision coverage as the testing process metric. Decision coverage is the proportion of boolean expressions in a program that have been exercised during testing. She has been advised to use complexity as the software module metric. The complexity metric used is based on cyclomatic complexity, which is the number of independent paths in a module. The director decides that the quality score that will be used will be execution time to first failure after the module has been released. The director scrutinizes project management records and incident reports to get this data and then asks for your help in developing this model.

She provides this data file. Each observation consists of the following values:

Note: Do not edit the data in any way. Write your program to access the data as defined above.

  1. Write a SAS program to analyze this dataset. Your program should do the following: (40%)
    Note: The code for the fde/dc problem discussed in class is available.
    1. Read your data from an external file.
    2. Execute the PRINT procedure.
    3. Produce two scatterplots. Your plots should be of the dependent variable vs each of the independent variables.
    4. Use PROC REG to generate estimates of the model parameters and to generate residuals.
    5. Use PROC UNIVARIATE to analyze the normality assumption.

    Note: For PROC PRINT, be sure to use labels for column headings rather than variable names. Use names for data sets and variables that are meaningful. You should generate an appropriate title for the output of these procedures.

  2. Write a report to summarize your findings. Your report should address the following. (60%)
    1. State the regression model. Use appropriate symbols for all of its parameters.
      Hint: Your regression model has two explanatory variables.
    2. Provide estimates of the model parameters.
    3. State the regression equation and interpret the coefficients of the regression equation.
    4. Assess normality for the residuals.
    5. If normality is reasonable complete the following problems:
      1. Discuss the p-values in the Parameter Estimates section of your PROC REG output. That is, what do these p-values tell you about the model?
      2. The quality assurance management team have contended for some time that a unit increase in decision coverage will lead to an increase in execution time to first failure of 30 minutes. You have always believed that the benefit of this additional coverage is better than they contend. Formulate the necessary hypotheses to resolve this issue. Given your findings, conduct a test of hypotheses.