Programming Assignment 3
CSC423/324 - Data Analysis
Due: 11/24/2003
The new director of software development at a large local company has
decided to investigate software quality. The director is particularly
interested in developing a model that will allow her to predict
software quality from metrics that may be
obtained from the software module and
the testing process.
She discusses the matter with the quality assurance
management team and decides to examine
several randomly selected
modules from the software portfolio. She has been advised to use
decision coverage as
the testing process metric. Decision coverage is the proportion of
boolean expressions in a program that have been exercised during testing.
She has been advised to use
complexity as the software module metric.
The complexity metric used is based on cyclomatic complexity, which
is the number of independent paths in a module.
The director decides that the quality score that will be used will
be execution time to first failure after the module has been
released.
The director scrutinizes project management records and
incident reports to get this data and then asks for your help
in developing this model.
She
provides
this data file.
Each observation consists of the following values:
- Decision Coverage; 1-2
- Complexity; 3-5
- Quality Score (hours); 6-10
- Module Name; 11-16
- Quality Team; 17 (i.e. 1 - Team #1, 2 - Team #2)
Note: Do not edit the data in any way. Write your program
to access the data as defined above.
- Write a SAS program to analyze this dataset.
Your program
should do the following: (40%)
Note: This code
is similar to the code for the function point
problem discussed in class.
- Read your data from an external file.
- Execute the PRINT procedure.
- Produce two scatterplots. Your plots should
be of the dependent variable vs each of the independent variables.
- Use PROC REG
to generate
estimates of the model parameters
and to generate residuals.
- Use PROC UNIVARIATE to analyze the normality assumption.
Note: For PROC PRINT, be sure to use labels for
column headings rather than variable names. Use names for
data sets and variables that are meaningful. You should
generate an appropriate title for the output of these
procedures.
- Write a report to summarize your findings.
Your report should address the following.
(60%)
- State the regression model. Use appropriate symbols for
all of its parameters.
Hint: Your regression model has two explanatory variables.
- Provide estimates of the model parameters.
- State the regression equation and interpret the
coefficients of the regression equation.
- Assess normality for the residuals.
- If
normality is reasonable complete the following problems:
- Discuss the p-values in the Parameter Estimates
section of your PROC REG output. That is, what do these p-values
tell you about the model?
- The quality assurance management team have contended for some
time that a unit increase in decision coverage will lead to an increase
in execution time to first failure of 30 minutes. You have always
believed that the benefit of this additional coverage is better than
they contend.
Formulate the necessary
hypotheses to resolve this issue.
Given your findings,
conduct a test of hypotheses.