Programming Assignment 3
CSC 323 Data Analysis and Statistical Software
Due: 6/4/98
Use simple linear regression methods to conduct a thorough
analysis of the following dataset.
- Fault Dataset:
Each observation consists of the following variables in the order presented:
- Fault Detection Effectiveness
- Decision Coverage
- Block Coverage
- All-Uses Coverage
- Utility Name (Unix utilities Cal & Col)
Consider
Fault Detection Effectiveness
to be the dependent variable.
Decision, Block and All-Uses Coverage are explanatory
variables.
An example
of the format required for your analysis is provided. However, your analysis
will contain more. It will have a section which identifies the best
explanatory variable
and your Discussion section will discuss the 95%
confidence intervals for your parameter estimates.
Your analysis should address the following points:
- Identify the best explanatory variable, that is, the explanatory variable that does the best job of explaining variability in the dependent variable.
- For the best explanatory variable:
- Identify the regression model including symbols for all of its parameters.
- Conduct a residual analysis. You must state the hypotheses and p-value for the normality
test.
- Give the estimates of the model parameters, state the regression equation and interpret the
coefficients of the equation.
- Construct, and interpret, 95% confidence intervals for the slope and intercept parameters.
- Summarize the results of your analysis.
Your program should accomplish the following:
- Produce a scatterplot of the dependent variable vs. each of the explanatory variables.
- Execute PROC CORR for the dependent variable and each of the explanatory variables.
- For the best explanatory variable:
- Generate estimates of your slope and intercept using PROC REG.
- Produce a residual plot.
- Execute PROC UNIVARIATE with the NORMAL and PLOT options for the residuals.
- Optionally produce a normal plot of the residuals using the RANK and PLOT procedures.
Note:You may want to do this if the normality test is not definitive.