Programming Assignment 1

CSC 323 - Data Analysis and Statistical Software

Due: 10/7/99

A local financial institution discovers that a small portfolio of COBOL programs are not Y2K compliant. These programs must either be made Y2K compliant or replaced by a compliant system that provides equivalent functionality.

A local consulting firm has been brought in to evaluate these alternatives and have seven days to submit a report to management. They have been told that cost will be the most important factor in deciding between the alternatives. Given the time constraint, they decide to use a random sample of programs selected from the portfolio to estimate the cost of making the portfolio compliant. Forty one programs are selected and carefully examined to determine the cost to fix each program. The following details are recorded for each program in the sample:

You have been asked to help with the analysis of this data.

Your analysis will involve the determination of the mean cost, per line of code, to make the portfolio Y2K compliant. You have also been asked to include in your analysis a comparison of this estimated population mean with a suggested population mean of $2.20 per line of code. See additional details below.

  1. Write a SAS program to analyze these data. Your program should accomplish the following:
    1. Access your data from an external file.
    2. Execute the PRINT and UNIVARIATE procedures with appropriate options.
    3. For PROC PRINT, be sure to use a label for your column heading. Use names that are meaningful. You should generate an appropriate title for your output.
      Note: If necessary, see guide 2.
  2. Write a short analysis (no more than one page) of your output. Your analysis should address the following:
    1. Given this sample, is it reasonable to assume normality for the population? Justify your answer.
      Note: That is, if you were to examine the cost per line of code to make each program in the portfolio Y2K compliant, would these costs be normally distributed.
    2. Assuming that your sample is representative of the portfolio, estimate the mean cost per line of code to make the portfolio Y2K compliant. Your discussion should also provide an estimate of the population standard deviation and, assuming normality, discuss the implication of a standard deviation of this magnitude.
    3. The CIO has suggested that the mean cost per line of code to make the portfolio Y2K compliant should be $2.20. Assume that the CIO is correct. Given your findings and considering the sampling distribution of means what proportion of samples would you expect to result in a mean more extreme than yours. What does this proportion tell you about your sample or your assumption that the CIO is correct.