Programming Assignment 1

CSC 323 - Data Analysis and Statistical Software

Due: Section 601 - 4/21/99; Section 602 - 4/22/99

A local financial institution discovers that a small portfolio of COBOL programs are not Y2K compliant. These programs must either be made Y2K compliant or replaced by a compliant system that provides equivalent functionality.

A local consulting firm has been brought in to evaluate these alternatives and have seven days to submit a report to management. They have been told that cost will be the most important factor in deciding between the alternatives. Given the time constraint, they decide to use a random sample of programs selected from the portfolio to estimate the cost of making the portfolio compliant. Forty one programs are selected and carefully examined to determine the cost to fix each program. The following details are recorded for each program in the sample:

You have been asked to help with the analysis of this data.

Your analysis will involve the determination of the mean cost, per line of code, to make the portfolio Y2K compliant. You have also been asked to include in your analysis a comparison of this estimated population mean with the industry mean, which you have been told is $2.10 per line of code. See additional details below.

  1. Write a SAS program to analyze these data. Your program should accomplish the following:
    1. Access your data from an external file.
    2. Execute the PRINT and UNIVARIATE procedures with appropriate options.
    3. For PROC PRINT, be sure to use a label for your column heading. Use names that are meaningful. You should generate an appropriate title for your output.
      Note: If necessary, see guide 2.
  2. Write a short analysis (no more than one page) of your output. Your analysis should address the following:
    1. Given this sample, is the normality assumption reasonable? Justify your answer.
      Note: That is, if you were to examine the cost per line of code to make each program in the portfolio Y2K compliant, would these costs be normally distributed.
    2. Assuming that your sample is representative of the portfolio, estimate the mean cost per line of code to make the portfolio Y2K compliant. Your discussion should also provide an estimate of the population standard deviation and, assuming normality, discuss the implication of a standard deviation of this magnitude.
    3. By considering the sampling distribution of means for this situation, compare your sample mean with the industry mean.

    4. Note: To do this, assume the industry mean to be the mean for your population and use your sample standard deviation as an estimate of the population standard deviation. Now, if your mean is larger than the industry mean determine the proportion of samples that would result in a mean larger than yours. Similarly, if your mean is smaller than the industry mean determine the proportion of samples that would result in a mean smaller than yours. Discuss the implication of the size of your computed proportion.