Programming Assignment 1

CSC 323 - Data Analysis and Statistical Software

Due: Section 601 - 5/1/2000; Section 901 - 4/26/2000

A local financial institution is interested in investigating the cost of maintaining a large portfolio of legacy COBOL programs. The CIO is prepared to fund a project to replace these programs if the maintenance cost is too high. However, this cost information is not readily available.

A local consulting firm has been brought in to investigate this issue and have seven days to submit a report to management. Given the time constraint, they decide to examine the project management records of a simple random sample of programs selected from the portfolio. The idea is to determine the maintenance effort, in hours, over the past year, for each program in the sample and to use this information to make inferences about the portfolio. Twenty six programs are selected and effort, along with other useful details, recorded. The following details are available for each program in the sample:

You have been asked to help with the analysis of this data.

Your analysis will involve the determination of the mean effort per line of code (if necessary, see "DATA step statements", point 6, SAS Review). You have also been asked to include in your analysis a comparison of your estimated population mean with a population mean suggested by the Director of Software Development. See additional details below.

  1. Write a SAS program to analyze these data. Your program should accomplish the following:
    1. Access your data from an external file.
    2. Execute the PRINT and UNIVARIATE procedures with appropriate options.
    3. For PROC PRINT, be sure to use a label for your column heading. Use names that are meaningful. You should generate an appropriate title for your output.
      Note: If necessary, see guide 2.
  2. Write a short analysis (no more than one page) of your output. Your analysis should address the following:
    1. Given this sample, is it reasonable to assume normality for the population? Justify your answer.
      Note: That is, if you were to examine the effort per line of code to maintain each program in the portfolio, would effort be normally distributed.
    2. Provide an estimate of the mean effort per line of code to maintain the portfolio. Your discussion should also provide an estimate of the population standard deviation and, assuming normality, discuss the implication of a standard deviation of this magnitude.
    3. The Director of Software Development has speculated that the mean effort per line of code to maintain the portfolio is 0.28 man hours. Assume that this is correct (i.e. use 0.28 as the population mean). Given your findings and, considering the sampling distribution of means, what proportion of samples would you expect to result in a mean more extreme than yours. What does this proportion tell you about your sample or your assumption that the Director is correct.
      Note: If necessary, use your sample statistics to estimate population parameters that are required for your computation.