Programming Assignment 1

CSC 323 - Data Analysis and Statistical Software

Due: 10/10/2000

A local financial institution is interested in investigating the cost of maintaining a large portfolio of legacy C++ programs. The CIO is prepared to fund a project to replace these programs if the maintenance cost is too high. However, this cost information is not readily available.

A local consulting firm has been brought in to investigate this issue and have seven days to submit a report to management. Given the time constraint, they decide to examine the project management records of a simple random sample of programs selected from the portfolio. The idea is to determine the maintenance effort per line of code, in hours, over the past year, for each program in the sample and to use this information to make inferences about the portfolio. The following details are available for each program in the sample:

You have been asked to help with the analysis of this data.

The requirements for your analysis are detailed below (if necessary, see "DATA step statements", point 6, SAS Review).

  1. Write a SAS program to analyze these data. In particular your program should analyze effort per line of code. Your program should accomplish the following:
    1. Access your data from an external file.
    2. Compute effort per line of code.
    3. Execute the PRINT and MEANS procedures with appropriate options.
    4. For PROC PRINT, be sure to use labels for column headings (i.e. Program Size, Effort, Module Name, Effort per Line of Code). Use names that are meaningful. You should generate an appropriate title for your output.
      Note: If necessary, see guide 2.
  2. Write a short report (no more than a couple of paragraphs) discussing your findings. Your report should address the following:
    1. Provide an estimate of the population mean. That is, the mean effort per line of code to maintain the portfolio.
    2. Provide an estimate of the population standard deviation. That is the standard deviation of effort per line of code for the portfolio.
    3. The CIO would like to know the proportion of programs in the portfolio that are expensive to maintain. Using 0.31 man hours per line of code as a threshold to distinguish expensive from non-expensive, and considering your estimate of the population parameters, estimate this proportion assuming a normal distribution.