Programming Assignment 1

CSC 323 - Data Analysis and Statistical Software

Due: 10/3/2001

A local financial institution is interested in investigating the cost of maintaining a large portfolio of legacy C++ programs. The CIO is prepared to fund a project to replace these programs if the maintenance cost is too high. However, this cost information is not readily available.

A local consulting firm has been brought in to investigate this issue and have seven days to submit a report to management. Given the time constraint, they decide to examine the project management records of a simple random sample of programs selected from the portfolio. The idea is to determine the maintenance effort per line of code, in hours, over the past year, for each program in the sample and to use this information to make inferences about the portfolio. The following details are available for each program in the sample:

You have been asked to help with the analysis of this data.

The requirements for your analysis are detailed below:

  1. Write a SAS program to analyze these data. In particular your program should analyze effort per line of code. (50%)
    Your program should accomplish the following:
    1. Access your data from an external file.
    2. Compute effort per line of code (if necessary, see "DATA step statements", point 6, SAS Review).
    3. Execute the PRINT and MEANS procedures with appropriate options.
    4. For PROC PRINT, be sure to use labels for column headings (e.g. Program Size, Effort, Module Name, Effort per Line of Code). Use names that are meaningful. You should generate an appropriate title for your output.
    Note: If necessary, see the SAS program used for the in-class SAS demonstration.

  2. Write a short report (no more than a couple of paragraphs) discussing your findings. (50%)
    Your report should address the following:
    1. Provide an estimate of the population mean (3 places of decimal). That is, the mean effort per line of code to maintain the portfolio.
    2. Provide an estimate of the population standard deviation (3 places of decimal). That is the standard deviation of effort per line of code for the portfolio.
    3. The CIO has asked you to address the following in your report:
      • Program nlnreg is considered to be problematic by several staff members. That is, they contend that it requires more maintenance than the average program in the portfolio. Program nlnreg is 900 LOC and has required 279 hours of maintenance effort. Determine its percentile rank with respect to effort per line of code.
      • Determine the effort per line of code that distinguishes the most expensive programs from others in the portfolio. Assume that most expensive refers to the top 15% of the portfolio with respect to effort per line of code.
      Note: Assume that effort per line of code is normally distributed and use your estimates of the relevant population parameters to address these questions.