Programming Assignment 1

CSC 323 - Data Analysis and Statistical Software

Due: 7/3/2002

A local financial institution has decided to upgrade its existing proprietary database management system to a relational database management system. They are in the process of deciding which software systems should be rewritten and which ones should be migrated to the new database management system. They are particularly interested in investigating the effort required to convert a large portfolio of C++ programs. The CIO would like to know if the conversion cost will be too high.

A local consulting firm has been brought in to investigate this issue and have seven days to submit a report to the CIO. Given the time constraint, they decide to examine a simple random sample of these C++ programs. The idea is to determine the effort required to convert each program in the sample. The conversion effort for a program is the time, in man hours, to analyze, modify, test, and install the program. Since each program is a different size, conversion effort per line of code will be needed for each program in the sample. The information gathered from the sample will be used to make inferences about the portfolio. The following details are available for each program in the sample:

You have been asked to help with the analysis of this data.

The requirements for your analysis are detailed below:

  1. Write a SAS program to analyze these data. Remember, your program should analyze effort per line of code not effort. (50%)
    Your program should accomplish the following:
    1. Access your data from an external file.
    2. Compute effort per line of code.
      Note: You will need an assignment statement in your data step. If necessary, see "DATA step statements", point 6, SAS Review.
    3. Execute the PRINT and MEANS procedures with appropriate options.
    4. For PROC PRINT, be sure to use labels for column headings (e.g. Programmer, Module Name, Program Size, Effort, Effort per Line of Code). Use names that are meaningful. You should generate an appropriate title for your output.
    Note: If necessary, see the SAS program used for the in-class SAS demonstration.

  2. Write a short report (no more than a couple of paragraphs) discussing your findings. (50%)
    Your report should address the following:
    1. Provide an estimate of the population mean (3 places of decimal). That is, the mean effort per line of code to convert each program in the portfolio.
    2. Construct and interpret a 90% confidence interval for your estimate of the population mean.
    3. The CIO has asked you to address the following in your report (include all computations and diagrams to support your answer):
      • Determine the effort per line of code that distinguishes the most expensive programs from others in the portfolio. Assume that most expensive refers to the top 10% of the portfolio with respect to effort per line of code.
      • Program uniq is considered to be a maintenance nightmare by several staff members. That is, they contend that it is more difficult to modify and test uniq than it is to modify and test the average program in the portfolio. Program uniq is 1500 LOC and will require 105 hours of conversion effort. Determine its percentile rank with respect to effort per line of code.
      Note: To address these questions, assume that effort per line of code is normally distributed. Also, use your estimates of the relevant population parameters.