Programming Assignment 3

CSC 323 - Data Analysis and Statistical Software

Due: Section 702: 11/18/98 & Section 403: 11/17/98.

 

On average, the cost per line of code to fix the Y2K problem is $1.70 for the financial industry. You have recently been hired as a project manager at a large bank and have been told that the cost being incurred by the Y2K team is in line with the industry average. From what you have observed, you dispute this claim and believe that the banks cost is greater then the industry average. You have shared your reservations with the CIO and have been asked by the CIO to conduct an experiment to address this issue.

For your experiment, you decide to select a sample of 25 programs that have already been fixed by the Y2K team and examine project management records for each program. You determine the cost per line of code for each program and decide to conduct a "test of hypothesis".

  1. Write a SAS program to analyze this dataset. Your program should do the following:
    1. Read your data from an external file.
    2. Execute the PRINT procedure.
    3. Use the appropriate SAS procedures to produce the p-values to test your hypotheses.

    Note: For PROC PRINT, be sure to use labels for column headings rather than variable names. Use names for data sets and variables that are meaningful. You should generate an appropriate title for the output of these procedures.

  2. Write a short analysis (no more than one page) of the output of your SAS program. Your analysis should at least address the following:
    1. State the NULL and ALTERNATIVE hypotheses for the problem stated above.
    2. State all other hypotheses needed for your analysis.
    3. If necessary, assess the normality assumption.
    4. The significance of the p-value obtained for the hypotheses stated for part a. above.
    5. Is the banks cost greater than the industry average? If appropriate, comment on the sensitivity of your conclusion to the normality assumption.
    6. If appropriate, comment on the issue of practical vs. statistical significance in this case.