Programming Assignment 2

CSC323 - Data Analysis and Statistical Software

Due: 5/21/2003

 

You are a recent hire at a company that develops next-generation technologies for the consumer electronics industry. You have been assigned to the embedded/firmware software development team and will be working with DSP software engineers on testing firmware for a new product.
Note: See the What is Embedded Computing article (IEEE Computer, Jan 2002, vol 35, 1) and the DSP FAQ if you are interested in knowing more about embedded computing and DSP's.

You have been asked to analyze the data from a series of tests on two different implementations of a decoding algorithm. The algorithm was designed for high-end audiophile electronics products that must decode encoded audio datastreams in real time. The encoding is done by a standard lossless encoding codec. One of these implementations is an off-the-shelf (OTS) implementation of the decoding algorithm and the other was developed by the in-house software development (IHD) team. The test datastreams were provided by an independent Validation and Verification team.
Note: See the Meridian Lossless Packing article if you are interested in knowing more about one approach to lossless encoding.

The CEO has been informed by the independent Validation and Verification team that the IHD implementation is, on average, 0.9 seconds faster than the OTS implementation. The CEO has indicated that the IHD implementation will only be used if it is more than 0.9 seconds faster than the OTS implementation. The IHD team believes that their implementation is more than 0.9 seconds faster than the OTS implementation but have been told to provide empirical evidence to demonstrate this. The IHD team is convinced that your analysis will provide the necessary evidence.

You have been provided with this data file. Each observation consists of the following values:

Notice that a Test Status indicator has been provided. You are only interested in observations where Test Status is C.

Note: Do not edit the data in any way. You must code your SAS program to read each observation as defined above and ignore those observations that are not needed.

Conduct a thorough analysis of these data. You will need to conduct a test of hypotheses and submit a report summarizing your findings. You must analyze the difference in decoding time for each observation. Remember that each observation represents a test datastream. See additional details below.

  1. Write a SAS program to analyze this dataset. (50%)

    Your program should do the following:

    1. Read your data from an external file.
    2. Compute the time difference for each observation.
    3. Execute the PRINT procedure.
    4. Use the appropriate SAS procedures to produce the statistics needed to conduct your hypothesis test.

    Note: For PROC PRINT, be sure to use labels for column headings rather than variable names. Use names for data sets and variables that are meaningful. You should generate an appropriate title for the output of these procedures.

  2. Write a short analysis (no more than two pages) of the output of your SAS program. (50%)

    Remember that your analysis is a test of hypotheses and so should at least address the following:

    1. State the primary hypotheses. That is, the NULL and ALTERNATIVE hypotheses for the experiment described above.
    2. Address the normality issue. That is, do you need to establish normality in order to address your primary hypotheses? Justify your answer.
      Note: If you think normality must be established, do not assume normality. Instead, state and address the normality hypotheses. By so doing, you will know if normality is reasonable.
    3. Determine the p-value for your primary hypotheses (i.e. compute the p-value as outlined in Step 3 of the Hypothesis Testing lecture notes). Remember to discuss the significance of the p-value obtained for the primary hypotheses.
    4. Given your findings, briefly discuss which of the implementations will be included in the final product.