Programming Assignment 1

CSC423/324 - Data Analysis

Due: 10/20/2003

 

You are a recent hire at a company that develops software for the consumer electronics industry. You have been assigned to the embedded/firmware software development team and will be working with DSP software engineers on testing firmware for a new product.
Note: See the What is Embedded Computing article (IEEE Computer, Jan 2002, vol 35, 1) and the DSP FAQ if you are interested in knowing more about embedded computing and DSP's. Also, the paper The Rise of Embedded Media Processing (Analog Devices) presents an interesting, and informative, perspective on Embedded Computing.

You have been asked to analyze the data from a series of tests on two different implementations of a lossy encoding algorithm. The algorithm was designed for encoding high definition video datastreams where space and time efficiency are of paramount importance. One of these implementations is by a team of software consultants (C) and the other was developed by an in-house software development (I) team. The test datastreams were provided by an independent Validation and Verification team.
Note: See the Microsoft WM9 page as well as the Analog Devices WM9 DSP page if you are interested in knowing more about a lossy encoding algorithm for high definition video. Also, this recent EEtimes news report provides some insight into the new generation of competing video encoding standards.

The CEO has indicated that speed will be one criterion for choosing between implementations. That is, the (I) implementation will only be chosen if it is faster than the (C) implementation. The (I) team believes that their implementation is faster than the (C) implementation but understand that they must demonstrate this by providing empirical evidence. The (I) team is convinced that your analysis will provide the necessary evidence.

You have been provided with this data file. Each observation consists of the following values:

Notice that a Test Status indicator has been provided. You are only interested in observations where Test Status is C.

Note: Do not edit the data in any way. You must code your SAS program to read each observation as defined above and ignore those observations that are not needed.

Conduct a thorough analysis of these data. You will need to conduct a test of hypotheses and submit a report summarizing your findings. This is a paired data problem (see Ott, 6.4) and so you must analyze the difference in encoding time for each observation. Remember that each observation represents a test datastream. See additional details below.

  1. Write a SAS program to analyze this dataset. (50%)

    Your program should do the following:

    1. Read your data from an external file.
    2. Compute the time difference for each observation.
    3. Execute the PRINT procedure.
    4. Use the appropriate SAS procedures to produce the statistics needed to conduct your hypothesis test.

    Note: For PROC PRINT, be sure to use labels for column headings rather than variable names. Use names for data sets and variables that are meaningful. You should generate an appropriate title for the output of these procedures.

  2. Write a short analysis (no more than two pages) of the output of your SAS program. (50%)

    Remember that your analysis is a test of hypotheses and so should at least address the following:

    1. State the primary hypotheses. That is, the NULL and ALTERNATIVE hypotheses for the experiment described above.
    2. Address the normality issue. That is, do you need to establish normality in order to address your primary hypotheses? Justify your answer.
      Note: If you think normality must be established, do not assume normality. Instead, state and address the normality hypotheses. By so doing, you will know if normality is reasonable.
    3. Determine the p-value for your primary hypotheses (i.e. compute the p-value as outlined in Step 3 of the Hypothesis Testing lecture notes). Remember to discuss the significance of the p-value obtained for the primary hypotheses.
    4. Given your findings, briefly discuss which of the implementations will be included in the final product (if necessary, conduct a Post-Hoc analysis). Also, if you find that the (I) implementation is significantly faster than the (C) implementation, then provide a point estimate of the average difference in speed with a 90% confidence interval.