CSC423/324 - Data Analysis
Quiz #7

There are two questions for a total of 40 points. Plan on spending about 20 minutes on all questions.

You may use your notes to answer the questions. Please submit answers by email to jmorgan1@condor.depaul.edu no later than Monday, 3/24. Do not send attachments. Embed your answers in the body of your email.

  1. Consider the Microsoft Word training problem. That is, a training consultant believes that training can improve the efficiency of casual Microsoft Word users and so decides to conduct an experiment to investigate. She randomly selects two groups of Microsoft Word users from her organization. One group is the Control group and the other is the Treatment group (i.e. the group that receives training). She assigns a suite of tasks to each individual in each group and records the time required to complete the suite. (20pts)

    1. Given the problem statement above, identify and state the null and alternative hypotheses.
    2. Examine the proc reg output below. Let us say that x=0 identifies times for the treatment group and x=1 identifies times for the control group. Given this output, and your hypotheses, complete the following:
      1. Derive the sample means.
      2. Conduct a test of hypotheses. Remember to state all necessary assumptions.
      3. Comment on the consultants point of view.
                               Parameter Estimates
      
                    Parameter     Standard
       Variable     Estimate        Error    t Value    Pr > |t|
      
       Intercept     6.35112      1.08170    5.87142      <.0001
       x             2.01050      1.52975    1.31426      0.0950
      
      

  2. Consider the quality score prediction problem. That is, you are interested in deriving a multiple regression model to predict quality score from two independent variables. Quality score is the time to the first failure in hours. However, in this case, the independent variables are all-uses coverage, x1 (i.e. a coverage measure that is similar to decision coverage), and programmer experience, x2 (i.e. programming experience in years). (20 pts)

    Examine the proc reg output below. Given this output, complete the following:

    1. Comment on the estimates of the slope parameters. That is, interpret the estimates and comment on the importance of each parameter to the model.
    2. Your colleagues argue that a unit increase in all-uses coverage will lead to an increase in execution time to first failure of 240 minutes. You believe that the benefit of this additional coverage is better than they contend. Formulate the necessary hypotheses to resolve this issue. Given the proc reg output, conduct a test of hypotheses.
                             Parameter Estimates
    
                     Parameter       Standard
        Variable     Estimate          Error    t Value    Pr > |t|
    
        Intercept     17.33146       5.94571        8.50     0.0172
        x1             5.42188       1.87820        8.33     0.0180
        x2             3.00957       1.47685        4.15     0.0720