Hypothesis Testing & Proc Univariate

Remember that a normality test will be necessary if your sample is small. In this case you will have two sets of hypotheses. In addition to the hypotheses for the stated problem (i.e. the primary hypotheses) you will need normality hypotheses (i.e. the secondary hypotheses). Your normality hypotheses should be stated thus:

H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution

Proc univariate may be used to test these hypotheses and will also provide the statistics required for testing your primary hypotheses. As a bonus, univariate may also be used to test your primary hypotheses. However, additional work will be necessary to ensure that the correct statistics and p-values are produced.

Proc univariate assumes the following hypotheses:

H0:m=0
Ha:m<>0

Since there is no way to force univariate to use your hypotheses, adjustments will be needed to transform your problem into this form. First, the alternative hypothesis above is as a two sided alternative and so the resulting p-value must be divided by two if the desired hypothesis is one sided. Second, in most cases you will not be interested in testing if the population mean is zero. However, if not, it is easy to transform your data so your null hypothesis is of this form.

Consider the following hypotheses:

H0:m=42
Ha:m>42

Let us say you subtracted 42 from each observation in your sample. The hypotheses for your adjusted data would be:

H0:m-42=0
Ha:m-42>0

Note that the p-value obtained in this case will be the same as the p-value for the original hypotheses. So, since proc univariate assumes a null hypothesis of this form, all we need to do to use proc univariate for the original hypotheses is to adjust our observations, as mentioned above, and as done in the code below.

To correctly use your proc univariate output note the following:

  1. The normality test is only required if the sample size is small.
  2. The Pr>|T| p-value is the p-value used for your primary hypotheses and is used if the sample size is small and normality is reasonable. It is also used if the sample size is large.
  3. The Pr>=|S| p-value is only used if normality is not reasonable or for sensitivity.
  4. Remember that the default hypothesis tested by univariate (i.e. the Pr>|T| p-value) is two sided and will need adjustment for your problem.
  5. The Pr>=|S| p-value is also two sided and will need adjustment.
  6. The p-value for the normality test (i.e. the Pr<W p-value) is always used as is and is never adjusted.

The following code and output illustrate how proc univariate may be used to test both your primary and secondary hypotheses. The output is annotated and identifies the p-values that are required.

Code:

options pagesize=53 linesize=76;
data faults;
 input fault;
 centered=fault-42;
 label fault='Discovered Faults'
       centered='Faults - 42';
 datalines;
  42
  36
  46
  43
  41
  35
  43
  45
  40
  39
 run;
proc sort;
 by fault;
title 'Testing for Population Mean Faults';
proc print label;
 var centered fault;
proc means n nmiss mean std stderr t prt maxdec=3;
 var centered;
proc univariate normal plot;
 var centered;
 run;





Annotated Output:


                     Testing for Population Mean Faults                     

                               Faults    Discovered
                        OBS     - 42       Faults

                         1       -2          40    
                         2       -1          41    
                         3        0          42    
                         4        1          43    
                         5        4          46    

 Analysis Variable : CENTERED Faults - 42


 N  Nmiss          Mean       Std Dev     Std Error             T  Prob>|T|
 --------------------------------------------------------------------------
 5      0         0.400         2.302         1.030         0.389    0.7174
 --------------------------------------------------------------------------

                            Univariate Procedure

Variable=CENTERED      Faults - 42

                                  Moments

                  N                 5  Sum Wgts          5
                  Mean            0.4  Sum               2
                  Std Dev    2.302173  Variance        5.3
                  Skewness   1.032659  Kurtosis   1.128515
                  USS              22  CSS            21.2
                  CV         575.5432  Std Mean   1.029563
                  T:Mean=0   0.388514  Pr>|T|       0.7174 <--2 sided p-value
                  Num ^= 0          4  Num > 0           2
                  M(Sign)           0  Pr>=|M|      1.0000
                  Sgn Rank        0.5  Pr>=|S|      1.0000 <--altern to Pr>|T|
                  W:Normal    0.94258  Pr<W         0.6922 <--Normality p-value


                              Quantiles(Def=5)

                   100% Max         4       99%         4
                    75% Q3          1       95%         4
                    50% Med         0       90%         4
                    25% Q1         -1       10%        -2
                     0% Min        -2        5%        -2
                                             1%        -2
                   Range            6                    
                   Q3-Q1            2                    
                   Mode            -2                    


                                  Extremes

                     Lowest    Obs     Highest    Obs
                         -2(       1)       -2(       1)
                         -1(       2)       -1(       2)
                          0(       3)        0(       3)
                          1(       4)        1(       4)
                          4(       5)        4(       5)



                            Univariate Procedure

Variable=CENTERED      Faults - 42

              Stem Leaf                     #             Boxplot
                 4 0                        1                |   
                 3                                           |   
                 2                                           |   
                 1 0                        1             +-----+
                 0 0                        1             *--+--*
                -0                                        |     |
                -1 0                        1             +-----+
                -2 0                        1                |   
                   ----+----+----+----+              


                               Normal Probability Plot              
             4.5+                                     *   ++++      
                |                                     ++++          
                |                                +++++              
                |                            ++*+                   
                |                        +*++                       
                |                   +*+++                           
                |             * ++++                                
            -2.5+           ++++                                    
                 +----+----+----+----+----+----+----+----+----+----+
                     -2        -1         0        +1        +2