Hypothesis Testing & SAS

Remember that a normality test will be necessary if your sample is small. In this case you will have two sets of hypotheses. In addition to the hypotheses for the stated problem (i.e. the primary hypotheses) you will need normality hypotheses (i.e. the secondary hypotheses). Your normality hypotheses should be stated thus:

H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution

Proc univariate may be used to test these hypotheses and will also provide the statistics required for testing your primary hypotheses.

Note: As a bonus, univariate may also be used to test your primary hypotheses. However, additional work, which we will not cover, will be necessary to ensure that the correct statistics and p-values are produced.

Example

For some hypothesis testing problem, you are interested in determining if the population mean is zero. You think the mean may be greater than zero. Consider the following hypotheses:

H0:m=0
Ha:m>0

To correctly use your proc univariate output note the following:

  1. The normality test is only required if the sample size is small.
  2. The p-value for the normality test is the number adjacent to Pr<W. Notice that it is expressed as a decimal not a percentage.

The following code and output illustrate how proc univariate may be used. The output is annotated and identifies the p-value for the normality test.

Code:

options pagesize=53 linesize=76;
data new;
 infile 'in.dat';
 input x;
 label x='Descriptive Label';
 run;
proc sort;
 by x;
title 'Testing for Population Mean';
proc print label;
 var x;
proc means n nmiss mean std stderr t prt maxdec=3;
 var x;
proc univariate normal;
 var x;
 run;





Annotated Output:


                        Testing for Population Mean                        1

                                    Descriptive
                             Obs       Label

                              1          -2    
                              2          -1    
                              3           0    
                              4           1    
                              5           4    



                        Testing for Population Mean                        2

                            The MEANS Procedure

                   Analysis Variable : x Descriptive Label
 
        N
 N   Miss           Mean        Std Dev      Std Error   t Value   Pr > |t|
 --------------------------------------------------------------------------
 5      0          0.400          2.302          1.030      0.39     0.7174
 --------------------------------------------------------------------------



                        Testing for Population Mean                        3

                          The UNIVARIATE Procedure
                     Variable:  x  (Descriptive Label)

                                  Moments

      N                           5    Sum Weights                  5
      Mean                      0.4    Sum Observations             2
      Std Deviation      2.30217289    Variance                   5.3
      Skewness           1.03265854    Kurtosis            1.12851549
      Uncorrected SS             22    Corrected SS              21.2
      Coeff Variation    575.543222    Std Error Mean      1.02956301


                         Basic Statistical Measures
 
               Location                    Variability

           Mean     0.400000     Std Deviation            2.30217
           Median   0.000000     Variance                 5.30000
           Mode      .           Range                    6.00000
                                 Interquartile Range      2.00000


                         Tests for Location: Mu0=0
 
              Test           -Statistic-    -----p Value------

              Student's t    t  0.388514    Pr > |t|    0.7174
              Sign           M         0    Pr >= |M|   1.0000
              Signed Rank    S       0.5    Pr >= |S|   1.0000


                            Tests for Normality
 
         Test                  --Statistic---    -----p Value------

         Shapiro-Wilk          W      0.94273    Pr < W      0.3853 <--use this
         Kolmogorov-Smirnov    D     0.197191    Pr > D     >0.1500    p-value
         Cramer-von Mises      W-Sq  0.035726    Pr > W-Sq  >0.2500
         Anderson-Darling      A-Sq  0.240319    Pr > A-Sq  >0.2500


                          Quantiles (Definition 5)
 
                           Quantile      Estimate

                           100% Max             4
                           99%                  4
                           95%                  4
                           90%                  4


                        Testing for Population Mean                        4

                          The UNIVARIATE Procedure
                     Variable:  x  (Descriptive Label)

                          Quantiles (Definition 5)
 
                           Quantile      Estimate

                           75% Q3               1
                           50% Median           0
                           25% Q1              -1
                           10%                 -2
                           5%                  -2
                           1%                  -2
                           0% Min              -2


                            Extreme Observations
 
                    ----Lowest----        ----Highest---
 
                    Value      Obs        Value      Obs

                       -2        1           -2        1
                       -1        2           -1        2
                        0        3            0        3
                        1        4            1        4
                        4        5            4        5