Remember that a normality test will be necessary if your sample is small. In this case you will have two sets of hypotheses. In addition to the hypotheses for the stated problem (i.e. the primary hypotheses) you will need normality hypotheses (i.e. the secondary hypotheses). Your normality hypotheses should be stated thus:
H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution
Proc univariate may be used to test these hypotheses and will also provide the statistics required for testing your primary hypotheses.
Note: As a bonus, univariate may also be used to test your primary hypotheses. However, additional work, which we will not cover, will be necessary to ensure that the correct statistics and p-values are produced.
Example
For some hypothesis testing problem, you are interested in determining if the population mean is zero. You think the mean may be greater than zero. Consider the following hypotheses:
To correctly use your proc univariate output note the following:
The following code and output illustrate how proc univariate may be used. The output is annotated and identifies the p-value for the normality test.
Code:
options pagesize=53 linesize=76; data new; infile 'in.dat'; input x; label x='Descriptive Label'; run; proc sort; by x; title 'Testing for Population Mean'; proc print label; var x; proc means n nmiss mean std stderr t prt maxdec=3; var x; proc univariate normal; var x; run;
Testing for Population Mean 1
Descriptive
Obs Label
1 -2
2 -1
3 0
4 1
5 4
Testing for Population Mean 2
The MEANS Procedure
Analysis Variable : x Descriptive Label
N
N Miss Mean Std Dev Std Error t Value Pr > |t|
--------------------------------------------------------------------------
5 0 0.400 2.302 1.030 0.39 0.7174
--------------------------------------------------------------------------
Testing for Population Mean 3
The UNIVARIATE Procedure
Variable: x (Descriptive Label)
Moments
N 5 Sum Weights 5
Mean 0.4 Sum Observations 2
Std Deviation 2.30217289 Variance 5.3
Skewness 1.03265854 Kurtosis 1.12851549
Uncorrected SS 22 Corrected SS 21.2
Coeff Variation 575.543222 Std Error Mean 1.02956301
Basic Statistical Measures
Location Variability
Mean 0.400000 Std Deviation 2.30217
Median 0.000000 Variance 5.30000
Mode . Range 6.00000
Interquartile Range 2.00000
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t 0.388514 Pr > |t| 0.7174
Sign M 0 Pr >= |M| 1.0000
Signed Rank S 0.5 Pr >= |S| 1.0000
Tests for Normality
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.94273 Pr < W 0.3853 <--use this
Kolmogorov-Smirnov D 0.197191 Pr > D >0.1500 p-value
Cramer-von Mises W-Sq 0.035726 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.240319 Pr > A-Sq >0.2500
Quantiles (Definition 5)
Quantile Estimate
100% Max 4
99% 4
95% 4
90% 4
Testing for Population Mean 4
The UNIVARIATE Procedure
Variable: x (Descriptive Label)
Quantiles (Definition 5)
Quantile Estimate
75% Q3 1
50% Median 0
25% Q1 -1
10% -2
5% -2
1% -2
0% Min -2
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
-2 1 -2 1
-1 2 -1 2
0 3 0 3
1 4 1 4
4 5 4 5