Remember that a normality test will be necessary if your sample is small. In this case you will have two sets of hypotheses. In addition to the hypotheses for the stated problem (i.e. the primary hypotheses) you will need normality hypotheses (i.e. the secondary hypotheses). Your normality hypotheses should be stated thus:
H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution
Proc univariate may be used to test these hypotheses and will also provide the statistics required for testing your primary hypotheses.
Note: As a bonus, univariate may also be used to test your primary hypotheses. However, additional work, which we will not cover, will be necessary to ensure that the correct statistics and p-values are produced.
Example
For some hypothesis testing problem, you are interested in determining if the population mean is zero. You think the mean may be greater than zero. Consider the following hypotheses:
To correctly use your proc univariate output note the following:
The following code and output illustrate how proc univariate may be used. The output is annotated and identifies the p-value for the normality test.
Code:
options pagesize=53 linesize=76; data new; infile 'in.dat'; input x; label x='Descriptive Label'; run; proc sort; by x; title 'Testing for Population Mean'; proc print label; var x; proc means n nmiss mean std stderr t prt maxdec=3; var x; proc univariate normal; var x; run;
Testing for Population Mean 1 Descriptive Obs Label 1 -2 2 -1 3 0 4 1 5 4 Testing for Population Mean 2 The MEANS Procedure Analysis Variable : x Descriptive Label N N Miss Mean Std Dev Std Error t Value Pr > |t| -------------------------------------------------------------------------- 5 0 0.400 2.302 1.030 0.39 0.7174 -------------------------------------------------------------------------- Testing for Population Mean 3 The UNIVARIATE Procedure Variable: x (Descriptive Label) Moments N 5 Sum Weights 5 Mean 0.4 Sum Observations 2 Std Deviation 2.30217289 Variance 5.3 Skewness 1.03265854 Kurtosis 1.12851549 Uncorrected SS 22 Corrected SS 21.2 Coeff Variation 575.543222 Std Error Mean 1.02956301 Basic Statistical Measures Location Variability Mean 0.400000 Std Deviation 2.30217 Median 0.000000 Variance 5.30000 Mode . Range 6.00000 Interquartile Range 2.00000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 0.388514 Pr > |t| 0.7174 Sign M 0 Pr >= |M| 1.0000 Signed Rank S 0.5 Pr >= |S| 1.0000 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.94273 Pr < W 0.3853 <--use this Kolmogorov-Smirnov D 0.197191 Pr > D >0.1500 p-value Cramer-von Mises W-Sq 0.035726 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.240319 Pr > A-Sq >0.2500 Quantiles (Definition 5) Quantile Estimate 100% Max 4 99% 4 95% 4 90% 4 Testing for Population Mean 4 The UNIVARIATE Procedure Variable: x (Descriptive Label) Quantiles (Definition 5) Quantile Estimate 75% Q3 1 50% Median 0 25% Q1 -1 10% -2 5% -2 1% -2 0% Min -2 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs -2 1 -2 1 -1 2 -1 2 0 3 0 3 1 4 1 4 4 5 4 5