Remember that a normality test will be necessary if your sample is small. In this case you will have two sets of hypotheses. In addition to the hypotheses for the stated problem (i.e. the primary hypotheses) you will need normality hypotheses (i.e. the secondary hypotheses). Your normality hypotheses should be stated thus:
Proc univariate may be used to test these hypotheses and will also provide the statistics required for testing your primary hypotheses. As a bonus, univariate may also be used to test your primary hypotheses. However, additional work will be necessary to ensure that the correct statistics and p-values are produced.
Proc univariate assumes the following hypotheses:
Since there is no way to force univariate to use your hypotheses, adjustments will be needed to transform your problem into this form. First, the alternative hypothesis above is as a two sided alternative and so the resulting p-value must be divided by two if the desired hypothesis is one sided. Second, in most cases you will not be interested in testing if the population mean is zero. However, if not, it is easy to transform your data so your null hypothesis is of this form.
Consider the following hypotheses:
Let us say you subtracted 42 from each observation in your sample. The hypotheses for your adjusted data would be:
Note that the p-value obtained in this case will be the same as the p-value for the original hypotheses. So, since proc univariate assumes a null hypothesis of this form, all we need to do to use proc univariate for the original hypotheses is to adjust our observations, as mentioned above, and as done in the code below.
To correctly use your proc univariate output note the following:
The following code and output illustrate how proc univariate may be used to test both your primary and secondary hypotheses. The output is annotated and identifies the p-values that are required.
Code:
options pagesize=53 linesize=76; data faults; input fault; centered=fault-42; label fault='Discovered Faults' centered='Faults - 42'; datalines; 42 36 46 43 41 35 43 45 40 39 run; proc sort; by fault; title 'Testing for Population Mean Faults'; proc print label; var centered fault; proc means n nmiss mean std stderr t prt maxdec=3; var centered; proc univariate normal plot; var centered; run;
Testing for Population Mean Faults Faults Discovered OBS - 42 Faults 1 -2 40 2 -1 41 3 0 42 4 1 43 5 4 46 Analysis Variable : CENTERED Faults - 42 N Nmiss Mean Std Dev Std Error T Prob>|T| -------------------------------------------------------------------------- 5 0 0.400 2.302 1.030 0.389 0.7174 -------------------------------------------------------------------------- Univariate Procedure Variable=CENTERED Faults - 42 Moments N 5 Sum Wgts 5 Mean 0.4 Sum 2 Std Dev 2.302173 Variance 5.3 Skewness 1.032659 Kurtosis 1.128515 USS 22 CSS 21.2 CV 575.5432 Std Mean 1.029563 T:Mean=0 0.388514 Pr>|T| 0.7174 <--2 sided p-value Num ^= 0 4 Num > 0 2 M(Sign) 0 Pr>=|M| 1.0000 Sgn Rank 0.5 Pr>=|S| 1.0000 <--altern to Pr>|T| W:Normal 0.94258 Pr<W 0.6922 <--Normality p-value Quantiles(Def=5) 100% Max 4 99% 4 75% Q3 1 95% 4 50% Med 0 90% 4 25% Q1 -1 10% -2 0% Min -2 5% -2 1% -2 Range 6 Q3-Q1 2 Mode -2 Extremes Lowest Obs Highest Obs -2( 1) -2( 1) -1( 2) -1( 2) 0( 3) 0( 3) 1( 4) 1( 4) 4( 5) 4( 5) Univariate Procedure Variable=CENTERED Faults - 42 Stem Leaf # Boxplot 4 0 1 | 3 | 2 | 1 0 1 +-----+ 0 0 1 *--+--* -0 | | -1 0 1 +-----+ -2 0 1 | ----+----+----+----+ Normal Probability Plot 4.5+ * ++++ | ++++ | +++++ | ++*+ | +*++ | +*+++ | * ++++ -2.5+ ++++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2