The normal plot is a graphical procedure used to assess normality. It
is provided by
proc univariate
when the
plot
option is specified. Unfortunately,
as produced by
proc univariate,
this plot is not really usable since
it is hard to read.
Fortunately, we do not have to rely on
proc univariate.
A readable normal plot may be quite easily
produced from scratch by using
proc rank
in conjunction with
proc plot.
The following code produces a normal plot for the variable "time". Notice
that two options have been specified,
normal=blom
and
out=new2.
The first option tells
proc rank
to use a procedure developed by the statistician Blom to
produce normal scores (i.e. z-values). The second option specifies the name of a dataset
which will contain all the original data plus a new variable called
nscores.
This is the variable which will
contain normal scores for the variable "time".
proc rank
computes z-values (i.e. normal scores) for the values of "time" but it
bases its computation on the ranks of these values. Statistical theory
tells us that if a set of values are drawn from
a normal population we can estimate the z-scores if we
know their ranks.
This is the theory being applied here.
Hence, if the values of "time" are drawn from a normal population we should expect
a straight line when we plot these values against their estimated normal scores. We use
proc plot
to produce this plot.
A plot that is nearly linear suggests agreement with normality, whereas a
plot
that departs substantially from linearity suggests that the values are
not drawn from a normal distribution.
Code:
proc rank normal=blom out=new2;
var time;
ranks nscores;
proc plot;
plot time*nscores='*';
label nscores='Normal Scores';
Output:
Plot of TIME*NSCORES. Symbol used is '*'.
|
8 +
|
|
|
| * *
6 +
|
| *
|
|
4 +
|
|
|
| *
2 +
| *
|
|
| *
0 +
|
|
|
| *
-2 +
| *
|
|
|
-4 +
|
|
|
| * *
-6 +
|
| *
|
|
-8 +
---+-------------+-------------+-------------+-------------+--
-2 -1 0 1 2
Normal Scores