### The Standard Deviation (SD)

 It is difficult to understand why statisticians commonly limit their enquiries to averages and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as the of the native of one of our English countries, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be gotten rid of at once. Sir Francis Galton, 1822-1911
• The standard deviation SD is defined as

SD = √ average of squared deviations from the average

SD =  (x1 - x)2 + ... + (xn - x)2 n

• SPSS uses a slightly different definition for the SD:

SD+ =  (x1 - x)2 + ... + (xn - x)2 n - 1

• Most statisticians prefer SD+ for theoretical reasons; the argument goes like this:
because the true center of the dataset is unknown, you must estimate it using x, so the SD loses a little accuracy. Therefore divide by n-1 instead of n to compensate. (Dividing by n-1 instead of n makes SD+ a little larger than SD.)

• If n is large, SD and SD+ are very close in value.

• The SD describes about how far an observation tends to be away the sample average.

• Example 1: What is the SD of these observations?

20   10   15   15

1. Average = (20 + 10 + 15 + 15) / 4 = 60 / 4 = 15

2. Deviations from Average = 5   -5   0   0

3. Average of Squared Deviations = (52 + (-5)2 + 02 + 02) / 4 = 12.5

4. SD = √average of squared deviations = √12.5= 3.54

• The SD gives a plus or minus estimate of a measurement. If we measure the width of a part 100 times and obtain a average measurement of 3.4544 cm, with an SD of 0.0037 cm, we can say that the width of the part is 3.4544 ± 0.0037 cm.

• For a bell-shaped histogram, the SD is the best estimate of the spread.

• Warning:   Outliers can cause an inflated value of the SD. For this reason, SD and SD+ are not resistant statistics for estimating the spread of a histogram.

• If the histogram of a dataset is bell-shaped, it can be described with only two summary statistics, the average which estimates the center, x and the SD, which estimates the spread.

• The average and SD form a parsimonious summary of the data.

• If a histogram is not bell-shaped, then the five statistics Q0, Q1, Q2, Q3 and Q4 form a better summary.