Inferences - Two Populations
(m) contd.Readings:
Ott; 6.2(pg 275-278), 6.3, 6.5
Case 2: n1 small or n2 small:We continue with the second of the three cases where we have a small sample.
Theorem 2b: Let y1 and y2 be normally distributed but
sy1 is not equal to sy2. In this case (y1bar-y2bar) has a Behrens-Fisher distribution with mean and standard deviation:m
(y1bar-y2bar)=(my1 - my2)s
(y1bar-y2bar)=sqrt(s2y1/n1 + s2y2/n2)In 1938, Welch showed that we may use a Student t distribution with degrees of freedom given by the following expression to approximate this distribution:
Note: Sometimes referred to as the Satterthwaite approximation.
Definition 2b: Given Theorem 2b, an a % confidence interval for (my1 - my2) may be obtained thus:
L: (y1bar-y2bar) -
(tk;(100-a)/2)(s(y1bar-y2bar))
U: (y1bar-y2bar) +
(tk;(100-a)/2)(s(y1bar-y2bar))
where k is the degrees of freedom determined as in Theorem 2b above and:
s(y1bar-y2bar)= sqrt(s2y1/n1 + s2y2/n2)
Note: We will not do these computations manually. Instead, we will use the SAS PROC TTEST procedure to do the computations.
Problem:
A colleague is interested in testing the usability of two competing GUI products (i.e. product a and product b). She believes that product b is faster than product a but several experts argue that, on average, the products require the same time to complete a standard task suite. Your colleague randomly selects two groups of users and assigns product a to one group and product b to the other group. She then provides each user with a standard task suite and measures the time that each user takes to complete the task suite. Given the data collected, conduct a test of hypotheses.
Solution:
The required null and alternative hypotheses are:
H0: ma-mb=0
Ha: ma-mb>0
To solve this problem we must address several issues. It is clear that this is an independent sample problem and, from the data, we can see that we have two small samples. So the question to be addressed is whether this is a Case 2a, 2b, 2c, or 2d problem. That is, is normality reasonable? Also, are the population standard deviations equal? You may think of this as addressing the following two additional sets of hypotheses:
H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution
H0: sa=sb
Ha: sa!=sb
This SAS program may be used to get the necessary p-values.
Output:
The SAS System 1
------------------------------- group=a --------------------------------
The UNIVARIATE Procedure
Variable: time
Moments
N 12 Sum Weights 12
Mean 74.0833333 Sum Observations 889
Std Deviation 11.6576805 Variance 135.901515
Skewness -0.5102858 Kurtosis 0.4082119
Uncorrected SS 67355 Corrected SS 1494.91667
Coeff Variation 15.7359017 Std Error Mean 3.36528249
Basic Statistical Measures
Location Variability
Mean 74.08333 Std Deviation 11.65768
Median 74.00000 Variance 135.90152
Mode 74.00000 Range 40.00000
Interquartile Range 13.00000
NOTE: The mode displayed is the smallest of 2 modes with a count of 2.
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t 22.014 Pr > |t| <.0001
Sign M 6 Pr >= |M| 0.0005
Signed Rank S 39 Pr >= |S| 0.0005
Tests for Normality
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.94964 Pr < W 0.6317
Kolmogorov-Smirnov D 0.164734 Pr > D >0.1500
Cramer-von Mises W-Sq 0.040129 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.270727 Pr > A-Sq >0.2500
Quantiles (Definition 5)
Quantile Estimate
100% Max 90.0
99% 90.0
The UNIVARIATE Procedure
Variable: time
Quantiles (Definition 5)
Quantile Estimate
95% 90.0
90% 90.0
75% Q3 82.5
50% Median 74.0
25% Q1 69.5
10% 60.0
5% 50.0
1% 50.0
0% Min 50.0
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
50 12 76 5
60 11 80 4
69 10 85 3
70 9 90 1
71 8 90 2
------------------------------- group=b --------------------------------
The UNIVARIATE Procedure
Variable: time
Moments
N 10 Sum Weights 10
Mean 46.5 Sum Observations 465
Std Deviation 23.8245905 Variance 567.611111
Skewness 0.33636882 Kurtosis 0.24923925
Uncorrected SS 26731 Corrected SS 5108.5
Coeff Variation 51.2356784 Std Error Mean 7.53399702
Basic Statistical Measures
Location Variability
Mean 46.50000 Std Deviation 23.82459
Median 46.50000 Variance 567.61111
Mode . Range 82.00000
Interquartile Range 24.00000
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t 6.172023 Pr > |t| 0.0002
Sign M 5 Pr >= |M| 0.0020
Signed Rank S 27.5 Pr >= |S| 0.0020
Tests for Normality
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.975638 Pr < W 0.9376
Kolmogorov-Smirnov D 0.176456 Pr > D >0.1500
Cramer-von Mises W-Sq 0.033885 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.202176 Pr > A-Sq >0.2500
Quantiles (Definition 5)
Quantile Estimate
100% Max 90.0
99% 90.0
95% 90.0
90% 82.5
The UNIVARIATE Procedure
Variable: time
Quantiles (Definition 5)
Quantile Estimate
75% Q3 54.0
50% Median 46.5
25% Q1 30.0
10% 16.0
5% 8.0
1% 8.0
0% Min 8.0
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
8 10 50 5
24 9 51 4
30 8 54 3
40 7 75 2
43 6 90 1
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL
Variable group N Mean Mean Mean Std Dev
time a 12 66.676 74.083 81.49 8.2582
time b 10 29.457 46.5 63.543 16.387
time Diff (1-2) 11.354 27.583 43.813 13.902
Statistics
Upper CL
Variable group Std Dev Std Dev Std Err Minimum Maximum
time a 11.658 19.793 3.3653 50 90
time b 23.825 43.494 7.534 8 90
time Diff (1-2) 18.171 26.24 7.7802
T-Tests
Variable Method Variances DF t Value Pr > |t|
time Pooled Equal 20 3.55 0.0020
time Satterthwaite Unequal 12.5 3.34 0.0055
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
time Folded F 9 11 4.18 0.0293
Analysis:
The proc univariate output indicates that normality is reasonable for both
populations. The Equality of Variances section of the
proc ttest output provides the p-value (0.0293) for the equal standard
deviation hypothesis and so, since this is 2.93%, we
conclude that the standard deviations are different.
Note: We can now say that
this is a Case 2b problem.
Before proceeding we should check consistency. To do this, we need the sample means. Notice that these means are provided in the Statistics section of proc ttest. Since the difference in our sample means is consistent with our alternative hypothesis, we may now use the T-Tests section of the proc ttest output to determine the p-value for our original hypotheses. Since we have established that the population standard deviations are different (which means the variances are different) we use the p-value on the line with Unequal (i.e. 0.0055). However, proc ttest always provides p-values for two-sided alternative hypotheses and since our alternative is one-sided we must divide 0.0055 by 2 (i.e. 0.00275) to obtain the p-value for our problem. That is, the p-value for our usability problem is 0.275%. This p-value is highly significant and so we reject the null hypothesis and conclude that product b is indeed faster than product a.
Note: As an exercise, look at this dataset. This is the dataset that was used to do the in class demo. It is for the same problem but is slightly different. Replace the data following the datalines statement in the SAS program, execute the program and redo the analysis.
Non-Parametric Methods:
We will not develop a Theorem 2c or a
Theorem 2d since these are cases where
y1 is not normally distributed or
y2 is not normally distributed. When normality is not
reasonable, we must use Non-Parametric (otherwise known as
distribution free) methods to solve the problems. In
particular we will use the Wilcoxon Rank Sum Non-Parametric
method. Again, we will
not do the computations manually. We will use the SAS NPAR1WAY procedure
to do the computations.
Note: We will illustrate for
Case 2c only. Remember that Case 2c requires equal population variances. We
will relax this somewhat and require instead that the population distributions
are approximately the same shape.
Case 2d will not be considered in this class.
Problem:
A colleague is interested in assessing a software development methodology that supposedly leads to higher quality software. Your colleague randomly selects two groups of programmers and provides one group (group b) with training in this new methodology. The other group (group a) is used as the control group and so does not receive training in the methodology. She then provides each programmer in each group with a programming assignment and, at the end of coding and unit testing, carefully inspects the delivered code for defects. She computes the number of defects per 100 lines of code and asks you to help with the analysis.
Given the data collected, conduct a test of hypotheses.
Solution:
The required null and alternative hypotheses are:
H0: ma-mb=0
Ha: ma-mb>0
To solve this problem we must again address the issues mentioned for the problem above. It is clear that this is an independent sample problem and, from the data, we can see that we have two small samples. So the question to be addressed again, is whether this is a Case 2a, 2b, 2c, or 2d problem. That is, is normality reasonable? Also, are the population standard deviations equal? You may think of this as addressing the following two additional sets of hypotheses:
H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution
H0: sa=sb
Ha: sa!=sb
This SAS program may be used to get the necessary p-values.
Output:
The SAS System 1
------------------------------- group=a --------------------------------
The UNIVARIATE Procedure
Variable: errors
Moments
N 17 Sum Weights 17
Mean 8.02941176 Sum Observations 136.5
Std Deviation 4.22104322 Variance 17.8172059
Skewness 1.67416729 Kurtosis 3.52424225
Uncorrected SS 1381.09 Corrected SS 285.075294
Coeff Variation 52.5697691 Std Error Mean 1.02375336
Basic Statistical Measures
Location Variability
Mean 8.029412 Std Deviation 4.22104
Median 7.400000 Variance 17.81721
Mode 7.400000 Range 17.70000
Interquartile Range 2.40000
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t 7.843112 Pr > |t| <.0001
Sign M 8.5 Pr >= |M| <.0001
Signed Rank S 76.5 Pr >= |S| <.0001
Tests for Normality
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.832123 Pr < W 0.0058
Kolmogorov-Smirnov D 0.246125 Pr > D <0.0100
Cramer-von Mises W-Sq 0.201647 Pr > W-Sq <0.0050
Anderson-Darling A-Sq 1.134096 Pr > A-Sq <0.0050
Quantiles (Definition 5)
Quantile Estimate
100% Max 19.9
99% 19.9
95% 19.9
90% 15.7
The UNIVARIATE Procedure
Variable: errors
Quantiles (Definition 5)
Quantile Estimate
75% Q3 8.7
50% Median 7.4
25% Q1 6.3
10% 4.1
5% 2.2
1% 2.2
0% Min 2.2
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
2.2 17 8.7 5
4.1 16 9.4 4
4.2 15 9.5 3
5.0 14 15.7 2
6.3 13 19.9 1
Stem Leaf # Boxplot
18 9 1 *
16
14 7 1 0
12
10
8 2745 4 +--+--+
6 3891447 7 *-----*
4 120 3 |
2 2 1 0
----+----+----+----+
The UNIVARIATE Procedure
Variable: errors
Normal Probability Plot
19+ * ++
| +++++
| * ++++
| +++++
11+ +++++
| +++++** * *
| **+*+** *
| * *+*+++
3+ * +++++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
------------------------------- group=b --------------------------------
The UNIVARIATE Procedure
Variable: errors
Moments
N 10 Sum Weights 10
Mean 5.45 Sum Observations 54.5
Std Deviation 3.75625404 Variance 14.1094444
Skewness 1.85338027 Kurtosis 3.82689713
Uncorrected SS 424.01 Corrected SS 126.985
Coeff Variation 68.9220926 Std Error Mean 1.18783182
Basic Statistical Measures
Location Variability
Mean 5.450000 Std Deviation 3.75625
Median 4.650000 Variance 14.10944
Mode . Range 12.90000
Interquartile Range 2.10000
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t 4.588192 Pr > |t| 0.0013
Sign M 5 Pr >= |M| 0.0020
Signed Rank S 27.5 Pr >= |S| 0.0020
Tests for Normality
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.808401 Pr < W 0.0183
Kolmogorov-Smirnov D 0.30531 Pr > D <0.0100
Cramer-von Mises W-Sq 0.142663 Pr > W-Sq 0.0241
Anderson-Darling A-Sq 0.81031 Pr > A-Sq 0.0235
Quantiles (Definition 5)
Quantile Estimate
100% Max 14.60
99% 14.60
95% 14.60
90% 11.70
The UNIVARIATE Procedure
Variable: errors
Quantiles (Definition 5)
Quantile Estimate
75% Q3 5.40
50% Median 4.65
25% Q1 3.30
10% 2.05
5% 1.70
1% 1.70
0% Min 1.70
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
1.7 10 5.0 5
2.4 9 5.1 4
3.3 8 5.4 3
3.9 7 8.8 2
4.3 6 14.6 1
Stem Leaf # Boxplot
14 6 1 *
12
10
8 8 1 0
6
4 3014 4 +--+--+
2 439 3 +-----+
0 7 1 |
----+----+----+----+
The UNIVARIATE Procedure
Variable: errors
Normal Probability Plot
15+ * +++
| +++++
| +++++
| +++*++
| +++++
| ++*+* * *
| *++*++*
1+ *+++++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The SAS System 7
23:25 Friday, February 21, 2003
The UNIVARIATE Procedure
Variable: errors
Schematic Plots
|
20 + *
|
|
|
18 +
|
|
|
16 +
| 0
|
| *
14 +
|
|
|
12 +
|
|
|
10 +
| |
| | 0
| +-----+
8 + | + |
| *-----*
| | |
| +-----+
6 + |
| | +--+--+
| | | |
| | *-----*
4 + | | |
| +-----+
| |
| |
2 + 0 |
| |
|
|
0 +
------------+-----------+-----------
group a b
The NPAR1WAY Procedure
Wilcoxon Scores (Rank Sums) for Variable errors
Classified by Variable group
Sum of Expected Std Dev Mean
group N Scores Under H0 Under H0 Score
---------------------------------------------------------------------
a 17 277.50 238.0 19.910412 16.323529
b 10 100.50 140.0 19.910412 10.050000
Average scores were used for ties.
Wilcoxon Two-Sample Test
Statistic 100.5000
Normal Approximation
Z -1.9588
One-Sided Pr < Z 0.0251
Two-Sided Pr > |Z| 0.0501
t Approximation
One-Sided Pr < Z 0.0305
Two-Sided Pr > |Z| 0.0609
Z includes a continuity correction of 0.5.
Kruskal-Wallis Test
Chi-Square 3.9358
DF 1
Pr > Chi-Square 0.0473
Analysis:
The proc univariate output indicates that normality is NOT
reasonable for either
population. For this class, we do not have
an analytic procedure to determine if the shapes of the distributions are
equal. Instead, we
will use the box-plots and histograms produced
by proc univariate for each sample to see
if this is reasonable. Since these plots look the same we will
conclude that the population distributions are approximately the same shape.
Note: We can now say that
this is a Case 2c problem.
Before proceeding we should check consistency. To do this, we will compare the entries under the Mean Score heading from the npar1way procedure. Since the difference is consistent with our alternative hypothesis, we may now use the One-Sided Pr p-value from the Wilcoxon Two-Sample Test section of the output to assess our original hypotheses. Hence, the p-value for our methodology problem is 3.05% and so this p-value is significant and we reject the null hypothesis and conclude that the methodology is effective.