Inferences - Two Populations
(m) contd.Readings:
Ott; 6.2(pg 275-278), 6.3, 6.5
Case 2: n1 small or n2 small:We continue with the second of the three cases where we have a small sample.
Theorem 2b: Let y1 and y2 be normally distributed but
sy1 is not equal to sy2. In this case (y1bar-y2bar) has a Behrens-Fisher distribution with mean and standard deviation:m
(y1bar-y2bar)=(my1 - my2)s
(y1bar-y2bar)=sqrt(s2y1/n1 + s2y2/n2)In 1938, Welch showed that we may use a Student t distribution with degrees of freedom given by the following expression to approximate this distribution:
Note: Sometimes referred to as the Satterthwaite approximation.
Definition 2b: Given Theorem 2b, an a % confidence interval for (my1 - my2) may be obtained thus:
L: (y1bar-y2bar) -
(tk;(100-a)/2)(s(y1bar-y2bar))
U: (y1bar-y2bar) +
(tk;(100-a)/2)(s(y1bar-y2bar))
where k is the degrees of freedom determined as in Theorem 2b above and:
s(y1bar-y2bar)= sqrt(s2y1/n1 + s2y2/n2)
Note: We will not do these computations manually. Instead, we will use the SAS PROC TTEST procedure to do the computations.
Problem:
A colleague is interested in testing the usability of two competing GUI products (i.e. product a and product b). She believes that product b is faster than product a but several experts argue that, on average, the products require the same time to complete a standard task suite. Your colleague randomly selects two groups of users and assigns product a to one group and product b to the other group. She then provides each user with a standard task suite and measures the time that each user takes to complete the task suite. Given the data collected, conduct a test of hypotheses.
Solution:
The required null and alternative hypotheses are:
H0: ma-mb=0
Ha: ma-mb>0
To solve this problem we must address several issues. It is clear that this is an independent sample problem and, from the data, we can see that we have two small samples. So the question to be addressed is whether this is a Case 2a, 2b, 2c, or 2d problem. That is, is normality reasonable? Also, are the population standard deviations equal? You may think of this as addressing the following two additional sets of hypotheses:
H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution
H0: sa=sb
Ha: sa!=sb
This SAS program may be used to get the necessary p-values.
Output:
The SAS System 1 ------------------------------- group=a -------------------------------- The UNIVARIATE Procedure Variable: time Moments N 12 Sum Weights 12 Mean 74.0833333 Sum Observations 889 Std Deviation 11.6576805 Variance 135.901515 Skewness -0.5102858 Kurtosis 0.4082119 Uncorrected SS 67355 Corrected SS 1494.91667 Coeff Variation 15.7359017 Std Error Mean 3.36528249 Basic Statistical Measures Location Variability Mean 74.08333 Std Deviation 11.65768 Median 74.00000 Variance 135.90152 Mode 74.00000 Range 40.00000 Interquartile Range 13.00000 NOTE: The mode displayed is the smallest of 2 modes with a count of 2. Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 22.014 Pr > |t| <.0001 Sign M 6 Pr >= |M| 0.0005 Signed Rank S 39 Pr >= |S| 0.0005 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.94964 Pr < W 0.6317 Kolmogorov-Smirnov D 0.164734 Pr > D >0.1500 Cramer-von Mises W-Sq 0.040129 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.270727 Pr > A-Sq >0.2500 Quantiles (Definition 5) Quantile Estimate 100% Max 90.0 99% 90.0 The UNIVARIATE Procedure Variable: time Quantiles (Definition 5) Quantile Estimate 95% 90.0 90% 90.0 75% Q3 82.5 50% Median 74.0 25% Q1 69.5 10% 60.0 5% 50.0 1% 50.0 0% Min 50.0 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 50 12 76 5 60 11 80 4 69 10 85 3 70 9 90 1 71 8 90 2 ------------------------------- group=b -------------------------------- The UNIVARIATE Procedure Variable: time Moments N 10 Sum Weights 10 Mean 46.5 Sum Observations 465 Std Deviation 23.8245905 Variance 567.611111 Skewness 0.33636882 Kurtosis 0.24923925 Uncorrected SS 26731 Corrected SS 5108.5 Coeff Variation 51.2356784 Std Error Mean 7.53399702 Basic Statistical Measures Location Variability Mean 46.50000 Std Deviation 23.82459 Median 46.50000 Variance 567.61111 Mode . Range 82.00000 Interquartile Range 24.00000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 6.172023 Pr > |t| 0.0002 Sign M 5 Pr >= |M| 0.0020 Signed Rank S 27.5 Pr >= |S| 0.0020 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.975638 Pr < W 0.9376 Kolmogorov-Smirnov D 0.176456 Pr > D >0.1500 Cramer-von Mises W-Sq 0.033885 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.202176 Pr > A-Sq >0.2500 Quantiles (Definition 5) Quantile Estimate 100% Max 90.0 99% 90.0 95% 90.0 90% 82.5 The UNIVARIATE Procedure Variable: time Quantiles (Definition 5) Quantile Estimate 75% Q3 54.0 50% Median 46.5 25% Q1 30.0 10% 16.0 5% 8.0 1% 8.0 0% Min 8.0 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 8 10 50 5 24 9 51 4 30 8 54 3 40 7 75 2 43 6 90 1 The TTEST Procedure Statistics Lower CL Upper CL Lower CL Variable group N Mean Mean Mean Std Dev time a 12 66.676 74.083 81.49 8.2582 time b 10 29.457 46.5 63.543 16.387 time Diff (1-2) 11.354 27.583 43.813 13.902 Statistics Upper CL Variable group Std Dev Std Dev Std Err Minimum Maximum time a 11.658 19.793 3.3653 50 90 time b 23.825 43.494 7.534 8 90 time Diff (1-2) 18.171 26.24 7.7802 T-Tests Variable Method Variances DF t Value Pr > |t| time Pooled Equal 20 3.55 0.0020 time Satterthwaite Unequal 12.5 3.34 0.0055 Equality of Variances Variable Method Num DF Den DF F Value Pr > F time Folded F 9 11 4.18 0.0293Analysis:
The proc univariate output indicates that normality is reasonable for both
populations. The Equality of Variances section of the
proc ttest output provides the p-value (0.0293) for the equal standard
deviation hypothesis and so, since this is 2.93%, we
conclude that the standard deviations are different.
Note: We can now say that
this is a Case 2b problem.
Before proceeding we should check consistency. To do this, we need the sample means. Notice that these means are provided in the Statistics section of proc ttest. Since the difference in our sample means is consistent with our alternative hypothesis, we may now use the T-Tests section of the proc ttest output to determine the p-value for our original hypotheses. Since we have established that the population standard deviations are different (which means the variances are different) we use the p-value on the line with Unequal (i.e. 0.0055). However, proc ttest always provides p-values for two-sided alternative hypotheses and since our alternative is one-sided we must divide 0.0055 by 2 (i.e. 0.00275) to obtain the p-value for our problem. That is, the p-value for our usability problem is 0.275%. This p-value is highly significant and so we reject the null hypothesis and conclude that product b is indeed faster than product a.
Note: As an exercise, look at this dataset. This is the dataset that was used to do the in class demo. It is for the same problem but is slightly different. Replace the data following the datalines statement in the SAS program, execute the program and redo the analysis.
Non-Parametric Methods:
We will not develop a Theorem 2c or a
Theorem 2d since these are cases where
y1 is not normally distributed or
y2 is not normally distributed. When normality is not
reasonable, we must use Non-Parametric (otherwise known as
distribution free) methods to solve the problems. In
particular we will use the Wilcoxon Rank Sum Non-Parametric
method. Again, we will
not do the computations manually. We will use the SAS NPAR1WAY procedure
to do the computations.
Note: We will illustrate for
Case 2c only. Remember that Case 2c requires equal population variances. We
will relax this somewhat and require instead that the population distributions
are approximately the same shape.
Case 2d will not be considered in this class.
Problem:
A colleague is interested in assessing a software development methodology that supposedly leads to higher quality software. Your colleague randomly selects two groups of programmers and provides one group (group b) with training in this new methodology. The other group (group a) is used as the control group and so does not receive training in the methodology. She then provides each programmer in each group with a programming assignment and, at the end of coding and unit testing, carefully inspects the delivered code for defects. She computes the number of defects per 100 lines of code and asks you to help with the analysis.
Given the data collected, conduct a test of hypotheses.
Solution:
The required null and alternative hypotheses are:
H0: ma-mb=0
Ha: ma-mb>0
To solve this problem we must again address the issues mentioned for the problem above. It is clear that this is an independent sample problem and, from the data, we can see that we have two small samples. So the question to be addressed again, is whether this is a Case 2a, 2b, 2c, or 2d problem. That is, is normality reasonable? Also, are the population standard deviations equal? You may think of this as addressing the following two additional sets of hypotheses:
H0: Sample is drawn from a normal distribution
Ha: Sample is not drawn from a normal distribution
H0: sa=sb
Ha: sa!=sb
This SAS program may be used to get the necessary p-values.
Output:
The SAS System 1 ------------------------------- group=a -------------------------------- The UNIVARIATE Procedure Variable: errors Moments N 17 Sum Weights 17 Mean 8.02941176 Sum Observations 136.5 Std Deviation 4.22104322 Variance 17.8172059 Skewness 1.67416729 Kurtosis 3.52424225 Uncorrected SS 1381.09 Corrected SS 285.075294 Coeff Variation 52.5697691 Std Error Mean 1.02375336 Basic Statistical Measures Location Variability Mean 8.029412 Std Deviation 4.22104 Median 7.400000 Variance 17.81721 Mode 7.400000 Range 17.70000 Interquartile Range 2.40000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 7.843112 Pr > |t| <.0001 Sign M 8.5 Pr >= |M| <.0001 Signed Rank S 76.5 Pr >= |S| <.0001 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.832123 Pr < W 0.0058 Kolmogorov-Smirnov D 0.246125 Pr > D <0.0100 Cramer-von Mises W-Sq 0.201647 Pr > W-Sq <0.0050 Anderson-Darling A-Sq 1.134096 Pr > A-Sq <0.0050 Quantiles (Definition 5) Quantile Estimate 100% Max 19.9 99% 19.9 95% 19.9 90% 15.7 The UNIVARIATE Procedure Variable: errors Quantiles (Definition 5) Quantile Estimate 75% Q3 8.7 50% Median 7.4 25% Q1 6.3 10% 4.1 5% 2.2 1% 2.2 0% Min 2.2 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 2.2 17 8.7 5 4.1 16 9.4 4 4.2 15 9.5 3 5.0 14 15.7 2 6.3 13 19.9 1 Stem Leaf # Boxplot 18 9 1 * 16 14 7 1 0 12 10 8 2745 4 +--+--+ 6 3891447 7 *-----* 4 120 3 | 2 2 1 0 ----+----+----+----+ The UNIVARIATE Procedure Variable: errors Normal Probability Plot 19+ * ++ | +++++ | * ++++ | +++++ 11+ +++++ | +++++** * * | **+*+** * | * *+*+++ 3+ * +++++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 ------------------------------- group=b -------------------------------- The UNIVARIATE Procedure Variable: errors Moments N 10 Sum Weights 10 Mean 5.45 Sum Observations 54.5 Std Deviation 3.75625404 Variance 14.1094444 Skewness 1.85338027 Kurtosis 3.82689713 Uncorrected SS 424.01 Corrected SS 126.985 Coeff Variation 68.9220926 Std Error Mean 1.18783182 Basic Statistical Measures Location Variability Mean 5.450000 Std Deviation 3.75625 Median 4.650000 Variance 14.10944 Mode . Range 12.90000 Interquartile Range 2.10000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 4.588192 Pr > |t| 0.0013 Sign M 5 Pr >= |M| 0.0020 Signed Rank S 27.5 Pr >= |S| 0.0020 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.808401 Pr < W 0.0183 Kolmogorov-Smirnov D 0.30531 Pr > D <0.0100 Cramer-von Mises W-Sq 0.142663 Pr > W-Sq 0.0241 Anderson-Darling A-Sq 0.81031 Pr > A-Sq 0.0235 Quantiles (Definition 5) Quantile Estimate 100% Max 14.60 99% 14.60 95% 14.60 90% 11.70 The UNIVARIATE Procedure Variable: errors Quantiles (Definition 5) Quantile Estimate 75% Q3 5.40 50% Median 4.65 25% Q1 3.30 10% 2.05 5% 1.70 1% 1.70 0% Min 1.70 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 1.7 10 5.0 5 2.4 9 5.1 4 3.3 8 5.4 3 3.9 7 8.8 2 4.3 6 14.6 1 Stem Leaf # Boxplot 14 6 1 * 12 10 8 8 1 0 6 4 3014 4 +--+--+ 2 439 3 +-----+ 0 7 1 | ----+----+----+----+ The UNIVARIATE Procedure Variable: errors Normal Probability Plot 15+ * +++ | +++++ | +++++ | +++*++ | +++++ | ++*+* * * | *++*++* 1+ *+++++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 The SAS System 7 23:25 Friday, February 21, 2003 The UNIVARIATE Procedure Variable: errors Schematic Plots | 20 + * | | | 18 + | | | 16 + | 0 | | * 14 + | | | 12 + | | | 10 + | | | | 0 | +-----+ 8 + | + | | *-----* | | | | +-----+ 6 + | | | +--+--+ | | | | | | *-----* 4 + | | | | +-----+ | | | | 2 + 0 | | | | | 0 + ------------+-----------+----------- group a b The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable errors Classified by Variable group Sum of Expected Std Dev Mean group N Scores Under H0 Under H0 Score --------------------------------------------------------------------- a 17 277.50 238.0 19.910412 16.323529 b 10 100.50 140.0 19.910412 10.050000 Average scores were used for ties. Wilcoxon Two-Sample Test Statistic 100.5000 Normal Approximation Z -1.9588 One-Sided Pr < Z 0.0251 Two-Sided Pr > |Z| 0.0501 t Approximation One-Sided Pr < Z 0.0305 Two-Sided Pr > |Z| 0.0609 Z includes a continuity correction of 0.5. Kruskal-Wallis Test Chi-Square 3.9358 DF 1 Pr > Chi-Square 0.0473Analysis:
The proc univariate output indicates that normality is NOT
reasonable for either
population. For this class, we do not have
an analytic procedure to determine if the shapes of the distributions are
equal. Instead, we
will use the box-plots and histograms produced
by proc univariate for each sample to see
if this is reasonable. Since these plots look the same we will
conclude that the population distributions are approximately the same shape.
Note: We can now say that
this is a Case 2c problem.
Before proceeding we should check consistency. To do this, we will compare the entries under the Mean Score heading from the npar1way procedure. Since the difference is consistent with our alternative hypothesis, we may now use the One-Sided Pr p-value from the Wilcoxon Two-Sample Test section of the output to assess our original hypotheses. Hence, the p-value for our methodology problem is 3.05% and so this p-value is significant and we reject the null hypothesis and conclude that the methodology is effective.