CSC423/324 Data Analysis

Hypothesis Testing ( m_y)

Readings:

Ott; 5.4, 5.7-5.8

Recall:

The hypothesis testing problem involves two contrasting points of view (i.e. hypotheses) about a population. The problem is to determine which hypothesis is more reasonable.

e.g. Consider the CTI-02 problem, where a journalist reports that on average CTI-02 graduates receive starting salaries of $50000 but the CTI Dean believes that CTI-02 graduates got better offers.

To determine which hypothesis is more reasonable we use a strategy that is similar to proof by contradiction. Remember that this is a technique where, if you wish to prove something false, you assume it to be true and then by a logically consistent argument see if you are led to a contradiction of the initial assumption. Similarly, for hypothesis testing, we will assume one hypothesis to be true, we will then select a random sample from the population in question and see if the sample provides evidence to refute the assumption. Bear in mind that hypothesis testing is based on sampling distribution theory.

Terminology:

For our purposes, a hypothesis is a statement about a population that may be expressed in terms of one or more population parameters.

e.g. m_y=50000; m_y>50000

Given any hypothesis testing problem you will always identify two hypotheses:

Null hypothesis:

Denoted by H₀ and, for this class, will be of the following form:

H₀: m_y= m

Note: This is the point of view of no change; the point of view of the skeptic; the point of view to be challenged.

For the CTI-02 problem: H₀: m_y= 50000

Alternative hypothesis:

Denoted by H_a or H₁ or H_r. It is sometimes referred to as the research hypothesis hence H_r. It will be of the following form:

H_a: m_y> m; H_a: m_y< m or H_a: m_{y …}m

Note: This is the point of view of change; the point of view of the optimist; the point of view of the challenger.

For the CTI-02 problem: H₀: m_y> 50000

Hypothesis Testing Procedure (m_y)

To conduct a hypothesis test for m_y you will follow the following four step procedure:

Examine the problem and identify and state the null and alternative hypotheses.

e.g. Let us say that after examination of a problem statement you identify the null hypothesis to be m_y= m and you think that m_y> m, then you would state this as follows:

H₀: m_y= m
H_a: m_y> m

Examine your sample and do the following:

Determine the sample size n and the statistics ybar and s_y

Ensure that ybar is consistent with your alternative hypothesis (i.e. if m_y> m then ensure that ybar > m and if m_y< m then ensure that ybar < m).

Note: An inconsistent ybar means that your sample does not support your alternative and so you cannot proceed.

If ybar is consistent then do the following:

Assume H₀ true.
Given that H₀ is assumed true, determine the p-value.

Note: The p-value is the proportion of samples of size n that would result in a ybar more extreme than the one observed if H₀ is true. To do this you must consider the sampling distribution of ybar. Remember that the sample size determines the sampling distribution:

n large:
Since we assume H₀ true then:

m_ybar=m_y=m.

Since

is not known we estimate it with s_y and so we estimate

_ybar

by:

s_ybar=s_y/sqrt(n).

Compute:

z=(ybar - m_ybar)/s_ybar

Depending on your alternative hypothesis, determine the desired proportion (i.e. the p-value).

n small:
1. If y is normally distributed then:
  Since we assume H₀ true:
  
  m_ybar=m_y=m.
Note: The z and t values computed for hypothesis testing problems are sometimes referred to as test statistics.

Apply the following decision rule to your p-value.

If p-value is <= 1% then the p-value is highly significant and so you reject H₀.
If p-value is <= 5% then the p-value is significant and so you reject H₀.
If p-value is > 5% then the p-value is non-significant and so you have insufficient evidence to reject H₀.

Problem:

Consider the CTI-02 graduates problem. You select a sample of twenty-five from the graduating class and determine that the mean starting salary is $52K with a standard deviation of $4K.

Conduct a test of hypotheses.

Solution:

Applying the procedure:

Identify and state the null and alternative hypotheses.

H₀: m_y= $50K
H_a: m_y> $50K

Examining the sample:

Determine the sample size n and the statistics ybar and s_y

n=25; ybar=$52K and s_y=$4K

Ensure that ybar is consistent with alternative hypothesis:

Since ybar>$50K then ybar is consistent with H_a and we may proceed.

If ybar is consistent then:
1. Assume H₀ true (i.e. m_y= $50K).
2. Given that H₀ is true, determine the p-value.
  n small:
  
  Since we are not using SAS let us assume that y is normally distributed in order to proceed.
  m_ybar=m_y=50;
  
  s_ybar=s_y/sqrt(n)=4/sqrt(25)=0.8;
  
  t=(52 - 50)/0.8=2.5
  
  hence the t test statistic is 2.5 and from the t-table with df=24 the desired proportion (i.e. the p-value) is between 0.5% and 1%.
Apply the decision rule to your p-value.
Since the p-value is <= 1% then the p-value is highly significant and so we reject H₀ and conclude that the mean is higher than claimed (i.e. m_y> $50K).