Learning Objectives
 To learn how to construct a confidence interval for the difference in the proportions of two distinct populations that have a particular characteristic of interest.
 To learn how to perform a test of hypotheses concerning the difference in the proportions of two distinct populations that have a particular characteristic of interest.
Suppose we wish to compare the proportions of two populations that have a specific characteristic, such as the proportion of men who are lefthanded compared to the proportion of women who are lefthanded. Figure 9.7 “Independent Sampling from Two Populations In Order to Compare Proportions” illustrates the conceptual framework of our investigation. Each population is divided into two groups, the group of elements that have the characteristic of interest (for example, being lefthanded) and the group of elements that do not. We arbitrarily label one population as Population 1 and the other as Population 2, and subscript the proportion of each population that possesses the characteristic with the number 1 or 2 to tell them apart. We draw a random sample from Population 1 and label the sample statistic it yields with the subscript 1. Without reference to the first sample we draw a sample from Population 2 and label its sample statistic with the subscript 2.
Figure 9.7 Independent Sampling from Two Populations In Order to Compare Proportions
Our goal is to use the information in the samples to estimate the difference
p1−p2in the two population proportions and to make statistically valid inferences about it.
Confidence Intervals
Since the sample proportion
p^1computed using the sample drawn from Population 1 is a good estimator of population proportion p_{1} of Population 1 and the sample proportion
p^2computed using the sample drawn from Population 2 is a good estimator of population proportion p_{2} of Population 2, a reasonable point estimate of the difference
p1−p2is
p^1−p^2.In order to widen this point estimate into a confidence interval we suppose that both samples are large, as described in Section 7.3 “Large Sample Estimation of a Population Proportion” in Chapter 7 “Estimation” and repeated below. If so, then the following formula for a confidence interval for
p1−p2is valid.
100(1−α)%
Confidence Interval for the Difference Between Two Population Proportions
(p^1−p^2)±zα∕2p^1(1−p^1)n1+p^2(1−p^2)n2
The samples must be independent, and each sample must be large: each of the intervals
[p^1−3 p^1(1−p^1)n1, p^1+3 p^1(1−p^1)n1 ]
and
[p^2−3 p^2(1−p^2)n2, p^2+3 p^2(1−p^2)n2 ]
must lie wholly within the interval
[0,1 ].
Example 10
The department of code enforcement of a county government issues permits to general contractors to work on residential projects. For each permit issued, the department inspects the result of the project and gives a “pass” or “fail” rating. A failed project must be reinspected until it receives a pass rating. The department had been frustrated by the high cost of reinspection and decided to publish the inspection records of all contractors on the web. It was hoped that public access to the records would lower the reinspection rate. A year after the web access was made public, two samples of records were randomly selected. One sample was selected from the pool of records before the web publication and one after. The proportion of projects that passed on the first inspection was noted for each sample. The results are summarized below. Construct a point estimate and a 90% confidence interval for the difference in the passing rate on first inspection between the two time periods.
No public web accessn1=500p^1=0.67Public web accessn2=100p^2=0.80
Solution:
The point estimate of
p1−p2is
p^1−p^2=0.67−0.80=−0.13
Because the “No public web access” population was labeled as Population 1 and the “Public web access” population was labeled as Population 2, in words this means that we estimate that the proportion of projects that passed on the first inspection increased by 13 percentage points after records were posted on the web.
The sample sizes are sufficiently large for constructing a confidence interval since for sample 1:
3 p^1(1−p^1)n1=3 (0.67)(0.33)500=0.06
so that
p^1−3 p^1(1−p^1)n1, p^1+3 p^1(1−p^1)n1=[0.67−0.06,0.67+0.06 ]=[0.61,0.73 ]⊂[0,1 ]
and for sample 2:
3 p^1(1−p^1)n1=3 (0.8)(0.2)100=0.12
so that
p^2−3 p^2(1−p^2)n2, p^2+3 p^2(1−p^2)n2=[0.8−0.12,0.8+0.12 ]=[0.68,0.92 ]⊂[0,1 ]
To apply the formula for the confidence interval, we first observe that the 90% confidence level means that
α=1−0.90=0.10so that
zα∕2=z0.05.From Figure 12.3 “Critical Values of “ we read directly that
z0.05=1.645.Thus the desired confidence interval is
(p^1−p^2)±zα∕2p^1(1−p^1)n1+p^2(1−p^2)n2=−0.13±1.645(0.67)(0.33)500+(0.8)(0.2)100 =−0.13±0.07
The 90% confidence interval is
[−0.20,−0.06 ].We are 90% confident that the difference in the population proportions lies in the interval
[−0.20,−0.06 ], in the sense that in repeated sampling 90% of all intervals constructed from the sample data in this manner will contain
p1−p2.Taking into account the labeling of the two populations, this means that we are 90% confident that the proportion of projects that pass on the first inspection is between 6 and 20 percentage points higher after public access to the records than before.
Hypothesis Testing
In hypothesis tests concerning the relative sizes of the proportions p_{1} and p_{2} of two populations that possess a particular characteristic, the null and alternative hypotheses will always be expressed in terms of the difference of the two population proportions. Hence the null hypothesis is always written
H0:p1−p2=D0
The three forms of the alternative hypothesis, with the terminology for each case, are:
Form of
Ha 
Terminology 

Ha:p1−p2<D0 
Lefttailed 
Ha:p1−p2>D0 
Righttailed 
Ha:p1−p2≠D0 
Twotailed 
As long as the samples are independent and both are large the following formula for the standardized test statistic is valid, and it has the standard normal distribution.
Standardized Test Statistic for Hypothesis Tests Concerning the Difference Between Two Population Proportions
Z=(p^1−p^2)−D0p^1(1−p^1)n1+p^2(1−p^2)n2
The test statistic has the standard normal distribution.
The samples must be independent, and each sample must be large: each of the intervals
[p^1−3 p^1(1−p^1)n1, p^1+3 p^1(1−p^1)n1 ]
and
[p^2−3 p^2(1−p^2)n2, p^2+3 p^2(1−p^2)n2 ]
must lie wholly within the interval
[0,1 ].
Example 11
Using the data of Note 9.25 “Example 10”, test whether there is sufficient evidence to conclude that public web access to the inspection records has increased the proportion of projects that passed on the first inspection by more than 5 percentage points. Use the critical value approach at the 10% level of significance.
Solution:

Step 1. Taking into account the labeling of the populations an increase in passing rate at the first inspection by more than 5 percentage points after public access on the web may be expressed as
p2>p1+0.05, which by algebra is the same as
p1−p2<−0.05.This is the alternative hypothesis. Since the null hypothesis is always expressed as an equality, with the same number on the right as is in the alternative hypothesis, the test is
H0:p1−p2=−0.05 vs. Ha:p1−p2<−[email protected] α=0.10

Step 2. Since the test is with respect to a difference in population proportions the test statistic is
Z=(p^1−p^2)−D0p^1(1−p^1)n1+p^2(1−p^2)n2

Step 3. Inserting the values given in Note 9.25 “Example 10” and the value
D0=−0.05into the formula for the test statistic gives
Z=(p^1−p^2)−D0p^1(1−p^1)n1+p^2(1−p^2)n2=(−0.13)−(−0.05)(0.67)(0.33)500+(0.8)(0.2)100=−1.770
 Step 4. Since the symbol in H_{a} is “<” this is a lefttailed test, so there is a single critical value,
zα=−z0.10.From the last row in Figure 12.3 “Critical Values of “ z0.10=1.282
, so −z0.10=−1.282.
The rejection region is (−∞,−1.282 ].

Step 5. As shown in Figure 9.8 “Rejection Region and Test Statistic for “ the test statistic falls in the rejection region. The decision is to reject H_{0}. In the context of the problem our conclusion is:
The data provide sufficient evidence, at the 10% level of significance, to conclude that the rate of passing on the first inspection has increased by more than 5 percentage points since records were publicly posted on the web.
Figure 9.8 Rejection Region and Test Statistic for Note 9.27 “Example 11”
Example 12
Perform the test of Note 9.27 “Example 11” using the pvalue approach.
Solution:
The first three steps are identical to those in Note 9.27 “Example 11”.
 Step 4. Because the test is lefttailed the observed significance or pvalue of the test is just the area of the left tail of the standard normal distribution that is cut off by the test statistic
Z=−1.770.From Figure 12.2 “Cumulative Normal Probability” the area of the left tail determined by −1.77 is 0.0384. The pvalue is 0.0384.
 Step 5. Since the pvalue 0.0384 is less than
α=0.10, the decision is to reject the null hypothesis: The data provide sufficient evidence, at the 10% level of significance, to conclude that the rate of passing on the first inspection has increased by more than 5 percentage points since records were publicly posted on the web.
Finally a common misuse of the formulas given in this section must be mentioned. Suppose a large preelection survey of potential voters is conducted. Each person surveyed is asked to express a preference between, say, Candidate A and Candidate B. (Perhaps “no preference” or “other” are also choices, but that is not important.) In such a survey, estimators
p^Aand
p^Bof p_{A} and p_{B} can be calculated. It is important to realize, however, that these two estimators were not calculated from two independent samples. While
p^A−p^Bmay be a reasonable estimator of
pA−pB, the formulas for confidence intervals and for the standardized test statistic given in this section are not valid for data obtained in this manner.
Key Takeaways
 A confidence interval for the difference in two population proportions is computed using a formula in the same fashion as was done for a single population mean.
 The same fivestep procedure used to test hypotheses concerning a single population proportion is used to test hypotheses concerning the difference between two population proportions. The only difference is in the formula for the standardized test statistic.
Exercises
Basic

Construct the confidence interval for
p1−p2for the level of confidence and the data given. (The samples are sufficiently large.)

90% confidence,
n1=1670,
p^1=0.42
n2=900,
p^2=0.38

95% confidence,
n1=600,
p^1=0.84
n2=420,
p^2=0.67


Construct the confidence interval for
p1−p2for the level of confidence and the data given. (The samples are sufficiently large.)

98% confidence,
n1=750,
p^1=0.64
n2=800,
p^2=0.51

99.5% confidence,
n1=250,
p^1=0.78
n2=250,
p^2=0.51


Construct the confidence interval for
p1−p2for the level of confidence and the data given. (The samples are sufficiently large.)

80% confidence,
n1=300,
p^1=0.255
n2=400,
p^2=0.193

95% confidence,
n1=3500,
p^1=0.147
n2=3750,
p^2=0.131


Construct the confidence interval for
p1−p2for the level of confidence and the data given. (The samples are sufficiently large.)

99% confidence,
n1=2250,
p^1=0.915
n2=2525,
p^2=0.858

95% confidence,
n1=120,
p^1=0.650
n2=200,
p^2=0.505


Perform the test of hypotheses indicated, using the data given. Use the critical value approach. Compute the pvalue of the test as well. (The samples are sufficiently large.)

Test
H0:p1−p2=0vs.
Ha:p1−p2>[email protected]
α=0.10,
n1=1200,
p^1=0.42
n2=1200,
p^2=0.40

Test
H0:p1−p2=0vs.
Ha:p1−p2≠[email protected]
α=0.05,
n1=550,
p^1=0.61
n2=600,
p^2=0.67


Perform the test of hypotheses indicated, using the data given. Use the critical value approach. Compute the pvalue of the test as well. (The samples are sufficiently large.)

Test
H0:p1−p2=0.05vs.
Ha:p1−p2>[email protected]
α=0.05,
n1=1100,
p^1=0.57
n2=1100,
p^2=0.48

Test
H0:p1−p2=0vs.
Ha:p1−p2≠[email protected]
α=0.05,
n1=800,
p^1=0.39
n2=900,
p^2=0.43


Perform the test of hypotheses indicated, using the data given. Use the critical value approach. Compute the pvalue of the test as well. (The samples are sufficiently large.)

Test
H0:p1−p2=0.25vs.
Ha:p1−p2<[email protected]
α=0.005,
n1=1400,
p^1=0.57
n2=1200,
p^2=0.37

Test
H0:p1−p2=0.16vs.
Ha:p1−p2≠[email protected]
α=0.02,
n1=750,
p^1=0.43
n2=600,
p^2=0.22


Perform the test of hypotheses indicated, using the data given. Use the critical value approach. Compute the pvalue of the test as well. (The samples are sufficiently large.)

Test
H0:p1−p2=0.08vs.
Ha:p1−p2>[email protected]
α=0.025,
n1=450,
p^1=0.67
n2=200,
p^2=0.52

Test
H0:p1−p2=0.02vs.
Ha:p1−p2≠[email protected]
α=0.001,
n1=2700,
p^1=0.837
n2=2900,
p^2=0.854


Perform the test of hypotheses indicated, using the data given. Use the pvalue approach. (The samples are sufficiently large.)

Test
H0:p1−p2=0vs.
Ha:p1−p2<[email protected]
α=0.005,
n1=1100,
p^1=0.22
n2=1300,
p^2=0.27

Test
H0:p1−p2=0vs.
Ha:p1−p2≠[email protected]
α=0.01,
n1=650,
p^1=0.35
n2=650,
p^2=0.41


Perform the test of hypotheses indicated, using the data given. Use the pvalue approach. (The samples are sufficiently large.)

Test
H0:p1−p2=0.15vs.
Ha:p1−p2>[email protected]
α=0.10,
n1=950,
p^1=0.41
n2=500,
p^2=0.23

Test
H0:p1−p2=0.10vs.
Ha:p1−p2≠[email protected]
α=0.10,
n1=220,
p^1=0.92
n2=160,
p^2=0.78


Perform the test of hypotheses indicated, using the data given. Use the pvalue approach. (The samples are sufficiently large.)

Test
H0:p1−p2=0.22vs.
Ha:p1−p2>[email protected]
α=0.05,
n1=90,
p^1=0.72
n2=75,
p^2=0.40

Test
H0:p1−p2=0.37vs.
Ha:p1−p2≠[email protected]
α=0.02,
n1=425,
p^1=0.772
n2=425,
p^2=0.331


Perform the test of hypotheses indicated, using the data given. Use the pvalue approach. (The samples are sufficiently large.)

Test
H0:p1−p2=0.50vs.
Ha:p1−p2<[email protected]
α=0.10,
n1=40,
p^1=0.65
n2=55,
p^2=0.24

Test
H0:p1−p2=0.30vs.
Ha:p1−p2≠[email protected]
α=0.10,
n1=7500,
p^1=0.664
n2=1000,
p^2=0.319

Applications
In all the remaining exercsises the samples are sufficiently large (so this need not be checked).

Voters in a particular city who identify themselves with one or the other of two political parties were randomly selected and asked if they favor a proposal to allow citizens with proper license to carry a concealed handgun in city parks. The results are:
Party A Party B Sample size, n 150 200 Number in favor, x 90 140  Give a point estimate for the difference in the proportion of all members of Party A and all members of Party B who favor the proposal.
 Construct the 95% confidence interval for the difference, based on these data.
 Test, at the 5% level of significance, the hypothesis that the proportion of all members of Party A who favor the proposal is less than the proportion of all members of Party B who do.
 Compute the pvalue of the test.

To investigate a possible relation between gender and handedness, a random sample of 320 adults was taken, with the following results:
Men Women Sample size, n 168 152 Number of lefthanded, x 24 9  Give a point estimate for the difference in the proportion of all men who are lefthanded and the proportion of all women who are lefthanded.
 Construct the 95% confidence interval for the difference, based on these data.
 Test, at the 5% level of significance, the hypothesis that the proportion of men who are lefthanded is greater than the proportion of women who are.
 Compute the pvalue of the test.

A local school board member randomly sampled private and public high school teachers in his district to compare the proportions of National Board Certified (NBC) teachers in the faculty. The results were:
Private Schools Public Schools Sample size, n 80 520 Proportion of NBC teachers, p^
0.175 0.150  Give a point estimate for the difference in the proportion of all teachers in area public schools and the proportion of all teachers in private schools who are National Board Certified.
 Construct the 90% confidence interval for the difference, based on these data.
 Test, at the 10% level of significance, the hypothesis that the proportion of all public school teachers who are National Board Certified is less than the proportion of private school teachers who are.
 Compute the pvalue of the test.

In professional basketball games, the fans of the home team always try to distract free throw shooters on the visiting team. To investigate whether this tactic is actually effective, the free throw statistics of a professional basketball player with a high free throw percentage were examined. During the entire last season, this player had 656 free throws, 420 in home games and 236 in away games. The results are summarized below.
Home Away Sample size, n 420 236 Free throw percent, p^
81.5% 78.8%  Give a point estimate for the difference in the proportion of free throws made at home and away.
 Construct the 90% confidence interval for the difference, based on these data.
 Test, at the 10% level of significance, the hypothesis that there exists a home advantage in free throws.
 Compute the pvalue of the test.

Randomly selected middleaged people in both China and the United States were asked if they believed that adults have an obligation to financially support their aged parents. The results are summarized below.
China USA Sample size, n 1300 150 Number of yes, x 1170 110 Test, at the 1% level of significance, whether the data provide sufficient evidence to conclude that there exists a cultural difference in attitude regarding this question.

A manufacturer of walkbehind push mowers receives refurbished small engines from two new suppliers, A and B. It is not uncommon that some of the refurbished engines need to be lightly serviced before they can be fitted into mowers. The mower manufacturer recently received 100 engines from each supplier. In the shipment from A, 13 needed further service. In the shipment from B, 10 needed further service. Test, at the 10% level of significance, whether the data provide sufficient evidence to conclude that there exists a difference in the proportions of engines from the two suppliers needing service.
Large Data Set Exercises

Large Data Sets 6A and 6B record results of a random survey of 200 voters in each of two regions, in which they were asked to express whether they prefer Candidate A for a U.S. Senate seat or prefer some other candidate. Let the population of all voters in region 1 be denoted Population 1 and the population of all voters in region 2 be denoted Population 2. Let p_{1} be the proportion of voters in Population 1 who prefer Candidate A, and p_{2} the proportion in Population 2 who do.
http://www.flatworldknowledge.com/sites/all/files/data6A.xls
http://www.flatworldknowledge.com/sites/all/files/data6B.xls
 Find the relevant sample proportions
p^1and p^2.
 Construct a point estimate for
p1−p2.  Construct a 95% confidence interval for
p1−p2.  Test, at the 5% level of significance, the hypothesis that the same proportion of voters in the two regions favor Candidate A, against the alternative that a larger proportion in Population 2 do.
 Find the relevant sample proportions

Large Data Set 11 records the results of samples of real estate sales in a certain region in the year 2008 (lines 2 through 536) and in the year 2010 (lines 537 through 1106). Foreclosure sales are identified with a 1 in the second column. Let all real estate sales in the region in 2008 be Population 1 and all real estate sales in the region in 2010 be Population 2.
http://www.flatworldknowledge.com/sites/all/files/data11.xls
 Use the sample data to construct point estimates
p^1and p^2
of the proportions p_{1} and p_{2} of all real estate sales in this region in 2008 and 2010 that were foreclosure sales. Construct a point estimate of p1−p2.
 Use the sample data to construct a 90% confidence for
p1−p2.  Test, at the 10% level of significance, the hypothesis that the proportion of real estate sales in the region in 2010 that were foreclosure sales was greater than the proportion of real estate sales in the region in 2008 that were foreclosure sales. (The default is that the proportions were the same.)
 Use the sample data to construct point estimates
Answers

 (0.0068,0.0732)
,
(0.1163,0.2237)
 (0.0068,0.0732)

 (0.0210,0.1030)
,
(0.0001,0.0319)
 (0.0210,0.1030)

 Z = 0.996, z0.10=1.282
, p value=0.1587
, do not reject H_{0},
 Z=−2.120
, ±z0.025=±1.960
, p value=0.0340
, reject H_{0}
 Z = 0.996, z0.10=1.282

 Z=−2.602
, −z0.005=−2.576
, p value=0.0047
, reject H_{0},
 Z = 2.020, ±z0.01=±2.326
, p value=0.0434
, do not reject H_{0}
 Z=−2.602

 Z=−2.85
, p value=0.0022
, reject H_{0},
 Z=−2.23
, p value=0.0258
, do not reject H_{0}
 Z=−2.85

 Z = 1.36, p value=0.0869
, do not reject H_{0},
 Z = 2.32, p value=0.0204
, do not reject H_{0}
 Z = 1.36, p value=0.0869

 −0.10,
 −0.10±0.101
,
 Z=−1.943
, −z0.05=−1.645
, reject H_{0} (fewer in Party A favor),
 pvalue = 0.0262

 0.025,
 0.025±0.0745
,
 Z = 0.552, z0.10=1.282
, do not reject H_{0} (as many public school teachers are certified),
 pvalue = 0.2912

Z = 4.498,
±z0.005=±2.576, reject H_{0} (different)

 p^1=0.355
and p^2=0.41
p^1−p^2=−0.055
(−0.1501,0.0401) H0:p1−p2=0
vs. Ha:p1−p2<0.
Test Statistic: Z=−1.1335.
Rejection Region: (−∞,−1.645 ].
Decision: Fail to reject H_{0}.
 p^1=0.355