Data & Analysis Services for Education & Industry Brighton Webs Ltd.
statistical and data services for industry
Home
Index
Feedback

Comparison of Two Sample Means - StDev of Population Unknown

A common problem facing researchers is to determine if two samples are from the same population or represent two separate populations and the only data available is the two samples.

An example of this type of problem is salinity data from two adjacent groups of boreholes, if the mean salinity from the two groups is the same, there is an inference that all the boreholes have penetrated the same reservoir (the null hypothesis is that variation is due to random fluctuations), however, if the means are different, then it maybe that each group of wells is in different reservoirs (the alternative hypothesis is that the difference in means  is not due to chance).

Hypothesies and t statistic

It is not the intention of this page to work through the underlying maths, but the process is based on the null hypothesis:

Hypothesis Test - The Null Hypothesis

Which can be rearranged to:

Hypothesis testing - rearrangement of Null Hypothesis

And this leads to the t statistic:

Hypothesis Testing - formula for t statistic

If the null hypothesis is accepted, t will follow a t distribution with:

Estimation of Standard Deviation

The unbiased estimate of the population's standard deviation is obtained by using data from both samples:

Hypothesis Testing - Estimation of population variance from sample data

Sample Data

The sample data consists of a pair of datasets, each with two samples.  Set 1 is composed of two samples both drawn from a population with a mean of 10.00 and standard deviation of 0.5.  For Set 2, sample A is the same as sample A in Set1, however, Sample B is drawn from a population with a mean of 11.00 and standard deviation of 0.5.

Without resorting to any form of statistical analysis, knowledge of the composition of the data suggests that the samples from Set 1 will have similar means and that those of Set 2 will be significantly different.

Example

Some intermediate values are missing from the calculation below, however, they can easily be recreated using Excel if required.  Values are shown to 2 d.p. thus rounding errors may be present.

Properties Set 1 Set 2
Sample A Sample B Sample A Sample B
10.26 10.38 10.26 11.15
9.66 9.46 9.66 11.68
9.84 9.90 9.84 10.46
10.24 9.46 10.24 11.11
10.56 11.18 10.56 11.01
9.32 10.10 9.32 11.24
10.41 10.36 10.41 11.98
10.82 9.69 10.82 10.81
10.65   10.65  
10.29   10.29  
N 10 8 10 8
mean 10.20 10.07 10.20 11.18
(x-mean)2 1.98 2.32 1.98 1.58
StDev 0.52 0.47
DOF 16 16
t 0.56 -4.37
tcrit - 5% 2.12 2.12
Null Hypothesis Accept Reject

For Set 1, the null hypothesis is accepted (i.e. the difference in the means of the samples is due to chance) because the value of t is less than the critical value (t with 16 degrees of freedom).  However, it is rejected for Set 2 (i.e. the difference between the means is greater than would be expected from chance alone) because the value of t is outside the critical values.

Page updated: 26-Sep-2006

 

For more information: info@brighton-webs.co.uk