Data & Analysis Services for Industry & Education Brighton Webs Ltd.
Data & Analysis Services for Education & Industry

Home
Index
Feedback

Chi-Squared Distribution

The chi-squared distribution is best known for its application to contingency tables and goodness of fit problems to test a hypothesis based on the chi-squared parameter for a dataset with v degrees of freedom.  It is a special case of the gamma distribution where:

The scale parameter is a constant with the value 2

The shape parameter has the value v/2

Profile

Chi Squared - Variation in probability density with degrees of freedom

Parameters

Parameter Description Characteristics
v degrees of freedom An integer >=2

Range

From zero to positive infinity

Functions

The formula for P(x) shown below is derived from the probability density function for the Gamma distribution with an integer shape parameter.  However, BW D-Calc 1.0  treats the Chi-Squared distribution as a special case of the Gamma function, hence the use of the Incomplete Gamma function for F(x).

Chi Squared Distribution - Formula for Probability Density and Cumulative Probability

Properties

Chi Squared - formula for properties

Example

The chi-squared statistic is used to measure the quality of fit of a set of observed values to the expected values derived from some model.  This distribution of this statistic follows the chi-squared distribution:

Chi Squared Statistic

For example, comparing the daily average wind speed in a given year to that predicted by the Rayleigh distribution.  If the quality of the fit is high enough, then a model based on the distribution might be used in some form of simulation, say, to estimate the number of days when wind power falls below the site's requirements, thus allowing the size of buffer storage to be estimated.

The example is based on a computer simulation of 60,000 dice throws.  For sixty throws of the dice, assuming it to be "fair", the expectation is that there will be an equal number of ones, twos etc.  However, the distribution of values over 60 throws is unlikely to be uniform and the results may as shown in the table:

Value Expected Observed
1 10 8 0.4
2 10 11 0.1
3 10 12 0.4
4 10 7 0.9
5 10 10 0.0
6 10 12 0.4
Totals 60 60 2.2

If the process of throwing 60 dice and calculating chi-squared is repeated 1,000 times, the distribution of chi-squared is similar to that shown in the graph.  The blue values are the observed values, whilst the red ones are the values predicted by a chi-squared distribution with five degrees of freedom:

In practical terms, the graph shows the probability of a value of chi-squared being exceeded due to random fluctuations and this follows through to its use in hypothesis testing.  In the above example, the probability of chi-squared exceeding 11.05 due to random fluctuations is 5%.  If the value of chi-squared is very large, in this case, say 20, the probability of this being exceeded by random fluctuations is very low and the model is a poor one.

Typically in hypothesis testing, an upper value for chi-squared is set.  Thus the hypothesis that the model is a good one is accepted if the value of chi-squared is below that value and rejected if it is above it.  If the hypothesis is that the dice is fair (generates a discrete uniform distribution with a minimum of one and a maximum of six).  If we say that we will accept the hypothesis if the value of chi-squared is less than 11.05, then the sample set of sixty throws with a chi-squared score of 2.2 is accepted to be from a fair dice.

Page Modified: 10-May-2005

.

For more information: info@brighton-webs.co.uk