Brighton Webs Ltd.
statistical and data services for industry
Home
Index
Feedback

Goodness of Fit

This page gives a brief description of two means of testing the quality of a data model.  Typically, this is done with a hypothesis test, where the Null Hypothesis is that the dataset has the same distribution as the model and Alternative Hypothesis is that id does not.

Chi-Squared Test

This is a parametric test is based on "binning" a dataset and comparing the expected number of values in each interval with the observed number.  The term "binning" is derived from industrial grading processes where items of a given size range are placed in designated bins.  The example below is based on 1,000 numbers from a pseudo random number generator.  In the interval 0 ≤ x < 0.1, the expectation is that there are 100 values, however, with such a small sample size this is unlikely, but if the chi-squared statistic is within a given range, we can assume that difference between the expected and observed values is due to chance, in other words accept the Null Hypothesis.  Chi-squared is calculated using this formula:

The sample data set is shown below:

The value of 5.52 is significantly less than the 5% confidence value of 16.1, thus we can accept that the sample is from a uniformly distributed set of numbers.

Kolmorogov-Smirnov

The Kolmorogov-Smirnov test is a non-parametric test for continuous distributions.  It is based on the maximum difference between the expected and observed cumulative frequencies.  It is defined as:

It has a couple of attractive features relative to other methods, first because the data is not binned, it is possible to test hypotheses with as few as four values and secondly because the analyst does not have to make any arbitrary decisions on interval size.

The graphic below shows a sample calculation based on the output of the same Psuedo Random Number Generator that was used for the Chi-Squared example.

In this example, the D is a maximum at x=0.29 with a vaue of 0.208.  This is less than the 5% critical value for N=10 which is 0.41, thus the Null Hypothesis can be accepted and the output of the PRNG is uniform.

Page updated: 24-May-2005

 

For more information: info@brighton-webs.co.uk