![]() |
Brighton Webs Ltd. statistical and data services for industry |
|
Home Index Feedback |
Mode For discrete distributions, the mode is the value with the greatest frequency and for continuous ones, it is the point where the probability density is at a maximum. It is possible for a distribution to have two or more modes. The advantage of the mode over the mean is that it is less sensitive the presence of anomalous values in the data: The calculation of the mode for discrete distributions is straightforward, simply examine each value and select the one with the greatest frequency. In pseudo code, the process for identifying a single mode looks like this:
Unlike the the calculation of the mean and median, there is no single accepted method for calculating the mode of a sample of values from a continuous distribution. Two methods which give varying results are described below. Mode from Histogram The simplest way of calculating the mode is base on a histogram:
An example is shown in graphic form below: This approach is sensitive to the size of the interval chosen. With large datasets it may be possible to increase the precision of the estimate by using small interval sizes, however, a small one may make it difficult to get a reliable result. Mode by successive Bisection An alternative approach is based on the successive bisection of an array of values sorted into ascending order. The logic for this approach is based on the Probability Density curve. For a given interval, the probability density will be proportional to: Where intsize is the number of values being inspected and i is the start of the interval. This function will have its maximum value around the Mode. Thus for a given value of intsize, the range will have its minimum value. We can use this to construct a search algorithm to successively refine the estimate of the mode.
At the time of writing, this algorithm has been tested on a dataset derived from a skewed distribution with x values such that the difference of cumulative probability between them was 0.001, 0.0001, 0.00001 etc.. This has has produced results close to the analytical value for the distribution. Further testing on random datasets is in progress. In its current form, the algorithm only identifies the first modal interval that it encounters and further development is needed to handle multi-modal datasets. It may also be possible to increase the computational efficiency by performing the search with increments larger than one during the early passes. |
|
For more information: info@brighton-webs.co.uk |