|
Brighton Webs Ltd.
Statistics for Energy and the Environment
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||
|
Quartiles A set numbers, e.g. test scores, wind speed, height of women wearing red high heels can be divided into four equally sized groups by three values, the lower quartile, median and the upper quartile: If your value is between the lower quartile and the median, you're in the second quartile. 25% of values are less than or equal to the lower quartile and 25% are greater than or equal to the upper quartile. 50% of values are between the lower quartile and the upper quartile and so on. Example Quartiles are an intuitive and convenient way of summarising a set of numbers. Three values give an indication of the range, central tendency and dispersion. and simple way to compare datasets. The graph below shows the distribution of wind speed at some location in the Northern Hemisphere. By adding the minimum and maximum values to give the extremes, the data can be summarised in five values:
Quartiles are often display graphically using a box-and-whisker diagram; The diagram has been drawn to show the maximum and minimum values from the example dataset. In some case, the maximum and minimum values are anomalies and it may be appropriate to use the 05/95 or 10/90 percentiles. In the graphic below the monthly wind speed has been summarised as box-and-whisker diagrams. This clear shows the nature of the seasonality of the wind speed at this location. The median value is more or less constant, however, there is much greater variation in the upper quartile and maximum values. Calculation MS Excel and the GoogleDocs spreadsheet both have quartile functions. These offer a relatively accessible way of creating consistent results. There maybe differences in the methods used by various software packages. The description below is more of interest for programmers who are building the functionality into software. The first step in deriving the quartiles for a set numbers is to sort them into ascending order. If the dataset is large (say, greater than 1,000 values), approximate values can be obtained from an array of N items, using integer division to get the array indexes::
For smaller datasets, the two stage Excel algorithm can be used. The first step is to determine where to select the items and then, as this is unlikely to be an integer value, use linear interpolation to get the required values. The sample data is a set of ten numbers:
Spreadsheets Both MS Excel and Google Doc's spreadsheet have a quartile function which determines the 1st, 2nd and 3rd quartiles together with the maximum and minimum values, e.g.
Similar results can also be obtained using the percentile function. |
|||||||||||||||||||||||||||||||||||||||||||||
| Page updated: 02-May-2012 | |||||||||||||||||||||||||||||||||||||||||||||