Brighton Webs Ltd.
Statistics for Energy and the Environment
Home Index About

The Median

The median divides a population into two halves.  Half the values are less than the median and the other half are greater than the median.   For example if the median time of the journey to work is 20 minutes, then you know that 50% of journeys will take less than 20 minutes and the rest will be longer.

The median is less sensitive to extreme values than the mean (average) because it is based on the number of values, not their magnitude.

When the set contains an odd number of values, the median is the central value, for example:

values = 2,3,4      Median=3.0

When the set contains an even number, then the mean of the two central values is used, for example:

values = 2,3,4,5     Median=3.5

The median, mean and the mode are often described as a measures of central tendancy.

Related topics

The Mean
The Mode
Quartiles

Calculation

Unlike the mean (average), there is no formula for calculating the mean, it is derived from a two step procedure:

Step 1

Sort the data set into ascending order.
Step 2

Locate the mid point(s) and determine the median value.

Data for Example Calculation

The sample calculation is based on a 38 year time series of the annual rainfall in a humid sub-tropical area. The frequency diagram for the data set is shown below:

The Median - Example Data - Histogram

The left hand table shows th data in year order and the right hand table has been sorted such that the rainfall values are in ascending order which allows them to be ranked.

Year Rain (mm)
1974 1463
1975 1313
1976 1319
1977 888
1978 1390
1979 1573
1980 1051
1981 1564
1982 1288
1983 1357
1984 1182
1985 1442
1986 1264
1987 1160
1988 670
1989 1185
1990 1035
1991 1625
1992 1417
1993 1527
1994 1251
1995 1251
1996 922
1997 1389
1998 1373
1999 716
2000 1200
2001 1931
2002 1332
2003 1157
2004 1590
2005 1088
2006 1510
2007 1730
2008 1432
2009 1249
2010 1147
2011 625
Rank Year Rain (mm)
1 2011 625
2 1988 671
3 1999 717
4 1977 888
5 1996 922
6 1990 1035
7 1980 1052
8 2005 1089
9 2010 1147
10 2003 1157
11 1987 1161
12 1984 1182
13 1989 1185
14 2000 1200
15 2009 1249
16 1995 1251
17 1994 1252
18 1986 1264
19 1982 1288
20 1975 1314
21 1976 1320
22 2002 1333
23 1983 1357
24 1998 1374
25 1997 1390
26 1978 1391
27 1992 1417
28 2008 1433
29 1985 1443
30 1974 1463
31 2006 1511
32 1993 1528
33 1981 1564
34 1979 1573
35 2004 1590
36 1991 1625
37 2007 1730
38 2001 1931

The minimum and maximum values are 625 and 1931 mm respectively and the mean (average) is 1279 mm

Example calculation

Step 1 - Sort the data

The table below shows the sorted and ranked data

Step 2 - Select the mid points

Many statistical methods involve picking data values from tables.  This is based on integer division where the result is an integer and a remainder.  Some programming languages us the symbol "\" as the operator for integer division, this is used on the examples below:

3\2 = 1 remainder 1

4\2 = 2 remainder 0

The symbol "/" denotes normal division where the result is a fraction e.g. 3/2=1.5.

In selecting the mid-points, we ignore the  remainder.  Thus the median values are:

Even number of items:

median = (data(n\2)+data(n\2+1))/2

Odd number of items:

median = data(n\2+1)

In our example, we have 50 data values, thus the median is

median = (data(38\2)+data(38\2+1))/2
  (data(19)+data(20))/2
  (1288+1314)/2
  1301

If we added a 39th data value of 2500 to the table the calculation would look like this:

median = data(39\2+1)
  data(20)
  1314

The value of 2500 was arbitrarily chosen because it is anomalous, adding it to the dataset causes only a small increase in the median value from 1301 to 1314, however, the effect on the mean is larger it changes from 1279 to 1311.

Spreadsheets

Spreadsheets offer a relatively painless way of executing many statistical procedures, both Microsoft's Excel and Google Spreadsheet document have a median function,

=median(arg1,arg2,arg3,.....)

The arguments can be either ranges or valid numbers, e.g.

=median(A1:A10)

=median(A1:A10,B1:B10)

=median(A1:A10,7.5)

Empty cells are ignored.

Note for Programmers

A lot of statistical literature is based on tables of data where the first entry is referenced as 1 e.g. x(1) and the last entry n e.g. x(n) where n is also the number of items in the table.  In programming languages (e.g. c++, vb etc.) tables are often implemented as arrays where the first entry is referenced as 0 e.g. x(0) and the last item is x(n-1).  Thus some care is required to implement statistical methods in code.

Page updated: 24-Apr-2012