Calculating the Median

<< Click to Display Table of Contents >>

Navigation:  Other Topics > Smart Access 1996-2006 > May-2003 >

Calculating the Median

Peter Vogel

 

In this column Peter Vogel answers just one query—but it’s

a good one. Along the way, he demonstrates how to “think

in SQL.”

 

I need to calculate the sum, median, and average values of a

set of numbers. Doing the sum and average in SQL is easy (I

just use the Avg and Sum functions). Is there any way to do

the median in SQL?

 

I have an answer, but explaining how it works is going to

take this whole column—you’ll be sorry that you asked.

Calculating the median is difficult in SQL. A median,

like an average, is “a measure of central tendency” and is

determined by finding the number in the middle of a

series. So, for instance, for the numbers 1, 2, 4, 8, 20, the

average is 7 (1+2+4+8+20 = 35/5). The median is 4

because there are two numbers lower than 4 in the series

and two numbers greater than 4. For a sequence with an

even number of digits, the median is the average of the

two middle numbers. For the sequence 1, 2, 4, 8, the

median would be 3 (2 + 4 = 6/2). Since half the numbers

in the sequence are larger than the median, comparing

medians to averages is often useful. If the median is

significantly different from the average, it indicates that

the data is weighted to one side of the average. For

instance, in my first example, while the average was 7, the

median was only 4. The much larger average reflects the

very large value of 35 (compared to the other values).

The problem with calculating the median in SQL is

that it assumes that the numbers are in order (a concept

that’s foreign to SQL) and depends upon calculating the

position of a number in a sequence (position is a concept

that’s foreign to sets, the basis of SQL). Fortunately,

there’s a way in SQL to determine where in a sequence

a record appears.

 

Determining position

Let me start with a solution that works for the cases

where there’s an odd number of unique values. I begin

by joining the Orders table to itself, using a non-equijoin

(in other words, a join that uses something other than a

equals sign). I join each record in the Orders table to all of

the records in the Orders table that have an equal or lower

Freight charge:

 

SELECT Or1.Freight

FROM Orders AS Or1 INNER JOIN Orders AS Or2

ON Or1.Freight <= Or2.Freight;

 

This will result in each Freight charge being repeated