Computing Interquartile Range
There are two ways of measuring the spread of a data distribution. Standard Deviation measures the spread of a data distribution from the mean and is thus influenced by outliers. Interquartile Range (IQR) measures the spread of a data distribution from the median and is thus not influenced by outliers.
When you look at boxplots, the box represents the IQR.
The formula for IQR is simple:
IQR = Q3 - Q1
where Q3 is the 75th Quantile and Q1 is the 25th Quantile. You can used the SUMMARY function provided by the PL/SQL package DBMS_STAT_FUNCS to compute Q3 and Q1. Unfortunately, the DBMS_STAT_FUNCS package does not provide a way to subset the data.
So, today we will use SQL to compute Q3 and Q1. First, divide the data set into a high and low group split at the median. Q3 is the median of the high group and Q1 is the median of the low group.
The only catch is that if the data set has an odd number of values, we have to include the median of the entire data set in the high and low groups.
I am going to use the Oracle sample HR data set to illustrate IQR computation.
Step 1: Compute the median of the data set and determine whether the number of elements in the data set is even or odd
select median(salary), mod(count(1), 2) as isOdd from employees
The median salary is $6,200 and isOdd is 1 which means that we have an odd number values in our data set. There are 107 employees in this data set.
Step 2: Compute Q1
select median(salary) as Q1 from employees where salary <= 6200
If we had an even number of data values, we would do the following:
select median(salary) as Q1 from employees where salary <>
Step 3: Compute Q3
select median(salary) as Q3 from employees where salary >= 6200
If we had an even number of data values, we would do the following:
select median(salary) as Q3 from employees where salary > 6200
Step 4: Compute IQR
Step 2 and Step 3 result in a Q1 of $3,100 and a Q3 of $8,900. Thus,
IQR = Q3 - Q1 = 8900 - 3100 = 5800
When you look at boxplots, the box represents the IQR.
The formula for IQR is simple:
IQR = Q3 - Q1
where Q3 is the 75th Quantile and Q1 is the 25th Quantile. You can used the SUMMARY function provided by the PL/SQL package DBMS_STAT_FUNCS to compute Q3 and Q1. Unfortunately, the DBMS_STAT_FUNCS package does not provide a way to subset the data.
So, today we will use SQL to compute Q3 and Q1. First, divide the data set into a high and low group split at the median. Q3 is the median of the high group and Q1 is the median of the low group.
The only catch is that if the data set has an odd number of values, we have to include the median of the entire data set in the high and low groups.
I am going to use the Oracle sample HR data set to illustrate IQR computation.
Step 1: Compute the median of the data set and determine whether the number of elements in the data set is even or odd
select median(salary), mod(count(1), 2) as isOdd from employees
The median salary is $6,200 and isOdd is 1 which means that we have an odd number values in our data set. There are 107 employees in this data set.
Step 2: Compute Q1
select median(salary) as Q1 from employees where salary <= 6200
If we had an even number of data values, we would do the following:
select median(salary) as Q1 from employees where salary <>
Step 3: Compute Q3
select median(salary) as Q3 from employees where salary >= 6200
If we had an even number of data values, we would do the following:
select median(salary) as Q3 from employees where salary > 6200
Step 4: Compute IQR
Step 2 and Step 3 result in a Q1 of $3,100 and a Q3 of $8,900. Thus,
IQR = Q3 - Q1 = 8900 - 3100 = 5800
0 Comments:
Post a Comment
<< Home