Monday, March 14, 2005

Computing Statistical Summaries using PL/SQL DBMS_STAT_FUNCS package

The PL/SQL DBMS_STAT_FUNCS package can be used to compute statistical summary of numerical column of a table.

The following PL/SQL code fragemnt summarizes employee salaries in the Oracle 10g HR schema.

DECLARE
sigma number := 3;
S DBMS_STAT_FUNCS.summaryType;
item number;
cnt number;
BEGIN
DBMS_STAT_FUNCS.SUMMARY ('hr', 'employees', 'salary', sigma, S);
DBMS_OUTPUT.PUT_LINE('COUNT = ' || S.count);
DBMS_OUTPUT.PUT_LINE('MIN = ' || S.min);
DBMS_OUTPUT.PUT_LINE('MAX = ' || S.max);
DBMS_OUTPUT.PUT_LINE('RANGE = ' || S.range);
DBMS_OUTPUT.PUT_LINE('VARIANCE = ' || S.variance);
DBMS_OUTPUT.PUT_LINE('STDDEV = ' || S.stddev);
DBMS_OUTPUT.PUT_LINE('5th QUANTILE = ' || S.quantile_5);
DBMS_OUTPUT.PUT_LINE('25th QUANTILE = ' || S.quantile_25);
DBMS_OUTPUT.PUT_LINE('MEDIAN = ' || S.median);
DBMS_OUTPUT.PUT_LINE('75th QUANTILE = ' || S.quantile_75);
DBMS_OUTPUT.PUT_LINE('95th QUANTILE = ' || S.quantile_95);
DBMS_OUTPUT.PUT_LINE('PLUS X SIGMA = ' || S.plus_x_sigma);
DBMS_OUTPUT.PUT_LINE('MINUS X SIGMA = ' || S.minus_x_sigma);

FOR item IN S.cmode.FIRST..S.cmode.LAST
LOOP
DBMS_OUTPUT.PUT_LINE('MODE [' || item || '] = ' || S.cmode(item));
END LOOP;

FOR item IN S.top_5_values.FIRST..S.top_5_values.LAST
LOOP
DBMS_OUTPUT.PUT_LINE('TOP ' || item || ' VALUE = ' || S.top_5_values(item));
END LOOP;

cnt := S.bottom_5_values.LAST;
FOR item IN S.bottom_5_values.FIRST..S.bottom_5_values.LAST
LOOP
DBMS_OUTPUT.PUT_LINE('BOTTOM ' || cnt || ' VALUE = ' || S.bottom_5_values(item));
cnt := cnt - 1;
END LOOP;

END;
/


The DBMS_STAT_FUNCS.SUMMARY() function's argument signature is:


DBMS_STAT_FUNCS.SUMMARY (
ownername IN VARCHAR2,
tablename IN VARCHAR2,
columnname IN VARCHAR2,
sigma_value IN NUMBER DEFAULT 3,
s OUT SummaryType);


SummaryType is an object defined by the DBMS_STAT_FUNCS package. It's structure is:

TYPE n_arr IS VARRAY(5) of NUMBER;
TYPE num_table IS TABLE of NUMBER;
TYPE summaryType IS RECORD (
count NUMBER,
min NUMBER,
max NUMBER,
range NUMBER,
mean NUMBER,
cmode num_table,
variance NUMBER,
stddev NUMBER,
quantile_5 NUMBER,
quantile_25 NUMBER,
median NUMBER,
quantile_75 NUMBER,
quantile_95 NUMBER,
plus_x_sigma NUMBER,
minus_x_sigma NUMBER,
extreme_values num_table,
top_5_values n_arr,
bottom_5_values n_arr);


The limitation of the DBMS_STAT_FUNCS.SUMMARY() function is that there is no way to subset the data. The SUMMARY() function computes the summary across all rows in the table for the specified column.

The DBMS_STAT_FUNCS package also provides exponentional, normal, poisson, uniform and weibull distribution fitting functions. We will explore those in the future.

0 Comments:

Post a Comment

<< Home