APPROX_MEDIAN

Purpose

APPROX_MEDIAN is an approximate inverse distribution function that assumes a continuous distribution model. It takes a numeric or datetime value and returns an approximate middle value or an approximate interpolated value that would be the middle value once the values are sorted. Nulls are ignored in the calculation.

This function provides an alternative to the MEDIAN function, which returns the exact middle value or interpolated value. APPROX_MEDIAN processes large amounts of data significantly faster than MEDIAN, with negligible deviation from the exact result.

For expr, specify the expression for which the approximate median value is being calculated. The acceptable data types for expr, and the return value data type for this function, depend on the algorithm that you specify with the DETERMINISTIC clause.

DETERMINISTIC

This clause lets you specify the type of algorithm this function uses to calculate the approximate median value.

  • If you specify DETERMINISTIC, then this function calculates a deterministic approximate median value. In this case, expr must evaluate to a numeric value, or to a value that can be implicitly converted to a numeric value. The function returns the same data type as the numeric data type of its argument.

  • If you omit DETERMINSTIC, then this function calculates a nondeterministic approximate median value. In this case, expr must evaluate to a numeric or datetime value, or to a value that can be implicitly converted to a numeric or datetime value. The function returns the same data type as the numeric or datetime data type of its argument.

ERROR_RATE | CONFIDENCE

These clauses let you determine the accuracy of the value calculated by this function. If you specify one of these clauses, then instead of returning the approximate median value for expr, the function returns a decimal value from 0 to 1, inclusive, which represents one of the following values:

  • If you specify ERROR_RATE, then the return value represents the error rate for the approximate median value calculation for expr.

  • If you specify CONFIDENCE, then the return value represents the confidence level for the error rate that is returned when you specify ERROR_RATE.

See Also:

  • MEDIAN

  • APPROX_PERCENTILE which returns, for a given percentile, the approximate value that corresponds to that percentile by way of interpolation. APPROX_MEDIAN is the specific case of APPROX_PERCENTILE where the percentile value is 0.5.

Examples

The following query returns the deterministic approximate median salary for each department in the hr.employees table:

SELECT department_id "Department",
       APPROX_MEDIAN(salary DETERMINISTIC) "Median Salary"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;
 
Department Median Salary
---------- -------------
        10          4400
        20          6000
        30          2765
        40          6500
        50          3100
        60          4800
        70         10000
        80          9003
        90         17000
       100          7739
       110          8300
                    7000

The following query returns the error rates for the approximate median salaries that were returned by the previous query:

SELECT department_id "Department",
       APPROX_MEDIAN(salary DETERMINISTIC, 'ERROR_RATE') "Error Rate"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Department Error Rate
---------- ----------
        10 .002718282
        20 .021746255
        30 .021746255
        40 .002718282
        50 .019027973
        60 .019027973
        70 .002718282
        80 .021746255
        90 .021746255
       100 .019027973
       110 .019027973
           .002718282

The following query returns the confidence levels for the error rates that were returned by the previous query:

SELECT department_id "Department",
       APPROX_MEDIAN(salary DETERMINISTIC, 'CONFIDENCE') "Confidence Level"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Department Confidence Level
---------- ----------------
        10       .997281718
        20       .999660215
        30       .999660215
        40       .997281718
        50       .999611674
        60       .999611674
        70       .997281718
        80       .999660215
        90       .999660215
       100       .999611674
       110       .999611674
                 .997281718

The following query returns the nondeterministic approximate median hire date for each department in the hr.employees table:

SELECT department_id "Department",
       APPROX_MEDIAN(hire_date) "Median Hire Date"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Department Median Hire Date
---------- ----------------
        10        17-SEP-03
        20        17-FEB-04
        30        24-JUL-05
        40        07-JUN-02
        50        15-MAR-06
        60        05-FEB-06
        70        07-JUN-02
        80        23-MAR-06
        90        17-JUN-03
       100        28-SEP-05
       110        07-JUN-02
                  24-MAY-07