APPROX_PERCENTILE

Purpose

APPROX_PERCENTILE is an approximate inverse distribution function. It takes a percentile value and a sort specification, and returns the value that would fall into that percentile value with respect to the sort specification. Nulls are ignored in the calculation

This function provides an alternative to the PERCENTILE_CONT and PERCENTILE_DISC functions, which returns the exact results. APPROX_PERCENTILE processes large amounts of data significantly faster than PERCENTILE_CONT and PERCENTILE_DISC, with negligible deviation from the exact result.

The first expr is the percentile value, which must evaluate to a numeric value between 0 and 1.

The second expr, which is part of the ORDER BY clause, is a single expression over which this function calculates the result. The acceptable data types for expr, and the return value data type for this function, depend on the algorithm that you specify with the DETERMINISTIC clause.

DETERMINISTIC

This clause lets you specify the type of algorithm this function uses to calculate the return value.

  • If you specify DETERMINISTIC, then this function calculates a deterministic result. In this case, the ORDER BY clause expression must evaluate to a numeric value, or to a value that can be implicitly converted to a numeric value, in the range -2,147,483,648 through 2,147,483,647. The function rounds numeric input to the closest integer. The function returns the same data type as the numeric data type of the ORDER BY clause expression. The return value is not necessarily one of the values of expr

  • If you omit DETERMINSTIC, then this function calculates a nondeterministic result. In this case, the ORDER BY clause expression must evaluate to a numeric or datetime value, or to a value that can be implicitly converted to a numeric or datetime value. The function returns the same data type as the numeric or datetime data type of the ORDER BY clause expression. The return value is one of the values of expr.

ERROR_RATE | CONFIDENCE

These clauses let you determine the accuracy of the result calculated by this function. If you specify one of these clauses, then instead of returning the value that would fall into the specified percentile value for expr, the function returns a decimal value from 0 to 1, inclusive, which represents one of the following values:

  • If you specify ERROR_RATE, then the return value represents the error rate for calculating the value that would fall into the specified percentile value forexpr.

  • If you specify CONFIDENCE, then the return value represents the confidence level for the error rate that is returned when you specify ERROR_RATE.

DESC | ASC

Specify the sort specification for the calculating the value that would fall into the specified percentile value. Specify DESC to sort the ORDER BY clause expression values in descending order, or ASC to sort the values in ascending order. ASC is the default.

See Also:

Examples

The following query returns the deterministic approximate 25th percentile, 50th percentile, and 75th percentile salaries for each department in the hr.employees table. The salaries are sorted in ascending order for the interpolation calculation.

SELECT department_id "Department",
       APPROX_PERCENTILE(0.25 DETERMINISTIC)
         WITHIN GROUP (ORDER BY salary ASC) "25th Percentile Salary",
       APPROX_PERCENTILE(0.50 DETERMINISTIC)
         WITHIN GROUP (ORDER BY salary ASC) "50th Percentile Salary",
       APPROX_PERCENTILE(0.75 DETERMINISTIC)
         WITHIN GROUP (ORDER BY salary ASC) "75th Percentile Salary"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Department 25th Percentile Salary 50th Percentile Salary 75th Percentile Salary
---------- ---------------------- ---------------------- ----------------------
        10                   4400                   4400                   4400
        20                   6000                   6000                  13000
        30                   2633                   2765                   3100
        40                   6500                   6500                   6500
        50                   2600                   3100                   3599
        60                   4800                   4800                   6000
        70                  10000                  10000                  10000
        80                   7400                   9003                  10291
        90                  17000                  17000                  24000
       100                   7698                   7739                   8976
       110                   8300                   8300                  12006
                             7000                   7000                   7000

The following query returns the error rates for the approximate 25th percentile salaries that were calculated in the previous query:

SELECT department_id "Department",
       APPROX_PERCENTILE(0.25 DETERMINISTIC, 'ERROR_RATE')
         WITHIN GROUP (ORDER BY salary ASC) "Error Rate"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Department Error Rate
---------- ----------
        10 .002718282
        20 .021746255
        30 .021746255
        40 .002718282
        50 .019027973
        60 .019027973
        70 .002718282
        80 .021746255
        90 .021746255
       100 .019027973
       110 .019027973
           .002718282

The following query returns the confidence levels for the error rates that were calculated in the previous query:

SELECT department_id "Department",
       APPROX_PERCENTILE(0.25 DETERMINISTIC, 'CONFIDENCE')
         WITHIN GROUP (ORDER BY salary ASC) "Confidence"
FROM employees
GROUP BY department_id
ORDER BY department_id; 

Department Confidence
---------- ----------
        10 .997281718
        20 .999660215
        30 .999660215
        40 .997281718
        50 .999611674
        60 .999611674
        70 .997281718
        80 .999660215
        90 .999660215
       100 .999611674
       110 .999611674
           .997281718

The following query returns the nondeterministic approximate 25th percentile, 50th percentile, and 75th percentile salaries for each department in the hr.employees table. The salaries are sorted in ascending order for the interpolation calculation.

SELECT department_id "Department",
       APPROX_PERCENTILE(0.25)
         WITHIN GROUP (ORDER BY salary ASC) "25th Percentile Salary",
       APPROX_PERCENTILE(0.50)
         WITHIN GROUP (ORDER BY salary ASC) "50th Percentile Salary",
       APPROX_PERCENTILE(0.75)
         WITHIN GROUP (ORDER BY salary ASC) "75th Percentile Salary"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Department 25th Percentile Salary 50th Percentile Salary 75th Percentile Salary
---------- ---------------------- ---------------------- ----------------------
        10                   4400                   4400                   4400
        20                   6000                   6000                  13000
        30                   2600                   2800                   3100
        40                   6500                   6500                   6500
        50                   2600                   3100                   3600
        60                   4800                   4800                   6000
        70                  10000                  10000                  10000
        80                   7300                   8800                  10000
        90                  17000                  17000                  24000
       100                   7700                   7800                   9000
       110                   8300                   8300                  12008
                             7000                   7000                   7000