3.2.7 Ranking Data

The ore.rank function analyzes distribution of values in numeric columns of an ore.frame.

The ore.rank function supports useful functionality, including:

Ranking within groups
Partitioning rows into groups based on rank tiles
Calculation of cumulative percentages and percentiles
Treatment of ties
Calculation of normal scores from ranks

The ore.rank function syntax is simpler than the corresponding SQL queries.

The ore.rank function returns an ore.frame in all instances.

You can use these R scoring methods with ore.rank:

To compute exponential scores from ranks, use savage.
To compute normal scores, use one of blom, tukey, or vw (van der Waerden).

For details about the function arguments, invoke help(ore.rank).

The following examples illustrate using ore.rank. The examples use the NARROW data set.

Example 3-40 Ranking Two Columns

This example ranks the two columns AGE and CLASS and reports the results as derived columns; values are ranked in the default order, which is ascending.

x <- ore.rank(data=NARROW, var='AGE=RankOfAge, CLASS=RankOfClass')

Example 3-41 Handling Ties in Ranking

This example ranks the two columns AGE and CLASS. If there is a tie, the smallest value is assigned to all tied values.

x <- ore.rank(data=NARROW, var='AGE=RankOfAge, CLASS=RankOfClass', ties='low')

Example 3-42 Ranking by Groups

This example ranks the two columns AGE and CLASS and then ranks the resulting values according to COUNTRY.

x <- ore.rank(data=NARROW, var='AGE=RankOfAge, CLASS=RankOfClass', group.by='COUNTRY')

Example 3-43 Partitioning into Deciles

To partition the columns into a different number of partitions, change the value of groups. For example, groups=4 partitions into quartiles. This example ranks the two columns AGE and CLASS and partitions the columns into deciles (10 partitions).

x <- ore.rank(data=NARROW, var='AGE=RankOfAge, CLASS=RankOfClass',groups=10)

Example 3-44 Estimating Cumulative Distribution Function

This example ranks the two columns AGE and CLASS and estimates the cumulative distribution function for both column.

x <- ore.rank(data=NARROW, var='AGE=RankOfAge, CLASS=RankOfClass',nplus1=TRUE)

Example 3-45 Scoring Ranks

This example ranks the two columns AGE and CLASS and scores the ranks in two different ways. The first command partitions the columns into percentiles (100 groups). The savage scoring method calculates exponential scores and blom scoring calculates normal scores.

x <- ore.rank(data=NARROW, var='AGE=RankOfAge, 
          CLASS=RankOfClass', score='savage', groups=100, group.by='COUNTRY')
x <- ore.rank(data=NARROW, var='AGE=RankOfAge, CLASS=RankOfClass', score='blom')