5.2 Application Requirements

To use Perfect Balance successfully, your application must meet the following requirements:

  • The job is distributive, so that splitting a group of records associated with a reduce key does not produce incorrect results for the application.

    To balance a load, Perfect Balance subpartitions the values of large reduce keys and sends each subpartition to a different reducer. This distribution contrasts with the standard Hadoop practice of sending all values for a single reduce key to the same reducer. Your application must be able to handle output from the reducers that is not fully aggregated, so that it does not produce incorrect results.

    This partitioning of values is called chopping. Applications that support chopping have distributive reduce functions. See "About Chopping".

    If your application is not distributive, then you can still run Perfect Balance after disabling the key-splitting feature. The job still benefits from using Perfect Balance, but the load is not as evenly balanced as it is when key splitting is in effect. See the oracle.hadoop.balancer.keyLoad.minChopBytes configuration property to disable key splitting.

  • This release does not support combiners. Perfect Balance detects the presence of combiners and does not balance when they are present.