Pre-General Availability: 2024-09-02

Processing Bulk Data Best Practices

Your automation work might involve processing bulk data, such as a file of input data or a JSON object containing a number of items. You have several options for processing bulk data.

If you don't need to process the records in order, create an integration with a loop that sends records to one or more robot instances to process in parallel. If you need to process the records in order and all records will process within 30 minutes, create an integration with a loop that sends records to one or more robot instances to process sequentially. If you don't need to process records in order and all records will process within 30 minutes, create an integration that sends all the records to one robot instance to process sequentially

Questions

Understand how the answers to the questions in the flow chart help inform your decision-making process.

Question Example Why the question matters
Do you need to process the records in order?

For example, do you need to update record 1, then record 2, and so on? Or can you update 5 records simultaneously?

Robots can process orders in sequence without any issues.

However, when your business requirements allow it, you'll find opportunities for efficiency by processing records in parallel.

Will all records process within 30 minutes?

For instance, if you need to update 100 records, and each update takes 30 seconds, the total processing time is 50 minutes.

In general, when the total processing time for all records exceeds 30 minutes, Oracle recommends using an integration to manage the distribution of work across multiple robots.

On the other hand, when the total processing time for all records is less than 30 minutes, you can allow the robot to manage the distribution of its own work and don't need a more robust solution architecture.

The 30-minute time limit is an Oracle-recommended limit. Your organization can choose a different time period. Consider the amount of time you're willing to wait to determine whether a set of records processed successfully as well as the service limits. See Service Limits in Provisioning and Administering Oracle Integration 3.

Processing Options

The flow chart provided several processing options. Review them in more detail.

Processing option Description Use cases

Create an integration that sends all the records to one robot instance to process sequentially

Create an integration that passes the entire data set to a single robot instance. In the robot, create a foreach loop that iterates over all of the records, one at a time.

This solution is easy and straightforward and is best for records that can be processed relatively quickly and must be processed in a specific order.

Create an integration with a loop that sends records to one or more robot instances to process sequentially

Create an integration with a foreach loop that handles sequential iterations. The integration iterates over all of the records, one at a time, and invokes a robot instance for each record in turn.

For guidance on the number of robot instances to invoke and the number of records to pass to each robot, keep reading.

This solution is ideal for the following scenarios:

  • Each record takes a long time to process.

  • You want to include error handling.

    If a single robot instance fails, you can add error handling to the integration so that the integration can continue sending records to other robot instances.

Create an integration with a loop that sends records to one or more robot instances to process in parallel

Create an integration with a foreach loop that handles parallel iterations. The integration processes the data in parallel. Each branch invokes a robot instance to process one or more records.

For example, consider a data set with 100 records. An integration supports 5 parallel branches, and each branch calls 1 robot. Therefore, the integration and robot process 5 records at a time.

For guidance on the number of robot instances to invoke and the number of records to pass to each robot, keep reading.

This solution is efficient when the total processing time for your records is high, either because you have a lot of records to process or because each record takes a long time to process, and your business requirements allow you to process the records in any order.

Additional Factors: Number of Robots and Records

Several scenarios require you to determine the number of robots that process records and the number of records that each robot processes.

The following scenarios require you to make these decisions:

Create an integration with a loop that sends records to one or more robot instances to process sequentially Create an integration with a loop that sends records to one or more robot instances to process in parallel

Consider the following factors.

Factor More information

Overhead for calling a robot instance

Each robot accomplishes one or more specific goals, such as updating a record. However, to achieve its goal, a robot must complete other tasks, such as opening an application, signing in, and navigating to the right page. All of the tasks that a robot does to prepare for its specific goal are the robot's overhead.

For example, a robot that takes one minute and 15 seconds (1:15) to run might spend 1 minute navigating to the right page and then 15 seconds accomplishing its goal. That robot has 1 minute of overhead.

Total processing time for all records

The following components determine the total processing time for all records:

  • Number of records to process

  • Overhead for the robot

  • Number of robots that process records

For example, if you pass 3 records to a robot instance, you eliminate the overhead for 2 robots, but you also increase the total running time for the robot instance.

30-minute (or a different organization-created) time limit

The time limit is the amount of time that you're willing to wait before knowing whether a robot has succeeded. This value becomes the maximum processing time for a set of records and helps you calculate the number of records to send to a given robot instance. To maximize the efficiency of your automation, Oracle recommends passing the maximum number of records to the robot instance to limit the overhead time.

Additionally, you can use parallel processing to reduce the clock time that passes before all records are processed. However, remember that each branch of the parallel processing incurs the overhead costs. Depending on the overhead duration and other components, distributing records to 5 branches might be less efficient than distributing records to only 3 branches.

Sample Calculations

Sample calculations help you understand how to calculate the optimal number of robots to use and the number of records to send them.

Simple Scenarios

A robot takes one minute and 15 seconds (1:15) to run. The robot spends 1 minute navigating to the right page and then 15 seconds accomplishing its goal. Different numbers of records and robot instances impact the total processing time for this work.

Scenario More information

Five robot instances each process one record, either sequentially or in parallel

Each robot requires 1:15 to run, resulting in a total processing time of 6:25:

1:15 processing time x 5 robots = 6:25 processing time

The robots can run sequentially or in parallel.

One robot instance processes five records sequentially

The robot requires 1 minute of overhead, and then 15 seconds of processing time for each record, resulting in a total processing time of 2:25:

1 minute overhead + (15 seconds processing time per record x 5 records) = 2:15 processing time

One robot instance processes 150 records sequentially

Reducing overhead costs improves the efficiency of your automation, but passing too many records to a single robot instance can result in longer-than-preferred processing times.

For instance, if one robot updates 150 records, you save 149 minutes of overhead time. However, the total processing time is 38:30, which might be longer than you want to wait to determine whether all the updates completed successfully.

1 minute overhead + (15 seconds processing time per record x 150 records) = 38:30 processing time

Sequential Processing

If your business requires you to process records in sequence, determine the optimal number of tasks that each robot instance should process.

Here's how to complete these calculations.

  1. Determine the overhead time

    For example, consider a robot that spends 1 minute navigating to the right page and then 15 seconds accomplishing its goal. This robot has 1 minute, or 60 seconds, of overhead.

  2. Determine the maximum time to process records

    The 30-minute time limit contains 1,800 seconds:

    30 minutes x 60 seconds = 1,800 seconds

    Each record requires 60 seconds of overhead. You must subtract the overhead time from the maximum processing time:

    1,800 seconds - 60 seconds = 1,740 seconds

    This calculation assumes that in 30 minutes, you complete the overhead one time and then use the rest of the time to process records.

  3. Calculate the number of records that you can process

    A robot needs 15 seconds to process each record.

    To calculate the maximum number of records that a robot can process, divide the maximum time to process records by the time to process each record:

    1,740 seconds maximum time / 15 seconds per record = 116 records

Theoretically, the optimal number of records for each robot to process is 116.

Note:

This conclusion is theoretical because it makes several potentially faulty assumptions. For instance, the calculation assumes that the processing time never changes, but response times vary significantly in the real world. The optimal value according to a calculator doesn't reflect these varying circumstances. When making these decisions, consider building in some wiggle room that accommodates requirements that these calculations don't consider, such as network latency.

Parallel Processing

If your business allows you to process records in parallel, you can distribute the work in a way that minimizes the time of the jobs.

  1. Calculate the total potential overhead time

    For example, if you have 100 records to process, and each record requires 60 seconds of overhead time, the total potential overhead is 6,000 seconds:

    100 records x 60 seconds of overhead = 6,000 seconds of potential overhead

    You can reduce this value by processing multiple records using a single robot instance.

  2. Calculate the processing time without any overhead

    For example, if each record requires 15 seconds to process (without its overhead time), the total processing time is 1,500 seconds:

    15 seconds of processing time x 100 tasks = 1,500 seconds

    You cannot reduce this time. However, you can reduce the amount of time that passes on the clock by processing records in parallel.

  3. Consider several scenarios to find your preferred number of parallel branches (up to 5) and the number of records that each processes

    To minimize the processing time, including overhead, you need to reduce your overhead time as much as possible while staying within the 30-minute time limit (or whatever time limit your organization chooses). Processing the records in parallel also minimizes the total time that passes on the clock before the jobs complete.

    Calculate several scenarios to find your preferred combination. For example:

    Scenario Total processing time per branch Calculation

    2 branches, 50 records per branch

    810 seconds (13 ½ minutes)

    60 seconds of overhead + (50 records x 15 seconds of processing time) = 810 seconds

    3 branches, 33 or 34 records per branch

    570 seconds (9 ½ minutes)

    60 seconds of overhead + (34 records x 15 seconds of processing time) = 570 seconds

    4 branches, 25 records per branch

    435 seconds (7 ¼ minutes)

    60 seconds of overhead + (25 records x 15 seconds of processing time) = 435 seconds

    5 branches, 20 records per branch

    360 seconds (6 minutes)

    60 seconds of overhead + (20 records x 15 seconds of processing time) = 360 seconds

    5 branches, 20 records per branch, sent in 2 different batches

    Each batch finishes in 210 seconds (3 ½ minutes), for a total processing time of 420 seconds (7 minutes) per branch

    [60 seconds of overhead + (10 records x 15 seconds of processing time)] x 2 = 420 seconds

    With a higher number of records or higher processing times, you might need to consider sending records to each branch in batches. This approach often reduces the processing time for a given batch of records but increases the total processing time. The following table provides sample calculations for processing 500 records in parallel.

    Scenario Total processing time per branch Calculation

    2 branches, 250 records per branch

    3810 seconds (63 ½ minutes)

    This value exceeds the 30-minute time limit

    60 seconds of overhead + (250 records x 15 seconds of processing time) = 3810 seconds

    3 branches, 166 or 167 records per branch

    2565 seconds (42 ¾ minutes)

    This value exceeds the 30-minute time limit

    60 seconds of overhead + (167 records x 15 seconds of processing time) = 2565 seconds

    4 branches, 125 records per branch

    1935 seconds (32 ¼ minutes)

    This value exceeds the 30-minute time limit

    60 seconds of overhead + (125 records x 15 seconds of processing time) = 1935 seconds

    4 branches, 125 records per branch, send in 2 different batches

    Each batch finishes in 1005 seconds (16 ¾ minutes), for a total processing time of 2010 seconds (33 ½ minutes) per branch

    [60 seconds of overhead + (63 records x 15 seconds of processing time)] x 2 = 2010 seconds

    5 branches, 100 records per branch

    1560 seconds (26 minutes)

    60 seconds of overhead + (100 records x 15 seconds of processing time) = 1560 seconds

    5 branches, 100 records per branch, sent in 2 different batches

    Each batch finishes in 810 seconds (13 ½ minutes), for a total processing time of 1620 seconds (27 minutes) per branch

    [60 seconds of overhead + (50 records x 15 seconds of processing time)] x 2 = 1620 seconds

  4. Choose the right scenario for your requirements

    Consider your calculations. Build in some wiggle room for periods of higher-than-usual volume, network latency, and other unforeseen issues. Then, choose an approach.

    Oracle recommends testing an integration and robot under load before going live to confirm that your approach will succeed in the real world.