What are the best practices for high volume data import?

Here are some best practices for high volume data import.

At some time, you might need to load large volumes of data into your CX Sales and B2B Service instance. For example, during initial roll-out, expanding use of CX Sales and B2B Service to other users or divisions, or expanding use of CX Sales and Fusion Service to include other object types (for example, adding support for Service objects).

When loading large volumes of data to CX Sales and Fusion Service, one of the key considerations is the time taken to import data. Reducing the amount of time required to import a large volume of data is crucial, if you have a limited period of time to run the data imports (for example, during a weekend migration). Oracle recommends certain best practices that you can use to improve the performance of the data import.

Actual throughput varies for users based on a number of factors including the volume of data contained in each record, the number of relationships being created during data import, and the number of records in your source file.

Before executing the data import in a production environment, Oracle recommends that you perform these functional validation and performance optimization procedures in your test environment:

  • Data Cleansing

  • Validate Functional Import

  • Optimization Process

Performing these exercises in your test environment ensures that these exercises don't impact existing operations or the data in your production environment.

Data Cleansing

Before proceeding with high volume imports, you need to clean the data being imported thoroughly. Ensuring that your data is in the correct format reduces the number of errors that are encountered during data import. This benefits your overall throughput in two ways:

  • Processing time isn't wasted on attempting to import records that can't be imported.

  • You don't have to review error logs, fix issues, and reimport data. It costs less to clean your data before attempting a data import than it does to fix issues that arise during data import.

When cleaning the data, note the following:

  • Field Length - ensure that the data doesn't exceed the maximum field length in OSC

  • Data Type - ensure that the data is of the correct type for the field to which it's being imported in OSC

  • Required - ensure that a value is provided for all mandatory fields

  • Conditionally Required - in some cases, a value is required based on the value provided for another field (for example, for Opportunity a Close Date is required if the Status of the Opportunity is set to Closed)

  • Foreign Keys - provide a valid value for foreign keys to create relationships between records. The foreign keys can vary for every object.

By ensuring that your data is clean, you can avoid many issues associated with importing your data from your legacy application into CX Sales and Fusion Service.

Validate Functional Import

To verify that you have successfully cleaned your data, Oracle recommends that you validate your import in your test environment with a small set of data. Follow these guidelines to validate the functional import:

  • Before you import large volumes of data, ensure that your import completes successfully and that the resulting data matches your expectations. This includes ensuring that the data is imported to the correct object, fields have been mapped correctly, and relationships are imported properly.

  • Import a small number of records including values for each attribute. Oracle recommends that you use the smallest set of input data possible that can still save the variations present in your full data set.

  • Identify any data issues that are reported in the import log files. Fix any specific issues reported for rows with errors. If a row doesn't have an error reported, then don't reimport that row. After you resolve all the issues and successfully import your subset of data, apply any changes to your larger data set (for example, if you determine that your time stamp format is incorrect, fix this for all records in the input data file for your full data set).

Optimization Process

Importing high volume of data is a project and should be included in the project plan for your implementation. You need to allocate time for the actual import of data and for the design and optimization of the import process for your data set.

When optimizing the data import, you identify the number of records to be included in your input data file as well as the number of concurrent import activities that can be activated. You trigger concurrent import activities by running more than multiple data import operations simultaneously. The number of concurrent import activities is the number of data imports that are running and haven't yet completed. Running import activities serially or running too many import activities concurrently can lower your throughput. Finding the right values for these two parameters is a key aspect of designing your import process. There's no right answer for every situation. Every data set is different; you need to repeat this process for any new data import operations that you plan to perform.

Note: Don't import more than 100,000 records in a single import operation or perform more than five concurrent import activities.

Oracle recommends that you adhere to this process when planning high volume data import:

  • Establish baseline performance

  • Scale up import volume

  • Optimizing Concurrent Import Activities

Each of these steps is explained in greater detail.

Establish Baseline Performance

Oracle recommends that after you have cleaned your data and validated that your import is functionally correct, you should begin the process of scaling up your import volumes. The first step in this process is to establish a performance baseline. This involves performing a set of serial imports using a high volume of records.

Oracle recommends that you begin by importing a small number of records (a few hundred to a few thousand) in a single import activity. For example, you may import 10,000 records using four import activities of 2,500 records each. After these data import activities have completed, you should note the time required to complete the import of records and calculate the overall throughput (number of records imported per second).

Note: You may also want to track the time between data import operations.

Scale Up Import Volume

After you have established a baseline, you can increase the number of records in your input file and begin to run multiple imports concurrently. Scale the volumes in a controlled manner and note your throughput at every step.

Optimize Concurrent Import Activities

You can run multiple import activities simultaneously. This can improve your overall throughput versus serially running the same imports. The optimal number of concurrent import activities can vary across objects and use cases; you must perform testing prior to beginning your production data load to identify the most efficient data import plan.

You can run multiple data import activities concurrently by submitting a new import activity while the first activity is still being processed. CX Sales and Fusion Service processes both import activities without waiting for the first activity to complete before starting the second activity. You can continue to run multiple data import activities concurrently for additional data import activities.

For example, you first run a test to measure the throughput when importing 200,000 records using four serial activities of 50,000 records each. You may then run a test to import a total of 600,000 records as shown in this table:

Test Number

Total Number of Records

Number of Records per Activity

Number of Activities

Concurrency

Average Throughput

1

200,000

50,000

4

Serial

2.1

100,000

50,000

2

Concurrent

2.2

200,000

50,000

4

Concurrent

2.3

300,000

50,000

6

Concurrent

At every step, you should note the average throughput. If you reach a point where throughput is decreasing, then you may have exceeded your optimal concurrency. You may want to repeat the tests to ensure that you have sufficient data points to draw a correct conclusion.

You can import file attachments to some record types. Importing attachments impacts import performance and should be considered when planning your data import activities.

You can also adjust the data import based on the number of records in your input data file. For example, start with 25,000 records and increase by 25,000 records with every run. As you increase the number of records in each batch, note the average throughput. If you reach a point where throughput is decreasing, then you may have exceeded your optimal number of records. Oracle doesn't recommend importing more than 100,000 records in a single import activity and exceeding five concurrent import activities.

Pod Size

Significant changes to your usage patterns such as adding many new users or the execution of high volume data imports may change your pod sizing assessment. Ensure that your environment is properly sized to handle the import data volumes generated by your organization along with other operations that are performed by your users. Before starting a high volume import process or one time data load, you may want to have Oracle review your sizing. Contact Oracle help desk to start this review process.