Design for Resiliency

Design for Restartability

Some integration processes can be very complex and have dependencies with other processes. They can go into a number of stages before data is finally loaded into the final target system. To help with restartability think of adding an additional column to your staging tables to track and identifier of the current load (load_id or load_date, or anything that uniquely identifies this load). This will allow for better traceability and will make your cleanup procedures easier to design.

What does that mean for Oracle Data Integrator?

Use variables to keep track of the identifier of the process that is being loaded, and if possible add these values to the intermediate data stages (additional columns in your staging tables).
Use these variables to set a different session name (SESS_NAME) for each scenario execution: this will let operators immediately identify processes that fail knowing exactly where to look. See Use Unique or Dynamic Names for Scenario Sessions.
Taking the previous example where the same process is used to load hundreds of files, it is more practical to have a job named after the file that is being processed than having the same generic job name for all files.

Design to Limit Outage Impacts

Running massive extracts at pre-defined times has a large impact on your entire infrastructure.

For example:

The source system is impacted as it has to serve the request.
The network is impacted as the bandwidth is inundated with the service of the request.
Overall integration jobs take longer to run because of the sheer volume of data that has to be processed.
A small blip or outage in your network at integration time has a major impact on your integration task.

Integration jobs do not have to be either batch or real time. Often times, they can be both. If the ultimate load must be a batch operation (because data is consolidated or aggregated for instance), the extraction and some pre-integration processes can be performed in a more real-time manner. This reduces the load on the overall infrastructure, and limits the impact should you face an outage when trying to access the source system at integration time. If the data has been extracted and prepared in a streaming manner, you do not need to access the source system when it is time for the final integration.

Oracle Data Integrator provides a number of tools that can be used in the construction of packages to detect that new data is available. See Use Event-Driven Tools for a list.

For integration with true real time replication, Oracle Data Integrator can create an infrastructure that will allow for the consumption of changes, leveraging the APIs mentioned above.

Choose Between Inserting and Merging Data

There are tradeoffs between an INSERT and a MERGE approach to data loading. Beyond the integration strategy, you may want to consider what happens when a load fails partially.

Depending how you are loading the data in your target system, differentiating what was properly loaded versus what failed, or identifying the elements of a partial load can be quite complex. Even though from a design perspective all you are doing is appending data in the target system, it can be useful from a recoverability perspective to consider the benefits of merging the incoming data with the data that is already in the target system.

If you are choosing this approach, you will want to double-check the impact this strategy has on the performance of your loads. But keep in mind that a fully optimized INSERT load that fails is not faster than a less efficient MERGE that succeeds.

From an Oracle Data Integrator perspective, changing from an INSERT strategy to a MERGE strategy is a very simple operation: you only have to change your integration strategy and select the appropriate Knowledge Module. This said, changing knowledge modules for a large number of mappings can be a daunting task. You can automate such a task by using the Oracle Data Integrator SDK.

Design to Limit Planned Outages

Planned outages are usually required for patching and upgrades.

In a cloud environment where patching and upgrades are more and more seamless from and end-user perspective, the last thing we want is to be forced into an outage because we are patching the code of the integration processes. This means that patching must be part of the development strategy going forward to ensure that outages are kept to a minimum.

The execution unit for Oracle Data Integrator is a scenario. When a scenario is generated, it is associated with a version number (starting with 001). Scenarios can be re-generated (to overwrite the current version), or a new version can be generated (002, 003, etc.).

When invoking a scenario, Oracle recommends that you always specify version number -1. This will have two benefits:

Oracle Data Integrator will always use the latest version of your scenario. You will not have to change how you invoke these scenarios as you generate new versions;
Shortly after you create a new version of the scenario, this is the version that is run by Oracle Data Integrator. Configure the Agent Blueprint Cache Timeout describes how to control for possible delays. You do not have to stop and restart Oracle Data Integrator, or whatever external orchestration tool you are using, to have Oracle Data Integrator use the latest version of your integration scenarios.

Note: this approach is only possible if you do not have infinite loops in Oracle Data Integrator. Infinite loops are never recommended in Oracle Data Integrator (they actually qualify as an anti-pattern):

They clog the Oracle Data Integrator logs: log purges will never impact a running job. An infinite loop is always running, as such the corresponding logs cannot be purged.
They prevent live patching of scenarios: for ODI to pick up the new version of a scenario, it must have an opportunity to start that scenario. An infinite loop never ends… and as such never has a chance to be restarted.
Rather than having an infinite loop, you can finish your scenario by invoking that same scenario asynchronously: the last step of the scenario before ending is then to start a new copy of itself in a new session.