A Configuration Guide for Exadata

This guide describes the Exadata configuration for Oracle Healthcare Foundation. It contains minimum parameter requirements for a large deployment of about 10 million patients.

The contents of this guide apply only to the Oracle Healthcare Foundation ETL platform and does not consider any application stack.

This guide contains the following sections:

A.1 System Configuration

A.1.1 Exadata Component Configuration

Table A-1 Exadata Components and Configuration

Component Configuration

Exa check

Run the Exachk utility and verify the adherence to the recommended Exadata best practices.

For more information, see article ID 1070954.1 on My Oracle Support (https://support.oracle.com).

RAC

Use defaults and native load balance.

High performance disks

We recommend 15000 revolutions per minute (RPM) high performance disks.


A.1.2 Database Configuration

A.1.2.1 Database I/O Calibration

Run the I/O calibration tool for each database to set the automatic degree of parallelism.

For more information on I/O calibration and automatic degree of parallelism, see article IDs 727062.1 and 1269321.1 on My Oracle Support (https://support.oracle.com).

A.1.2.2 Resource Manager

Use the Oracle Resource Manager to manage and prioritize the resource allocation for different applications, databases, instances, users, and so on.

For more information, see article ID 1339769.1 on My Oracle Support (https://support.oracle.com) and the White Paper Using Oracle Database Resource Manager at http://www.oracle.com/technetwork/database/database-technologies/performance/resource-manager-twp-133705.pdf.

A.1.2.3 Database Parameters

Table A-2 Recommended Settings for Database Parameters

Database Parameter Recommended Settings

processes

1500

parallel_max_servers

500-1280

parallel_min_servers

128

parallel_degree_policy

LIMITED

parallel_adaptive_multi_user

TRUE

parallel_degree_limit

CPU

parallel_force_local

TRUE

sga_target

40 GB

sga_max_size

40 GB

pga_aggregate_target

If HugePages is used, set this parameter to 20 GB. Else, set it to 12 GB.

memory_target

If HugePages is used, set this parameter to 0. Else, set it to 60 GB.

memory_max_target

If HugePages is used, set this parameter to 0. Else, set it to 60 GB.

db_block_size

8192

workarea_size_policy

AUTO

db_cache_size

2 GB

shared_pool_size

2 GB

optimizer_features_enable

See Optimizer Features Enable

optimizer_secure_view_merging

False

_px_join_skew_handling

FALSE

result size max

0

_optimizer_dsdir_usage_control

0


A.1.2.4 Parallel Degree Policy

Set the parallel_degree_policy parameter to LIMITED to use the automatic degree of parallelism and smart scans.

  • The Omics Data Bank (ODB) application uses Exadata SQL processing innovations. Setting this parameter to LIMITED, disables the in-memory parallel execution that leads to better use of smart scans.

  • Statement queuing is disabled to avoid any waits under heavy load where there are chances of draining all PX servers thereby impacting response times.

A.1.2.4.1 Parallel Degree Limit

Set the parallel_degree_limit parameter to CPU or below to improve the response time. A resource-intensive query may request too many PX servers, draining available resources. Use the Resource Manager to limit the resource allocation to intensive or long running processes.

A.1.2.4.2 Adaptive Tuning

Set the parallel_adaptive_multi_user parameter to TRUE, which lets Oracle downgrade PX requests under stress. You should not over-burden the database CPUs by allocating all requested PX servers, especially when the database nodes are running at over 80% CPU load.

Alternatively, you can use the Resource Manager to control resources.

A.1.2.4.3 Parallel Max Servers

Set the parallel_max_servers parameter between 500-1280 to make sure that there are enough PX server processes to allocate to concurrent user requests. In benchmark tests, this was set to 512.

A.1.2.4.4 Parallel Min Servers

Set the parallel_min_servers parameter to CPU to ensure the availability of a specific number of PX server processes at startup.

A.1.2.4.5 Memory Settings

Set the Automatic Memory Management (AMM) using the memory_target parameter to 60 GB.

Also, you have to set the Program Global Area (PGA) (pga_aggregate_target parameter) to a minimum of 20/10 GB depending upon whether Hugepages is used and the System Global Area (SGA) (sga_max_size parameter) to a minimum of 40 GB.

Note:

If you use HugePages, set all DB memory parameters appropriately as AMM does not work with HugePages. If you encounter any shared pool issues with HugePages, switch to alternate recommended settings.
A.1.2.4.6 Optimizer Features Enable

Although Oracle Healthcare Foundation is benchmarked with 12.2.0.1, we recommend you to upgrade to the latest version of the optimizer when it is available.

If the SQL execution plans are not optimized in the later version, consider using the 12.2.0.1 optimizer.

A.1.2.4.7 Processes

Set the processes parameter to a sizeable number based on the expected concurrency. The minimum value should be 1024. In internal benchmarks, it was set to 2000 based on concurrent loads.

A.1.2.5 Oracle Automatic Stats Job

Disable the Oracle Automatic Stats job if it is configured.

Automatic statistics scheduler jobs run under different maintenance windows (such as daily, weekly, and so on). This may potentially generate bad statistics due to data availability (for example, the application may have transient data or no data in stage tables).

A.1.2.6 Tablespace and Redo Log Recommendations

Table A-3 Tablespace Recommendations

Tablespace Recommended

TEMP

500 GB across all nodes.

UNDO

500 GB per instance.


Note:

Use the BigFile tablespace for TEMP, UNDO, and user (HDI, HDM, and HCD) tablespaces to ease tablespace maintenance.

Configure the redo logs to have at least five groups per instance with each member having a size of 500 MB or higher.

A.1.2.7 ASM Recommendations

Redundancy - Oracle uses NORMAL redundancy for all benchmarks. You can use HIGH redundancy as per the data mirroring requirements. However, note that the usable storage is lower when using ASM with HIGH redundancy disk groups.

ASM AU Size - We recommend you to use the default setting for the disk sequential reads.

A.1.2.8 Maximum Availability Recommendations

You may want to consider the recommendations in the White Paper Best Practices for Database Consolidation on Exadata Database Machine at http://www.oracle.com/technetwork/database/features/availability/exadata-consolidation-522500.pdf.

A.2 Application Considerations

A.2.1 General Recommendations

  • Oracle Table or Schema Statistics:

    Configure the scheduled jobs to collect the Oracle statistics using the following method. This should be configured after the initial load is complete.

    exec dbms_stats.gather_schema_stats (ownname=> '<OWNER>', method_ opt=> 'for all columns size auto', estimate_percent=> dbms_ stats.auto_sample_ size, cascade=>true, block_sample => true, options => 'GATHER STALE')

  • Data files or flat files (for example, ODB result and reference data files) should be staged on Database File System (DBFS) as the Oracle DB provides much better security, availability, robustness, transactions, and scalability than the traditional file systems. For setting up DBFS, see article ID 1954431.1 on My Oracle Support (https://support.oracle.com).

A.2.2 Oracle Healthcare Foundation

  • See Data Model Installation for Exadata-specific configurable compression options and index reduction notes.

  • If advanced compression is used during installation (for certain business cases), you may still want to consider HCC compression for cold (least DML activity) or old table partitions. Use the following method to implement HCC and for better compression ratios for cold or old table partitions:

    1. Perform annual or biannual maintenance to change the table partition compression mode to HCC Query High.

    2. Rebuild the table partitions using the alter table .. move .. SQL command.

      Rebuilding all the unique and non-unique indexes on these tables is necessary as the table or partition rebuild invalidates index structure. See Oracle Healthcare Foundation Administrator's Guide (hdm_hcc_maintenance_exadata.sql)to change the compression for partitioned tables to hybrid columnar compression (QUERY HIGH).

  • Oracle Table or Schema Statistics:

    HDI source tables should have proper table statistics before the initial load. See Oracle Healthcare Foundation Administrator's Guide for details on gathering statistics and for ETL specific recommendations regarding parallel options.

  • Oracle Healthcare Foundation Data loads:

    • When you load HDI data, make sure that you load data using the direct path insert to obtain a high compression ratio. For more information about improving INSERT performance with Direct-Path INSERT, see the Oracle Database Administrator Guide, and implement accordingly for your ETL loads to HDI.

    • If direct path insert is not possible due to code or ETL limitations, rebuild the tables or partitions using the alter table .. move .. SQL command (indexes on these tables should be rebuilt) to achieve better compression. However, this leads to regular maintenance of these tables to achieve maximum compression. Oracle recommends you to create scripts or ETL to support direct path insert rather than resorting to regular maintenance. For more details, see Oracle Healthcare Foundation Programmer's Guide.

A.2.3 Cohort Data Model

See Oracle Health Sciences Translational Research Center Installation Guide for Exadata specific configurable partitioning and parallel options.

A.2.4 Omics Loaders

  • Run the result files in parallel to achieve maximum performance.

    Note:

    You can execute maximum 4gVCF loaders on a single Exa node.
  • For better storage savings, load more than 60 samples or 200 million rows per result table. Small sample sets may not compress up to 90%. Run the post processing script, load_exadata.sh, at the end of the day after loading a minimum of 60 samples or more than 200 million rows for high compression using HCC.

  • See Data Model Installation for Exadata specific configurable partitioning and parallel options.