2 Installing the Database Tier

This chapter describes security considerations and the installation process for the database tier. It contains the following topics:

Section 2.1, "Prerequisites"
Section 2.2, "Database Physical Layout Considerations"
Section 2.3, "Installing the Database Tier"
Section 2.4, "Upgrade Considerations"
Section 2.5, "Post-Installation Database Tier Steps"
Section 2.6, "Creating Synonyms for TRC Objects (Optional)"
Section 2.7, "Recommended Patches for Known RDBMS Issues"

2.1 Prerequisites

Following are the prerequisites for installing or upgrading the database tier:

Download the TRC installation package from the Media Pack and extract the files into a directory.

The Database folder contains all the scripts required for the TRC database tier installation. The master install script install_TRC.sh is in the master_install directory.
If an Oracle Exadata Appliance is used for the database tier, see Oracle Health Sciences Translational Research Center Suite Exadata Configuration Guide (Doc ID 1557221.1).
TRC requires several roles to exist, which are used for controlling access to different database features. Create the following roles. The installation of the database tier will check their existence:
- create role CohortDatamartUser;
  
  This is a read-only role to access the CDM objects. This role should be assigned to a user that needs to use SQL to query the CDM data.
- create role CohortDatamartAdmin;
  
  This is a role that can be used for ETL operations if the CDM schema account cannot be used directly to run the ETL code.
- create role OmicsDatamartUser;
  
  This is a read-only role to access the ODB objects. This role should be assigned to a user that needs to use SQL to query the ODB data.
- create role OmicsDatamartAdmin;
  
  This is a privileged role to let a user run the reference loaders that insert data into the reference tables. This role lets a schema user load reference data.
- create role OmicsDatamartContributor;
  
  This is a privileged role to let a user run the result loaders that load data for patients or subjects. This role lets a schema user load result data.
- create role TRC_APPLICATION_ROLE;
  
  This role is used by the application to access various objects in other schemas. This role is not intended to be assigned to any user.
- create role JOBENGINE_ADMIN;
  
  All JOB_ENGINE roles are reserved for future use to access the JOB schema code. There is no direct access at this time to use the job engine schema and these roles should not be assigned to a schema user.
- create role JOBENGINE_MANAGER;
- create role JOBENGINE_PROGRAMER;
- create role JOBENGINE_USER;
- create role JOBENGINE_VIEWER;
The following five database schemas are required to store TRC objects. It is recommended that each schema have a separate default tablespace and that each schema have quota granted on the tablespaces required:
- CDM - Cohort Data Mart (CDM) schema stores the clinical information that is refreshed by ETLs.
- ODB - Omics Data Bank (ODB) schema stores the omics data linked to patients as well as genetic reference data to link.
- APPS - Application (APP) schema stores information used by the User Interface.
- ENT - The Enterprise (ENT) schema stores system-wide configuration information like product version.
- JOB_ENGINE - Job Engine (JOB_ENGINE) schema tracks all jobs executed and provides templates for jobs.

Create TRC tablespaces to store schema objects. Each schema should be granted quota on each of these tablespaces. Each tablespace should be created with options to match the anticipated size and use. For details on creating tablespaces, see the Oracle® Database Storage Administrator's Guide 11g Release 1 (11.1).

Database Schema	Tablespaces
ENT	ENT_DATA (default)
CDM	CDM_DATA (default) CDM_INDEX
ODB	ODB_DATA (default) ODB_INDEX ODB_LOB
APPS	APPS_DATA (default)
JOB_ENGINE	JOB_DATA (default) JOB_INDEX DBFS_TBS DBFS_ LOB

Note:

The CDM schema now stores HIPAA identifiable attributes. Make sure that tablespaces are encrypted to protect the at-rest data. Contact your database administrator for encrypting the tablespaces. Using encrypted tablespaces requires a license for the Advanced Security Option. For details, see the Oracle Database Administrator's Guide 11g Release 2 here https://docs.oracle.com/cd/E11882_01/server.112/e25494/toc.htm.

Run the following SQL script to grant all required system privileges and roles to the schema users. This script can be executed as SYS database account and is available in the master_install directory (replacing each schema name with the correct schema name used):

@grant_schema_priv.sql cdm_schema odb_schema app_schema job_schema ent_schema
Edit the 3 partitioning script files to match the expected user requirements. These scripts should be modified to match the expected amount of data to maximize the efficiency of each partition. Queries using these tables use the partition keys to limit the amount of data that is used in any query. Extreme care should be used to set the correct values needed. Following is a description of each script that should be modified:

Note:
For ODB, the partitioning is for either GENE or STUDY so you can update that particular script. The specimen partitioning is mandatory and has to be updated.
- cdm_install/cdm_define_partitions.sql: This script defines the number of hash partitions on the bridge tables in the CDM schema. The bridge tables have the most amount of data since they store patient specific information for various dimensions. The default number of partitions is set to 1024. Any number you use must be a power of 2.
  
  Note:
  Each definition can be set to an empty string if no partitions are used, and the definition for the partition index clause should also be set to an empty string.
- odb_install/define_partition_hash_gene.sql: This script defines the number of hash partitions for each result table when gene partitioning is used to create the schema. The gene partitioning mode uses a hash sub partition for all result records linked to specific GENE records. The value used should be a power of 2 and reflect the anticipated amount of data. There is also a hash value for the number of partitions used for the W_EHA_VARIANT_BROWSER_AGG table that collects information for each patient.
- odb_install/define_partition_hash_study.sql: This script defines the number of hash partitions used for each result table when study partitioning is used to create the schema. The study partitioning mode uses an interval partition on the study foreign key and then a hash sub partition for all result records using the foreign key to the GENE field. The values used restrict the number of studies that can be supported. Each Oracle table has a limitation of 1048575 (1024K - 1) total physical partitions. This implies that the maximum value for STUDY key cannot exceed the maximum number of partitions divided by the hash value. For the default value of 128, the maximum STUDY value is 8191. If a large number of studies are anticipated, then the hash value should be adjusted to allow for the total number of studies. The value used should be a power of 2. There is also a hash value for the number of partitions used for the W_EHA_VARIANT_BROWSER_AGG table that collects information for each patient.
- odb_install/define_partition_hash_spec.sql: This script defines the number of hash partitions for the specimen partitioning used for all result tables. The specimen partitioning will create a chromosome based hash main partition (if chromosome exists in the data) and then a specimen based hash subpartition. The values used should reflect the anticipated amount of data to keep the amount of data in each partition manageable. For details on the specimen partitioning, see the Oracle® Health Sciences Translational Research Center Programmer's Guide.

2.2 Database Physical Layout Considerations

Oracle recommends creating each TRC schema with a dedicated default tablespace. The TRC installer places schema tables into these default tablespaces. Oracle also recommends creating separate tablespaces for each TRC schema to store index and LOB segments. These tablespace names are supplied as installer parameters during installation.

The ODB loaders in TRC utilize large Global Temporary tables, which require adequately sized temporary tablespace. Some of the ODB loaders use very large transactions that require adequately sized UNDO tablespace and Redo logs. For more information on allocating and managing these objects, see the Oracle Database documentation.

2.3 Installing the Database Tier

Important:

Ensure that all prerequisites (Section 2.1) have been fulfilled.

2.3.1 Installing the Oracle Health Sciences Translational Research Center Database Framework

Perform the following steps to install the TRC database:

Log into the operating system that can execute the bash shell command language, and change directory to where the TRC installer is extracted. This environment should have Oracle client software installed and be set to run sqlplus.
Navigate to the master_install directory.

Execute install_TRC.sh from the master_install directory with the following parameters:

Note:

Parameters marked with a * are mandatory.

Table 2-1 Installation Parameters

Parameter	Description
-cdm_schema <CDM Schema User> *	User name of the CDM schema user that was created above.
-odb_schema <ODB Schema User> *	User name of the ODB schema user that was created above OR the name of the ODB 3.0.2.1 schema that you want to upgrade
-app_schema <APPS Schema User> *	Username of the APPS schema user that was created above.
-enterprise_schema <Enterprise Schema User>	Username of the Enterprise schema user that was created above.
-db_conn <Database Instance Name> *	The SID or the service name of the database instance.
-cdm_indx_tablespace <CDM Index TablespaceName> *	Name of the tablespace to be used for CDM indexes.
-index_tablespace <Index TablespaceName> *	Name of the tablespace to be used for ODB indexes.
-lob_tablespace <LOB Tablespace Name> *	Name of the tablespace to be used for ODB LOB.
-promoter_offset *	A promoter region is an upstream region to the gene starting position (or end position for complement genes). The value of promoter_offset parameter is used to define the default length of this promoter region in ODB when gene definitions do not have regions existing in the reference data.
-flanking_offset	Flanking_offset is defined as the length of the upstream and downstream regions of a gene. This value, stored in the FLANKING_OFFSET column, is used to associate result records to specific genes based on the positions of the result data. Oracle does not recommend changing this parameter after the result tables are populated since all of the result data will have to be reloaded to have the proper gene mapping.
-partition_mode *	This parameter lets you specify the result partitioning for the ODB schema as either study or gene. The recommended value is gene. If queries should retrieve data from all studies, then use the gene option. If queries are restricted to specific studies, then select the study option.
-db_platform <Database Platform> *	This parameter is used to install a specific DDL for the Exadata database platform. This must be set to exadata or default (case sensitive).
-cdm_compress	The compression mode to be applied to CDM tables. Values are nocompress, oltp or query high. Use the query high value only for file systems where files for CDM default tablespace reside on support hybrid columnar compression.
-odb_compress	The compression mode to be applied to ODB tables. Values are nocompress or oltp. Skip this parameter for Exadata installations.
-job_schema <JOB Schema User> *	The JOB_ENGINE schema user created earlier.
-job_store <store name> *	A string value used to create a new DBFS Content store used by the JOB_ENGINE schema.
-job_index_tbs <Index Tablespace Name of JOB schema> *	Name of the tablespace to be used for indexes for JOB_ENGINE schema.
-dbfs_tbs <default table space> *	Name of the tablespace to be used for the DBFS store created for the JOB_ENGINE schema.
-dbfs_lob_tbs <LOB Tablespace Name> *	Name of the tablespace to be used for storing LOB data in the DBFS store created for the JOB ENGINE schema user.
-max_parallel_degree	Maximum parallel degree value. This value should be set to match the capabilities of the database server used. For more information on determining the optimal value, see Appendix A, "Parallelization".
-log_file <Log File Name>	The name of the log file to which the installation log will be written. This parameter defaults to install_TRC.log and is used to log all output from scripts that install the database tier.
-err_file <Error Log File Name>	The file where installation errors will be logged. This parameter defaults to install_TRC.err and is used to log specific database errors that make the installation invalid.

Some examples for executing install_TRC.sh are as follows. This command is used for both install as well as upgrade mode. For details on the upgrade matrix, see Section 2.4.

Installation on Exadata:

sh install_TRC.sh -cdm_schema cdm03st2 -odb_schema odb03st2 -app_schema apps03st2 -enterprise_schema ent3st2 -db_conn slc04lx3 -cdm_indx_tablespace CDM3_INDEX -index_tablespace ODB3_INDEX -lob_tablespace ODB3_LOB -promoter_offset 200 -flanking_offset 200 -partition_mode gene -job_schema je03st2 -job_store JOBSTORE2 -job_index_tbs INDEX_JE -dbfs_tbs TMP_JE -dbfs_lob_tbs LOB_JE -db_platform exadata -max_parallel_degree 8 -cdm_compress oltp

Installation on Non Exadata:

sh install_TRC.sh -cdm_schema cdm03st2 -odb_schema odb03st2 -app_schema apps03st2 -enterprise_schema ent03st2 -db_conn slc04lx3 -cdm_indx_tablespace CDM3_INDEX -index_tablespace ODB3_INDEX -lob_tablespace ODB3_LOB -promoter_offset 200 -flanking_offset 200 -partition_mode gene -job_schema je03st2 -job_store JOBSTORE2 -job_index_tbs INDEX_JE -dbfs_tbs TMP_JE -dbfs_lob_tbs LOB_JE  -db_platform default -max_parallel_degree 8  -odb_compress oltp -cdm_compress oltp

The install_TRC.sh script will prompt for the following passwords so that no passwords are passed on the command line:
- ODB schema password
- APPS schema password
- CDM schema password
- ENTERPRISE Schema password
- JOB_ENGINE schema password

2.4 Upgrade Considerations

The upgrade portion of the installer supports upgrading the ODB schema from any of the existing versions to the current version. The CDM schema must use the following upgrade matrix for TRC 3.1.

Note:

If there is a need to avoid these upgrade paths, the CDM schema can be dropped and recreated. Then a full refresh from HDWF can be performed.

Following is the upgrade matrix for TRC 3.1:

Table 2-2 Upgrade Matrix

Source Version	Upgrade Path	Destination Version
2.5.0	=> 3.0=>3.0.2 =>3.0.2.1	3.1.0
3.0.1	=> 3.0.2 =>3.0.2.1	3.1.0
3.0.2.1	direct	3.1.0

2.4.1 Overview

The same installation scripts are used for upgrade or new installation. All of the prerequisites are required for existing or new installations. The grant script should also be executed for upgrading existing installations in case the privileges were different from an earlier installation. Upgrading existing schemas will strip the schema to remove all database objects except for tables, sequences, and primary keys. The installation scripts handle errors for existing tables and also have alter statements to make sure the existing tables are modified to match the current version. If there are custom user defined views, indexes, packages, and so on, all of the custom user objects will have to be recreated after the upgrade.

The installation scripts check the existing data during upgrade and transform any data that needs to be updated for the current version. The install scripts are written to try and handle every anticipated set of data, and if there is some reason why the upgrade process fails, the errors will be logged for analysis. The data errors can then be fixed and the same install scripts are reentrant and can be started again and will continue data transformations.

In previous versions of TRC, DBFS had to be mounted on each database node and Java had to be installed. TRC 3.1 does not require this. Java can be removed and any commands to mount DBFS can be removed.

2.5 Post-Installation Database Tier Steps

ODB schema loaders require an Oracle directory object to be passed to each loader. The database administrator should create the required Oracle directory objects and ensure that appropriate permissions are set to let ODB schema user load result files. For more information, see the Oracle Health Sciences Translational Research Center Programmer's Guide.

2.6 Creating Synonyms for TRC Objects (Optional)

If desired, a set of synonyms for TRC objects can be created in any additional database schema. Firstly, appropriate TRC role(s) should be granted to this database user. Then, connect as this database user and execute the following script from the master_install directory:

Note:

The DB user should have CREATE SYNONYM and CREATE SESSION privileges.

@create_synonym_for_user.sql &&cdm_schema_name &&odb_schema_name &&apps_schema_name &&jobengine_schema_name

2.7 Recommended Patches for Known RDBMS Issues

Oracle recommends that you apply the following RDBMS patches to prevent known RDBMS issues. At the time of this guide's release, these patches have not been incorporated into any available critical patch update.

ORA-07445

ORA-07445: exception encountered: core dump [kkexGetUStat()+68] after statistics is collected using GLOBAL_STATS option. Apply patch 13537901 available on My Oracle Support. For details on downloading patches from My Oracle Support, see Section *, "Finding Information and Patches on My Oracle Support".

ORA-01008

ORA-01008: not all variables bound in parallel queries while running a Cohort Report.

For Exadata systems, apply patch 17258090 available on My Oracle Support. For details on downloading patches from My Oracle Support, see Section *, "Finding Information and Patches on My Oracle Support".
For Linux 64-bit and Linux x86-64 systems, apply patch 18332139 available on My Oracle Support. For details on downloading patches from My Oracle Support, see Section *, "Finding Information and Patches on My Oracle Support".

2.7.1 Other Known RDBMS Issues

ORA-4030

ORA-4030 (ORA-04030: out of process memory when trying to allocate 4024 bytes (koh dur heap d,rworotdTransDeq)) when running ODB Loaders. Follow the procedure in Doc ID 399497.1 available on My Oracle Support at the following location.

https://support.oracle.com/epmos/faces/DocumentDisplay?id=399497.1

2.8 Login Query Performance Fix

The following SQL was observed to be a high consumer of CPU resources in Performance Benchmarking runs.

SQL_ID SQL_TEXT

ctwj1682wsfsf select plc.*, u.username from TRC_USER u, (select count(plm.patient_list_id) patient_count, pl.patient_list_id,pl.patient_list_name,pl.patient_list_desc,pl.user_id,pl.changed_on_dt,pl.privacy from TRC_PATIENT_LIST pl,TRC_PATIENT_LIST_MEMBER plm where pl.user_id=:USERIDBV and pl.CONTEXT_FLAG = :contextFlag and pl.DELETE_FLG = 'N' and plm.patient_list_id=pl.patient_list_id group by pl.patient_list_id, pl.patient_list_name, pl.patient_list_desc, pl.user_id, pl.changed_on_dt, pl.privacy order by pl.changed_on_dt ) plc where plc.user_id = u.user_id order by plc.changed_on_dt desc

SQL_ID	SQL_TEXT
ctwj1682wsfsf	select plc.*, u.username from TRC_USER u, (select count(plm.patient_list_id) patient_count, pl.patient_list_id,pl.patient_list_name,pl.patient_list_desc,pl.user_id,pl.changed_on_dt,pl.privacy from TRC_PATIENT_LIST pl,TRC_PATIENT_LIST_MEMBER plm where pl.user_id=:USERIDBV and pl.CONTEXT_FLAG = :contextFlag and pl.DELETE_FLG = 'N' and plm.patient_list_id=pl.patient_list_id group by pl.patient_list_id, pl.patient_list_name, pl.patient_list_desc, pl.user_id, pl.changed_on_dt, pl.privacy order by pl.changed_on_dt ) plc where plc.user_id = u.user_id order by plc.changed_on_dt desc

Resolution

To tune the performance of the Login Query, use Oracle's DBMS_SQLTUNE feature. See the Oracle documentation available at the following location to create and manage plan baselines.

https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/d_sqltun.htm#CHDFECGH

Verification

To verify that the Login Query performance issue has been fixed, generate AWRs after running a representative workload on the system. The Login Query should not appear as a top resource consumer anymore.