2 Introduction to Application Development in the Oracle Life Sciences Data Hub

This section contains information on the following topics:

About Application Development
Defining Objects
Developing a Business Application
Developing Standard Definitions and Modular Applications
Ensuring Data Currency

For an overview of Oracle LSH, see "Overview" in the Oracle Life Sciences Data Hub Implementation Guide.

About Application Development

The Oracle Life Sciences Data Hub (Oracle LSH) allows you to load data from different external systems (and from different trials on the same system), merge data and produce reports for interim analysis and clinical trial reporting, all in compliance with industry regulations.

To accomplish any of these goals you must develop an application. An application typically loads data from an external system or reads Oracle LSH data, merges and/or transforms the data, and produces one or more of the following outputs:

Reports. An application may generate simple reports—figures, listings, and tabulations—on data in any standard format such as TXT or EPS.
Report Sets. An application may create a complex single- or multi-volume set of reports with a single table of contents. With Oracle XML Publisher you can create custom templates and use them to produce PDF outputs.
Data Marts. An application may generate large data files in any of the following formats: text, Oracle Export, SAS X-port, SAS C-port, or SAS data sets. (The SAS formats are available only if you have SAS).
Data Visualizations. An application may make data available so that users can create ad hoc data queries in a tabular or graphical display using Oracle Business Intelligence Enterprise Edition.

Integrated Development Environments (IDEs) To transform data or produce reports, including the reports in Report Sets, you can use one of the several development environments, including SAS, SQL*Plus, SQL Developer, Oracle Reports, Oracle Business Intelligence Publisher (BIP) or any other system for which your company has built or bought an Oracle LSH adapter (products not produced by Oracle are not included with Oracle LSH and all Oracle products may not be included either; check with Oracle Sales for information). In each environment, you write a program or create a report as usual in that environment. You then store the source code in Oracle LSH under version control. When you run the program in Oracle LSH, the system sends the job to the appropriate engine for execution and stores the results in Oracle LSH with the security you specify.

For example, if you have SAS installed on your personal computer you can launch SAS directly from Oracle LSH and view Oracle LSH data from SAS as you develop your SAS program.

Using Existing Programs If you have already developed SAS, PL/SQL, or Oracle Reports programs that you want to continue to use, you can upload them to Oracle LSH.

Using Existing Data Sets, Tables, and Data Text Files You can load SAS data sets, Oracle tables, or text data files into Oracle LSH. Oracle LSH can read the structure of data sets and tables in an external system and automatically create Oracle LSH Tables with the same structure: Columns with the same data types, length, and name as the external table columns or data set variables. (Every Oracle LSH Table has all the required attributes of a SAS data set and an Oracle table.)

You can load data into Oracle LSH using text files in either fixed or delimited format. However in this case you must manually create the target Oracle LSH Tables before loading the data into them.

If you use Oracle Clinical, you can import all your Oracle Clinical Global Library questions, question groups, and discrete value groups, which Oracle LSH converts automatically to Oracle LSH Tables and other data structures.

Defining Objects

This section contains the following topics:

Predefined Object Types
Object Definitions and Instances, and their Containers

To develop an application in Oracle LSH, you define objects. For example, in Oracle LSH a Program is a defined object. In addition to writing program source code in a development environment, you must define a Program and each of its components (Parameters, Source Code, and other components) as objects in the Oracle LSH user interface.

Oracle LSH stores the objects you define as metadata in the Oracle LSH database under version control. This approach has the following advantages:

Validation. All defined objects have a validation status that allows you to track whether an object you are using has been fully tested and validated, for regulatory compliance. Changes to validated objects are tracked so that they can be revalidated. Your company must develop validation standards. You can validate legacy programs that you have migrated into Oracle LSH as well as Programs and other defined objects that you create in Oracle LSH. See "Validating Objects and Outputs".
Reuse. You can reuse object definitions, which ultimately saves work, minimizes the validation required, and promotes consistency. You can design and create a set of modular, small-scope Programs to promote reuse and validation (see "Object Definitions and Instances, and their Containers" and "Developing Standard Definitions and Modular Applications").
Security. The Oracle LSH security system allows you to control access to, and operations on, defined objects and their outputs (primarily reports and report sets); see "Applying Security to Objects and Outputs".
Classification. Your company can develop a classification system to label defined objects and their outputs and to create a customized user interface for outputs. Users can use these classifications to search for objects and outputs (see "Classifying Objects and Outputs").

Predefined Object Types

Oracle LSH includes a set of predefined object types. Each object type is specialized for a particular data handling function. There is a detailed chapter on each type of object in this book. For an overview of how all the object types fit together, see "Overview" in the Oracle Life Sciences Data Hub Implementation Guide. You can define the following types of objects:

Tables. Oracle LSH Tables have characteristics of both Oracle tables and SAS data sets, and can hold data from both SAS and Oracle source systems at the same time. See Chapter 4, "Defining Tables".
Programs. Programs transform and/or report data using source code of a single technology type: SAS, PL/SQL, or Oracle Reports. See Chapter 5, "Defining Programs".
Load Sets. Load Sets load data into Oracle LSH Tables from external systems that are integrated with Oracle LSH by an adapter or set of adapters. See Chapter 7, "Defining Load Sets".
Report Sets. Report Sets allow you to define the hierarchical structure of a set of reports, with each chapter and subchapter represented by an Entry in the Report Set. Each Report Set Entry can contain a Program that produces the content of that Chapter (for example, a SAS program that generates a summary table). The Report Set Entry can also contain text (called a narrative) that is included in the output.

When you submit a Report Set for execution, the system executes all the Programs on current data (or a data snapshot, if you have specified a snapshot) and creates a table of contents. If you use Oracle XML Publisher, the Report Set collates the outputs into one or more PDF documents based on custom templates. See Chapter 9, "Defining Report Sets".
Data Marts. Data Marts allow you to export data stored in Oracle LSH to a file. These files can be in a variety of different formats depending upon where you would like to use the Data Mart's output, including Oracle Export, SAS data sets, SAS transport, or text. See Chapter 8, "Defining Data Marts".
Workflows. You can use a Workflow to string together other Oracle LSH executables—Load Sets, Programs, Report Sets, or Data Marts—so that they execute as part of a single process in a predefined order. For example, if you have one SAS Program that converts raw data sets into merged data sets and another SAS Program that produces a summary report on the merged data, you can use a Workflow to execute those Programs in order.

A Workflow can contain conditional branches to handle the success or failure of each program or other activity. A Workflow can also generate Notifications— customized email messages and Oracle LSH Home Page messages to users—at specified points in the Workflow. There are two types of Notifications: those that request an action from the recipient (Approvals) and those that simply provide information (FYI). See Chapter 10, "Defining Workflows".
Business Areas. Business Areas provide a view onto Oracle LSH data for Oracle Business Intelligence Enterprise Edition for the purpose of allowing nonprogramming personnel to create ad hoc data visualizations. See Chapter 11, "Defining Business Areas for Visualizations".

Each of these objects contains subcomponent objects that you must also define. Details are included in the chapter on each type of object. Parameters and Variables have their own chapter, Chapter 6, "Defining Variables and Parameters".

Object Definitions and Instances, and their Containers

To promote the reuse of object definitions, Oracle LSH allows you to create multiple instances of the same object definition. The object instance is also a defined object, but it consists primarily of a pointer to the definition.

For example, if you have a standard demography table, you can create instances of the Table definition for use in many different studies. You may also have a standard Program that reads from the Demography table to create a report. You can create instances of the Program definition for use in different studies as well.

Organizational Objects

Oracle LSH has three types of container objects that hold object definitions and instances. These containers are nested, or contained within one another, as follows:

Domains. Domains contain Application Areas and object definitions that have been explicitly moved into the Domain.

In addition, Domains can contain child Domains that themselves contain child Domains, up to nine (9) levels, depending on the value of the Domain Nest Value profile setting for your Oracle LSH implementation. A Domain can contain any number of Domains at a single level. For example, if the Domain Nest Value profile is set to one (1), a top-level Domain can contain any number of child Domains, but those child Domains cannot contain any child Domains of their own.

You can use child and grandchild Domains to more finely organize object definitions and applications.

In general, Domain security should be set up so that only a few people can create, modify, or move object definitions in Domains. Only object definitions that have been thoroughly tested and approved for reuse should be moved into Domains.
Application Area. An Application Area contains a library of definitions developed specifically for a particular application and the Work Areas necessary to develop, test, and put the application into production.
Work Area. A Work Area contains instances of all the object definitions required for a particular application.

Figure 2-1 Oracle LSH Organizational Structure

Description of "Figure 2-1 Oracle LSH Organizational Structure"

For information on planning an organizational structure of Domains, Libraries, Application Areas and Work Areas for your company, see "Designing an Organizational Structure" in the Oracle Life Sciences Data Hub Implementation Guide.

Object Definitions and Instances

In order to use an object you must create both the object definition and an instance of it in a Work Area, and install the instance. If you work in a Work Area, the system automatically creates both the definition and the instance at the same time.

The primary objects you define—Tables, Programs, Load Sets, Report Sets, Data Marts, and Workflows—contain secondary objects. For example, a Table contains Columns and Constraints, and a Program contains Source Code, Parameters, Table Descriptors, Planned Outputs, and Execution Templates.

Object instances also contain objects, notably Execution Setups and Mappings. See Appendix B, "Object Ownership" for further information.

Modifying Object Definitions To modify an object definition, you must first check it out. Checking it out creates a new version of the definition—the one you are working on. No one else can check out the definition until you either check it in or undo the checkout. Existing versions of the object continue to function as before while you have the definition checked out. See "Understanding Object Versions and Checkin/Checkout" for further information.

The exception to this rule is Report Sets. Report Set definitions have a shared checkout system, so that after one user checks out a Report Set, anyone with the necessary security privileges on the Report Set can modify it. The system locks the sections people are working on to prevent conflicting modifications.

If you modify an object contained in the definition, the system implicitly checks out any secondary object you modify and implicitly checks it in again when you apply your changes. For example, if you check out a Program, modify and upload your Source Code, the system implicitly checks the Source Code out and then in again when you apply your changes.

Modifying Object Instances Anyone with Modify privileges on an object instance can modify it without checking it out. However, when you click the Update button to begin modifying the object instance, the system creates a lock on the object so that no one else can modify it at the same time. The system also implicitly checks out the underlying object definition and increments its version number. When you click Apply, the system releases the lock and implicitly checks in the definition.

Developing a Business Application

This section contains the following topics:

Working in a Work Area
Mapping Executables to Tables
Example of a Business Application

An application may be as complex as storing all the data and performing all the analysis required for a study, or as simple as running a single set of reports.

An application typically includes the following general steps, each of which requires a particular defined object type:

Make data available for processing by doing one or both of the following:
- Load data from one or more external systems using Load Sets.
- Use Programs to read data already stored in Oracle LSH.
Use Programs to transform the data in one or more steps.
Report and/or transport data out of Oracle LSH. Use Programs and Report Sets to generate reports. Use Data Marts to write data to files for transport out of Oracle LSH.

Use Tables to store data at each stage of processing in Oracle LSH.

You can use a Workflow to combine some or all of these steps into a single executable process. See "Predefined Object Types" for a description of each of these object types.

Figure 2-2 Example of Data Flow in a Work Area

Description of "Figure 2-2 Example of Data Flow in a Work Area"

Working in a Work Area

As an Oracle LSH application developer, or Definer, you work in a Work Area, creating instances of all the object definitions—Tables, Programs, Load Sets, Report Sets, Workflows, Data Marts, and/or Business Areas—required by the application.

Work Areas contain only object instances, but you can see the properties of the underlying object definition in the Work Area as well. In general, the instance properties are displayed in the upper part of the Properties screen for the object, while the definition properties are displayed in the lower part of the screen.

Creating New Object Definitions When you create an object instance in a Work Area, you have the choice between creating an instance of an existing definition or creating a new definition and instance at the same time. When you create a new definition and instance at the same time, the system simultaneously creates the object definition in the Application Area that contains the Work Area, and an instance of that definition in the Work Area. You work on the definition through the instance in the Work Area.

Follow your company's standard operating procedures (SOPs) for testing and validating each object you create (see "Validating Objects and Outputs").

Creating Instances of Existing Definitions When your company begins to use Oracle LSH, you will need to create many new object definitions. However, as time goes by your company will develop libraries of tested and validated object definitions. Your company may choose to move validated definitions that are suitable for reuse in other applications into Domain libraries.

If you create an instance of an existing validated object and do not modify it, you do not need to revalidate the definition. If you need to modify the definition because it is flawed or needs to be updated, you can do so. You can then update some or all instances of the definition. If you need to modify a definition because you need different functionality you can copy the definition and modify the copy. See "Reusing Existing Definitions" for further information.

Installing Work Areas You must install a Work Area and all its object instances to the database in order to run executable objects (Programs, Load Sets, Report Sets, Workflows, and Data Marts) or write data into Tables. See "Installing a Work Area and Its Objects" for further information.

Mapping Executables to Tables

To promote the reusability of Programs and other executable object definitions, Oracle LSH requires adding a subcomponent called a Table Descriptor to definitions of executable objects for each Table instance the executable reads from or writes to.

A Table Descriptor is very similar to a Table instance in that it consists of a pointer to a Table definition. The Table Descriptor acts as the interface between the executable and its source and target Table instances. You must map the Table Descriptor to the source or target Table to enable the executable to read from or write to the Table instance.

This extra definitional layer facilitates the reuse of executable definitions. The executable source code refers to source and target tables using the names of the Table Descriptors included in the executable definition. Those Table Descriptors can be mapped to Table instances with the same name and structure or a different name and, if Column data types and lengths are compatible, a different structure.

For example, if the demographic table is called DEMOG in one study and DEMO in another, you can use the same program to generate a demography report in both studies, if the data types and lengths of the corresponding Columns are compatible.

Also, if a Program reads from some but not all Columns of a table, you can create a Table Descriptor with only the necessary Columns and map it only to the required Columns in the Table instance. In that case it does not matter if the nonmapped Columns are incompatible.

In many cases, Oracle LSH can automatically map Table Descriptors to Table instances. If necessary, you can map manually. See "Mapping Table Descriptors to Table Instances" for further information.

Example of a Business Application

You have both a SAS and an Oracle system integrated with Oracle LSH, with demography and adverse events data for Study A in the SAS system and demography and adverse events data for Study B in the Oracle system. You want to merge and compare the data from the two studies.

To accomplish this you can do the following (see Figure 2-3 ):

Define Load Set definitions and instances: you need one Oracle-type Load Set to load both Oracle tables. You need one or two Load Sets for the SAS data sets. If you use a SAS transport file containing both data sets you can load both data sets with a single Load Set. If you load the data set files directly you need two Load Sets, one for each. During Load Set definition, you specify the external tables and data sets to load. The system does the following:
- Oracle LSH creates target Table Descriptors with the same metadata structure as the source tables and data sets.
- When you invoke the Table Instances from Existing Table Descriptors job, Oracle LSH creates a matching Table definition and instance for each source table and data set—a total of four Table definitions and instances for Study A Adverse Events, Study A Demography, Study B Adverse Events, and Study B Demography.
- When you invoke the Automatic Mapping By Name job, Oracle LSH maps each Table Descriptor to its corresponding Table instance.
Create instances of two previously defined standard Oracle LSH Tables, Demography and Adverse Events, to hold the merged data.
Define a Program definition and instance to merge the data from the SAS Demography data set and the Oracle Demography table into your standard Oracle LSH Demography Table. Name the Program "Merge Oracle and SAS Demog."

In the Program definition, create two source Table Descriptors based on the same Table definition as the Load Set target Table instance. Invoke the Automatic Mapping By Name job to map the Table Descriptors to the Table instances with the same name.

In the Program definition, also create one target Table Descriptor based on your standard Oracle LSH Table called Demography. In addition, create an instance of the standard Demography Table to receive the data generated by the Program. Invoke automatic mapping by name.
Define a Program definition and instance to merge the data from the SAS Adverse Events data set and the Oracle Adverse Events table into your standard Oracle LSH Adverse Events Table. Name the Program "Merge Oracle and SAS AE."

In the Program definition, create two source Table Descriptors based on the same Table definition as the Load Set target Table instance. Invoke automatic mapping by name.

In the Program definition, create one target Table Descriptor based on your standard Oracle LSH Table called Adverse Events. In addition, create an instance of the standard Adverse Events Table to receive the data generated by the Program. Invoke automatic mapping by name.
Define a Program definition and instance to read data from both the Demography and Adverse Events Table instances and produce a report comparing adverse events in Study A to those of Study B. Name the Program "Compare Adverse Events in Two Studies."

In the Program definition, create two source Table Descriptors based on your standard Oracle LSH Tables, one for Adverse Events and one for Demography. Invoke automatic mapping by name.

The Program definition does not need any target Table Descriptors because it does not write data to any Table instances. It does need a Planned Output to create the actual report.
Define another Program definition and instance to read data from the Demography Table instance and compare patient information from Study A to patient information from Study B. Name the Program "Compare Demography from Two Studies."

In the Program definition, create one source Table Descriptor based on the standard Demography Table. Invoke automatic mapping by name.

The Program definition does not have any target Table Descriptors because it does not write data to any Table instances. It does have a Planned Output to create the actual report.

You can update data periodically by rerunning the Load Set instances. You can then rerun the other Program instances in turn to update the report data.

If you would like to load the data, run the Programs, and generate the reports in a single process, you can create a Workflow definition containing all the Load Set and Program instances and defining the order in which they must be executed. You can also include an email notification as part of the workflow to be automatically sent to a group of people to alert them when the report is generated.

Figure 2-3 Business Application Example

Description of "Figure 2-3 Business Application Example"

Good Practice in Application Design Although it is possible to write one Program instead of four in the above example to accomplish the same result, it is good practice to divide the operations needed into discrete parts and define smaller Programs to perform each part. These smaller-scope Programs are more generic and therefore more reusable than one big Program. See "Developing Standard Definitions and Modular Applications".

Developing Standard Definitions and Modular Applications

You can reduce the amount of time required to develop and validate applications by creating standard object definitions and reusing them as much as possible. To encourage Definers to reuse standard definitions, make it clear which definitions are standard by storing them in a library created especially for that purpose and/or classifying them as Standard, for example.

Following are a few strategies for developing standard object definitions:

Develop standard Table definitions. If you have a standard set of Tables, with standard data types and lengths for corresponding Columns in different Tables, then you can easily reuse the same Table definition at different points in the data flow, and write standard Programs to read from and write to the standard Tables.

You can develop your own company standards or use external standards such as the CDISC data model.

The standard data model you use must be compatible with the data model of the source system(s) you use. For example, if the Patient ID column or variable has a length of 8 in the source system, the Patient ID Column in a standard Oracle LSH Table must have a length of 8 or more.
Define more Programs with a smaller scope instead of fewer Programs with a larger scope. You can then use the smaller-scope Programs as modules in multiple applications.

For example, rather than define a single Program to merge data from different studies and then analyze the data, define two separate Programs, one to merge data and the other to analyze data. Both Programs are reusable in more situations. You can reuse the combining Program before different analytical Programs, and you can reuse the analytical Program, with minor modifications, on data from either a single study or multiple studies.

You can include a series of Programs in a single Workflow, so that you can run all the Programs in a single process, so that there is no time lost by running separate Programs; the successful execution of one Program triggers the next in the series, and some Programs can run in parallel.
Use Parameters. Make executable object definitions more reusable by defining Parameters to contain information that may clearly change from one use of the definition to another.

For example, if you want to use the same Program to generate a Demography Report in several different studies, create a Parameter to contain the study name. You can either make the Study Parameter enterable by the user who runs the report, or, in the Execution Setup for each instance, bind the Study Parameter to the value appropriate for each study.
Create Parameter Sets based on standard Table definitions. Both Oracle LSH Parameter and Column definitions are based on a data structure called an Oracle LSH Variable, which determines their data type and length. You can create Parameter Sets containing Parameters based on the same Variables on which a Table's Columns are based, and give the Parameter Set the same name as the Table so that, when you are defining Program Parameters, you can easily find Parameters with the same data type and length as the corresponding Table Column. This approach has two advantages:
- It promotes data type and length consistency along the data flow.
- It makes automatic Parameter value propagation in Workflows and Report Sets easier to set up.

Ensuring Data Currency

Oracle LSH provides two different ways to ensure that data in an Oracle LSH Table or report reflects the most current source data: Workflows and backchaining execution. A Workflow pushes the most current data forward, while a backchain goes backward along the data flow to find more recent data and pull it into the current job. In both cases, you must explicitly define the entire data flow from the source of the most current data to the job to be run on the most current data.

A Workflow is a defined object. Backchaining is a type of execution that you must specify as part of the Execution Setup for each executable along the data flow.

To ensure that the most current data is displayed in the report in Figure 2-4 below, you can use either a Workflow or a Backchain; see below.

Figure 2-4 Example of a Workflow

Description of "Figure 2-4 Example of a Workflow"

Workflow To use a Workflow to ensure data currency, add all the executables —Load Sets and Programs—in Work Area 2 in Figure 2-4 to a Workflow in the same Work Area and define the order in which they should be executed. The Load Sets and Programs write to and read from the Tables as they do without a Workflow, but the Tables are not part of the Workflow. When you execute the Workflow, the Load Set and the Program that pulls data in from Work Area 1 can both run at the same time. The next Program waits until both complete successfully, and then it runs. When it completes successfully, the final Program runs, generating the report.

You can schedule a Workflow to run at regular intervals. If the external system is Oracle Clinical, or if you have set up XML messaging in a different external system, you can schedule a Workflow to run when triggered by the successful completion of a job in the external system, such as batch validation in Oracle Clinical.

See Chapter 10, "Defining Workflows" for further information.

Backchain You must define at least one Execution Setup for each executable object definition in any circumstances. The Execution Setup serves as the basis for the submittal form that users need to run the executable.

To use backchaining to ensure data currency in this example, define an Execution Setup especially for backchaining for each of the Programs and the Load Set that feed data into the final Program. When a user runs the final Program (the one that generates the report) with the Data Currency system parameter set to Most Current Available, the system checks the Programs that feed data into the Report Program's source Table instances to see if those Program instances have Execution Setups with backchaining enabled.

If so, the system continues to check upstream for backchain Execution Setups in the Load Sets or Programs that feed data into those source Table instances. The system then compares the currency of the source and target data for each Program or Load Set that has a backchain Execution Setup, and executes the Program or Load Set if its source data is more current than its target data.

The system then runs each Program downstream in the data flow until the report Program's source Table instances have the most current possible data, and then executes the program that generates the report.

See "Backchaining" for further information.