8 Advanced topics

This section includes details on requirements and how to perform advanced tasks (for example, assign user groups to objects). See the following topics for more information.

Naming restrictions

Make an object's name descriptive to help other users understand its purpose, but keep it short.

Avoid special characters and reserved words

Do not use special characters such as ( ) - & @ * $ | % ~ except for underscore (_). Also do not use Oracle SQL or PL/SQL reserved words, especially in the Oracle name. For the latest information on reserved words, you can generate a list of all keywords and reserved words with the V$RESERVED_WORDS view described in the Oracle® Database Reference.

Note:

  • The system uses the value you enter in the Name field as the default Oracle name.
  • DUPLICATE is a reserved word in DMW for columns. It is used in the Default Listings page to allow filtering on duplicate records.

Keep it short

Short names work better both for display and for technical reasons.

  • Keep all table Oracle names to 25 characters or less. This is because target tables contain an internal column for surrogate key information that contains all source table names.

  • Windows has a maximum file path length of 256 characters that may be a problem for users who develop custom programs in Oracle LSH; see Keep container and object names short for integrated development environments.

Keep container and object names short for integrated development environments

When you open an integrated development environment (IDE) such as SAS or run a SAS program on your personal computer, the system uses the actual full path for source code definitions, table instances, and the SAS runtime script. If the full path exceeds 256 characters, you get an error and cannot open the IDE or run the program. You can use a package to limit name length or display an error when the path is too long; see Customizable naming validation package.

The full path begins with the username of the person who opens the IDE followed by the directory name cdrwork. It also includes DMW_DOMAIN, the study grouping domain, and the study. See also Figure 8-3.

  • Source Code definitions path: username>cdrwork>DMW_DOMAIN>Study_Domain_name>Application_Area_name>Program_definition_name>Source_Code_definition_name>version_number>fileref>source_code_filename

  • Table instances path: username>cdrwork>DMW_DOMAIN>Study_Domain_name>Application_ Area_name>Work_Area_name>program_ instance_name>program_ version>Table_instance_libname> Table_instance_SAS_name>

  • SAS runtime script path: username>cdrwork>DMW_DOMAIN>Study_Domain_name>Application_ Area_name>Work_Area_name>Program_instance_name>version_number>setup

Automatic name truncation

When you create a new table by uploading a file, the system truncates the Oracle Name to 30 characters and also replaces the last two characters with the number 01 or the next sequential number.

Duplicate names: system appends _1

The system enforces unique naming for each object of the same type in the same container. For example, you cannot have two tables with the same name in the same model.

If you try to create a second object of the same type and name in the same container, the system creates the object but appends an underscore and the number one (_1) to the name. If you add a third object of the same type and name, the system increments the number (_2), and so on.

Naming studies and libraries

Studies and Libraries in DMW are Oracle LSH domain objects. If you plan to export a domain to another DMW instance, avoid using spaces in its name. Domain names with spaces must be entered with escape characters surrounding them in the Import- Export Utility—for example: \" domain name\". See the Oracle Life Sciences Data Hub System Administrator's Guide for more information.

Customizable naming validation package

Object creation and modification code includes a call to a predefined validation package from every object name field. By default, this package performs no validation and returns a value of TRUE, allowing users to enter any name in the field. However, you can customize the package to enforce your own naming conventions, full path length, or other standards. See "Customizing Object Validation Requirements" in the Oracle Life Sciences Data Hub System Administrator's Guide.

Checkout and checkin

To modify a clinical data model, table, transformation, or validation check batch, you must check it out. To save your changes, check it in.

If the Check Out option is not active, either the object is already checked out or you do not have the required privileges. To see who checked it out, look at the Checked Out By value near the top of the page. You may need to click the arrows.

When you check out an object, the system creates a new version of it in the Development lifecycle stage. You can promote the new version to Quality Control or Production. The existing installed version of the object in the QC and/or Production lifecycle area functions as before until you promote the new version to QC or Production and install it there.

Validation status and lifecycle stages

To install an object in the Quality Control or Production lifecycle stage, you must first promote it to the corresponding validation status.

  1. On the Icon is a house. Home page, select the study and the Development lifecycle stage.

  2. After you select a study, click the Study Configuration icon Shows gear with pencil iconfrom the navigation bar.

  3. Click the tab for the type of object you want to change the status of:

    • Clinical Data Models

    • Transformations

    • Validation Checks

  4. Select the object and select Modify Validation Status from the Actions drop-down.

    Tip:

    The object must be checked in and installed.

  5. From the Validation Status drop-down list, select the new status.

    • Development: Assigned by default to all new and checked out objects.

    • Quality Control (QC): (Optional, corresponds to UAT in InForm.) For objects that are undergoing formal testing.

    • Production: For objects that are suitable for use in an active study. The system prevents destructive changes to tables and models in a production environment.

  6. (Optional) Click the Icon is a plus sign.Add icon in the Supporting Documents pane to add a document such as test results, a log file, a requirements document, or a signoff document.

  7. Click OK.

    Note:

    When a transformation is promoted to QC or Production, the system automatically promotes its source and target models to the same lifecycle stage.

  8. Install the object in the new lifecycle stage:

    1. At the top of the page, change the lifecycle context to the same as the object's new validation status.

    2. Select and install the object.

      Tip:

      In QC and Production you can choose to display either latest installed version or an installable version, if any—a newer version that has been installed in a lower lifecycle and promoted to the current lifecycle but not yet installed in the current lifecycle.

Assign user groups to objects

Administrators can assign user groups to study groupings. Study configurators can assign user groups to studies, clinical data models, transformations, or validation check batches.

To have access to any object, a user must be in a user group that is assigned to it. The user's role in the group determines the user's privileges on the object. Studies, clinical data models, transformations, and validation check batches are all objects.

  1. Select the object and click the Icon is a key.Apply Security icon.

    The Apply Security window lists all user groups associated with the object in any way. The Assignment Status column shows the current state of each user group's access to the object:

    • Inherited: The user group has access to the object through inheritance.

      When a user group is assigned to a study grouping, all studies in the grouping, and all objects in those studies (clinical data models, transformations, and validation check batches), inherit the assignment.

      When a user group is assigned to a study, all clinical data models, transformations, and validation check batches in the study inherit the assignment.

    • Assigned: The user group has access to the object because it was explicitly assigned to the object.

    • Revoked: The user group does not have access to the object. Its inherited access has been revoked.

    • Unassigned: The user group does not have access to the object. Its explicit assignment was undone.

    User groups that never had access to the object are not displayed.

  2. In the Assign To field, select one:
    • Metadata: Required to create or modify studies, clinical data models, tables, validation checks, transformations, and custom listings.

    • Development: Required for creating, modifying, or executing the object in the Development lifecycle stage and loading or viewing data in the Development lifecycle stage.

    • QC: Required for testing or executing the object in the QC lifecycle stage and loading or viewing data in the QC lifecycle stage.

    • Production: Required for executing the object in the Production lifecycle stage and loading or viewing data in the Production lifecycle stage.

  3. Make a user group assignment change:
    • To assign a user group that is not currently associated with the object, click Assign. The Assign User Group window appears. Query for the user group, select it, and click Apply.

    • To revoke the access of a user group that has an Inherited status, select the group and click Revoke.

    • To reinstate the access of a group whose status is Revoked, select the group and click Unrevoke.

    • To remove the access of a group whose status is Assigned, select the group and click Unassign.

Snapshot labels

You can use Oracle Life Sciences Data Hub to apply snapshot labels to data in a clinical data model in a particular lifecycle stage.

Apply snapshot labels directly to data in tables

  1. Log in to Oracle LSH.

  2. Expand the Life Sciences Data Hub node in the main menu on the left or from the Navigator drop-down. Select Applications.

    Or, if Oracle LSH is already open, go to the Applications tab.

  3. Navigate to the tables to which you want to apply a snapshot label:

    1. In the Application Development window, click the Icon is a magnifying glass.Search icon next to the Select Domain field.

    2. Select Search By: Domain Name, enter DMW_DOMAIN, then click Go.

    3. Click the Icon includes a down arrow.Quick Select icon for DMW_DOMAIN.

    4. Expand the node for the study grouping domain that contains the study.

    5. Expand the node for the study.

    6. Click the link for the lifecycle application area. A list of work areas, one for each installed clinical data model, appears. The model name is in the work area description.

    7. Click the link for the work area. A list of tables and other objects in the clinical data model appears.

  4. In the Actions list, select Manage Snapshot Labels and click Go.

  5. Query for the data:

    1. Select either:

      - Dummy to label masking values and nonblinded data.

      - Real to label sensitive, blinded data and nonblinded data.

    2. Select either:

      - Most recent timestamp, and then select a timestamp from the drop-down.

      - Snapshot label, and then select a snapshot label from the drop-down.

    3. Click Search. The tables that match the criteria are listed below.

  6. In the Snapshot Label to Add or Remove box, enter a new label or search for an existing one.

  7. Select the tables.

  8. Click Remove Snapshot Label, Add Snapshot Label, or Add/Move Snapshot Label.

Apply a snapshot label during job submission

You can apply a label to source and/or target tables when you submit a program (transformation or validation check) or load set (data load) in Oracle LSH.

  1. Log in to Oracle LSH.

  2. Expand the Life Sciences Data Hub node in the main menu on the left or from the Navigator drop-down. Select Applications.

    Or, if Oracle LSH is already open, go to the Applications tab.

  3. Navigate to the tables to which you want to apply a snapshot label:

    1. In the Application Development window, click the Icon is a magnifying glass.Search icon next to the Select Domain field.

    2. Select Search By: Domain Name, enter DMW_DOMAIN, then click Go.

    3. Click the Icon includes a down arrow.Quick Select icon for DMW_DOMAIN.

    4. Expand the node for the study grouping domain that contains the study.

    5. Expand the node for the study.

    6. Click the link for the lifecycle application area. A list of work areas, one for each installed clinical data model, appears. The model name is in the work area description.

    7. Click the link for the work area. A list of tables and other objects in the clinical data model appears. DMW transformations and validation check batches are shown as programs. DMW file loads are shown as load sets.

  4. Click the link of the program or load set in the Name column.

  5. Click Submit.

  6. In Apply Snapshot Label, select Targets, Sources, or Both.

  7. In the Label field, enter the label.

  8. Set all other parameters and click Submit.

Automatic triggering of transformations and validation checks

To ensure that DMW data is always as up-to-date as possible:

  1. For each transformation, click the Icon shows lines and a plus and minus sign. Add or Remove Source Models icon and select Can Trigger for each source model.
  2. When adding or modifying each validation check batch, select Can Be Triggered.

Each time data is loaded into an input data model, it triggers the transformation to the next target model, and then the next, and so on to all target models. In addition, all validation check batches are executed as soon as data is loaded into the model they read from.

Required syntax for metadata files

You can define tables in a data model by uploading a .zip file that contains one .mdd file per table. Each .mdd file must have the syntax described here. This is the only automatic way you can create tables with all constraints and blinding attribute values.

Tip:

You can create a table initially from a metadata file and then load data into it from a SAS file.

Note:

DMW supports a limited number of columns in clinical data model tables. The limit varies depending on whether the model is an inForm input model and on how many columns in a table are blinded.

For file-based input data models and target models:

  • 260 if all columns are blinded

  • 370 if the table is not blinded, has only row blinding, or is completely blinded

For InForm data models:

  • 339 regardless of how many columns are blinded

The limit is due to the system's use of internal columns and the Oracle Database 11 limit of 1000 columns per table. Although it is possible to create tables with more columns, having too many columns may cause issues while loading InForm data, loading text files, transforming data, and displaying data in the Listings pages.

Delimiter: (Optional) Must begin lsh_delimiter= . If you do not specify a delimiter, the default delimiter is a comma (,).

Table: (Optional) Must begin lsh_table= . If you do not specify a table name in the file, the system uses the file name (without the extension) as the table name and follows the default behavior for the attribute values.

Columns: The system expects a set of column attribute values, one column per row in the file, optionally preceded by a row identifying the delimiter and a row defining Table attribute values, each of which must begin with a key word. Column position is determined by the order in which the column rows in the file are processed. Default behavior for the attribute values applies.

Constraints: Each constraint must have its own row starting with the string CONSTRAINT followed by values you supply for the constraint name, description, and constraint type, followed by other values depending on the constraint type. See syntax below.

Comments: A row beginning with two dashes is treated as a comment.

Syntax:

--This is a comment.
lsh_delimiter = ,
lsh_table= Name, Description, Oracle Name, SAS Name, SAS Label, Process Type, Allow Snapshot?, Blinding Flag?, Blinding Status, SAS Library Name, Is Target?, Target as Dataset?, SDTM Identifier (SUBJECT/SUBJECTVISIT), Table Alias, Blinding Type (TABLE/COLUMN/ROW), Blinding Criteria
--Column details:
Name, Data Type, Length, Precision, Oracle Name, SAS Name, SAS Format, Description, SAS Label, Nullable?, Default Value, Date Format, SDTM Identifier, Column Alias, Masking Level (COLUMN/CELL), Masking Value, Masking Criteria
--Constraints must start with string "CONSTRAINTS". Requirements for each type:
CONSTRAINT,Name,Description,PRIMARYKEY,Duplicate_PK_Support_Flag (YES/NO), Surrogate_Key_Flag (YES/NO),{delimited_list_of_columns_in_key]
CONSTRAINT,Name,Description,UNIQUE,,,[delimited_list_of_columns_in_key]
CONSTRAINT,Name,Description,NONUNIQUE,,,[delimited_list_of_columns_in_key]
CONSTRAINT,Name,Description,BITMAP,,,[delimited_list_of_columns_in_key]
CONSTRAINT,Name,Description,CHECK,,,[column_name],{delimited_list_of_values}

Note that all constraints require square brackets ([]) around the column name(s). In addition, the check constraint requires curly brackets ({}) around the list of values.

See Table 8-1 and Table 8-2.

Note:

If you are using a metadata file to create a table in LSH, set attributes Duplicate_PK_Support_Flag and Surrogate_Key_Flag to NO or installation will fail. They are relevant only to DMW.

Table 8-1 Table attributes with reference codelist values

Attribute Valid Values

Processing Type

UOW, Reload

Allow Snapshot

YES, NO

Blinding Flag

YES: The table may contain sensitive data at some point in time.

NO: The table will never contain blinded data.

Blinding Status

If Blinding Flag is set to Yes, the data may have a status of either BLINDED or UNBLINDED. (Users can set Blinding Status to Authorized but not in the table itself.)

If Blinding Flag is set to No, the data must have a Blinding Status of NOT APPLICABLE.

Is Target

YES, NO. You can safely use the default value (YES).

Target as dataset

YES, NO. You can safely use the default value (NO).

SDTM Identifier

For tables: SUBJECT, SUBJECTVISIT

Blinding Type

TABLE, COLUMN, ROW

Table 8-2 Column Attributes with Reference Codelist Values

Attribute Valid Values

Data Type

VARCHAR2, NUMBER, DATE

Nullable

YES, NO

SDTM Identifier

For valid values, see Use SDTM identifiers to support important functionality.

Blinding Type

TABLE, COLUMN, ROW

Masking Level

COLUMN, CELL

Example 8-1 Metadata file

lsh_delimiter = |
--This section is for Table attributes
--Name, Description, Oracle Name, SAS Name, SAS Label, Process Type, Allow Snapshot?, Blinding Flag?, Blinding Status, SAS Library Name, Is Target?, Target as Dataset?, SDTM Identifer (SUBJECT/SUBJECTVISIT), Table Alias, Blinding Type (TABLE/COLUMN/ROW), Blinding Criteria
lsh_table=S_QS|S_QS Table|S_QS|S_QS|S_QS|Staging with Audit|Yes|Yes|Blinded|Target|Yes|Yes||qs|ROW|(VISITDY < 100)
--
--This section is for columns
--Name, Data Type, Length, Precision, Oracle Name, SAS Name, SAS Format, Description, SAS Label, Nullable, Default Value, Date Format, SDTM Identifier, Column Alias, Masking Level(COLUMN/CELL), Masking Value, Masking Criteria
-- 
STUDYID|VARCHAR2|12||STUDYID|STUDYID|$12.||Study Identifier|Yes||
USUBJID|VARCHAR2|11||USUBJID|USUBJID|$11.||Unique Subject Identifier|Yes||
QSTESTCD|VARCHAR2|7||QSTESTCD|QSTESTCD|$7.||Question Short Name|Yes||
VISITNUM|NUMBER|||VISITNUM|VISITNUM|8.||Visit Number|Yes||
DOMAIN|VARCHAR2|2||DOMAIN|DOMAIN|$2.||Domain Abbreviation|Yes||
QSSEQ|NUMBER|||QSSEQ|QSSEQ|8.||Sequence Number|Yes||
QSTEST|VARCHAR2|40||QSTEST|QSTEST|$40.||Question Name|Yes||
QSCAT|VARCHAR2|70||QSCAT|QSCAT|$70.||Category for Question|Yes||
QSSCAT|VARCHAR2|26||QSSCAT|QSSCAT|$26.||Sub-Category for Question|Yes||
QSORRES|VARCHAR2|20||QSORRES|QSORRES|$20.||Finding in Original Units|Yes||
QSORRESU|VARCHAR2|7||QSORRESU|QSORRESU|$7.||Original Units|Yes||
QSSTRESC|VARCHAR2|4||QSSTRESC|QSSTRESC|$4.||Character Result/Finding in Std Format|Yes||
QSSTRESN|NUMBER|||QSSTRESN|QSSTRESN|8.||Numeric Finding in Standard Units|Yes||
QSSTRESU|VARCHAR2|7||QSSTRESU|QSSTRESU|$7.||Standard Units|Yes||
QSBLFL|VARCHAR2|1||QSBLFL|QSBLFL|$1.||Baseline Flag|Yes||
QSDRVFL|VARCHAR2|1||QSDRVFL|QSDRVFL|$1.||Derived Flag|Yes||
VISIT|VARCHAR2|19||VISIT|VISIT|$19.||Visit Name|Yes||
VISITDY|NUMBER|||VISITDY|VISITDY|8.||Planned Study Day of Visit|Yes||
QSDTC|VARCHAR2|10||QSDTC|QSDTC|$10.||Date/Time of Finding|Yes||
QSDY|NUMBER|||QSDY|QSDY|8.||Study Day of Finding|Yes||
--
--This section is for constraints
--
CONSTRAINT|pk_1|pk|PRIMARYKEY|Yes|YES|[STUDYID|USUBJID]

Example 8-2 Constraint metadata

CONSTRAINT,pk_1,pk,PRIMARYKEY,YES,YES,[Study]
CONSTRAINT,uk,uniq,UNIQUE,,,[DCMNAME]
CONSTRAINT,pk_22,nuq,NONUNIQUE,,,[DOCNUM]
CONSTRAINT,bmap_invsite,bitmap on INVSITE,BITMAP,,,[INVSITE]
CONSTRAINT,check_inv,check on INV,CHECK,,,[INV],{1,2,3,4}

Use SDTM identifiers to support important functionality

In all clinical data model tables, it's important to link columns to SDTM identifiers in the Filter/SDTM field to support key DMW functionality:

  • The Automap feature for transformations uses all SDTM identifiers.

  • Filtering in the Listings pages requires the USUBJID and VISITNUM SDTM identifiers.

  • Filtering in the Discrepancies pages requires the SUBJID and VISIT SDTM identifiers.

  • To display the subject ID and visit name in TMS for TMS discrepancies, assign those SDTM IDs to the appropriate columns in any table you map to a TMS Set.

  • Subject Visit unit of work processing requires the USUBJID and VISITNUM SDTM identifiers.

  • Subject unit of work processing requires the SUBJID SDTM identifier.

Table 8-3 SDTM Column Identifiers Available for Use in DMW

SDTM ID Meaning Data Type In Default Subject Visit Table? In Default Subject Table?

~

Actual Visit Date (Not an SDTM variable but used as one by the DMW Automap feature.)

date

No

No

COUNTRY

Country of Investigator Site

varchar2

No

Yes

INVNAM

Investigator Name

varchar2

No

No

INVID

Investigator ID

varchar2

No

No

SITEID

Site ID

varchar2

No

Yes

~

Site Name (Not an SDTM variable but used as one by the DMW Automap feature.)

varchar2

No

No

STUDYID

Study Identifier is the unique ID of the study.

varchar2

Yes

Yes

SUBJID

Subject ID unique within study

varchar2

No

No

USUBJID

Unique Subject ID unique across all studies for all applications or submissions involving the product.

varchar2

Yes

Yes

EPOCH

Visit Cycle is an interval of time in the planned conduct of a study, associated with a purpose such as screening, randomization, treatment, or follow-up, that applies across all arms of a study.

varchar2

No

Yes

VISIT

Visit Name is the protocol-defined description of the clinical encounter or description of unplanned visit.

varchar2

Yes

No

VISITDY

Visit Day Planned study day of visit.

varchar2

Yes

No

VISITNUM

Visit Number is a numeric version of VISIT, used for sorting. Decimal numbering may be useful for inserting unplanned visits.

number

Yes

No

SVSTDTC

Visit Start Date (Start Date/Time of Visit) is the start date/time of a subject's visit, represented in ISO 8601 character format.

varchar2

Yes

No

How subject and visit filters work

When a user creates a Subject or Visit filter, he or she must specify a filter driver model, the clinical data model that contains the Subject Visit table to use. The table can be in any clinical data model in the study, but it must have the SDTM table identifier SUBJECTVISIT and the user must have View privileges on the table.

Tip:

Associate only one table per study with the SDTM identifier SUBJECTVISIT, so that users have only one choice when they create filters. If they create using different driver models and use them at the same time, in the Discrepancies page the system does not apply either filter.

OR create public filters with the filter driver model set to the same model. Data reviewers can modify public filters instead of creating their own.

The filter logic checks this Subject Visit table and finds each distinct subject/visit combination that meets the criteria of the filter. It then creates a join with the table the user is viewing in the Listings page and displays those records. The filter logic uses different join columns on the Listings page and Discrepancies page, as shown below:

Table 8-4 Join columns required for subject and visit filters

Page Join Column for Subject Filters Join Column for Visit Filters

Listings

USUBJID

VISITNUM

Discrepancies

SUBJID

VISIT

Therefore, all tables with data that users may need to view must have columns linked to SDTM identifiers USUBJID, SUBJID, VISITNUM, and VISIT. Note that some of these names are used differently in InForm; see SDTM column equivalents in InForm.

In addition, to support all possible filters, the Subject Visit table must have columns linked to the SDTM identifiers COUNTRY, SITENAME, SITEID, INVNAM, INVID, SUBJID, USUBJID, EPOCH, VISITNUM, VISITDY, VISIT and EPOCH. Users can filter on one of these columns only if it exists in the Subject Visit table and is mapped to the corresponding SDTM identifier.

SDTM column equivalents in InForm

InForm was developed before SDTM and uses different names for equivalent columns.

Table 8-5 SDTM column equivalents in InForm

SDTM/DMW identifier Equivalent InForm column name

USUBJID

SUBJID

SUBJID

SUBNUMBERSTR

VISNUM

VISID

VISIT

VISITMNEUMONIC

How the system tracks data lineage

As data flows from input clinical data models to successive target models, the system stores information linking each data item to the data items that contributed to it from "upstream" input and target models and the data items it contributes to in "downstream" target models. Data reviewers can see a data item's source and target lineage in the Listings pages.

Maintaining this context, or data lineage, is required to pass discrepancies back and forth between DMW and its data sources and to recognize a discrepancy as the same discrepancy in all models.

The system uses the following:

  • Mappings: The system uses the table and column mappings you define as part of a transformation to generate record-level data mappings during transformation and validation check execution.

  • Generated Surrogate Keys: The system generates a surrogate key value for each record by concatenating a generated table identifier and the values in the primary key columns in the order specified in the primary key constraint, separated by tildes (~). For example, table_ID~subject~visit~crf~test.

  • Generated Columns to Store Surrogate Keys

    • When a clinical data model is installed, the system adds one auxiliary column named CDR$SKEY to each table to store the surrogate key value for each record in the table.

    • When a transformation program is installed, it adds one auxiliary column to each target table for each of its source tables, to store the surrogate key value of source records.

Data lineage example

In this example, multiple data items in several data models contribute to the calculation of each subject's Body Mass Index (BMI).

Figure 8-1 Body Mass Index Calculation Example

Description of Figure 8-1 follows
Description of "Figure 8-1 Body Mass Index Calculation Example"

The calculations shown above result in the following data in table BMI in the Analysis model:

Table 8-6 Data in Table BMI

Study Subject Visit Site Gender Age BMI BMIU

BMI

1005

4

McLean

MALE

29

24.6

KG/M**2

BMI

1006

4

McLean

MALE

54

28.8

KG/M**2

BMI

1007

4

McLean

FEMALE

42

33.1

KG/M**2

A BMI value over 30 is outside the normal range. To investigate the source data for Subject 1007, a female aged 42 with a BMI of 33.1, the reviewer selects the BMI data value 33.1 for Subject 1007 in the Listings page and clicks View Source Data.

Figure 8-2 Source Data Lineage Display

Screenshot explained in following text.

The Trace Data Lineage window displays the selected data item in the top row, with its sources displayed below. Expand any source to see its source. Columns are displayed in the format data_model.table.column.

The user can see downstream data by clicking View Target Data. The display shows the selected item at the top and the downstream data below.

Note:

If the system cannot display the entire lineage, it displays an asterisk (*) with a message about possible reasons.

Data lineage for deleted data

When a record is deleted in InForm for which a discrepancy was created on a target data model in DMW, data lineage is broken, and after the target data model is reloaded, the discrepancy is no longer displayed in the source model in the Listings page.

The data reviewer can close or cancel such discrepancies in the Discrepancies page.

Data blinding and authorization

The system supports blinding sensitive data at several levels: whole tables, whole columns, or whole rows or cells meeting specified criteria. You can specify masking values for blinded data.

You define blinding in input data models either manually or when you create tables using metadata files. You can then cascade blinding and masking attributes to downstream tables as part of defining transformations.

If a transformation reads from a blinded source table, you must either blind the target table or, if you are sure it does not contain any data that should be blinded, authorize it. If a target table has a blinded source table and is not authorized or explicitly blinded, the system completely blinds the target table. Only users with Blind Break privileges can see any data in the table.

Copying transformations with authorizations

When you copy a transformation from another study or model or as part of applying a study template, authorized tables are copied as Not Authorized. You must manually authorize them if appropriate.

Authorization and side transformations

When you create a side model or merge a side model back to the main transformation, the system copies the target table as Authorized if:

  • There are blinded source tables and you have Blind Break privileges on all of them.

  • The target table is not blinded.

Discrepancies on blinded data

At the time a discrepancy is created, the system checks if it should be viewable by all users with access to the table of the underlying data item or only to users with Blind Break privileges, and sets a flag for the discrepancy accordingly. The value of this flag does not change even if the table is subsequently blinded, unblinded, authorized, or unauthorized.

Data processing types and modes

DMW supports two types of data processing: Reload processing and Unit of Work processing (UOW). The unit of work can be either Subject or Subject Visit.

You set the data processing type for a table in its UOW Processing Type attribute in the clinical data model. Possible values include: Non UOW (Reload), Subject, or Subject and Visit. In a target table, this value determines the preferred processing type for transformations writing to the table. For UOW processing to actually occur in transformations, source tables must also be defined with UOW processing.

Both Reload and UOW processing can be run in either Full or Incremental mode. UOW Load mode is available for loading data. You select the mode—Full, Incremental, or Load—when you schedule or submit a job.

Both types of processing require a primary key on the target Table instance. It is not necessary to have a primary or unique key defined in an external source. For example, external SAS files that do not have a primary key defined can be loaded as long as the target table has a primary key.

The target tables of validation checks, which are created by the system, use Reload processing.

Reload processing

In Reload processing, the system processes all records in the source tables or table-level files and compares the primary keys of the source records with the primary keys of the records already in the target tables, if any, to determine which records to insert, update, and (in Full mode only) delete. If a reloaded record does not include any data changes, the system simply changes its refresh timestamp to the timestamp of the current job.

Full

Full Reload processing performs insertions, updates, refreshes, and deletions: if a record is not included as input but its table is included, the system soft-deletes the record—the record still exists in the database but is no longer available for use. The audit trail for the record is available.

If an entire table is not included in a data load, full reload does not delete the data in the target table. To delete all data in a target table, load an empty file for the table.

Note:

Unless you need to delete data in a particular table, be careful to use Full mode only if you are confident that the data being loaded or read is the complete set of current data.

Incremental

Incremental Reload processing performs insertions, updates, and refreshes but no deletions. If a record is not reloaded it remains available but its timestamp is not updated.

Incremental processing is faster than full, so you may want to use incremental processing frequently and full processing less frequently but regularly, to ensure that data is appropriately deleted.

Unit of Work processing

A Unit of Work (UOW) is all records associated with either a particular subject or a particular subject visit. Tables with a UOW processing type must have either the subject or subject and visit columns marked as SDTM identifiers and the UOW Processing Type must be set to either Subject or Subject and Visit.

UOW processing reads all source records and notes the UOW key—the value of the subject or subject and visit columns—for each record, and adds the UOW key to an execution set—the set of subjects or subject visits to be processed. The system then processes only records for the units of work represented in the execution set and no records for other units.

Target tables defined for Unit of Work processing also accept Reload processing.

See Table 8-7.

Full UOW

Full UOW processing examines timestamps in the source UOW tables to determine which records have changed since the last Full UOW or Full Reload processing job and creates an execution set for those records' UOW key (subject or subject visit) and no others. It inserts, updates, and refreshes all records belonging to subjects or subject visits in the execution set and no others. However, if the program has never been run in any mode, all units of work are included in the execution set. If records previously existed in the target tables it also deletes records:

  • Records in processed units of work (subject or subject visit) that are not reloaded are deleted.

  • Entire units of work are deleted if they exist in the target table but are not reloaded.

    For example, if a transformation reads from an Adverse Event table and writes to a Severe Adverse Events table and has previously inserted records with a Serious flag set to Y for a particular subject, a change to the Serious flag to N for all of a subject's records results in no records being inserted and, since the subject is in the UOW execution set, the deletion of all records for that subject from the target.

Incremental UOW

Incremental UOW processing examines records' timestamps in the source UOW tables to determine which records have changed since the last processing job in any mode (UOW or Reload, Full or Incremental), and creates the execution set for those records' UOW key (subject or subject visit) and no others. If the program has never been run in any mode, all units of work are included in the execution set. It performs insertions, updates, and refreshes but no deletions.

As with Reload processing, UOW's Incremental mode is faster than Full mode, so you may want to use incremental UOW processing frequently and full UOW processing less frequently but regularly, to ensure that data is appropriately deleted.

Note:

The end result of Incremental UOW processing is the same as Incremental Reload processing. Both process all new and changed records and delete no records. The end result of the two Full modes is also the same.

Which process will be faster depends on the volume of changed data being processed and whether changes are concentrated in specific units of work or spread fairly evenly across all units. Compared to Reload, UOW processing has overhead costs in detecting affected subjects or subject/visits, but it is more efficient in that it processes records only in units with changes, not all records.

In general, UOW will probably be faster than Reload when the number of incremental changes is small or concentrated in relatively few units of work.

In addition, if you use custom programs in transformations, using UOW processing takes care of finding incremental data changes; your code does not need to handle that.

UOW Load

After loading data the system identifies the distinct set of UOW keys for the records that were inserted, modified or refreshed in the load, creates an execution set consisting of these units of work, and processes all records within these units of work and no others. Any records in a unit of work included in the execution set that is not included in the file is deleted.

This is the only type of UOW processing available during data loading. If you want different deletion behavior you can use Reload processing; see Table 8-8.

Note:

Use UOW Load mode only if the file being loaded contains the full current set of data (new, modified and unchanged) for subjects or subject visits with any new or modified data because any data not reloaded in processed units of work will be deleted.

Instead, use Incremental Reload processing to load data containing just new or modified records.

Table 8-7 Deletion Behavior in Unit of Work Processing Modes

Unit of Work Mode Delete within Reloaded Unit? Delete All Records for Nonreloaded Unit?

UOW Load

Yes

No

Full UOW

Yes

Yes

Incremental UOW

No

No

Data processing during data loading

The system supports loading data from SAS or text files or from InForm:

See also Format checks on files being loaded.

You set up data loading when you define input clinical data models.

Processing data loads from files

You can load text or SAS files (data sets, XPORT, or CPORT files) into an input data model. You set up a File Watcher for each input data model. The File Watcher then detects when a data file appears in a specified location and proceeds to load the data.

The system supports three processing modes for loading data from files:

  • Full Reload

  • Incremental Reload

  • UOW Load

    Note:

    • Do not use Full mode if the table-level file being loaded contains only new data or a subset of data because any data not reloaded will be deleted.
    • Do not use UOW Load mode if the table-level file being loaded contains only new data or a subset of data for subjects or subject visits because any data not reloaded in processed units of work will be deleted. Instead, use Incremental (Reload) processing.
    • All deletions are "soft" deletions: records have an end timestamp equal to the load's date and time and are no longer available in the system. However, they still exist in the database and have an audit trail.

Table 8-8 Deletion Behavior in Data Loading Processing Modes

Processing Mode Uses UOW Logic? Deletion Behavior

UOW Load

Yes

Deletes nonreloaded records within units of work —subjects or subject visits—with new, changed, or refreshed records.

Incremental Reload

No

Does not delete any data.

Full Reload

No

Deletes all records that are not reloaded in tables that are reloaded.

Processing InForm data loads

Each InForm study has one InForm input clinical data model for each lifecycle stage. You set up a connection and schedule. See Create a clinical data model for InForm data for more information.

Format checks on files being loaded

In any data load processing mode, the system checks incoming data against the format requirements of target columns, including data type, length, codelist values (if the column is associated with a codelist), and the nullable and check constraints.

When you configure File Watcher for a table, you define the Reported Errors parameter. During a data load, the system counts errors in the file until it reaches the number set in this parameter plus one. It then stops processing the file, reports the errors in the log file, and the data load fails. If the number of errors detected for each file in the text data load is not more than Reported Errors, then the data load completes with a Warning status.

Records that are rejected are captured and validated for other errors. All the errors found in each record are reported in a file called ComprehensiveErrRpt.csv.

The error file contains one row for each record with an error that says "ORIGINAL ERROR" in the Column Name column, another row for the original error, and additional rows for any other errors it finds on the same record. For the purposes of calculating the number of reported errors, the logic treats each record that can't be loaded as one error, even if it contains multiple errors.

Table 8-9 shows the error file entries for two records. Record 1 has two errors and Record 32 has one error. The string "CDR_W37_1D0156B9.TXT_TV486627301.Name" represents the full path of the erroring field, Name, for Record 1. The first part, "CDR_W37_1D0156B9," is the schema name and the second, "TXT_TV486627301," is the view based on the target table.

Table 8-9 Error file example

TABLE_NAME FILE_NAME REC_NUM COLUMN_NAME VALUE ERROR_MESSAGE

DEMOG

DEMOG.txt

1

ORIGINAL_ERROR

Captured in the TextLoad.log

ORA-20100: ORA-01400: cannot insert NULL into ("CDR_W37_1D0156B9.TXT_TV486627301.NAME")

DEMOG

DEMOG.txt

1

NAME

ORA-01400: cannot insert NULL into ("CDR_W37_1D0156B9.TXT_TV486627301.NAME")

DEMOG

DEMOG.txt

1

HT

ORA-01400: cannot insert NULL into

("CDR_W37_1D0156B9.TXT_TV486627301.HT")

DEMOG

DEMOG.txt

32

ORIGINAL_ERROR

Captured in the TextLoad.log

ORA-20100: ORA-02290: check constraint

(("CDR_W37_1D0156B9.TXT_TV486627301

.AGE_CK) violated

DEMOG

DEMOG.txt

32

AGE

05

ORA-02290: check constraint

(("CDR_W37_1D0156B9.TXT_TV486627301

.AGE_CK) violated

Supporting duplicate primary key values in a load

In rare cases you may need to allow a single load of source data to contain records with duplicate primary key values—for example, when loading data from a small lab that may not be able to guarantee uniqueness. In those cases you can define a composite key, but it may not be sufficient to ensure uniqueness.

If you need to support duplicate primary key values within a single data load, check Supports Duplicate when you define the primary key for the table in the input data model. Selecting this option ensures that all records are loaded and not deleted but requires careful checking of the data.

When you select Supports Duplicate, the system adds the column CDR$DUP_NUM to the target table. During each data load, the system detects whether multiple incoming records have an identical primary key value and:

  • The system inserts a value of 1 to the CDR$DUP_NUM column for each record with the first occurrence of a set of primary key values.

  • If another record with the same primary key values is loaded in the same data load, the system inserts a value of 2 into its CDR$DUP_NUM column, and 3 for the third record with the same primary key values, and so on.

  • During subsequent data loads, the system assumes that the first record with a particular set of primary key values is the same as the existing one with the CDR$DUP_NUM value 1, the second is the same as 2, and so on. If two records exist with values 1 and 2, and the next load contains three records with the same values, the system gives the third record a CDR$DUP_NUM value of 3.

    Note:

    For this system to work, the lab must always reload all records in the same order, adding new records to the end of the file.

The CDR$DUP_NUM value becomes part of the surrogate key as well as the primary key and is used for data lineage tracing; see How the system tracks data lineage.

Installation

After creating or modifying a clinical data model, transformation, validation check batch, or custom listing, you must install it to make it usable.

What happens during installation?

When you install a clinical data model for the first time, the installation process creates a database schema for the model. As the following objects are installed, the system adds them to the same schema:

Clinical data model installation

The installation process:

  • Creates or updates a database schema for the model.

  • Creates or updates database tables based on DMW table metadata if the most recent version is not already installed.

    • Full installation drops and recreates all tables and deletes all data.

      To safeguard your data, full installation is not allowed in the Production lifecycle stage.

    • Regular installation upgrades tables and does not delete data unless destructive changes have been made to a table. See Destructive and nondestructive changes.

  • Creates or updates a query that selects all columns and rows from all tables in the model, including any masked data. The system uses this view to display data on the Listings pages.

  • Checks the compatibility of validation checks and transformations that read from it or write to it. If changes to the model affect a validation check or transformation, the installation process gives a warning. Even if there are no issues, the installation process sets Upgrade to Required for the validation check and transformation to keep all current versions synchronized.

    Note:

    Clinical data models are also installed when the transformation that writes to the model is installed.

Destructive and nondestructive changes

During a regular, upgrade type of installation of a clinical data model, if there have been destructive changes to any table, all data in that table is deleted.

Destructive changes include:

  • Removing a column.

  • Decreasing the length of a column.

  • Changing the processing type.

  • Changing the CREATE_AS_VIEW flag.

  • Changing the tablespace name.

  • Changing the primary key.

  • Changing the "allow duplicate rows" setting.

Nondestructive changes include:

  • Increasing the length of a column.

  • Changing the data type of a column from number or date to character.

  • Changing the Blinding attribute of a table from Yes to No or No to Yes.

  • Changes to a table's Support Duplicate attribute from Yes to No or No to Yes.

  • Adding a unique constraint.

Transformation installation

The installation process:

  • Generates or updates a PL/SQL program for each table-level transformation and corresponding packages in the database.

  • Links the transformation programs to the source and target models and upgrade-installs the models if they are not already installed.

  • Checks out the target tables and generates the auxiliary columns required to maintain the source system context and data lineage tracing for each record, if they have not already been created. See How the system tracks data lineage.

    Note:

    Because of this, if the table-level transformation has never been installed, the target model must be either:

    • Checked in.

    • Checked out by the same user who is installing the transformation.

    If not, the installation fails. So if you are installing a table-level transformation for the first time, see if the target model is checked out. If it is checked out by a different user, ask him or her to check it in before you install the transformation.

  • Installs the target model in upgrade (regular) mode.

    Even if you select Full installation to install the transformation, the behavior is the same as for regular installation. If and only if a table has had destructive changes, the installation job drops and recreates the table and deletes all its data f rom those tables.See Destructive and nondestructive changes.

Validation check batch installation

The installation process:

  • Creates or updates database packages for the generated programs.

  • Maps the validation check batch to the latest installed version of the source clinical data model.

  • Creates or updates and installs the target tables to store the discrepant data rows.

    Note:

    Installing a validation check batch does not install the source clinical data model. You cannot install a validation check batch until the source model is installed.

Installation requirements

Objects are not installable until they meet certain requirements.

Clinical data model installation requirements
  • All tables must be installable. A table is not installable if it has no columns.

  • All tables must have a primary key constraint.

  • All tables identified as blinded must have blinding completely defined.

Transformation installation requirements
  • The transformation's status must be Complete. If it is Incomplete, at least one table or column in the target model is neither mapped nor marked Not Used.

  • All source and target tables must be installable. (See Clinical data model installation requirements.)

  • If it has a custom program, the program must be installable. A program is not installable if it does not have source code or source and target tables.

  • All expressions must have valid code.

Validation check batch installation requirements
  • It must have at least one validation check.

  • Its source tables must be installable.

  • If it has a custom program, the program must be installable.

InForm metadata change detection and synchronization

When a protocol change requires changing study metadata (for example, a CRF item is added in Central Designer and then InForm) DMW can detect and propagate the change.

Compare DMW and InForm metadata on demand

You can compare the metadata—table and column structure—in any DMW InForm model lifecycle stage to any InForm lifecycle database for the same study.

  1. After you select a study, click the Study Configuration icon Shows gear with pencil iconfrom the navigation bar. Then click the Clinical Data Model tab.
  2. Select the InForm clinical data model.
  3. In the InForm Configuration tab, click the Icon is a box with lines.Compare Metadata icon.
  4. In the Metadata Comparison window, select the metadata to compare:
    • For the DMW InForm data model: Development, Quality Control, or Production.

    • For InForm: Development, UAT, or Production.

  5. Click Compare. The report appears on screen.

    To save the report, click the Export All to Excel icon.

    If the system finds no differences, a message appears.

To make the changes in DMW, click the Icon is a folder and an Up arrow.Load InForm Metadata icon.

Automatic metadata change detection

The system compares metadata during the following processes:

Table 8-10 Automatic Metadata Comparison and Results

Comparison Done During Result If Significant Differences Exist

Data loading.

Message "Metadata needs to be reloaded" is displayed and data loading is suspended. If it is in the production lifecycle, a message is displayed on the Home page.

Promoting an InForm model to a higher validation status. DMW compares model metadata to metadata in the InForm UAT or Production environment specified in the InForm clinical data model.

A message is displayed and the promotion is not allowed.

Saving changes to a remote location or study account.

Results of the comparison are displayed.

Metadata loading.

See Metadata change detection during metadata loading.

If there are differences and you accept them, click the Icon is a folder and an Up arrow.Load InForm Metadata icon in the InForm Configuration tab of the InForm clinical data model to synchronize DMW with InForm.

Note:

The InForm reporting database extract (RDE) views are the source of truth for DMW. For example, if a protocol change occurs that results in forms and items being removed from a trial, the RDE views may still retain the views and columns that represent the removed items. In that case, they also remain in DMW even after reloading metadata.

Metadata change detection during metadata loading

During metadata loading, the system:

  • Reloads the required static metadata tables: IRV tables and RD_DATADICTIONARY.

  • Compares the static metadata tables with the data stored in DME_IA_SRC_TABLES and DME_IA_SRC_COLUMNS to detect significant changes including:

    • New, missing, or changed reference path for an item

    • Changed data type

    • Increased data length for VARCHAR2 columns

    • Changes to column (item) blinding status

  • If significant changes are found, updates DME_IA_SRC_TABLES and DME_IA_SRC_COLUMNS, other extended metadata mapping tables, and the InForm input clinical data model tables. Note:

    • Tables are never dropped. If an RDE view is missing, the system marks the corresponding DMW table as Not Used.

    • Columns are never dropped. If an existing item is not included, it is marked Not Used. If a column data type is changed, a new column is created with the new data type.

    • Updating a column's blinding status may require updating its table's blinding status as well.

  • Displays the message "Metadata needs to be reloaded" and suspends data loading until it is done.

Understanding internal objects and error messages about them

Many DMW error messages and log files refer to internal objects. To help you understand these messages, here is a description of the objects. See Figure 8-3 for more information.

Primary objects

Each time you set up a clinical data model, transformation, validation check, or custom program, the system creates a specialized object to do the work.

Each object listed in Table 8-11 is really two objects:

  • The object definition is located directly in the study domain and includes almost all the object's metadata.

  • The object instance is located in a work area and contains a reference to the object definition. The object instance is installed, creating a database table, view, and/or package.

Table 8-11 Primary internal objects

Object Purpose Created when a user...

Table

Holds data.

Creates a clinical data model or table. Also when a user installs a validation check batch, creates validation check target tables.

Load set

Loads data into all tables in one clinical data model.

Creates an input clinical data model. (Load sets are used for file data loading and for InForm internal metadata and operational data models, but not InForm study data or metadata, which are handled by the DMW InForm Connector.)

Program

Contains the source code for a transformation, validation check, or custom listing.

Creates a transformation, validation check, or custom listing.

Data mart

Contains a set of views, one for each table in a clinical data model. When generated, produces a set of files containing all the data in the model.

Creates a clinical data model with the data extract option selected.

Business area

Contains a view to each table in a clinical data model for read-only access for data visualization tools.

Creates or modifies a clinical data model with the Business Area option selected.

Transformation maps

Used for mapping by transformations, validation checks, and custom listings. For transformations, model-level maps contain table-level maps, which contain column-level maps.

For validation checks, batch-level maps contain validation check-level maps, which contain column-level maps.

For custom listings, listing-level maps contain column-level maps.

Creates a model- or table-level transformation, validation check batch, validation check, or custom listing.

Codelists

Contains a list of valid values that can be assigned to a table column.

Creates a codelist.

Container objects

Some objects are simply containers, or namespaces. See Figure 8-3 for more information.

Table 8-12 Container internal objects

Object Purpose Created when a user...

Domain

Contains all other objects, including other domains. The shipped DMW_Domain contains all DMW objects. There is a domain for each study grouping, and a domain for each study inside a study grouping domain. DMW REFDATA and DMW UTILS are both shipped domains.

Study domains contain the primary object definitions for clinical data models, transformations, validation check batches, and custom listings for their study.

Creates a study or study grouping.

Application area

DMW creates an application area for each lifecycle stage, inside each study domain.

Creates a study.

Work area

DMW creates a work area for each clinical data model in each lifecycle stage, inside the lifecycle application area.

Contains primary object instances for the study lifecycle area, including tables, the load set that writes data to its tables (if it is a file input model), the programs that read from its tables (including validation checks, data marts, and business areas) and the transformation programs that write to its tables (if it is a target model).

Installs a clinical data model.

Secondary objects

The primary objects contain secondary objects.

Table 8-13 Secondary internal objects

Object Purpose Created when a user...

Execution setup

Contains parameters used for running an executable object.

Creates an executable object: Load Set (input clinical data model), program (transformation, validation check, custom listing), or data mart (data model with data mart option).

Parameters and Parameter sets

Parameter sets contain parameters and are contained in all executable objects. A parameter contains a reference to a variable and additional metadata.

Creates an executable object.

Table descriptor

Table metadata that is owned by executable objects and used to map them to source and target tables.

Creates an executable object.

Source code

Container for the actual program code for an executable object.

Creates an executable object.

Column

Contains column metadata and a reference to a variable.

Creates a table.

Constraint

Contains constraint metadata.

Creates a constraint.

Variable

Contains metadata used by both columns and parameters.

Creates a table or executable object.