Oracle® Clinical Conducting a Study Release 5.1 E53564-01 |
|
|
PDF · Mobi · ePub |
The chapter contains the following topics:
The Batch Data Loader can load large amounts of data from sources such as labs, avoiding manual data entry.
The Batch Data Loader is designed only to update response data, not header keys. To update the header keys, you must perform a hard delete and reload the data. See "Batch-Deleting Data" for instructions and cautions about this procedure.
The Batch Data Loader cannot handle the following types of data:
DCI Modules that specify clinical planned events
DCI/DCM combinations that have different collect time flags; for example, both the DCI and the DCM should collect time data, or neither should.
DCI/DCM combinations with different dates and times
Optional DCMs
In a flexible study, after completing the batch data load, the system automatically runs the incremental expectedness calculation job for all patients whose data changed. See Oracle Clinical Creating a Study for more information.
This section contains the following topics:
To prevent exceeding the rollback segment space and causing the entire job to fail, you can choose to run the batch data loader in its own rollback segment. Define the rollback segment as follows:
Define and tune a rollback segment and bring it online, following instructions in the Oracle SQL Reference manual.
Enable the segment for use in Oracle Clinical by adding the entry BDL_R_ SE to the reference codelist OCL_STATE. For further information on this reference codelist, see the Oracle Clinical Administrator's Guide.
To batch-load data into Oracle Clinical, you must create or select a format mask with which you specify how the data in your files corresponds to fields in Oracle Clinical. Masks achieve this correspondence in two different ways:
Positional masks define a length for each column in the data file; for example, the STANDARD format mask (described below) assigns the first ten characters to the Investigator, the following ten characters to the site name, and so on.
Delimited masks have no set column length and separate column values with a user-defined delimiter character such as a comma (,). Flexibility in field length can save space.
Note:
You cannot load multibyte data using positional format masks, including the STANDARD format mask supplied by Oracle. Use delimited masks for this purpose.Oracle Clinical ships with a predefined format mask of the fixed, or positional, type. You can define your own format masks as required.
The STANDARD mask is the format used by default when a new data file is created. Table 7-1 lists the expected order of column values for each record, with the first ten characters (1-10) allotted to the first column value, the second ten characters (11-20) allotted to the second column value, and so on. Columns with an M displayed are mandatory and columns with an O displayed are optional.
Table 7-1 Default Batch Data Load Fields
Note:
Either dci_date or subevent_number must have a value.You can specify your own input formats to stipulate different column positions and different lengths from the defaults. You can also specify the delimiter to separate fields, removing the restriction of the fixed field format.
To define a format:
From the Conduct menu, select Batch Data Load, then select Batch Load Formats.
Enter the name of the format in the Name field.
STANDARD is the name of the format mask provided by Oracle.
Specify a delimiter if you are creating a delimited mask or leave the Delimiter field blank if you are creating a positional mask.
The Valid box is not enterable. See Step 9.
Press Enter or Tab. The system enters values under Format Components: Seq, Field Name, Mandatory, and Field Length.
Start indicates the start column for this field in the data file. An entry of zero means the field is not provided, so the field is ignored. This field is filled automatically with a number and the delimiter if the entry is not zero.
Default Value is not used by the system.
Change the status of the format if you want, and if Valid=Y. The default is P for provisional.
Save. The system runs a check on your format definition. If it is valid, the system checks the Valid box. If the format definition is not valid, the system does not check the Valid box and the format does not appear in the LOV for Format Mask in the Batch Data Load window.
Table 7-2 Batch Load Format Fields: Upper Part
Name | Description | Size; Data Type |
---|---|---|
Name |
Name of the format. |
16; free text |
Status |
Status of the format. The default is P for provisional. Other possibilities are A for active, and R for retired. |
1; CHAR |
Delimiter |
Special symbol you choose to mark the position of a format in the database column if you do not choose STANDARD. |
1; CHAR |
Valid |
Whether the mask is valid; checked denotes validity. |
box |
Table 7-3 Batch Load Format Fields: Format Components
Name | Description |
---|---|
Seq |
Sequence of the format components in the database table. This order is automatically provided by the system. You can change order, which is particularly useful for delimited masks. |
Field Name |
Name of the field to which the format applies; system-provided. |
Mandatory |
A box to indicate whether the field is required. The information is automatically provided by the system. |
Start |
Indicates the position in the database where the field to which you are applying the format will start. You can either choose a delimiter in the upper part of this window to indicate position, or, if the Delimiter field is zero, enter the position here. If you leave this field blank, it remains null. A zero means the field is not used, because a delimiter exists. |
Field Length |
Number of characters or digits allowed for the field to which you are applying the format. The defaults are entered automatically and cannot be changed. |
Default Value |
The system does not use the value in this field. |
You can identify data as normalized lab data by defining a repeating Question Group with a single pair of questions, one to collect the lab test name and the other to collect the lab test value; see the chapter on DCMs in Oracle Clinical Creating a Study.
When you batch load lab data in a normalized format, the system automatically generates two records from each incoming normalized lab data record, for both inserts and updates. The lab test name is then stored as a Question response and you can use it in data extract and in validation Procedures to identify the test name for each test result.
For example, from records with a normalized input file format as follows:
,,2,,VISIT1,0,20110101,,HEMA,HEMA,HEMA,LAB,HCT,0,1,43 ,,2,,VISIT1,0,20110101,,HEMA,HEMA,HEMA,LAB,HGB,0,2,17
where the lab test names are HCT and HGB, the respective lab test values for the patient are 43 and 17, the lab test Question name is LTEST and the lab test result name is LVALUE, the Batch Data Load process converts the preceding two input records to the following four records:
,,2,,VISIT1,0,20110101,,HEMA,HEMA,HEMA,LAB,LTEST,0,1,HCT ,,2,,VISIT1,0,20110101,,HEMA,HEMA,HEMA,LAB,LVALUE,0,1,43 ,,2,,VISIT1,0,20110101,,HEMA,HEMA,HEMA,LAB,LTEST,0,2,HGB ,,2,,VISIT1,0,20110101,,HEMA,HEMA,HEMA,LAB,LVALUE,0,2,17
The batch load process includes several stages: Load, Prepare, and Transfer. You can choose to process data one stage at a time, checking error files after each stage and then proceeding to the next stage, or peform a combination of stages; see "Using the Batch Data Loader".
Moves the data in the data files from the server to the Oracle Clinical database table Temporary Batch Data Items (TBDI).
Updates the TBDI table, and creates the Temporary Batch DCI and Temporary Batch DCM tables. During this stage the system looks for completeness and consistency of the records, whether the keys are valid, and checks for locked RDCMs.
Also during this stage, records from the data files are divided into INSERT and UPDATE records. INSERT records are those that do not already exist in Oracle Clinical, and need to be created; UPDATE records are those that have updated information for Received DCIs and Received DCMs that already exist in Oracle Clinical. These records are termed pending inserts and pending updates, respectively.
You can run the Detail Load Impact Report, or the Summary version of this report, after the Prepare stage to get a list of how many RDCMs have been updated or inserted.
Oracle Clinical offers two batch data load processes. Both processes can handle multiple data files at a time.
Standard Batch Data Load This option can handle only a single DCM in a single DCI in each data file; this usually corresponds to a single, relatively simple CRF (see Oracle Clinical Creating a Study for information on DCMs and DCIs). All data must be for a single study.
To use this process, select Batch Data Load from the Conduct menu.
Out of Study Batch Data Load This option can handle multiple DCMs in a DCI, and multiple DCIs in a data file and can process data from multiple studies at the same time. You may choose to use the Out of Study option even for a single study because it allows multiple DCMs and DCIs per data file.
To use this process, select Out of Study Load from the Conduct menu.
This section describes the following steps:
Before you start loading data into Oracle Clinical, some preprocessing may be necessary to bring data into conformity with required format standards.
If you are using a positional, or fixed, format, including the shipped STANDARD mask, each value occupies a fixed column position, as listed in Table 7-1. If there is no data in any one field, that field must contain spaces to make the record conform to the fixed format. The fields for subevent number, Investigator, site, document number, and DCM question group name may be left blank, in which case the individual field is calculated automatically. (See "Subevent Number Not Supplied" for a discussion on what occurs when the subevent number is left blank.) If the data is not in the correct fixed format, batch data load will not succeed.
Note:
Do not leave any empty lines in the load file. The system treats each line in the load file as a record. If there is an empty line, the system takes that as an empty record and invokes an error.When a subevent number is not supplied in the data file, the system attempts to derive a subevent number by following an algorithm that matches existing data for the patient against data being processed in the current batch load. The system follows these steps that comprise the algorithm:
For each patient/clinical planned event, the system derives a table of existing subevent numbers, and the minimum/maximum dates/times associated with them.
All data files being loaded as a group, as well as data in the Received DCMs and Received DCIs tables, are subject to the algorithm's actions.
The system processes each null subevent number in the incoming data file(s) in ascending date/time order. This is achieved by checking, in the order of existing or already allocated subevents, whether the DCM date and DCM time of the new data falls within the minimum/maximum dates/times of the existing subevents.
For the purpose of matching, null DCM times are treated as 00:00:00, or the earliest time for that date.
If the new data falls within the range of an existing subevent or one already allocated, it is assigned that subevent number.
If the new data does not fall within the range of an existing subevent, a new one is allocated, and the date and time of the new data is associated with it. Subsequently processed records that match are therefore allocated the same subevent number.
You cannot use this matching algorithm to automatically allocate subevent numbers for duplicate DCMs, because it does not include a check on whether the new subevent assignment will cause a collision with existing Received DCMs. For example, if data for subevent 0 already exists at 01-DEC-99 00:00:00, and new data for an existing DCM is processed with the same date, it will be assigned subevent 0 and will be detected by later phases in the Prepare stage as an update of the existing data. If the new data has a different document number, the document will be rejected during Prepare as a mismatched update.
Similarly, new data with time information that falls within the range of existing data will be matched to the existing subevent and will be later rejected as a mismatched update on time. For example, if data for subevent 1 exists at 01-DEC-99 00:00:00 and 02-DEC-99 00:00:00, and new data for a DCM that exists (at 01-DEC-ok
9 08:00:00, for instance) is processed at 01-DEC-99 12:00:00, it will be assigned subevent 1 and be detected by later phases in the Prepare stage as an update of the existing data. Since the times differ, it will be rejected.
You must log information about the files into Oracle Clinical—that is, create entries for the data files so that Oracle Clinical can access and process them.
From the Conduct menu, select Out of Study Load, then select Batch Load Data Files.
Note: For test mode, from the Definition menu, select Test a Study, choose Out of Study Load, and then select Batch Data Load Files. |
The Maintain Out of Study Batch Load Data File window opens. The system automatically populates the Creation TS field with the current date and time and the Status field with RECEIVED. You cannot change these fields; they change automatically as the data files are processed.
Enter a data file name in the Name field.
In the OS File Name is prepopulated with the appropriate directory path. Add a slash or backslash, depending on your operating system, and the filename.
The system validates the directory path when you submit the job.
If your data is lab data, enter the lab name in the Lab Name field.
If you have defined a position and size for your data in the database table from the Batch Load Formats menu entry, enter the name of the format mask in the Mask Name field.
If your data files have multiple DCMs per DCI or you want to process files from different studies at the same time, use Out of Study Batch Data Load instead; see "Creating Data File Entries for Out of Study Batch Data Loads".
To specify the file for a standard Batch Data Load:
From the Conduct menu, select Batch Data Load, then select Batch Load Data Files.
The Maintain Load Data File window opens. The system automatically populates the Creation TS field for the data file name with the current date and time and the Status field with LOAD. You cannot change these fields; they change automatically as the data files are processed.
Note: For test mode, from the Definition menu, select Test a Study, choose Batch Data Load, and then select Batch Data Load Files. |
Set your study context to the study you want to work with.
Enter a data file group name in the Name field.
Enter a separate record for each data file to be loaded.
To select files from a directory, see "Selecting Data Files".
Enter the appropriate Format Mask for the files.
Save your input.
To continue, go to "Loading Data".
Table 7-4 Maintain Batch Load Data File Field Details
To select data files from a directory to include in a data file group:
Click the Select Server Files button in the Maintain Batch Load Data File window to reach the Select Server Files window.
Enter the directory that contains your data files.
Specify a query filter, if necessary. The wildcard % is allowed.
Click the Get Directory Information button for information on this directory.
Note:
If information is retrieved a second time for the same directory in the same session, and if Refresh? is selected, another query goes to the operating system; otherwise, you would see outdated information.Click in the Select column next to each data file you want to include. You can use the Select All and Deselect All buttons and the navigation buttons at the bottom of the window.
Save your input.
If you are using the standard Batch Data Load, select Batch Data Load from the Conduct menu. If you are using Out of Study Batch Data Load, select Out of Study Load from the Conduct menu. After that the procedure is the same.
To successfully load data from an external source into Oracle Clinical tables you must perform all three batch data load stages, in order: Load, Prepare, and Transfer; see "Batch Data Load Stages". You can choose to perform a single stage or multiple stages at a time. Performing a single stage allows you to check the error file just for that stage and takes less time.
Select one of the following menu options:
Load Batch Data
Prepare Batch Data
Transfer Batch Data
Load/Prepare Batch Data
Load/Prepare/Transfer Batch Data
Load/Prepare/Privileged Transfer. This option allows you to update locked data, and is available only if you have special privileges. If you select this option the system automatically updates locked data without prompts if updates to additional data are loaded.
The appropriate Parameterized Submission (PSUB) form is displayed with the parameters required for the stage or stages you selected.
Enter values for the job's parameters. The parameters displayed depend on the stage or stages you selected. The complete list of parameters includes:
Table 7-5 Standard and Out of Study Batch Data Load Parameters
Stage | Parameter Name | Description |
---|---|---|
Load |
Datafile Group |
ID of the data file group you want to load. A list of values is available. |
Load |
Max No. of Bad Data Allowed During Load |
Maximum number of bad records allowed before the system ends the entire process. If fewer than the maximum number of bad records are found, the batch load continues; the bad records are not loaded, but are reported in the log file. |
Prepare |
OS File ID for Prepare |
ID of the data file group you want to prepare. A list of values is available. |
Prepare |
Max No. of Errors Allowed During Prepare |
Maximum number of bad records allowed before the system ends the entire process. If fewer than the maximum number of bad records are found, the batch load continues; the bad records are not loaded, but are reported in the log file. |
Prepare |
Prepare to Completion? |
Whether to complete the Prepare phase or to accept a partial completion. |
Prepare |
Spool Directory for Reloadable Data Files |
If Prepare to Completion is Y, the system populates this field with the appropriate value for the user, according to the values of the BDL_ROOT and BDL_USERDIRS in the OCL_STATE local reference codelist, and the user's account name (minus "OPS$" if it contains OPS$. |
Transfer |
Data File Group ID for Transfer |
ID of the data file group you want to transfer. A list of values is available. |
Transfer |
Transfer Mode |
The mode in which the transfer should take place. Values include: INSERT - Transfer only pending insert records. UPDATE - Transfer only pending updates records. BOTH - Transfer both pending inserts and pending update records. Note: If you are using Out of Study load and are loading DCIs with multiple DCMs, select BOTH for proper processing. The process sees the records in the first Received DCM in a Received DCI as inserts and the records in subsequent Received DCMs as updates to the existing Received DCI. |
Click the Job Details button to display the Submission Details window and set the appropriate fields; see Oracle Clinical Getting Started for details. Then click Back to return to the Submission of Module window.
Click Submit Job.
Click the Job Status button to monitor progress of the batch job.
If there is a problem loading the data, the load stops and an error message appears. To review details about why the load was aborted, look at the log file created during the batch load. See "Understanding the Error Log".
Note:
After the Transfer stage you must indicate if you want to purge pending or bad updates or inserts by displaying the Maintain Batch Load Data File form.To delete inserted (new) records included in the batch file, select Purge Inserts from the Special menu. Update Status should be UPDATED, and Insert Status should be PENDING or BAD INSERT.
To delete updates (additions to existing records), select Purge Updates from the Special menu. Insert Status should be INSERTED, and Update Status should be PENDING or BAD UPDATE.
If necessary (that is, if you have selected one of the steps that does not execute all three stages), select one of the subsequent steps to continue processing the batch data load.
Repeat Steps 2 through 6 until the batch data load is completed successfully.
You can use the Batch Job window to query on your job ID to see when it completes, or you can use the Maintain Batch Load Data File window or the Maintain Out of Study Batch Data Load File window to view the status of the batch job; the system automatically updates the Data File Status (DF Status) depending on your menu selection and the success of the current batch load stage.
In the Maintain Batch Load Data File window you can also change the comment text (only before processing starts) for a data file or purge either the pending inserts or pending updates, which are records of the data files not successfully inserted into the Temporary Batch Data Items, Temporary Batch DCIs, or Temporary Batch DCMs tables.
Purging Pending Inserts and Updates in the Maintain Batch Data Load Window In the bottom part of this window, place the cursor on the data file group name with the appropriate data file status and then select either Purge Updates or Purge Inserts from the Special menu.
Purging Pending Inserts and Updates for Out of Study Load Files To purge pending inserts or pending updates of Out of Study Load files:
Query for the data file in the Maintain Out of Study Batch Load Data File window.
In the bottom half of the window the last part of each data file group name contains the study name for which the data was loaded. Note the study name and the data file group names for which you want to purge inserts and/or updates.
Navigate to the Maintain Batch Data Load Files window, select the appropriate study and query for the data file group name.
In the bottom part of this window, place the cursor on the data file group name with the appropriate data file status and then select either Purge Updates or Purge Inserts from the Special menu.
The log file keeps track of the status of every discrete job during the Transfer process, including the names of the data files being loaded, summary information, and errors. If the transfer is successful, the log shows the status at the beginning and at the end.
Note:
The default error log is overwritten at each stage of the batch data load. To retain the error log you must enter a new name in the Output File field of the Submission Details window of PSUB.The batch data loader also performs error checking during the Prepare phase. To diagnose errors, follow these steps:
Open the .log file to get a general overview of the errors. When data files fail, you can check the log file for the status code of RECEIVED or PREPARED. The data file will not contain these status codes.
Open the .out file to see specific details about the errors.
Table 7-6 Error Messages During Prepare Phase
Table 7-7 shows an example of an Error Reporting Table that is displayed in the .OUT file if there are errors associated with the Load/Prepare phase. The example reports errors for Investigators. An equivalent Error Reporting Table is created for errors associated with patients.
Table 7-7 Error Reporting Table for Investigators
Investigator | Wrong Study | No Site | Wrong Site | Wrong Location | TRDCI Site | Investigator Site | Investigator Location |
---|---|---|---|---|---|---|---|
3260 |
No |
No |
Yes |
No |
No |
Site 2 |
Site 1 |
In Table 7-8, the "Yes" in the Wrong Site column indicates that Oracle Clinical was expecting a different site from the one it found in the data file. In this case, it was expecting Site 1 and found Site 2.
To correct this error, either:
change all occurrences of Site 2 to Site 1 for Investigator 3260 in the data file
change the site assignment for Investigator 3260 to Site 1 in Oracle Clinical
Table 7-8 Other Examples of Possible Errors
If "Yes" Under.... | Then... |
---|---|
No Site |
No site was found. Oracle Clinical shows the correct site under "Investigator Site." |
Wrong Location |
The wrong location was given. Oracle Clinical shows the correct location under "Investigator Location." |
Wrong Study |
The wrong study was given. In this case, Oracle Clinical cannot anticipate what is the right study. |
After the Transfer stage, the log shows the number of records updated and the number of records inserted.
You can use a batch job to delete study data when appropriate. For example:
After loading an Out of Study file, you can delete all data associated with a bad data file or data file group.
You can also delete batch data loads of externally maintained data where the audit history is maintained in an external system.
There are also two types of manually entered data that can be deleted from Oracle Clinical without causing problems with the audit trail:
First-pass data entered into a study but never made accessible.
Data in development databases where audit is not required.
You can navigate to the batch job for deleting study data from the Conduct menu, by selecting Security, and then selecting Delete Study Information or Delete Study Information with Audit. PSUB window appears. If you do not specify any values for DCI, document number, data file, or data file group, all data in the study will be deleted.
If the study is a flexible study, after completing the batch data delete job, the system automatically runs the full expectedness calculation job. See Oracle Clinical Creating a Study for more information.
Note:
The batch data delete that is not audited is a hard delete - you will not be able to roll back changes made after such a delete. Use this function with extreme care.Note:
Most users have access only to the Batch Data Deletion with Audit job. However some users may have access to the Batch Data Deletion job, which has an additional Audit parameter allowing the user to either audit the data deletion or not.You can change data entry information and maintain an audit trail by selecting Update from the Data Entry menu and making changes in Data Entry Update mode. You can also use the batch job Delete Study Information with Audit.
The Oracle Clinical table STUDY_DATA_DELETE_AUDIT records each time a batch data delete job is run. The table records:
the parameters used to invoke the deletion
the actual number of each object that is deleted by the run
This audit table enables you to document the use and impact of the data deletion facility, and may help you diagnose whether a problem has been caused by a hard data delete.
When you execute batch data delete with audit, the records in the deleted CRFs are backed up in a set of audit tables. For each source table that is deleted, a new corresponding audit table is created with a suffix of $BDD (for instance, RECEIVED_DCIS$BDD). Columns in the table include the batch_job_id followed by the same columns as in the source table. For example, RESPONSES$BDD includes batch_job_id followed by same fields as responses.
The job generates audit tables corresponding to the following Oracle Clinical tables:
Depending on parameter values specified for data file and data file group id during batch data deletion, data from the source table may be deleted and therefore audited.
If you specify a data file, the corresponding record in DATA_FILES is deleted.
If you specify a data file group and no data file, the corresponding record in DATA_FILE_GROUPS and related records in DATA_FILES is deleted.
If you accept all default parameters, all records for the study in DATA_FILE_GROUPS and DATA_FILES are deleted.
The DATA _FILES table stores names of files used to do a batch data load.
The DATA_FILE_GROUPS stores group of files that are submitted together.
Table 7-10 DATA Tables Used in Batch Data Deletion
Source Table | Corresponding Audit Table |
---|---|
RESPONSES |
RESPONSES$BDD |
RESPONSE_LOBS |
RESPONSE_LOBS$BDD |
RECEIVED_DCIS |
RECEIVED_DCIS$BDD |
RECEIVED_DCMS |
RECEIVED_DCMS$BDD |
DISCREPANCY_ENTRIES |
DISCREPANCY_ENTRIES$BDD |
VALIDATION_REPORTED_VALUES |
VALIDATION_REPORTED_VALUES$BDD |
ACTUAL_EVENTS |
ACTUAL_EVENTS$BDD |
RDCI_HISTORY |
RDCI_HISTORY$BDD |
DISCREPANCY_ENTRY_REVIEW_HIST |
DISCREPANCYENTRYREVIEWHIST$BDD |
RECEIVED_PAGES_HISTORY |
RECEIVED_PAGES_HISTORY$BDD |
RECEIVED_PAGES |
RECEIVED_PAGES$BDD |
Data from definitional tables is not deleted during batch data delete. However, definitions related to CRFs being deleted are audited in order to preserve identifying information. You can delete or modify definitional data in the below tables, especially if all related patient data is deleted. These tables are used to generate reports on the audited data
As with definitional data, the form layout tables are audited in case users delete or modify relevant layouts after CRFs are deleted. Data from the source table is not deleted during a batch data delete, but you may be inclined to retire or modify layouts if all related patient data is deleted. You can use this layout information to reconstruct the patient data report for hard-deleted patient data.
Records from the tables listed below are not deleted during batch data delete. However, these tables will also be audited to preserve status at the time the patient data was deleted. You may be inclined to delete data from these tables if all related patient data is deleted. With auditing, if you delete all the data for a patient and then delete the patient, you can still find the patient enrollment data from the PATIENT_POSITIONS$BDD table.
Table 7-13 Reference Tables Used in Batch Data Deletion
Source Table | Corresponding Audit Table |
---|---|
CLINICAL_PLANNED_EVENTS |
CLINICAL_PLANNED_EVENTS$BDD |
PATIENT_POSITIONS |
PATIENT_POSITIONS$BDD |
OCL_STUDY_SITES |
OCL_STUDY_SITES$BDD |
OCL_SITES |
OCL_SITES$BDD |
STUDY_SITE_PATIENT_POSITIONS |
STUDYSITE PATIENTPOSITIONS$BDD |
From the Conduct menu, select Conduct Reports, then select Batch Data Load to find several reports on batch loaded data.
This report would be better titled the Prepared Data Report because it displays data in the temporary tables following the successful completion of the Prepare stage of a batch data load. Data that has gone through the Transfer stage (after the Prepare stage) does not appear in the report.