Skip Headers
Oracle® Health Sciences Translational Research Center User's Guide
Release 3.0.2.1

E35681-09
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

4 Cohort Query

This chapter contains the following topics:

4.1 Overview

Cohort Explorer in v3.0 contains numerous user interface enhancement to improve search for patients or subjects based on the context selection. The tools are organized within the Cohort Query tab, which is the second tab from left on the top row. In this tab, you create, run, and save queries that select subsets of patients or subjects from your patient or subject database.

You have a wide array of topics, each with unique data elements that you can select or specify to focus on the particular subset of patients or subjects you are looking for. Each time you specify criteria for a particular topic, CE recognizes your selection as a discreet statement and preserves the definition as part of the query. You can create as many criteria statements as you want for a single query to identify the patient or subject count or list according to your requirement.

Along with defining data criteria (for example, Gender = Male), you also configure the logic of a particular criteria statement to be either inclusion or exclusion. For each query you create (or view), the entire definition displays in the main window where you can make changes and re-run the query in an iterative way, to view the impact of your query design in real-time.

4.2 Cohort Criteria Selection

The query criteria are grouped into three broad data categories—Patient information, Clinical data, and Genomic data. Each category is further defined with multiple topics where you can drill into the specific parameters for that particular topic. The categories and topics are summarized in the following sections:

Table 4-1 Cohort Selection Criteria

Category Topic

PATIENT INFORMATION

-

-

Demographics

-

Consent

CLINICAL DATA

-

-

Diagnosis

-

Clinical Encounter

-

Procedure

-

Medications

-

Patient History

-

Test or Observation

-

Specimen

-

Study

-

Relative Time Events

GENOMIC DATA

-

-

Sequence Variants

-

Copy Number Variation

-

Microarray Expression

 

RNA-seq Expression


To display the topics for a particular category, select the arrow to the left of that category name. The left hand column slides and reveals the available topic areas for that category.

Figure 4-1 Inclusion and Exclusions

Description of Figure 4-1 follows
Description of "Figure 4-1 Inclusion and Exclusions"

To add criteria to a query, select the topic that has the particular type of data you want to use for your definition. Each topic provides unique drop-down lists, searchable code lists or other parameters that are appropriate for that particular topic.

For example, within the Demographics tab you can specify the count of patients with a particular gender, age range or who live in a particular location. In contrast, the Diagnosis tab enables you to specify that you want patients with a specific diagnosis (by name or by code), which may have a particular onset date or date range.To specify Female patients between the ages of 40 and 45

  1. Click the arrow next to Patient Information to display the Demographics tab.

  2. Click Demographics.

  3. Select Female from the Gender drop-down list.

  4. Enter the range in years in Age (years).

  5. In the Insert as group, select inclusion or exclusion to confirm whether the definition is to be included or excluded.

  6. Click Submit.

At this point, the criteria you specified appears in its query statement in the appropriate section of the Query Patients page.

Figure 4-3 Query Statement

Description of Figure 4-3 follows
Description of "Figure 4-3 Query Statement"

Each time you select your criteria, CE adds the definition statement to the display on the right. You can add as many statements of criteria to your Query as you want and you can see the query definition expand, each time you add a new statement. When you run the query, CE considers all the criteria rows in combination.

4.2.1 Required Fields for Criteria

Most of the criteria prompts include required fields. These are identified with an asterisk. For example, you are required to indicate one or more Diagnoses for a patient along with a date or onset or other parameters.

If you submit a criteria statement without addressing a required field, the system prompts you to complete it.

The criteria selection topics are summarized in the following:

4.2.2 Patient Information or Subject Information

4.2.2.1 Demographics

Figure 4-5 Patient Information Demographics options

Description of Figure 4-5 follows
Description of "Figure 4-5 Patient Information Demographics options"

Table 4-2 Demographics Details

Field Name Definition Sample Value or Values

Patient ID or Subject ID

Double-blinded unique identifier for the Patient. Also known as the Oracle ID

-

Gender

Patient's gender

Male, Female

Marital Status

Patient's marital status

Married, Single, Separated, Divorced

Age (in Years)

Patient's chronological age

-

Date of Birth

Patient's date of birth

-

Deceased Date

Patient's decease date

-

Ethnicity

Code to reflect Patient's ethnicity

-

Race

Code to reflect Patient's race

-

City

Code for Patient's City of residence

-

State

Code for Patient's State of residence

-

Zip Code

Code for Patient's Zip code

-

County

Code for Patient's County

-

Country

Code for Patient's Country

-


4.2.2.2 Consent

Table 4-3 Consent Screen Fields

Prompt Heading Definition Sample Value

Consent Type Code

Authorization for specified medical care.

Procedure Consent, Specimen Consent

Consent Status

The status of the consent form.

Active, Pending, Refused

Consent Start Date

The period start date of the patient's consent.

-

Consent End Date

The period end date of the patient's consent.

-


4.2.3 Clinical Data

4.2.3.1 Diagnosis

Note:

Click the magnifying glass icon for Diagnosis, to search either by Diagnosis Code or Diagnosis Name.

Table 4-4 Diagnosis Screen Fields

Prompt Heading Definition Sample Value

Diagnosis

Code that classifies the patient's clinical condition.

Most commonly identified with ICD codes.

Diagnosis Status

Code that reflects the status of the Diagnosis

Active, New, Recurring

Onset Date

Date of the onset

-

Reported Date

Date when the diagnosis was recorded by a service provider

-

End Date

Date of resolution.

-

Age at First Onset

Age of the patient at first onset (years)

-

Anatomical Site

Anatomical site or sites related to the diagnosis

-


4.2.3.2 Clinical Encounter

Figure 4-8 Clinical Encounter

Description of Figure 4-8 follows
Description of "Figure 4-8 Clinical Encounter"

Note:

Click the magnifying glass icon for Encounter Type, to search either by Encounter Code or Encounter Name.

Table 4-5 Clinical Encounter Screen Fields

Prompt Heading Definition Sample Value

Encounter Type

Type of clinical encounter that individual has undergone, valid values may be inpatient, outpatient

Inpatient, Outpatient

Location

Facility where it took place, generally name of the hospital, clinic, doctors office and so on

Sequoia Hospital, Medical Associates, Mass General Hospital Ear and Nose Dept

Time

Date when encounter took place

-

Datasource

Name of the data system actual clinical data is coming from

EMR1, EMR2


4.2.3.3 Procedure

Note:

Click the magnifying glass icon for Procedure, to search either by Procedure Code or Procedure Name.

Table 4-6 Procedure Screen Fields

Prompt Heading Definition Sample Value

Procedure

A discreet intervention performed by a clinician. Procedures are most commonly identified with CPT codes

-

Procedure Type

A sub categorization of procedures

Surgical, Radiology, Diagnostic

Procedure Start Date

The date when the procedure began

-

Procedure End Date

The date when the procedure concluded

-


4.2.3.4 Medication

Note:

Click the magnifying glass icon for Medication to search either by Medication Code or Medication Name.

Table 4-7 Medication Screen Fields

Prompt Heading Definition Sample Value

Medication

A pharmaceutical substance intended to provide therapeutic benefit.

NDC codes, RxNorm codes

Medication Start Date

The start date for the administration of the medication.

-

Medication End Date

The end date for the administration of the medication.

-

Dosage

The medication's dosage.

-

Dosage Units

Code for the medication dosage's units.

-

Medication Outcome

Clinical outcome of taking a given medication as assessed by clinician or medical professional

Partial Recovery, Full Recovery, Adverse Reaction


4.2.3.5 Patient History

Figure 4-11 Patient History

Description of Figure 4-11 follows
Description of "Figure 4-11 Patient History"

Note:

Click the magnifying glass icon for Patient History to search either by Medication Code or Medication Name.

Table 4-8 Patient History Screen Fields

Prompt Heading Definition Sample Value

Patient History

A coded representation of the Patient History

Smoking, Obesity

Patient History Start Date

Date when this history was known to be active

-

Patient History End Date

Date when this history is known to no longer be active

-

Amount

Numerical value

For example, 1

Amount Units

Unit of Measure

For example, mmHg

Frequency

Numerical Value

For example, 12

Frequency Units

Unit of Measure

For example, TPDAY (times per day)

Text Value or Code

Value of history which is represented by text or coded.

Yes, No, Frequently, Rarely

History Applicable To

For Familial History, this will contain type of blood relative, which may have a history value

Father, Mother, Paternal grandmother, Paternal Grandfather


4.2.3.6 Test or Observation

Figure 4-12 Test or Observation

Description of Figure 4-12 follows
Description of "Figure 4-12 Test or Observation"

Note:

Click the magnifying glass for Test or Observation to search either by Test or Observation Code or Test or Observation Name.

Table 4-9 Test or Observation Screen Fields

Prompt Heading Definition Sample Value

Test or Observation

Any kind of medical intervention performed to aid in the diagnosis or detection of disease

Blood Pressure

Test Date

The date the test was performed

May-12-2012

Result (numeric)

The Test result

140, 90

Result (numeric) Units

The Test units of measure

mmHg

Result String (text)

Textual Test result, also Notes or remarks for the Test

140/90


4.2.3.7 Specimen

Note:

The three fields of Specimen Type, Specimen Number, and Specimen Vendor are used in combination to search for particular data. Click the magnifying glass icon next to each field, to specify the appropriate criteria.

Table 4-10 Specimen Screen Fields

Prompt Heading Definition Sample Value

Specimen Type Name

A coded description for the type of Specimen.

Blood, Urine, Sputum

Specimen Number

A unique identifier for a specimen.

-

Specimen Vendor Number

A unique identifier for a source of the specimen, namely lab the specimen was analyzed.

Harvard-CancerInstitute01, MIT-Whitehead-Lodish5

Anatomical Site

The target site of the intervention.

-

Specimen Collection Date

Date the specimen was collected.

-

Specimen Amount

The amount of the specimen collected.

-

Units

The units of measure for the specimen collected.

Mg/dL , millimoles/liter


4.2.3.8 Study

Note:

Click the magnifying glass icon for Study, to search either by Study Code or Study Name.

Table 4-11 Study Screen Fields

Prompt Heading Definition Sample Value

Study (Name or Identifier)

A reference to a particular study.

NOT-A-STUDY (any genomic data that is not explicitly tied to study would be categorized there), Glioblastoma Study

Study Start Date

The start date for the study.

-

Study End Date

The end date for the study.

-


4.2.3.9 Relative Time Events

So far, we have covered clinical and other patient related criteria along with their metadata. For example, a user can specify a diagnosis and a range of dates for a diagnosis given to the patient to search for cohort within the particular time period.

However, frequently, the events that drive cohort or patient set searches are relevant to finding patients that may have certain event dependencies such as being diagnosed with a disease after taking a medication.

CE enables you to find such patients based on relative time event dependencies. With this criteria set you can search for patients who have a history of one clinical (or genomic change) event relative to another. In other words, you can specify a search for patients that took a certain medication or a set of medications within 30 days before being diagnosed with a specific disease. Or who had a procedure performed on them two months after starting a medication. The intent of the interface is to give the user a natural language interface to specify the criteria. On the backend, a complex query is generated by the Query Engine; however the user stays within the user friendly front end interface. A figure of the Relative Time Events criterion is shown below.

Figure 4-15 Relative Time Events

Description of Figure 4-15 follows
Description of "Figure 4-15 Relative Time Events"

This is a three-step process:

  1. Specify the event data to start your event.

  2. Specify the relative time condition. This is how the events in Step 1 and Step 3 relate to each other.

  3. Specify the related event to associate with the event in Step 1.

The result is a search criteria that is interpreted as: Search for patients that experienced Event X (at any time while, certain amount of time before or after) Event Y. The idea is that X is the original event, Y is the associated event, and the time frame describes the time relationship between the two events, for the given patient.

The events listed for selection in Step 1 or Step 3 are present in Clinical or Genomics Data categories.

However, it is important to explain what the Relative time conditions are in Step 2 and how their meaning translates into SQL:

  • at any time while: the meaning of this relative time condition is to find any events that overlap with each other in terms of their occurrence. For example, if we have two distinct events with start and end dates, as long as at some point in time both events were true, this condition is satisfied.

    Example 1:

    Event in Step 1 - Started on May 1, 2010 - ended on June 29, 2011

    Event in Step 3 - Started on June 10, 2011 - till present

    The two events have overlap between June 10-29, 2011 thus the condition is satisfied.

    Example 2:

    Event in Step 1 - Started on Dec 1, 2010 - ended on Dec 12, 2011

    Event in Step 3 - was performed on August 3, 2011

    The two events have overlap on August 3, 2011 thus the condition is satisfied.

    Example 3:

    Event in Step 1 - Started on May 1, 2010 - ended on June 29, 2011

    Event in Step 3 - Started on June 10, 2009 - till present

    The two events have overlap between May 1, 2010 and June 29, 2011 thus the condition is satisfied.

    The dates used for time comparing each event type are as follows:

    • Diagnosis: Reported Date - End Date (if null, use present date)

    • Procedure: Start Date - End Date (if null, use present date)

    • Medication: Start Date - End Date (if null, use present date)

    • Test or Observation: Test Date (single day event only)

    • Gene Variant: Specimen Collection Date for end date use present date

  • (more than / less than / exactly) [ ] (days / weeks / months / years) (before / after): the meaning of this relative time condition is to find any events that have start or occurrence dates that relate to each other with the time period specified by the condition.

    Examples:

    Start date of Event in Step 1 is more than 5 weeks before Start date of Event in Step 3

    Occurrence, for example, Test Date of Test or Observation in Step 1 is less than 1 month before Start Date of Medication in Step 3

    Gene Variant Specimen Collection Date in Step 1 is exactly 1 day after Procedure Start Date in Step 3

    The dates used for time comparing each event type are as follows:

    • Diagnosis: Reported Date

    • Procedure: Start Date

    • Medication: Start Date

    • Test or Observation: Test Date

    • Gene Variant: Specimen Collection Date

4.2.4 Genomic Data

4.2.4.1 Microarray Expression

Microarray expression is one of the growing set of genomic criteria that can be used to stratify patients. The genomic data driving this filter comes from the ODB model, specifically the W_EHA_RSLT_GENE_EXPR table and related tables. Gene expression data is associated with patients through the specimen used for genomic study analyses collected during study participation or other genomic testing.

The upper section of the criteria gives you the choice to specify Specimen Type and Anatomical site to further filter the results.

In the popup for Gene Expression, you are given two options (Array data types to choose from). For each option, you must specify the criteria when selecting patients based on their gene expression data.

  • One-channel – Single channel data can be filtered based on the following data fields:

    • On Intensity; as a cutoff based on aggregates or on values, and across an experiment or multiple ones.

    • P-value; as a cutoff on values.

    • Call; on any of the three call types.

  • Two-channel – Here data can be filtered on the following data field:

    • Log2Ratio – as a cutoff on values

In Expression for Genes From, select at least one unique gene from any of the three specified sources:

  1. Ad-hoc list

  2. Pathway

  3. Gene Set

Figure 4-16 Microarray Expression

Description of Figure 4-16 follows
Description of "Figure 4-16 Microarray Expression"

Figure 4-17 Microarray Expression

Description of Figure 4-17 follows
Description of "Figure 4-17 Microarray Expression"

Note:

Click the magnifying glass icon for Specimen Type, to search either by Specimen Name or Specimen Code. This also applies to the other search criteria. Criteria identified with * are required.

Table 4-12 Microarray Expression Screen Fields

Prompt Heading Definition Sample Value

Specimen Type

The specimen type description that identifies the specimen used for genomic tests.

Normal sample, tumor sample.

Anatomical Site

Anatomical site name or code corresponding to the specimen collected.

Left lung, kidney, small intestine

Intensity

This and the following two fields are specific to Single Channel data. This field is a Gene expression value range specification. Data driving this selection is in the W_EHA_RSLT_GENE_EXP table in ODB and is generally expected to be normalized now to take full advantage of this criteria selection. Additionally, the data values should all be positive. The criteria conditions allow you to narrow down patients based on upregulated gene expression (using greater than 1.0 times mean condition either across columns (same hybridization) or rows (single result file) of the data in the particular experiment.

Down regulated gene in hybridization example - Intensity is less than 2.0 times mean in the gene expression within a particular hybridization.

P-value

Significance value associated with the specific experiment, optional.

P-value < 0.00001

Call

Call made on the particular value, if present. All are taken unless specified. Optional.

P - Present, A - Absent, M - Marginal

Log2Ratio

Specific to dual channel differential gene expression relating the difference between, for example, control sample and the tested sample intensity. Data for this selection in found in the W_EHA_RSLT_2CHANNEL_GXP table.

Float type integer representing a log base 2integer value of a fraction.

Expression for genes from*

Selection of genes that are to be used for patient stratification based on the expression. At least one of the below criteria must be specified.

N/A

Ad-hoc List: Gene

List of one or more genes.

-

Pathway

Reference to a pathway stored on the reference side of the ODB model. This in turn corresponds to a list of genes that are to be used to compare their Intensity values.

-

Gene Set

User-defined collection of genes that you can reference across any UI instead of having to build ad-hoc lists of genes each time.

MyGeneSet1, GlioblastomaSmithLabGenes


For Metadata Filter details for Microarray Expression, see Metadata Filters.

4.2.4.2 Sequence Variants

The Variants Criteria Selection interface falls under the category of Genomic criteria and enable the user to query for patients based on results present in the W_EHA_RSLT_SEQUENCING and related tables in the ODB model.

In the Sequence Variant screen, you have the option to choose Specimen Type and Anatomical site for associating the selected variant criteria in this screen. Currently, the following 5 main options are available for searching:

  • having Variants in selected Genes: this enables you to search variants in specific genes.

  • having selected Genomic Variants : this enables you to search for specific known variants by their Cosmic or dbSNP identifiers.

  • having Variants within specified Genomic Region: this enables you to search variants in specified genomic location like chr1:19094593-29302393 or whole of chr1.

  • having Zygosity: this enables you to search variants with specifiec zygosity.

  • having Genotypes: this enables you to search for specific genotypes like AT, or AA, or wildtype (same as reference), and so on.

Additionally there are other parameter, attributes and metrics associated with variants that you can use for further filtering on specific variants. These options are:

  • Specifying variants by their attributes.

  • Specifying variants by their non-synonymous substitution scores

  • Specifying quality metrics filters depending on the sequencing file type like VCF, MAF and CGI masterVar.

Figure 4-18 Sequence Variants

Description of Figure 4-18 follows
Description of "Figure 4-18 Sequence Variants"

Surrounding text describes Figure 4-18 .

Table 4-13 Variants Screen Fields

Field Name Definition Sample Value

Specimen Type, Anatomical Site

For more information on Microarray Expression, refer to description under Microarray Expression section.

As in Microarray Expression section.

Having Variants in selected Genes

Variant mode selection - being able to specify variants in specified genes

N/A

Having selected Genomic Variants

Variant mode selection - being able to specify known variants by their identifier. At least one mode of variant specification must be given.

N/A

having Variants within Specified Genomic Region

Variant mode selection - being able to specify variants in a specific genomic location like chr1:1234324-3434333, chr7:1000, chr3 or chr2:1000+200

N/A

Genomic Variant ID

Specify known gene variant reference identifiers such as dbSNP or Cosmic. Allows for selecting multiple values.

rs56289060, 905944

DNA Reference Version

Helps specify variants in a selected reference version. Allows multiple selection, except for 'having Genotype' option where only single selection is allowed.

GRCh37

At Genomic Position

Helps specify a genomic region in which variants has to be searched.

chr7, chr7:1000, chr3:1000-2000, chr2:1000+200

Genotype

List available genotype values for the selected genomic position or genomic variant based on the position. Also displays wildtype base (same as reference base). User can select a combination of two genotypes to search or just one genotype and search for patients with these selected genotypes.

N/A

Additional Variant Information (Optional) are used in the context of either of the radio buttons selection above

Additional criteria used to filter down variants based on variant type and variant impact features.

N/A

Variant Type and corresponding Variant Impact

You can select any of the variant types supported along with additional variant metadata such as each variant type's impact on the resulting protein.

Note: not all Variant Impact annotations apply to each Variant Type.

Variant Type: Substitution, Insertion, Deletion, Indel, Complex

Variant Impact: Synonymous, Missense, Nonsense, Unknown,

Variant Status

Specifies which variant types to consider - whether the variant should be known or novel. Default considers all variants.

Known, Novel

Strand

Gene transcription direction attribute. By default, all directions of transcription are included.

+ means forward, - means reverse

Ad-hoc List: Gene; Pathway; Gene Set

For more information on Microarray Expression, refer to description under Microarray Expression section.

As in Microarray Expression section.

At Genomic Position

Specify genomic location for the variants to occur in, the format should be chr#:from-to or chr# if entire chromosome to be used.

chrX:13000-120000, chr7

Non-synonymous Substitution Scores

Data for this section is loaded into the reference side of the ODB model from Ensembl based on either Polyphen algorithm or SIFT algorithm. The prediction value can be specified numerically or alternatively can be specified using Polyphen or SIFT specific annotation.

Note: SIFT or Polyphen predictions are only available for Known variants

With Polyphen, prediction between 0 and 1 or labeled as benign, Probably Damaging, possibly damaging and unknown

With SIFT, prediction between 0 and 1 or labeled as Deleterious and tolerated

Variant Parameters Depending on Sequencing File Type

User can select to specify more detailed filtering criteria based on data coming from 3 different sequencing file formats such as VCF, MAF, CGI masterVar. As each input file formats uses different metadata to describe stored entities, depending on the sequencing input format selection, the user can elect to specify:

VCF: Variant Call Format, Format.GQ range - user can specify upper or lower numeric values for this parameter

MAF: Mutation Annotation Format, Score - user can specify upper or lower numeric values for this parameter

Somatic Status

Somatic Score

Allele Read Count

Reference Read Count

Total Read Count

RMS Base Quality

RMS Mapping Quality

AD/DP Ratio

CGI masterVar: Complete Genomics masterVariation format. The available fields to search by are

  • Allele Zygosity

  • Score VAF

  • Score EAF

  • Allele read count

  • Reference read count

See appropriate file formats documentation for appropriate value ranges (VCF 4.2 format, MAF 2.0-2.2 format, Complete Genomics masterVar format). For example, Allele Zygosity for CG masterVar includes het-alt, hom, half, het-ref options.


For Metadata Filter details for Microarray Expression, see Metadata Filters.

4.2.4.2.1 At Genomic Position

You can opt to specify genomic data selection using genomic co-ordinates. You specify the chromosome region in a standard format for the Variation and CNV data. You can specify a complete chromosome or part of a chromosome as criteria. Currently, only one chromosome region at a time is implemented for search.

The following chromosome region formats are supported.

  • CHR15:10000-200000: Considers region between 10000 to 200000 in chromosome 15.

  • CHR15:1,200,000+5000 - Considers 5000 bases upstream from 1,200,000 position in chromosome 15.

  • CHR15 - Considers whole of the chromosome 15.

  • CHR15:1000 - Considers 1000th nucleotide position of chromosome 15.

4.2.4.3 Copy Number Variation

The Copy Number Variation criteria selection interface falls under the category of Genomic criteria where you can query for patients based on results present in the W_EHA_RSLT_COPY_NBR_VAR table and related tables in the ODB model. Currently, this table contains data from two platforms. One data type is from Complete Genomics and the other data type is from array based system like Affymetrix Genome Wide SNP 6 array with data in SEG format.

As for Gene Expression, you can optionally select specimen type and anatomical site.

Next select CNV Result Type, which represents data from numeria (array based) and categorized sequencing based platform.

You can then filter results based on the list of Copy Number Variation Attributes. For example, for numeric based CNV Result Type, SNP log2 Ratio and for categorized based, gain, loss, equal to indicate Amplification, Deletion or no change in the copy number of a given gene or gene region.

Finally, you should specify the location of Copy Number Variation which is the gene or genomic position of interest.

Figure 4-19 Copy Number Variation - Numeric Based

Description of Figure 4-19 follows
Description of "Figure 4-19 Copy Number Variation - Numeric Based"

Figure 4-20 Copy Number Variation - Categorized Based

Description of Figure 4-20 follows
Description of "Figure 4-20 Copy Number Variation - Categorized Based"

Table 4-14 Copy Number Variation

Prompt Heading Definition Sample Value or Values

Study*, Specimen Type, Anatomical Site

For more information on Microarray Expression, refer to description under Microarray Expression section.

As in Microarray Expression section.

CNV Result Type

Search CNV results either belonging to array based platform like Genome_Wide_SNP_6 array or sequencing based CNV data from complete Genomics.

Array based or sequencing based

SNP Log2 Ratio (Segment Mean)

Values for segment mean for array based CNV data in the form of a range. You can also specify a single value in Log2 Ratio and search for results with segment mean greater than the specified value.

Numeric value. It can accept negative values.

CNV Type

Copy Number Variation attribute indicating whether it is an amplification - gain, deletion - loss, or no change - equal.

Gain, Loss, Equal

With confidence > (CNV Type score)

Copy number variation confidence score associated with CNV Type. Score is populated from the source file, and depending on the scoring method, the range can vary.

Numeric value, range can vary depending on source

Called Ploidy

Values for ploidy can be given more specifically as a range, either upper and/or lower bound can be specified. For example, for duplication, called ploidy can be specified as 2.

Range of Ploidy to be selecting based on, e.g. for duplication, it can be specified as between 1.5 and 2.5

With confidence > (Ploidy score)

Confidence score associated with Called Ploidy. Score is populated from the source file, and depending on the scoring method, the range can vary. The higher the confidence, the more confidence is that the ploidy score is correct, lower range can be specified

Numeric value, range can vary depending on source.

CNV Location: in Genes from*

Selection of genes that are to be used for patient stratification based on Copy Number Variation. At least one of the below criteria must be specified.

N/A

Ad-hoc List: Gene; Pathway; Gene Set

For more information on Microarray Expression, refer to description under Microarray Expression section.

As in Microarray Expression section.

At Genomic Position

Specify genomic location for the variants to occur in: the format should be chr#:from-to or chr# if the entire chromosome is to be used.

chrX:13000-120000, chr7


For Metadata Filter details for Microarray Expression, see Metadata Filters.

4.2.4.4 RNA-seq Expression

The RNA-seq Expression criteria selection interface falls under the category of Genomic criteria where you can query for patients based on results present in the W_EHA_RSLT_RNA_SEQ table and related tables in the ODB model.

As for Gene Expression, you can optionally select specimen type and anatomical site. Next, you can filter results based on the RPKM values, Raw Counts, Median length and strand.

Finally, you should specify the location for searching RNA-seq expression results which is the gene or genomic position of interest.

Figure 4-21 RNA-seq Expression

Description of Figure 4-21 follows
Description of "Figure 4-21 RNA-seq Expression"

Table 4-15 RNA-seq Expression

Prompt Heading Definition Sample Value or Values

Specimen Type, Anatomical Site

For more information on RNA-seq Expression, refer to description under Microarray Expression section.

As in Microarray Expression section.

RPKM

Represents 'Reads Per Kilobaseq exon Model per million mapped reads', calculated expression intensity values in positive float or zero.

N/A

Raw Counts

Represents raw read counts in positive floating point values or a zero if unavailable.

N/A

Median Length (Normalized)

A normalized region length calculation in positive float or zero.

N/A

Strand

Strand of stored gene.

N/A

RNA-seq Location: in Genes from*

Selection of genes that are to be used for patient stratification based on RNA-seq expression. At least one of the below criteria must be specified.

N/A

Ad-hoc List: Gene; Pathway; Gene Set

For more information on RNA-seq Expression, refer to description under Gene Expression section.

As in Microarray Expression section.

For Transcript IDs

Search for Ensembl Transcipt ID

N/A

At Genomic Position

Specify genomic location for the variants to occur in, the format should be chr#:from-to or chr# if entire chromosome is to be used.

chrX:13000-120000, chr7


4.2.4.5 Metadata Filters

All genomic criteria screens like Sequence Variant, Copy Number Variation, Microarray Expression and RNA-seq Expression have additional filter criteria based on the metadata associated with the specimens or patients. This option is present at the bottom of each of the genomic criteria screens.

Figure 4-22 Metadata Filters

Description of Figure 4-22 follows
Description of "Figure 4-22 Metadata Filters"

Once you expand the Metadata Filters option, click Add Metadata Attribute to open the Select Metadata Attribute dialog.

Figure 4-23 Select Metadata Attributes

Description of Figure 4-23 follows
Description of "Figure 4-23 Select Metadata Attributes"

Then search based on Attribute Name, Scope or Category to get a list of attributes associated with Metadata. Select attribute to add as Metadata Filter and assign a value for the added filer.

Table 4-16 Metadata Filters

Prompt Heading Definition Sample Value or Values

Attribute Name

Represents the metadata qualifier tag from W_EHA_QUALIFIER table.

N/A

Scope

Scope is retrieved from the table W_EHA_QLFR_TABLE. Based on the internal mapping scope is shown as 'Per Result' for value in table as 'W_EHA_RSLT_FILE_SPEC_QLFR' and 'Per Specimen' as scope value for value in table as 'W_EHA_RSLT_SPEC_QLFR'.

N/A

Category

Represents the metadata qualifier category tag from W_EHA_QLFR_CATEGORY table.

N/A

Value

Based on the selected metadata attribute this value datatype would change. If a numeric attribute is selected then a numeric value is given as input. Similarly, for date attribute, date would be as input and for character attribute, character would be as input.

N/A


4.3 Patient or Subject Count

Based on the Cohort Criteria, the application displays Patient or Subject Count. Total number of patients or subjects in the database is also displayed below that. This count is updated or refreshed each time you log into the system and select the Query Patients tab.

Figure 4-24 Patient Count

Description of Figure 4-24 follows
Description of "Figure 4-24 Patient Count"

4.4 Inclusion and Exclusion Criteria

Once you have added several criteria statements to your query, you can select the arrow to the left of the criteria statement to expand or collapse its detail. You can also expand or collapse the display for the Inclusions or Exclusions criteria. This enables you to focus your attention on one aspect of your query as required.

If you have specified multiple rows of criteria, CE considers the logic in combination when you run the query. The Query logic is slightly different depending on whether the statements are designated as either Inclusion or Exclusion.

You can continue to adjust the details of an existing statement without having to open the criteria selections on the left. Instead, you can select the icons on the right end of each statement.

Note:

Oracle recommends that you prepare the query definition in a structured format (line-by-line), prior to creating it in CE. Visualizing your definition may help you recognize a simple way to organize it and give you a tool to validate the accuracy for the data you input in your system.

4.5 Final Patient or Specimen Count

Once you have defined at least one criteria statement, you can run the query. However, you can add as many additional criteria rows as you want. When you run a query, the system counts the number of patients that match the criteria as a subset of the Initial total count. CE also displays a count of the number of specimens for the patients identified by the query criteria. As you edit the criteria, the Patient Count and Specimen Count are updated each time you run the query.To run a query and view the counts, perform the following steps:

  1. Select at least one row of criteria defined in the Inclusions or Exclusions options.

  2. Click Run Query.

  3. Click Patient Count to view the count of patients.

To the left of the Initial Patient Count display, there is a Options drop-down menu. If you select the down arrow, you can change the default display of the Specimen Count and hide the count.

Figure 4-27 Patient Count

Description of Figure 4-27 follows
Description of "Figure 4-27 Patient Count"

4.6 Query Library

Once your query definition is complete, you may want to save it. When you save it, you create a unique name and have the option to enter a narrative description. CE preserves the criteria logic, but does not retain any information about the patients that were counted (or listed) when you last ran the query. As you create and save queries over time, you establish a valuable knowledge base that you want to reference in the future.

You can open a query at a later date to review your logic and possibly reuse all or part of it in a new query. In the upper left corner of the main window, there is a Query Library drop-down menu. This menu provides the ability to save the name of a query you have defined as well as to access a query that you have previously saved.

4.6.1 Load Query

To rerun a query and edit definition of the query from the Query Library menu, perform the following steps:

  1. Select Load Query.

  2. Enter all or part of a query name in the right hand text box.

  3. Search for your query by entering your criteria from the Query Name drop -down menu.

  4. Select the required query.

  5. Click Submit. The query definition is displayed on the main page.

4.6.2 Save Query

Once you have at least one query criteria statement, you can select Save Query from the Query Library menu. You have the option to save as a new query (with a new name), or overwrite an existing query from the library.

If you enter a new query name and that name is already recognized in your library, the system prompts you to determine if you want to overwrite your existing query. If not, you have the option of entering a new query name.

Note:

When you save a query, the only information that is preserved is the criteria definition itself. The criteria statements are saved as logical elements. It is important to understand that neither the patient count nor any of the underlying patient data is saved with a query. Each time the query is opened, it must rerun to view current patient counts. Therefore, patient counts for the same query are expected to change over time.