6 Use Case Examples

This chapter lists use cases for Oracle Health Sciences Omics Data Bank. It contains the following topics:

Overview of Use Cases
Use Cases Accompanied by Query Examples

6.1 Overview of Use Cases

This section contains the following use case scenarios:

Scenario 1—Find patients who are poor responders for drug A and have a mutation in the promoter region of gene A.
Scenario 2—Show expression level of TP53 mutant by cancer tissue.
Scenario 3—Ability to query subjects for established molecular tests. For example, the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Scenario 4—Ability to research a gene in the sample set.
Scenario 5—Enable researcher to select a patient cohort based on the expression level for a set of genes.
Scenario 6—Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm).
Scenario 7—I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Scenario 8—What is the frequency of co-mutation of two genes in a data set?
Scenario 9—Display all patients whose cancer cells had a deletion in gene X.
Scenario 10—Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).
Scenario 11—Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.

Note:

To run some of the use case queries, you have to create global temporary tables.

6.2 Use Cases Accompanied by Query Examples

6.2.1 Scenario 1

Use Case - Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.

Areas

Variant
Gene Annotation
Test and Results
Drug Info

Output queries tables from - ODB+CDM

Query

SELECT VRT.RESULT_SPEC_WID, CH.CHROMOSOME, VRT.START_POSITION, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING vrt, w_eha_chromosome ch,

(

SELECT V.ROW_WID, SG.GENE_WID, V.REPLACE_TAG FROM

W_EHA_VARIANT v,

(

SELECT GSG.SOURCE_WID, GSG.GENE_WID, (GSG.START_POSITION - PP.PROMOTER_OFFSET) as START_POSITION,

GSG.START_POSITION AS END_POSITION

FROM W_EHA_GENE G, W_EHA_GENE_SEGMENT GSG, W_EHA_PRODUCT_PROFILE PP

WHERE g.HUGO_NAME IN ('BRCA2') /*-- Enter the approved HUGO symbols of target genes here.*/

AND GSG.GENE_WID = G.ROW_WID

)SG

WHERE V.SOURCE_WID = SG.SOURCE_WID

AND (V.START_POSITION BETWEEN SG.START_POSITION AND SG.END_POSITION)

)vi, w_eha_rslt_study

WHERE VRT.VARIANT_WID = VI.ROW_WID

AND VRT.GENE_WID = VI.GENE_WID

AND W_EHA_RSLT_STUDY.RESULT_STUDY_NAME = 'STUDY1'

AND VRT.RESULT_STUDY_WID = W_EHA_RSLT_STUDY.ROW_WID

AND VRT.RESULT_SPEC_WID in (250); /*-- Select a list of specimen patients who are poor respondents of Drug A to test for mutation.*/

6.2.2 Scenario 2

Use Case - Show expression level of TP53 mutant by cancer tissue.

Areas

Variant
Gene Annotation
Gene Expression
Biospecimen Data

Output queries tables from - ODB+CDM

Query 2 -

SELECT RSLT_EXPR.RESULT_SPEC_WID, RSLT_EXPR.INTENSITY, RSLT_SEQ.START_POSITION, RSLT_SEQ.REPLACE_TAG

FROM (SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, V.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_GENE G, W_EHA_VARIANT V

WHERE VRT.VARIANT_WID = V.ROW_WID

AND VRT.GENE_WID = G.ROW_WID

AND

G.HUGO_NAME IN ('TP53') /*-- Enter the Approved HUGO SYMBOL of target genes here.*/

) RSLT_SEQ,

(SELECT R_EXP.RESULT_SPEC_WID, R_EXP.INTENSITY

FROM W_EHA_RSLT_GENE_EXP R_EXP

WHERE R_EXP.GENE_WID IN

(SELECT G.ROW_WID

FROM W_EHA_GENE G

WHERE G.HUGO_NAME IN ('TP53')) /*-- Enter the Approved HUGO SYMBOL of targeted genes.*/

) RSLT_EXPR

WHERE (RSLT_EXPR.RESULT_SPEC_WID = 1 AND RSLT_SEQ.RESULT_SPEC_WID = 2);

6.2.3 Scenario 3

Use Case - Ability to query subjects for established molecular tests, for example the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.

Areas

Variant
Chromosomal Rearrangement
Gene Annotation

Output queries tables from ODB+CDM

Query -

SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G,

W_EHA_RSLT_SPECIMEN SP

WHERE VRT.GENE_WID = G.ROW_WID

AND g.HUGO_NAME IN ('KRAS') /*-- Enter the Approved HUGO SYMBOL of target genes here.*/

AND VRT.VARIANT_WID = VI.ROW_WID

AND VRT.RESULT_SPEC_WID = SP.ROW_WID

AND (SP.SPECIMEN_NUMBER in ('TCGA-02-0001-XXX-XXX')); /*-- Give a target list of specimens here.*/

6.2.4 Scenario 4

Use Case - Ability to research a gene in the sample set.

Areas

Gene Annotation

Output queries tables from - ODB: REFERENCE + RESULT

Query -

SELECT VRT.RESULT_SPEC_WID, G.HUGO_NAME, VRT.START_POSITION, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G

WHERE VRT.VARIANT_WID = VI.ROW_WID

AND VRT.GENE_WID = G.ROW_WID

AND g.HUGO_NAME IN ('BRCA2');

6.2.5 Scenario 5

Use Case - Enable researcher to select a patient cohort based on the expression level for a set of genes.

Areas

Cancer Diagnosis
Demographic
Biospecimen Data
QC Data
Gene Annotation
Expression

Output queries tables from - ODB+CDM

Query -

SELECT R_EXP2.RESULT_SPEC_WID, G.HUGO_NAME, R_EXP2.INTENSITY

from

W_EHA_RSLT_GENE_EXP r_exp2, W_EHA_GENE G,

(SELECT AVG(R_EXP1.INTENSITY) EXP_AVG

FROM W_EHA_RSLT_GENE_EXP R_EXP1

WHERE R_EXP1.GENE_WID IN

(SELECT G1.ROW_WID

FROM W_EHA_GENE G1

WHERE G1.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP'))

) INTENSITY

WHERE R_EXP2.GENE_WID = G.ROW_WID

AND G.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP')

AND R_EXP2.INTENSITY > INTENSITY.EXP_AVG

ORDER BY R_EXP2.RESULT_SPEC_WID;

6.2.6 Scenario 6

Use Case - Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm)

Areas

Variant
Gene Annotation

Output queries tables from - ODB: REFERENCE

Query -

SELECT VG.REFERENCE_ID, VG.CODE_TYPE, VG.PREDICTION_SCORE, VG.CODE, VG.ABSOLUTE_POSITION, VG.CHROMOSOME, VG.REPLACE_TAG, PI.ACCESSION, P.AMINO_ACID_SEQUENCE

FROM W_EHA_GENE_COMPONENT GC, W_EHA_PROTEIN P, W_EHA_PROT_INFO PI,(

SELECT GCS.GENE_COMPONENT_WID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG, VP.PREDICTION_SCORE, PC.CODE, VP.STRUCTURE_WID, PC.CODE_TYPE

FROM W_EHA_GENE_COMP_SEGMENT GCS, W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, W_EHA_VARIANT_PREDICTION VP, W_EHA_PREDICTION_CODE PC

WHERE VP.VARIANT_WID = V.ROW_WID

AND VP.PREDICTION_CODE_WID = PC.ROW_WID

AND VX.VARIANT_WID = V.ROW_WID

AND V.SOURCE_WID = GCS.SOURCE_WID

AND V.START_POSITION <= GCS.END_POSITION

AND GCS.START_POSITION <= V.END_POSITION

AND PC.CODE_TYPE IN ('SIFT', 'polyphen')

AND PC.CODE IN (deleterious, possibly damaging, probably damaging) /*-- Filter by prediction code, confer W_EHA_PREDICTION_CODE table.*/

AND VP.PREDICTION_SCORE < '0.5' /*-- Filter by prediction score*/

)VG

WHERE GC.ROW_WID = VG.GENE_COMPONENT_WID

AND GC.COMPONENT_TYPE = 'CDS'

AND GC.PROTEIN_WID = P.ROW_WID

AND PI.PROTEIN_WID = P.ROW_WID

AND VG.STRUCTURE_WID = GC.STRUCTURE_WID;

6.2.7 Scenario 7

Use Case - I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?

Areas

Variant
Gene Annotation
Pathway Annotation

Output queries tables from - ODB: REFERENCE + RESULT

Query

SELECT S.RESULT_STUDY_NAME, VRT.RESULT_SPEC_WID, VI.PATHWAY_NAME, VI.PATHWAY_SOURCE_ID, VI.HUGO_NAME AS GENE_SYMBOL, VI.REFERENCE_ID, VI.ABSOLUTE_POSITION, VI.CHROMOSOME, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_RSLT_STUDY S, (

SELECT V.ROW_WID, GS.GENE_WID, GS.HUGO_NAME, GS.PATHWAY_NAME, GS.PATHWAY_SOURCE_ID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG

FROM W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, (

SELECT G.ROW_WID AS GENE_WID, G.HUGO_NAME, P.PATHWAY_NAME, P.PATHWAY_SOURCE_ID, GSG.START_POSITION, GSG.END_POSITION, GSG.SOURCE_WID

FROM W_EHA_GENE_SEGMENT GSG, W_EHA_GENE G, (

SELECT DISTINCT PH.PATHWAY_NAME, PH.PATHWAY_SOURCE_ID,PP.HUGO_SYMBOL

FROM W_EHA_PATHWAY_PROTEIN PP, W_EHA_PATHWAY PH

WHERE PP.PATHWAY_WID = PH.ROW_WID

AND PATHWAY_NAME LIKE ('%thyroid hormone%')

)P /*-- Select either a specific KEGG pathway or search for pathway name keywords. For example, All Thyroid hormone specific pathways.*/

WHERE G.HUGO_NAME = P.HUGO_SYMBOL

AND GSG.GENE_WID = G.ROW_WID

)GS

WHERE V.SOURCE_WID = GS.SOURCE_WID

AND V.START_POSITION <= GS.END_POSITION

AND GS.START_POSITION <= V.END_POSITION

AND VX.VARIANT_WID = V.ROW_WID

) VI

WHERE VRT.VARIANT_WID = VI.ROW_WID

AND VRT.GENE_WID = VI.GENE_WID

AND VRT.RESULT_STUDY_WID = S.ROW_WID;

6.2.8 Scenario 8

Use Case - What is the frequency of co-mutation of two genes in a data set?

Areas

Variant

Output queries tables from - ODB: REFERENCE + RESULT

Query -

select count(a_and_b.result_spec_wid) concurrent_cnt

FROM

(SELECT DISTINCT VRT.RESULT_SPEC_WID

FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g

WHERE VRT.GENE_WID = G.ROW_WID

AND g.HUGO_NAME IN ('KRAS')

intersect

SELECT DISTINCT VRT.RESULT_SPEC_WID

FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g

WHERE VRT.GENE_WID = G.ROW_WID

AND G.HUGO_NAME IN ('PTEN')

) a_and_b;

/* count all specimen with seq result */

SELECT COUNT(DISTINCT VRT_ALL.RESULT_SPEC_WID) CNT FROM W_EHA_RSLT_SEQUENCING VRT_ALL;

6.2.9 Scenario 9

Use Case - Display all patients whose cancer cells had a deletion in gene X.

Areas

Output queries tables from ODB+CDM

Query

SELECT COUNT(*) FROM (

SELECT DISTINCT R_CNV.RESULT_SPEC_WID

From W_EHA_GENE G,

(SELECT R_CNV1.*

FROM W_EHA_RSLT_COPY_NBR_VAR R_CNV1

WHERE R_CNV1.RESULT_SPEC_WID BETWEEN (200 +ROUND(DBMS_RANDOM.VALUE(1,3))) AND (200 +ROUND(DBMS_RANDOM.VALUE(5,12)))/*-- Restrict to a subset of the specimen list. A random list is taken here.*/

AND R_CNV1.CALLED_CNV_TYPE = 'gain') R_CNV /*-- Ensure W_EHA_RSLT_COPY_NBR_VAR column; called_cnv_type, is updated with type data.*/

WHERE R_CNV.GENE_WID = G.ROW_WID

AND G.HUGO_NAME in ('KRAS'));/*-- Enter gene symbol of gene X.*/

6.2.10 Scenario 10

Use Case - Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).

Areas

Output queries tables from

Query

select spec1.SPECIMEN_WID

From

(

select rnv.SPECIMEN_WID

from

(

select *

from

(

SELECT RS1.RESULT_SPEC_WID AS SPECIMEN_WID, RS1.*

FROM W_EHA_RSLT_NON_VARIANT_CHR RS1, W_EHA_VERSION V

WHERE

RS1.VERSION_WID = V.ROW_WID

AND V.VERSION_LABEL IN ('GRCH37.P8')

AND V.VERSION_TYPE = 'DNA'

AND RS1.ZYGOSITY IN ('hom-ref')

)

) rnv,

W_EHA_VARIANT_V VV1, W_EHA_VARIANT_XREF VX

WHERE

RNV.CHROMOSOME_WID = VV1.CHROMOSOME_WID

AND VX.VARIANT_WID = VV1.ROW_WID

AND VX.REFERENCE_ID = 'rs4733908'

and vv1.STATUS = 'KNOWN'

AND VV1.STRAND IN ('-','+')

AND VV1.ABSOLUTE_POSITION BETWEEN RNV.START_POSITION AND RNV.END_POSITION

) main1,

W_EHA_RSLT_SPECIMEN spec1

WHERE SPEC1.ROW_WID = MAIN1.SPECIMEN_WID

and spec1.SPEC_DATASRC_WID = 1;

6.2.11 Scenario 11

Use Case - Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.

Areas

Output queries tables from ODB

Query

SELECT DISTINCT ODBQ3.SPECIMEN_WID VARIANT_SPECIMEN, ODBQ4.SPECIMEN_WID RNASEQ_SPECIMEN

FROM

(

select spec1.SPECIMEN_WID

from

(

select rsq.SPECIMEN_WID

from

(

select *

from

(

select rs1.RESULT_SPEC_WID as SPECIMEN_WID, rs1.*

from W_EHA_RSLT_SEQUENCING_CHR rs1, W_EHA_GENE g, W_EHA_VERSION V

WHERE

RS1.GENE_WID = g.ROW_WID

AND g.HUGO_NAME i ('ADAM32')

AND RS1.version_wid = V.ROW_WID

AND V.VERSION_LABEL in ('GRCH37.P8')

AND V.VERSION_TYPE = 'DNA'

)) rsq,

(

select gs1.gene_wid, gcc1.start_position, gcc1.end_position

from W_EHA_GENE_STRUCTURE gs1,

(

select v.structure_wid, (v.start_position+s.start_position-1) as start_position, (v.end_position+s.start_position-1) as end_position

from W_EHA_PROMOTER_REGION_V v, W_EHA_DNA_SOURCE s

where s.ROW_WID = v.SOURCE_WID

) gcc1

where gcc1.structure_wid = gs1.row_wid

union all

select gseg1.gene_wid, (gseg1.start_position + ds1.start_position-1) as start_position, (gseg1.end_position + ds1.start_position-1) as end_position

from W_EHA_GENE_SEGMENT gseg1, W_EHA_DNA_SOURCE ds1

where gseg1.source_wid = ds1.row_wid

) gco1,

W_EHA_VARIANT_V vv1

WHERE

rsq.VARIANT_WID = vv1.ROW_WID

and rsq.gene_wid = gco1.gene_wid

and rsq.start_position between gco1.start_position and gco1.end_position

and vv1.STATUS = 'KNOWN'

and vv1.STRAND in ('-','+')

) main1,

W_EHA_RSLT_SPECIMEN spec1

where spec1.ROW_WID = main1.SPECIMEN_WID

AND SPEC1.SPEC_DATASRC_WID = 1) ODBQ3,

(select spec2.SPECIMEN_WID

from

(

select rnaseq2.SPECIMEN_WID

from

(

select *

from

(

select rs2.RESULT_SPEC_WID as SPECIMEN_WID, rs2.*

from W_EHA_RSLT_RNA_SEQ_CHR rs2

WHERE

RS2.GENE_WID IN (71499,14775)

AND RS2.VERSION_WID IN (8,18)

)) rnaseq2,

(

select gs2.gene_wid, gcc2.start_position, gcc2.end_position

from W_EHA_GENE_STRUCTURE gs2,

(

select gc2.structure_wid, (gcs2.start_position + ds2.start_position-1) as start_position, (gcs2.end_position + ds2.start_position-1) as end_position

from W_EHA_GENE_COMPONENT gc2, W_EHA_GENE_COMP_SEGMENT gcs2, W_EHA_DNA_SOURCE ds2

where gc2.component_type in ('CDS','mRNA','miscRNA','exon')

and gc2.row_wid = gcs2.gene_component_wid

and gcs2.source_wid = ds2.row_wid

) gcc2

where gcc2.structure_wid = gs2.row_wid

) gco2

WHERE

rnaseq2.gene_wid = gco2.gene_wid

and (gco2.end_position >= rnaseq2.start_position

and gco2.start_position <= rnaseq2.end_position)

and rnaseq2.RAW_COUNTS < 15.0

and rnaseq2.RPKM > 0.25

and rnaseq2.STRAND in ('-','+')

) main2,

W_EHA_RSLT_SPECIMEN spec2

WHERE SPEC2.ROW_WID = MAIN2.SPECIMEN_WID

AND SPEC2.SPEC_DATASRC_WID = 1) ODBQ4

WHERE

(ODBQ3.SPECIMEN_WID = 21284 /*-- SpecimenID for Mutation results of a patient*/

AND ODBQ4.SPECIMEN_WID = 20384); /*-- SpecimenId with Gene Expression results of the same patient.*/