6 Use Case Examples

This chapter lists use cases for Oracle Health Sciences Omics Data Bank. It contains the following topics:

Overview of Use Cases
Use Cases Accompanied by Query Examples

6.1 Overview of Use Cases

This section contains the following use case scenarios:

Scenario 1—Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.
Scenario 2—Show expression level of TP53 mutant by cancer tissue.
Scenario 3—Ability to query subjects for established molecular tests: for example, the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Scenario 4—Ability to research a gene in the sample set.
Scenario 5—The researcher should be able to select a patient cohort based on the expression level for a set of genes.
Scenario 6—Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm).
Scenario 7—I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Scenario 8—What is the frequency of co-mutation of two genes in a data set?
Scenario 9—Show me all patients whose cancer cells had a deletion in gene X.
Scenario 10—Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).
Scenario 11—Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.

Note:

To run some of the use case queries, you need to create global temporary tables.

6.2 Use Cases Accompanied by Query Examples

6.2.1 Scenario 1

Use Case - Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.

Areas

Variant
Gene Annotation
Test and Results
Drug Info

Output queries tables from - ODB+CDM

Query

SELECT VRT.RESULT_SPEC_WID, CH.CHROMOSOME, VRT.START_POSITION, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING vrt, w_eha_chromosome ch,

(

SELECT V.ROW_WID, SG.GENE_WID, V.REPLACE_TAG FROM

W_EHA_VARIANT v,

(

SELECT GSG.SOURCE_WID, GSG.GENE_WID, (GSG.START_POSITION - PP.PROMOTER_OFFSET) as START_POSITION,

GSG.START_POSITION AS END_POSITION

FROM W_EHA_GENE G, W_EHA_GENE_SEGMENT GSG, W_EHA_PRODUCT_PROFILE PP

WHERE g.HUGO_NAME IN ('BRCA2') — Give the approved HUGO symbols of target genes here.

AND GSG.GENE_WID = G.ROW_WID

)SG

WHERE V.SOURCE_WID = SG.SOURCE_WID

AND (V.START_POSITION BETWEEN SG.START_POSITION AND SG.END_POSITION)

)vi, w_eha_rslt_study

WHERE VRT.VARIANT_WID = VI.ROW_WID

AND VRT.GENE_WID = VI.GENE_WID

AND W_EHA_RSLT_STUDY.RESULT_STUDY_NAME = 'STUDY1'

AND VRT.RESULT_STUDY_WID = W_EHA_RSLT_STUDY.ROW_WID

AND VRT.RESULT_SPEC_WID in (250); — Select a list of specimen of patients who are poor respondents of Drug A to test for mutation.

6.2.2 Scenario 2

Use Case - Show expression level of TP53 mutant by cancer tissue.

Areas

Variant
Gene Annotation
Gene Expression
Biospecimen Data

Output queries tables from - ODB+CDM

Query 2 -

SELECT RSLT_EXPR.RESULT_SPEC_WID, RSLT_EXPR.INTENSITY, RSLT_SEQ.START_POSITION, RSLT_SEQ.REPLACE_TAG

FROM (SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, V.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_GENE G, W_EHA_VARIANT V

WHERE VRT.VARIANT_WID = V.ROW_WID

AND VRT.GENE_WID = G.ROW_WID

AND

G.HUGO_NAME IN ('TP53') — Give the Approved HUGO SYMBOL of target genes here.

) RSLT_SEQ,

(SELECT R_EXP.RESULT_SPEC_WID, R_EXP.INTENSITY

FROM W_EHA_RSLT_GENE_EXP R_EXP

WHERE R_EXP.GENE_WID IN

(SELECT G.ROW_WID

FROM W_EHA_GENE G

WHERE G.HUGO_NAME IN ('TP53')) — Give the Approved HUGO SYMBOL of targeted genes.

) RSLT_EXPR

WHERE (RSLT_EXPR.RESULT_SPEC_WID = 1 AND RSLT_SEQ.RESULT_SPEC_WID = 2);

6.2.3 Scenario 3

Use Case - Ability to query subjects for established molecular tests, for example the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.

Areas

Variant
Chromosomal Rearrangement
Gene Annotation

Output queries tables from ODB+CDM

Query -

SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G,

W_EHA_RSLT_SPECIMEN SP

WHERE VRT.GENE_WID = G.ROW_WID

AND g.HUGO_NAME IN ('KRAS') — Give the Approved HUGO SYMBOL of target genes here.

AND VRT.VARIANT_WID = VI.ROW_WID

AND VRT.RESULT_SPEC_WID = SP.ROW_WID

AND (SP.SPECIMEN_NUMBER in ('TCGA-02-0001-XXX-XXX')); — Give a target list of specimens here.

6.2.4 Scenario 4

Use Case - Ability to research a gene in the sample set.

Areas

Gene Annotation

Output queries tables from - ODB: REFERENCE + RESULT

Query -

SELECT VRT.RESULT_SPEC_WID, G.HUGO_NAME, VRT.START_POSITION, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G

WHERE VRT.VARIANT_WID = VI.ROW_WID

AND VRT.GENE_WID = G.ROW_WID

AND g.HUGO_NAME IN ('BRCA2');

6.2.5 Scenario 5

Use Case - The researcher should be able to select a patient cohort based on the expression level for a set of genes.

Areas

Cancer Diagnosis
Demographic
Biospecimen Data
QC Data
Gene Annotation
Expression

Output queries tables from - ODB+CDM

Query -

SELECT R_EXP2.RESULT_SPEC_WID, G.HUGO_NAME, R_EXP2.INTENSITY

from

W_EHA_RSLT_GENE_EXP r_exp2, W_EHA_GENE G,

(SELECT AVG(R_EXP1.INTENSITY) EXP_AVG

FROM W_EHA_RSLT_GENE_EXP R_EXP1

WHERE R_EXP1.GENE_WID IN

(SELECT G1.ROW_WID

FROM W_EHA_GENE G1

WHERE G1.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP'))

) INTENSITY

WHERE R_EXP2.GENE_WID = G.ROW_WID

AND G.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP')

AND R_EXP2.INTENSITY > INTENSITY.EXP_AVG

ORDER BY R_EXP2.RESULT_SPEC_WID;

6.2.6 Scenario 6

Use Case - Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm)

Areas

Variant
Gene Annotation

Output queries tables from - ODB: REFERENCE

Query -

SELECT VG.REFERENCE_ID, VG.CODE_TYPE, VG.PREDICTION_SCORE, VG.CODE, VG.ABSOLUTE_POSITION, VG.CHROMOSOME, VG.REPLACE_TAG, PI.ACCESSION, P.AMINO_ACID_SEQUENCE

FROM W_EHA_GENE_COMPONENT GC, W_EHA_PROTEIN P, W_EHA_PROT_INFO PI,(

SELECT GCS.GENE_COMPONENT_WID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG, VP.PREDICTION_SCORE, PC.CODE, VP.STRUCTURE_WID, PC.CODE_TYPE

FROM W_EHA_GENE_COMP_SEGMENT GCS, W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, W_EHA_VARIANT_PREDICTION VP, W_EHA_PREDICTION_CODE PC

WHERE VP.VARIANT_WID = V.ROW_WID

AND VP.PREDICTION_CODE_WID = PC.ROW_WID

AND VX.VARIANT_WID = V.ROW_WID

AND V.SOURCE_WID = GCS.SOURCE_WID

AND V.START_POSITION <= GCS.END_POSITION

AND GCS.START_POSITION <= V.END_POSITION

AND PC.CODE_TYPE IN ('SIFT', 'polyphen')

AND PC.CODE IN (deleterious, possibly damaging, probably damaging) —Filter by prediction code, confer W_EHA_PREDICTION_CODE table

AND VP.PREDICTION_SCORE < '0.5' — Filter by prediction score

)VG

WHERE GC.ROW_WID = VG.GENE_COMPONENT_WID

AND GC.COMPONENT_TYPE = 'CDS'

AND GC.PROTEIN_WID = P.ROW_WID

AND PI.PROTEIN_WID = P.ROW_WID

AND VG.STRUCTURE_WID = GC.STRUCTURE_WID;

6.2.7 Scenario 7

Use Case - I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?

Areas

Variant
Gene Annotation
Pathway Annotation

Output queries tables from - ODB: REFERENCE + RESULT

Query

SELECT S.RESULT_STUDY_NAME, VRT.RESULT_SPEC_WID, VI.PATHWAY_NAME, VI.PATHWAY_SOURCE_ID, VI.HUGO_NAME AS GENE_SYMBOL, VI.REFERENCE_ID, VI.ABSOLUTE_POSITION, VI.CHROMOSOME, VI.REPLACE_TAG

FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_RSLT_STUDY S, (

SELECT V.ROW_WID, GS.GENE_WID, GS.HUGO_NAME, GS.PATHWAY_NAME, GS.PATHWAY_SOURCE_ID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG

FROM W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, (

SELECT G.ROW_WID AS GENE_WID, G.HUGO_NAME, P.PATHWAY_NAME, P.PATHWAY_SOURCE_ID, GSG.START_POSITION, GSG.END_POSITION, GSG.SOURCE_WID

FROM W_EHA_GENE_SEGMENT GSG, W_EHA_GENE G,

(SELECT DISTINCT PH.PATHWAY_NAME, PH.PATHWAY_SOURCE_ID,PP.HUGO_SYMBOL

FROM W_EHA_PATHWAY_PROTEIN PP, W_EHA_PATHWAY PH

WHERE PP.PATHWAY_WID = PH.ROW_WID

AND PATHWAY_NAME LIKE ('%thyroid hormone%'))P — Select either a specific KEGG pathway or search for pathway name keywords. For example, All Thyroid hormone specific pathways.

WHERE G.HUGO_NAME = P.HUGO_SYMBOL

AND GSG.GENE_WID = G.ROW_WID

)GS

WHERE V.SOURCE_WID = GS.SOURCE_WID

AND V.START_POSITION <= GS.END_POSITION

AND GS.START_POSITION <= V.END_POSITION

AND VX.VARIANT_WID = V.ROW_WID

) VI

WHERE VRT.VARIANT_WID = VI.ROW_WID

AND VRT.GENE_WID = VI.GENE_WID

AND VRT.RESULT_STUDY_WID = S.ROW_WID;

6.2.8 Scenario 8

Use Case - What is the frequency of co-mutation of two genes in a data set?

Areas

Variant

Output queries tables from - ODB: REFERENCE + RESULT

Query -

select count(a_and_b.result_spec_wid) concurrent_cnt

FROM

(SELECT DISTINCT VRT.RESULT_SPEC_WID

FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g

WHERE VRT.GENE_WID = G.ROW_WID

AND g.HUGO_NAME IN ('KRAS')

intersect

SELECT DISTINCT VRT.RESULT_SPEC_WID

FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g

WHERE VRT.GENE_WID = G.ROW_WID

AND G.HUGO_NAME IN ('PTEN')

) a_and_b;

/* count all specimen with seq result */

SELECT COUNT(DISTINCT VRT_ALL.RESULT_SPEC_WID) CNT FROM W_EHA_RSLT_SEQUENCING VRT_ALL;

6.2.9 Scenario 9

Use Case - Show me all patients whose cancer cells had a deletion in gene X.

Areas

Output queries tables from ODB+CDM

Query

SELECT COUNT(*) FROM (

SELECT DISTINCT R_CNV.RESULT_SPEC_WID

From W_EHA_GENE G,

(SELECT R_CNV1.*

FROM W_EHA_RSLT_COPY_NBR_VAR R_CNV1

WHERE R_CNV1.RESULT_SPEC_WID BETWEEN (200 +ROUND(DBMS_RANDOM.VALUE(1,3))) AND (200 +ROUND(DBMS_RANDOM.VALUE(5,12)))—restrict to a subset of the specimen list. Here a random list is taken.

AND R_CNV1.CALLED_CNV_TYPE = 'gain') R_CNV — Ensure W_EHA_RSLT_COPY_NBR_VAR column; called_cnv_type, is updated with type data.

WHERE R_CNV.GENE_WID = G.ROW_WID

AND G.HUGO_NAME in ('KRAS'));— Give gene symbol of gene X.

6.2.10 Scenario 10

Use Case - Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).

Areas

Output queries tables from

Query

select spec1.SPECIMEN_WID

From

(

select rnv.SPECIMEN_WID

from

(

select *

from

(

SELECT RS1.RESULT_SPEC_WID AS SPECIMEN_WID, RS1.*

FROM W_EHA_RSLT_NON_VARIANT RS1, W_EHA_VERSION V

WHERE

RS1.VERSION_WID = V.ROW_WID

AND V.VERSION_LABEL IN ('GRCH37.P8')

AND V.VERSION_TYPE = 'DNA'

AND RS1.ZYGOSITY IN ('hom-ref')

)

) rnv,

W_EHA_VARIANT_V VV1, W_EHA_VARIANT_XREF VX

WHERE

RNV.CHROMOSOME_WID = VV1.CHROMOSOME_WID

AND VX.VARIANT_WID = VV1.ROW_WID

AND VX.REFERENCE_ID = 'rs4733908'

and vv1.STATUS = 'KNOWN'

AND VV1.STRAND IN ('-','+')

AND VV1.ABSOLUTE_POSITION BETWEEN RNV.START_POSITION AND RNV.END_POSITION

) main1,

W_EHA_RSLT_SPECIMEN spec1

WHERE SPEC1.ROW_WID = MAIN1.SPECIMEN_WID

and spec1.SPEC_DATASRC_WID = 1;

6.2.11 Scenario 11

Use Case - Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.

Areas

Output queries tables from ODB

Query

SELECT DISTINCT ODBQ3.SPECIMEN_WID VARIANT_SPECIMEN, ODBQ4.SPECIMEN_WID RNASEQ_SPECIMEN

FROM

(

select spec1.SPECIMEN_WID

from

(

select rsq.SPECIMEN_WID

from

(

select *

from

(

select rs1.RESULT_SPEC_WID as SPECIMEN_WID, rs1.*

from W_EHA_RSLT_SEQUENCING rs1, W_EHA_GENE g, W_EHA_VERSION V

WHERE

RS1.GENE_WID = g.ROW_WID

AND g.HUGO_NAME IN ('ADAM32')

AND RS1.version_wid = V.ROW_WID

AND V.VERSION_LABELin ('GRCH37.P8')

AND V.VERSION_TYPE = 'DNA'

)) rsq,

(

select gs1.gene_wid, gcc1.start_position, gcc1.end_position

from W_EHA_GENE_STRUCTURE gs1,

(

select v.structure_wid, (v.start_position+s.start_position-1) as start_position, (v.end_position+s.start_position-1) as end_position

from W_EHA_PROMOTER_REGION_V v, W_EHA_DNA_SOURCE s

where s.ROW_WID = v.SOURCE_WID

) gcc1

where gcc1.structure_wid = gs1.row_wid

union all

select gseg1.gene_wid, (gseg1.start_position + ds1.start_position-1) as start_position, (gseg1.end_position + ds1.start_position-1) as end_position

from W_EHA_GENE_SEGMENT gseg1, W_EHA_DNA_SOURCE ds1

where gseg1.source_wid = ds1.row_wid

) gco1,

W_EHA_VARIANT_V vv1

WHERE

rsq.VARIANT_WID = vv1.ROW_WID

and rsq.gene_wid = gco1.gene_wid

and rsq.start_position between gco1.start_position and gco1.end_position

and vv1.STATUS = 'KNOWN'

and vv1.STRAND in ('-','+')

) main1,

W_EHA_RSLT_SPECIMEN spec1

where spec1.ROW_WID = main1.SPECIMEN_WID

AND SPEC1.SPEC_DATASRC_WID = 1) ODBQ3,

(select spec2.SPECIMEN_WID

from

(

select rnaseq2.SPECIMEN_WID

from

(

select *

from

(

select rs2.RESULT_SPEC_WID as SPECIMEN_WID, rs2.*

from W_EHA_RSLT_RNA_SEQ rs2

WHERE

RS2.GENE_WID IN (71499,14775)

AND RS2.VERSION_WID IN (8,18)

)) rnaseq2,

(

select gs2.gene_wid, gcc2.start_position, gcc2.end_position

from W_EHA_GENE_STRUCTURE gs2,

(

select gc2.structure_wid, (gcs2.start_position + ds2.start_position-1) as start_position, (gcs2.end_position + ds2.start_position-1) as end_position

from W_EHA_GENE_COMPONENT gc2, W_EHA_GENE_COMP_SEGMENT gcs2, W_EHA_DNA_SOURCE ds2

where gc2.component_type in ('CDS','mRNA','miscRNA','exon')

and gc2.row_wid = gcs2.gene_component_wid

and gcs2.source_wid = ds2.row_wid

) gcc2

where gcc2.structure_wid = gs2.row_wid

) gco2

WHERE

rnaseq2.gene_wid = gco2.gene_wid

and (gco2.end_position >= rnaseq2.start_position

and gco2.start_position <= rnaseq2.end_position)

and rnaseq2.RAW_COUNTS < 15.0

and rnaseq2.RPKM > 0.25

and rnaseq2.STRAND in ('-','+')

) main2,

W_EHA_RSLT_SPECIMEN spec2

WHERE SPEC2.ROW_WID = MAIN2.SPECIMEN_WID

AND SPEC2.SPEC_DATASRC_WID = 1) ODBQ4

WHERE

(ODBQ3.SPECIMEN_WID = 21284 -- SpecimenID for Mutation results of a patient

AND ODBQ4.SPECIMEN_WID = 20384); -- SpecimenId with Gene Expression results of the same patient