Oracle® Health Sciences Omics Data Bank Programmer's Guide Release 2.5 E35680-04 |
|
|
PDF · Mobi · ePub |
This chapter lists use cases for Oracle Health Sciences Omics Data Bank. It contains the following topics:
This section contains the following use case scenarios:
Scenario 1—Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.
Scenario 2—Show expression level of TP53 mutant by cancer tissue.
Scenario 3—Ability to query subjects for established molecular tests: for example, the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Scenario 4—Ability to research a gene in the sample set.
Scenario 5—The researcher should be able to select a patient cohort based on the expression level for a set of genes.
Scenario 6—Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm).
Scenario 7—I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Scenario 8—What is the frequency of co-mutation of two genes in a data set?
Scenario 9—Show me all patients whose cancer cells had a deletion in gene X.
Scenario 10—Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).
Scenario 11—Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.
Note:
To run some of the use case queries, you need to create global temporary tables.Use Case - Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.
Areas
Variant
Gene Annotation
Test and Results
Drug Info
Output queries tables from - ODB+CDM
Query
SELECT VRT.RESULT_SPEC_WID, CH.CHROMOSOME, VRT.START_POSITION, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING vrt, w_eha_chromosome ch,
(
SELECT V.ROW_WID, SG.GENE_WID, V.REPLACE_TAG FROM
W_EHA_VARIANT v,
(
SELECT GSG.SOURCE_WID, GSG.GENE_WID, (GSG.START_POSITION - PP.PROMOTER_OFFSET) as START_POSITION,
GSG.START_POSITION AS END_POSITION
FROM W_EHA_GENE G, W_EHA_GENE_SEGMENT GSG, W_EHA_PRODUCT_PROFILE PP
WHERE g.HUGO_NAME IN ('BRCA2')
— Give the approved HUGO symbols of target genes here.
AND GSG.GENE_WID = G.ROW_WID
)SG
WHERE V.SOURCE_WID = SG.SOURCE_WID
AND (V.START_POSITION BETWEEN SG.START_POSITION AND SG.END_POSITION)
)vi, w_eha_rslt_study
WHERE VRT.VARIANT_WID = VI.ROW_WID
AND VRT.GENE_WID = VI.GENE_WID
AND W_EHA_RSLT_STUDY.RESULT_STUDY_NAME = 'STUDY1'
AND VRT.RESULT_STUDY_WID = W_EHA_RSLT_STUDY.ROW_WID
AND VRT.RESULT_SPEC_WID in (250);
— Select a list of specimen of patients who are poor respondents of Drug A to test for mutation.
Use Case - Show expression level of TP53 mutant by cancer tissue.
Areas
Variant
Gene Annotation
Gene Expression
Biospecimen Data
Output queries tables from - ODB+CDM
Query 2 -
SELECT RSLT_EXPR.RESULT_SPEC_WID, RSLT_EXPR.INTENSITY, RSLT_SEQ.START_POSITION, RSLT_SEQ.REPLACE_TAG
FROM (SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, V.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_GENE G, W_EHA_VARIANT V
WHERE VRT.VARIANT_WID = V.ROW_WID
AND VRT.GENE_WID = G.ROW_WID
AND
G.HUGO_NAME IN ('TP53')
— Give the Approved HUGO SYMBOL of target genes here.
) RSLT_SEQ,
(SELECT R_EXP.RESULT_SPEC_WID, R_EXP.INTENSITY
FROM W_EHA_RSLT_GENE_EXP R_EXP
WHERE R_EXP.GENE_WID IN
(SELECT G.ROW_WID
FROM W_EHA_GENE G
WHERE G.HUGO_NAME IN ('TP53'))
— Give the Approved HUGO SYMBOL of targeted genes.
) RSLT_EXPR
WHERE (RSLT_EXPR.RESULT_SPEC_WID = 1 AND RSLT_SEQ.RESULT_SPEC_WID = 2);
Use Case - Ability to query subjects for established molecular tests, for example the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Areas
Variant
Chromosomal Rearrangement
Gene Annotation
Output queries tables from ODB+CDM
Query -
SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G,
W_EHA_RSLT_SPECIMEN SP
WHERE VRT.GENE_WID = G.ROW_WID
AND g.HUGO_NAME IN ('KRAS')
— Give the Approved HUGO SYMBOL of target genes here.
AND VRT.VARIANT_WID = VI.ROW_WID
AND VRT.RESULT_SPEC_WID = SP.ROW_WID
AND (SP.SPECIMEN_NUMBER in ('TCGA-02-0001-XXX-XXX'));
— Give a target list of specimens here.
Use Case - Ability to research a gene in the sample set.
Areas
Gene Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query -
SELECT VRT.RESULT_SPEC_WID, G.HUGO_NAME, VRT.START_POSITION, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G
WHERE VRT.VARIANT_WID = VI.ROW_WID
AND VRT.GENE_WID = G.ROW_WID
AND g.HUGO_NAME IN ('BRCA2');
Use Case - The researcher should be able to select a patient cohort based on the expression level for a set of genes.
Areas
Cancer Diagnosis
Demographic
Biospecimen Data
QC Data
Gene Annotation
Expression
Output queries tables from - ODB+CDM
Query -
SELECT R_EXP2.RESULT_SPEC_WID, G.HUGO_NAME, R_EXP2.INTENSITY
from
W_EHA_RSLT_GENE_EXP r_exp2, W_EHA_GENE G,
(SELECT AVG(R_EXP1.INTENSITY) EXP_AVG
FROM W_EHA_RSLT_GENE_EXP R_EXP1
WHERE R_EXP1.GENE_WID IN
(SELECT G1.ROW_WID
FROM W_EHA_GENE G1
WHERE G1.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP'))
) INTENSITY
WHERE R_EXP2.GENE_WID = G.ROW_WID
AND G.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP')
AND R_EXP2.INTENSITY > INTENSITY.EXP_AVG
ORDER BY R_EXP2.RESULT_SPEC_WID;
Use Case - Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm)
Areas
Variant
Gene Annotation
Output queries tables from - ODB: REFERENCE
Query -
SELECT VG.REFERENCE_ID, VG.CODE_TYPE, VG.PREDICTION_SCORE, VG.CODE, VG.ABSOLUTE_POSITION, VG.CHROMOSOME, VG.REPLACE_TAG, PI.ACCESSION, P.AMINO_ACID_SEQUENCE
FROM W_EHA_GENE_COMPONENT GC, W_EHA_PROTEIN P, W_EHA_PROT_INFO PI,(
SELECT GCS.GENE_COMPONENT_WID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG, VP.PREDICTION_SCORE, PC.CODE, VP.STRUCTURE_WID, PC.CODE_TYPE
FROM W_EHA_GENE_COMP_SEGMENT GCS, W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, W_EHA_VARIANT_PREDICTION VP, W_EHA_PREDICTION_CODE PC
WHERE VP.VARIANT_WID = V.ROW_WID
AND VP.PREDICTION_CODE_WID = PC.ROW_WID
AND VX.VARIANT_WID = V.ROW_WID
AND V.SOURCE_WID = GCS.SOURCE_WID
AND V.START_POSITION <= GCS.END_POSITION
AND GCS.START_POSITION <= V.END_POSITION
AND PC.CODE_TYPE IN ('SIFT', 'polyphen')
AND PC.CODE IN
(deleterious, possibly damaging, probably damaging) —Filter by prediction code, confer W_EHA_PREDICTION_CODE table
AND VP.PREDICTION_SCORE < '0.5'
— Filter by prediction score
)VG
WHERE GC.ROW_WID = VG.GENE_COMPONENT_WID
AND GC.COMPONENT_TYPE = 'CDS'
AND GC.PROTEIN_WID = P.ROW_WID
AND PI.PROTEIN_WID = P.ROW_WID
AND VG.STRUCTURE_WID = GC.STRUCTURE_WID;
Use Case - I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Areas
Variant
Gene Annotation
Pathway Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query
SELECT S.RESULT_STUDY_NAME, VRT.RESULT_SPEC_WID, VI.PATHWAY_NAME, VI.PATHWAY_SOURCE_ID, VI.HUGO_NAME AS GENE_SYMBOL, VI.REFERENCE_ID, VI.ABSOLUTE_POSITION, VI.CHROMOSOME, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_RSLT_STUDY S, (
SELECT V.ROW_WID, GS.GENE_WID, GS.HUGO_NAME, GS.PATHWAY_NAME, GS.PATHWAY_SOURCE_ID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG
FROM W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, (
SELECT G.ROW_WID AS GENE_WID, G.HUGO_NAME, P.PATHWAY_NAME, P.PATHWAY_SOURCE_ID, GSG.START_POSITION, GSG.END_POSITION, GSG.SOURCE_WID
FROM W_EHA_GENE_SEGMENT GSG, W_EHA_GENE G
,
(SELECT DISTINCT PH.PATHWAY_NAME, PH.PATHWAY_SOURCE_ID,PP.HUGO_SYMBOL
FROM W_EHA_PATHWAY_PROTEIN PP, W_EHA_PATHWAY PH
WHERE PP.PATHWAY_WID = PH.ROW_WID
AND PATHWAY_NAME LIKE
('%thyroid hormone%'))P — Select either a specific KEGG pathway or search for pathway name keywords. For example, All Thyroid hormone specific pathways.
WHERE G.HUGO_NAME = P.HUGO_SYMBOL
AND GSG.GENE_WID = G.ROW_WID
)GS
WHERE V.SOURCE_WID = GS.SOURCE_WID
AND V.START_POSITION <= GS.END_POSITION
AND GS.START_POSITION <= V.END_POSITION
AND VX.VARIANT_WID = V.ROW_WID
) VI
WHERE VRT.VARIANT_WID = VI.ROW_WID
AND VRT.GENE_WID = VI.GENE_WID
AND VRT.RESULT_STUDY_WID = S.ROW_WID;
Use Case - What is the frequency of co-mutation of two genes in a data set?
Areas
Variant
Output queries tables from - ODB: REFERENCE + RESULT
Query -
select count(a_and_b.result_spec_wid) concurrent_cnt
FROM
(SELECT DISTINCT VRT.RESULT_SPEC_WID
FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g
WHERE VRT.GENE_WID = G.ROW_WID
AND g.HUGO_NAME IN ('KRAS')
intersect
SELECT DISTINCT VRT.RESULT_SPEC_WID
FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g
WHERE VRT.GENE_WID = G.ROW_WID
AND G.HUGO_NAME IN ('PTEN')
) a_and_b;
/* count all specimen with seq result */
SELECT COUNT(DISTINCT VRT_ALL.RESULT_SPEC_WID) CNT FROM W_EHA_RSLT_SEQUENCING VRT_ALL;
Use Case - Show me all patients whose cancer cells had a deletion in gene X.
Areas
Output queries tables from ODB+CDM
Query
SELECT COUNT(*) FROM (
SELECT DISTINCT R_CNV.RESULT_SPEC_WID
From W_EHA_GENE G,
(SELECT R_CNV1.*
FROM W_EHA_RSLT_COPY_NBR_VAR R_CNV1
WHERE R_CNV1.RESULT_SPEC_WID BETWEEN (200 +ROUND(DBMS_RANDOM.VALUE(1,3))) AND (200 +ROUND(DBMS_RANDOM.VALUE(5,12)))
—restrict to a subset of the specimen list. Here a random list is taken.
AND R_CNV1.CALLED_CNV_TYPE = 'gain') R_CNV
— Ensure W_EHA_RSLT_COPY_NBR_VAR column; called_cnv_type, is updated with type data.
WHERE R_CNV.GENE_WID = G.ROW_WID
AND G.HUGO_NAME in ('KRAS'));
— Give gene symbol of gene X.
Use Case - Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).
Areas
Output queries tables from
Query
select spec1.SPECIMEN_WID
From
(
select rnv.SPECIMEN_WID
from
(
select *
from
(
SELECT RS1.RESULT_SPEC_WID AS SPECIMEN_WID, RS1.*
FROM W_EHA_RSLT_NON_VARIANT RS1, W_EHA_VERSION V
WHERE
RS1.VERSION_WID = V.ROW_WID
AND V.VERSION_LABEL IN ('GRCH37.P8')
AND V.VERSION_TYPE = 'DNA'
AND RS1.ZYGOSITY IN ('hom-ref')
)
) rnv,
W_EHA_VARIANT_V VV1, W_EHA_VARIANT_XREF VX
WHERE
RNV.CHROMOSOME_WID = VV1.CHROMOSOME_WID
AND VX.VARIANT_WID = VV1.ROW_WID
AND VX.REFERENCE_ID = 'rs4733908'
and vv1.STATUS = 'KNOWN'
AND VV1.STRAND IN ('-','+')
AND VV1.ABSOLUTE_POSITION BETWEEN RNV.START_POSITION AND RNV.END_POSITION
) main1,
W_EHA_RSLT_SPECIMEN spec1
WHERE SPEC1.ROW_WID = MAIN1.SPECIMEN_WID
and spec1.SPEC_DATASRC_WID = 1;
Use Case - Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.
Areas
Output queries tables from ODB
Query
SELECT DISTINCT ODBQ3.SPECIMEN_WID VARIANT_SPECIMEN, ODBQ4.SPECIMEN_WID RNASEQ_SPECIMEN
FROM
(
select spec1.SPECIMEN_WID
from
(
select rsq.SPECIMEN_WID
from
(
select *
from
(
select rs1.RESULT_SPEC_WID as SPECIMEN_WID, rs1.*
from W_EHA_RSLT_SEQUENCING rs1, W_EHA_GENE g, W_EHA_VERSION V
WHERE
RS1.GENE_WID = g.ROW_WID
AND g.HUGO_NAME IN ('ADAM32')
AND RS1.version_wid = V.ROW_WID
AND V.VERSION_LABELin ('GRCH37.P8')
AND V.VERSION_TYPE = 'DNA'
)) rsq,
(
select gs1.gene_wid, gcc1.start_position, gcc1.end_position
from W_EHA_GENE_STRUCTURE gs1,
(
select v.structure_wid, (v.start_position+s.start_position-1) as start_position, (v.end_position+s.start_position-1) as end_position
from W_EHA_PROMOTER_REGION_V v, W_EHA_DNA_SOURCE s
where s.ROW_WID = v.SOURCE_WID
) gcc1
where gcc1.structure_wid = gs1.row_wid
union all
select gseg1.gene_wid, (gseg1.start_position + ds1.start_position-1) as start_position, (gseg1.end_position + ds1.start_position-1) as end_position
from W_EHA_GENE_SEGMENT gseg1, W_EHA_DNA_SOURCE ds1
where gseg1.source_wid = ds1.row_wid
) gco1,
W_EHA_VARIANT_V vv1
WHERE
rsq.VARIANT_WID = vv1.ROW_WID
and rsq.gene_wid = gco1.gene_wid
and rsq.start_position between gco1.start_position and gco1.end_position
and vv1.STATUS = 'KNOWN'
and vv1.STRAND in ('-','+')
) main1,
W_EHA_RSLT_SPECIMEN spec1
where spec1.ROW_WID = main1.SPECIMEN_WID
AND SPEC1.SPEC_DATASRC_WID = 1) ODBQ3,
(select spec2.SPECIMEN_WID
from
(
select rnaseq2.SPECIMEN_WID
from
(
select *
from
(
select rs2.RESULT_SPEC_WID as SPECIMEN_WID, rs2.*
from W_EHA_RSLT_RNA_SEQ rs2
WHERE
RS2.GENE_WID IN (71499,14775)
AND RS2.VERSION_WID IN (8,18)
)) rnaseq2,
(
select gs2.gene_wid, gcc2.start_position, gcc2.end_position
from W_EHA_GENE_STRUCTURE gs2,
(
select gc2.structure_wid, (gcs2.start_position + ds2.start_position-1) as start_position, (gcs2.end_position + ds2.start_position-1) as end_position
from W_EHA_GENE_COMPONENT gc2, W_EHA_GENE_COMP_SEGMENT gcs2, W_EHA_DNA_SOURCE ds2
where gc2.component_type in ('CDS','mRNA','miscRNA','exon')
and gc2.row_wid = gcs2.gene_component_wid
and gcs2.source_wid = ds2.row_wid
) gcc2
where gcc2.structure_wid = gs2.row_wid
) gco2
WHERE
rnaseq2.gene_wid = gco2.gene_wid
and (gco2.end_position >= rnaseq2.start_position
and gco2.start_position <= rnaseq2.end_position)
and rnaseq2.RAW_COUNTS < 15.0
and rnaseq2.RPKM > 0.25
and rnaseq2.STRAND in ('-','+')
) main2,
W_EHA_RSLT_SPECIMEN spec2
WHERE SPEC2.ROW_WID = MAIN2.SPECIMEN_WID
AND SPEC2.SPEC_DATASRC_WID = 1) ODBQ4
WHERE
(ODBQ3.SPECIMEN_WID = 21284 -- SpecimenID for Mutation results of a patient
AND ODBQ4.SPECIMEN_WID = 20384); -- SpecimenId with Gene Expression results of the same patient