Oracle® Health Sciences Omics Data Bank Programmer's Guide Release 3.0.2.1 E35680-12 |
|
|
PDF · Mobi · ePub |
This chapter lists use cases for Oracle Health Sciences Omics Data Bank. It contains the following topics:
This section contains the following use case scenarios:
Scenario 1—Find patients who are poor responders for drug A and have a mutation in the promoter region of gene A.
Scenario 2—Show expression level of TP53 mutant by cancer tissue.
Scenario 3—Ability to query subjects for established molecular tests. For example, the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Scenario 4—Ability to research a gene in the sample set.
Scenario 5—Enable researcher to select a patient cohort based on the expression level for a set of genes.
Scenario 6—Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm).
Scenario 7—I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Scenario 8—What is the frequency of co-mutation of two genes in a data set?
Scenario 9—Display all patients whose cancer cells had a deletion in gene X.
Scenario 10—Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).
Scenario 11—Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.
Note:
To run some of the use case queries, you have to create global temporary tables.Use Case - Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.
Areas
Variant
Gene Annotation
Test and Results
Drug Info
Output queries tables from - ODB+CDM
Query
SELECT VRT.RESULT_SPEC_WID, CH.CHROMOSOME, VRT.START_POSITION, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING vrt, w_eha_chromosome ch,
(
SELECT V.ROW_WID, SG.GENE_WID, V.REPLACE_TAG FROM
W_EHA_VARIANT v,
(
SELECT GSG.SOURCE_WID, GSG.GENE_WID, (GSG.START_POSITION - PP.PROMOTER_OFFSET) as START_POSITION,
GSG.START_POSITION AS END_POSITION
FROM W_EHA_GENE G, W_EHA_GENE_SEGMENT GSG, W_EHA_PRODUCT_PROFILE PP
WHERE g.HUGO_NAME IN ('BRCA2')
/*-- Enter the approved HUGO symbols of target genes here.*/
AND GSG.GENE_WID = G.ROW_WID
)SG
WHERE V.SOURCE_WID = SG.SOURCE_WID
AND (V.START_POSITION BETWEEN SG.START_POSITION AND SG.END_POSITION)
)vi, w_eha_rslt_study
WHERE VRT.VARIANT_WID = VI.ROW_WID
AND VRT.GENE_WID = VI.GENE_WID
AND W_EHA_RSLT_STUDY.RESULT_STUDY_NAME = 'STUDY1'
AND VRT.RESULT_STUDY_WID = W_EHA_RSLT_STUDY.ROW_WID
AND VRT.RESULT_SPEC_WID in (250);
/*-- Select a list of specimen patients who are poor respondents of Drug A to test for mutation.*/
Use Case - Show expression level of TP53 mutant by cancer tissue.
Areas
Variant
Gene Annotation
Gene Expression
Biospecimen Data
Output queries tables from - ODB+CDM
Query 2 -
SELECT RSLT_EXPR.RESULT_SPEC_WID, RSLT_EXPR.INTENSITY, RSLT_SEQ.START_POSITION, RSLT_SEQ.REPLACE_TAG
FROM (SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, V.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_GENE G, W_EHA_VARIANT V
WHERE VRT.VARIANT_WID = V.ROW_WID
AND VRT.GENE_WID = G.ROW_WID
AND
G.HUGO_NAME IN ('TP53')
/*-- Enter the Approved HUGO SYMBOL of target genes here.*/
) RSLT_SEQ,
(SELECT R_EXP.RESULT_SPEC_WID, R_EXP.INTENSITY
FROM W_EHA_RSLT_GENE_EXP R_EXP
WHERE R_EXP.GENE_WID IN
(SELECT G.ROW_WID
FROM W_EHA_GENE G
WHERE G.HUGO_NAME IN ('TP53'))
/*-- Enter the Approved HUGO SYMBOL of targeted genes.*/
) RSLT_EXPR
WHERE (RSLT_EXPR.RESULT_SPEC_WID = 1 AND RSLT_SEQ.RESULT_SPEC_WID = 2);
Use Case - Ability to query subjects for established molecular tests, for example the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Areas
Variant
Chromosomal Rearrangement
Gene Annotation
Output queries tables from ODB+CDM
Query -
SELECT VRT.RESULT_SPEC_WID, VRT.START_POSITION, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G,
W_EHA_RSLT_SPECIMEN SP
WHERE VRT.GENE_WID = G.ROW_WID
AND g.HUGO_NAME IN ('KRAS')
/*-- Enter the Approved HUGO SYMBOL of target genes here.*/
AND VRT.VARIANT_WID = VI.ROW_WID
AND VRT.RESULT_SPEC_WID = SP.ROW_WID
AND (SP.SPECIMEN_NUMBER in ('TCGA-02-0001-XXX-XXX'));
/*-- Give a target list of specimens here.*/
Use Case - Ability to research a gene in the sample set.
Areas
Gene Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query -
SELECT VRT.RESULT_SPEC_WID, G.HUGO_NAME, VRT.START_POSITION, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_VARIANT VI, W_EHA_GENE G
WHERE VRT.VARIANT_WID = VI.ROW_WID
AND VRT.GENE_WID = G.ROW_WID
AND g.HUGO_NAME IN ('BRCA2');
Use Case - Enable researcher to select a patient cohort based on the expression level for a set of genes.
Areas
Cancer Diagnosis
Demographic
Biospecimen Data
QC Data
Gene Annotation
Expression
Output queries tables from - ODB+CDM
Query -
SELECT R_EXP2.RESULT_SPEC_WID, G.HUGO_NAME, R_EXP2.INTENSITY
from
W_EHA_RSLT_GENE_EXP r_exp2, W_EHA_GENE G,
(SELECT AVG(R_EXP1.INTENSITY) EXP_AVG
FROM W_EHA_RSLT_GENE_EXP R_EXP1
WHERE R_EXP1.GENE_WID IN
(SELECT G1.ROW_WID
FROM W_EHA_GENE G1
WHERE G1.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP'))
) INTENSITY
WHERE R_EXP2.GENE_WID = G.ROW_WID
AND G.HUGO_NAME IN ('TP53','BRCA1','GPR4','PABPC1','SOBP')
AND R_EXP2.INTENSITY > INTENSITY.EXP_AVG
ORDER BY R_EXP2.RESULT_SPEC_WID;
Use Case - Select mutation with deep functional annotation (for example, high impact based on PolyPhen algorithm)
Areas
Variant
Gene Annotation
Output queries tables from - ODB: REFERENCE
Query -
SELECT VG.REFERENCE_ID, VG.CODE_TYPE, VG.PREDICTION_SCORE, VG.CODE, VG.ABSOLUTE_POSITION, VG.CHROMOSOME, VG.REPLACE_TAG, PI.ACCESSION, P.AMINO_ACID_SEQUENCE
FROM W_EHA_GENE_COMPONENT GC, W_EHA_PROTEIN P, W_EHA_PROT_INFO PI,(
SELECT GCS.GENE_COMPONENT_WID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG, VP.PREDICTION_SCORE, PC.CODE, VP.STRUCTURE_WID, PC.CODE_TYPE
FROM W_EHA_GENE_COMP_SEGMENT GCS, W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, W_EHA_VARIANT_PREDICTION VP, W_EHA_PREDICTION_CODE PC
WHERE VP.VARIANT_WID = V.ROW_WID
AND VP.PREDICTION_CODE_WID = PC.ROW_WID
AND VX.VARIANT_WID = V.ROW_WID
AND V.SOURCE_WID = GCS.SOURCE_WID
AND V.START_POSITION <= GCS.END_POSITION
AND GCS.START_POSITION <= V.END_POSITION
AND PC.CODE_TYPE IN ('SIFT', 'polyphen')
AND PC.CODE IN
(deleterious, possibly damaging, probably damaging) /*-- Filter by prediction code, confer W_EHA_PREDICTION_CODE table.*/
AND VP.PREDICTION_SCORE < '0.5'
/*-- Filter by prediction score*/
)VG
WHERE GC.ROW_WID = VG.GENE_COMPONENT_WID
AND GC.COMPONENT_TYPE = 'CDS'
AND GC.PROTEIN_WID = P.ROW_WID
AND PI.PROTEIN_WID = P.ROW_WID
AND VG.STRUCTURE_WID = GC.STRUCTURE_WID;
Use Case - I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Areas
Variant
Gene Annotation
Pathway Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query
SELECT S.RESULT_STUDY_NAME, VRT.RESULT_SPEC_WID, VI.PATHWAY_NAME, VI.PATHWAY_SOURCE_ID, VI.HUGO_NAME AS GENE_SYMBOL, VI.REFERENCE_ID, VI.ABSOLUTE_POSITION, VI.CHROMOSOME, VI.REPLACE_TAG
FROM W_EHA_RSLT_SEQUENCING VRT, W_EHA_RSLT_STUDY S, (
SELECT V.ROW_WID, GS.GENE_WID, GS.HUGO_NAME, GS.PATHWAY_NAME, GS.PATHWAY_SOURCE_ID, VX.REFERENCE_ID, V.ABSOLUTE_POSITION, V.CHROMOSOME, V.REPLACE_TAG
FROM W_EHA_VARIANT V, W_EHA_VARIANT_XREF VX, (
SELECT G.ROW_WID AS GENE_WID, G.HUGO_NAME, P.PATHWAY_NAME, P.PATHWAY_SOURCE_ID, GSG.START_POSITION, GSG.END_POSITION, GSG.SOURCE_WID
FROM W_EHA_GENE_SEGMENT GSG, W_EHA_GENE G
, (
SELECT DISTINCT PH.PATHWAY_NAME, PH.PATHWAY_SOURCE_ID,PP.HUGO_SYMBOL
FROM W_EHA_PATHWAY_PROTEIN PP, W_EHA_PATHWAY PH
WHERE PP.PATHWAY_WID = PH.ROW_WID
AND PATHWAY_NAME LIKE
('%thyroid hormone%')
)P
/*-- Select either a specific KEGG pathway or search for pathway name keywords. For example, All Thyroid hormone specific pathways.*/
WHERE G.HUGO_NAME = P.HUGO_SYMBOL
AND GSG.GENE_WID = G.ROW_WID
)GS
WHERE V.SOURCE_WID = GS.SOURCE_WID
AND V.START_POSITION <= GS.END_POSITION
AND GS.START_POSITION <= V.END_POSITION
AND VX.VARIANT_WID = V.ROW_WID
) VI
WHERE VRT.VARIANT_WID = VI.ROW_WID
AND VRT.GENE_WID = VI.GENE_WID
AND VRT.RESULT_STUDY_WID = S.ROW_WID;
Use Case - What is the frequency of co-mutation of two genes in a data set?
Areas
Variant
Output queries tables from - ODB: REFERENCE + RESULT
Query -
select count(a_and_b.result_spec_wid) concurrent_cnt
FROM
(SELECT DISTINCT VRT.RESULT_SPEC_WID
FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g
WHERE VRT.GENE_WID = G.ROW_WID
AND g.HUGO_NAME IN ('KRAS')
intersect
SELECT DISTINCT VRT.RESULT_SPEC_WID
FROM W_EHA_RSLT_SEQUENCING vrt, W_EHA_GENE g
WHERE VRT.GENE_WID = G.ROW_WID
AND G.HUGO_NAME IN ('PTEN')
) a_and_b;
/* count all specimen with seq result */
SELECT COUNT(DISTINCT VRT_ALL.RESULT_SPEC_WID) CNT FROM W_EHA_RSLT_SEQUENCING VRT_ALL;
Use Case - Display all patients whose cancer cells had a deletion in gene X.
Areas
Output queries tables from ODB+CDM
Query
SELECT COUNT(*) FROM (
SELECT DISTINCT R_CNV.RESULT_SPEC_WID
From W_EHA_GENE G,
(SELECT R_CNV1.*
FROM W_EHA_RSLT_COPY_NBR_VAR R_CNV1
WHERE R_CNV1.RESULT_SPEC_WID BETWEEN (200 +ROUND(DBMS_RANDOM.VALUE(1,3))) AND (200 +ROUND(DBMS_RANDOM.VALUE(5,12)))
/*-- Restrict to a subset of the specimen list. A random list is taken here.*/
AND R_CNV1.CALLED_CNV_TYPE = 'gain') R_CNV
/*-- Ensure W_EHA_RSLT_COPY_NBR_VAR column; called_cnv_type, is updated with type data.*/
WHERE R_CNV.GENE_WID = G.ROW_WID
AND G.HUGO_NAME in ('KRAS'));
/*-- Enter gene symbol of gene X.*/
Use Case - Find specimens with homozygous non-variants at the specified location (for example, rs12345 or chr1:13434).
Areas
Output queries tables from
Query
select spec1.SPECIMEN_WID
From
(
select rnv.SPECIMEN_WID
from
(
select *
from
(
SELECT RS1.RESULT_SPEC_WID AS SPECIMEN_WID, RS1.*
FROM W_EHA_RSLT_NON_VARIANT_CHR RS1, W_EHA_VERSION V
WHERE
RS1.VERSION_WID = V.ROW_WID
AND V.VERSION_LABEL IN ('GRCH37.P8')
AND V.VERSION_TYPE = 'DNA'
AND RS1.ZYGOSITY IN ('hom-ref')
)
) rnv,
W_EHA_VARIANT_V VV1, W_EHA_VARIANT_XREF VX
WHERE
RNV.CHROMOSOME_WID = VV1.CHROMOSOME_WID
AND VX.VARIANT_WID = VV1.ROW_WID
AND VX.REFERENCE_ID = 'rs4733908'
and vv1.STATUS = 'KNOWN'
AND VV1.STRAND IN ('-','+')
AND VV1.ABSOLUTE_POSITION BETWEEN RNV.START_POSITION AND RNV.END_POSITION
) main1,
W_EHA_RSLT_SPECIMEN spec1
WHERE SPEC1.ROW_WID = MAIN1.SPECIMEN_WID
and spec1.SPEC_DATASRC_WID = 1;
Use Case - Identify samples that have unacceptably low percentage of on-target reads, and exons that fall below threshold read depth. Filter variants with sufficient coverage and include only those that fall within a target region.
Areas
Output queries tables from ODB
Query
SELECT DISTINCT ODBQ3.SPECIMEN_WID VARIANT_SPECIMEN, ODBQ4.SPECIMEN_WID RNASEQ_SPECIMEN
FROM
(
select spec1.SPECIMEN_WID
from
(
select rsq.SPECIMEN_WID
from
(
select *
from
(
select rs1.RESULT_SPEC_WID as SPECIMEN_WID, rs1.*
from W_EHA_RSLT_SEQUENCING_CHR rs1, W_EHA_GENE g, W_EHA_VERSION V
WHERE
RS1.GENE_WID = g.ROW_WID
AND g.HUGO_NAME i ('ADAM32')
AND RS1.version_wid = V.ROW_WID
AND V.VERSION_LABEL in ('GRCH37.P8')
AND V.VERSION_TYPE = 'DNA'
)) rsq,
(
select gs1.gene_wid, gcc1.start_position, gcc1.end_position
from W_EHA_GENE_STRUCTURE gs1,
(
select v.structure_wid, (v.start_position+s.start_position-1) as start_position, (v.end_position+s.start_position-1) as end_position
from W_EHA_PROMOTER_REGION_V v, W_EHA_DNA_SOURCE s
where s.ROW_WID = v.SOURCE_WID
) gcc1
where gcc1.structure_wid = gs1.row_wid
union all
select gseg1.gene_wid, (gseg1.start_position + ds1.start_position-1) as start_position, (gseg1.end_position + ds1.start_position-1) as end_position
from W_EHA_GENE_SEGMENT gseg1, W_EHA_DNA_SOURCE ds1
where gseg1.source_wid = ds1.row_wid
) gco1,
W_EHA_VARIANT_V vv1
WHERE
rsq.VARIANT_WID = vv1.ROW_WID
and rsq.gene_wid = gco1.gene_wid
and rsq.start_position between gco1.start_position and gco1.end_position
and vv1.STATUS = 'KNOWN'
and vv1.STRAND in ('-','+')
) main1,
W_EHA_RSLT_SPECIMEN spec1
where spec1.ROW_WID = main1.SPECIMEN_WID
AND SPEC1.SPEC_DATASRC_WID = 1) ODBQ3,
(select spec2.SPECIMEN_WID
from
(
select rnaseq2.SPECIMEN_WID
from
(
select *
from
(
select rs2.RESULT_SPEC_WID as SPECIMEN_WID, rs2.*
from W_EHA_RSLT_RNA_SEQ_CHR rs2
WHERE
RS2.GENE_WID IN (71499,14775)
AND RS2.VERSION_WID IN (8,18)
)) rnaseq2,
(
select gs2.gene_wid, gcc2.start_position, gcc2.end_position
from W_EHA_GENE_STRUCTURE gs2,
(
select gc2.structure_wid, (gcs2.start_position + ds2.start_position-1) as start_position, (gcs2.end_position + ds2.start_position-1) as end_position
from W_EHA_GENE_COMPONENT gc2, W_EHA_GENE_COMP_SEGMENT gcs2, W_EHA_DNA_SOURCE ds2
where gc2.component_type in ('CDS','mRNA','miscRNA','exon')
and gc2.row_wid = gcs2.gene_component_wid
and gcs2.source_wid = ds2.row_wid
) gcc2
where gcc2.structure_wid = gs2.row_wid
) gco2
WHERE
rnaseq2.gene_wid = gco2.gene_wid
and (gco2.end_position >= rnaseq2.start_position
and gco2.start_position <= rnaseq2.end_position)
and rnaseq2.RAW_COUNTS < 15.0
and rnaseq2.RPKM > 0.25
and rnaseq2.STRAND in ('-','+')
) main2,
W_EHA_RSLT_SPECIMEN spec2
WHERE SPEC2.ROW_WID = MAIN2.SPECIMEN_WID
AND SPEC2.SPEC_DATASRC_WID = 1) ODBQ4
WHERE
(ODBQ3.SPECIMEN_WID = 21284 /*-- SpecimenID for Mutation results of a patient*/
AND ODBQ4.SPECIMEN_WID = 20384);
/*-- SpecimenId with Gene Expression results of the same patient.*/