Oracle® Health Sciences Omics Data Bank Programmer's Guide Release 1.0.1 Part Number E27509-02 |
|
|
View PDF |
This chapter lists use case for Oracle Health Sciences Omics Data Bank. It contains the following topics:
This section contains the following use case scenarios:
Note:
To run some of the use case queries you need to create global temporary tables. For more information, refer to Global Temporary Table Creation Code Snippets,Scenario 1 - Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.
Scenario 2 - Expression level of TP53 mutant by cancer tissue.
Scenario 3 - Show me how many patients with Moderate Alzheimer's, who have a smoking habit were treated with Doxorubicin? Which is there most recent Cholesterol and Hb counts? Of those patients, which have samples and what is the differential expression of genes in those samples compared to my controls?
Scenario 4 - MAGEA3 differential expression between basal cell carcinoma and metastatic carcinoma samples.
Scenario 5 - Compare CFL2 gene expresion between two cohorts. Cohort1: subjects with primary melanoma or basal cell carcinoma or squamous cell carcinoma; Cohort2: subjects with metastatic melanoma.
Scenario 6 - Ability to query subjects for established molecular tests, for example the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Scenario 7 - Ability to research a gene in the sample set.
Scenario 8 - Dr. Smith is evaluating gene expression data set for a group of tumors in his research project. He views pattern of gene expression data.
Scenario 9 - Dr. Smith started a trial for patients with metastatic melanoma. His hypothesis is that certain mutations make a patient a better candidate for a new treatment protocol compared to patients without the mutation. Dr. Storm searches the dat awarehouse using the HRI solution tools and identifies a group of patients clinically eligible for his study when identifies a subgroup of patients with certain gene mutaitons who also fit the clinical eligibility criteria for study. He contacts this group through the TCC protocol coordinator and commences his study.
Scenario 10 - Dr. Smith is attempting to identify clusters of genes that are activated with maglinant melanoma (skin cancer), in order to find successful interventions for prevention or treatment. Dr. Merdock is utilizing a Hidden Markov model to identify temporal patterns in the gene expression. He will visualize these patterns with a hierarchical clustering chart in order to more easily identify contributing factors for the illness. Dr. Merdock processes a large number of sequence results and uses the BI tools to identify data he wants to pull into his analysis. He runs statistical and advanced visualization of this data, archives the data utilized in the analysis and is able to find patterns that would be extremely difficult to find without these tools. These analyses would assist in determining patient prognosis, best therapies and potentially new druagable targets, especially in patients who are non-responsive to therapy.
Scenario 11 - CohortID: the researcher needs to be able to select a set of chips based on the certain criteria. The researcher will then devide the cohort into different groups (this can also be done at the cohort level where the researchers select several different cohorts and saves them) based on some criteria. The researcher will then see what genes (probes) are most differently regulated between these classes (cohorts). The research should also be able to select a sepcific gene and see its expression values across the samples. The results should also be shown in scatter plots or bar graphs.
Scenario 12 - The researcher should be able to select a patient cohort based on the expression level for a set of genes. The same selection as the example above is needed but with the addition filters for gene express levels.
Scenario 13 - I have a set of mutations meeting a set of criteria (for example, frequency, and so on), I need to annotated them with gene and function information.
Scenario 14 - Select mutation with deep functional annotation (for eample, polyphen, sift).
Scenario 15 - I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Scenario 16 - I have a mutation I have deemed interesting, I want to know if it perturbs gene expression (in the same project or in other projects).
Scenario 17 - Examine gene expression across cell lines within each panel.
Scenario 18 - Examine copy number across cell lines within each panel.
Scenario 19 - Do KRAS mutant cell lines express more IL-6 than wildtype.
Scenario 20 - Do RAS high cell lines expression more IL-6 than low RAS lines.
Scenario 21 - Are cMET amplified cell line more sensitive to METi?
Scenario 22 - Show HER+ lines have ERBB2 copy number gains.
Scenario 23 - Label cell lines for PTEN, PIK3CA mutations.
Scenario 24 - What is the frequency of co-mutation of two genes in a data set?
Scenario 25 - The steps for Initial data processing to identify sequence variants should be fully automated by writing scripts to interact with public database. The variants could then be presented in a graphic interface that summarized their characteristics.
Scenario 26 - Map the variant coordinates to gene, exon, amino acid, chromosomal location, region of protein; and obtain CCDS (Consensus Coding Sequence) and RefSeq IDs. Example: PDEGFR D842V, exon 18 out of 23 total exons, 4q11-q13, protein tyrosine kinase domain; CCDS3495.1, NP_006197.
Scenario 27 - Determine whether the variant is a known SNP or a previously characterized somatic mutation in cancer. Also determine the total number of reported somatic variants in the entire gene, and whether any are near the variant in question.
Scenario 28 - Show me all patients whose cancer cells had a deletion in gene X.
Use Case - Find patients that are poor responders for drug A and have a mutation in the promoter region of gene A.
Areas
Variant
Gene Annotation
Test and Results
Drug Info
Output queries tables from - ODB+CDM
Query -
insert into query_1SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ vrt.RESULT_SPEC_WID, vi.start_position, vi.end_position, vi.replace_tagFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.ROW_WID, v.start_position, v.end_position, v.replace_tag FROM W_EHA_VARIANT v, ( SELECT gcsg.SOURCE_WID, gcsg.START_POSITION, gcsg.END_POSITION FROM W_EHA_GENE g, w_eha_gene_structure gst, w_eha_gene_component gc, w_eha_gene_comp_segment gcsg WHERE g.HUGO_NAME IN ('BRCA2') AND gc.COMPONENT_TYPE IN ('CDS') AND gc.STRUCTURE_WID = gst.ROW_WID AND gst.GENE_WID = g.ROW_WID AND gcsg.GENE_COMPONENT_WID = gc.ROW_WID )sg WHERE v.SOURCE_WID = sg.SOURCE_WID AND v.START_POSITION <= sg.END_POSITION AND sg.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE vrt.VARIANT_WID = vi.ROW_WID
AND w_eha_rslt_study.result_study_name = 'study_3'
AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID
AND vrt.RESULT_SPEC_WID = 250
Use Case - Expression level of TP53 mutant by cancer tissue.
Areas
Variant
Gene Annotation
Gene Expression
Biospecimen Data
Output queries tables from - ODB+CDM
Query 2 -
insert into query_2select r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensity from W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp,query_2where p.primary_hugo_name in ('TP53'))AND r_exp.probe_wid = p.row_widAND r_exp.specimen_wid = query_2_a.specimen_wid ;
Query 2a -
insert into query_2_1SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ distinct vrt.RESULT_SPEC_WID, vi.start_position, vi.end_position, vi.replace_TagFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('TP53') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE w_eha_rslt_study.result_study_name = 'study_1'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND vrt.VARIANT_WID = vi.ROW_WID;
Use Case - Show me how many patients with Moderate Alzheimer's, who have a smoking habit were treated with Doxorubicin? Which is there most recent Cholesterol and Hb counts? Of those patients, which have samples and what is the differential expression of genes in those samples compared to my controls?
Areas
Diagnosis
Biospecimen Data
Drug Info
Risk Factor
Test and Results
Expression
Gene Annotation
Output queries tables from - ODB+CDM
Query -
insert into query_7select r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('COX15'))) AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_10' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 950union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('CHAT') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_10' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 950union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('WNT8B') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_10' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 950union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('TRBV1') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_10' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 950
Use Case - MAGEA3 differential expression between basal cell carcinoma and metastatic carcinoma samples.
Areas
Gene Annotation
Expression
Biospecimen Data
Output queries tables from - ODB: REFERENCE + RESULT
Query -
insert into query_9select r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('MAGEA3') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_10' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 950
Use Case - Compare CFL2 gene expresion between two cohorts. Cohort1: subjects with primary melanoma or basal cell carcinoma or squamous cell carcinoma; Cohort2: subjects with metastatic melanoma
Areas
Expression
Biospecimen Data
Cancer Diagnosis
Output queries tables from - ODB: REFERENCE + RESULT
Query -
insert into query_10select r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('ZNF211') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_10' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 950
Use Case - Ability to query subjects for established molecular tests, for example the presence of known myeloma mutations such as the t(4;14) translocation or mutations in oncogenes such as RAS.
Areas
Variant
Chromosomal Rearrangement
Gene Annotation
Output queries tables from - ODB+CDM
Query -
create global temporary table query_11(result_spec_wid NUMBER, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;INSERT INTO Query_11SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ vrt.RESULT_SPEC_WID, vi.start_position, vi.end_position, vi.replace_tagFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('KRAS') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE w_eha_rslt_study.result_study_name = 'study_1'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND vrt.VARIANT_WID = vi.ROW_WIDAND vrt.RESULT_SPEC_WID = 50;
Use Case - Ability to research a gene in the sample set.
Areas
Gene Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query -
SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ count(distinct w_eha_rslt_study.result_study_name)FROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.*, gs.hugo_name FROM W_EHA_VARIANT v, ( SELECT g.hugo_name, gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('BRCA2') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND vrt.VARIANT_WID = vi.ROW_WID;union all/* 6 sec */select distinct w_eha_rslt_study2.result_study_namefrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_study w_eha_rslt_study2where p.primary_hugo_name in ('BRCA2')AND r_exp.probe_wid = p.row_widAND r_exp.RESULT_STUDY_WID = w_eha_rslt_study2.ROW_WIDunion all/* 1082sec */SELECT /*+ index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M1),index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M2)*/distinct w_eha_rslt_study3.result_study_nameFROM W_EHA_RSLT_COPY_NBR_VAR r_cnv, w_eha_rslt_study w_eha_rslt_study3, (SELECT gsg.START_POSITION gsg_START_POSITION, gsg.END_POSITION gsg_END_POSITION, s.START_POSITION s_START_POSITION, s.END_POSITION s_END_POSITION FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g, W_EHA_DNA_SOURCE s WHERE g.HUGO_NAME IN ('BRCA2') AND gsg.GENE_WID = g.ROW_WID AND gsg.SOURCE_WID= s.ROW_WID ) sgWHERE r_cnv.START_POSITION <= (sg.gsg_end_position+sg.s_start_position)AND (sg.gsg_start_POSITION+sg.s_start_position) <= r_cnv.end_positionAND r_cnv.RESULT_STUDY_WID = w_eha_rslt_study3.ROW_WID
Use Case - Dr. Smith is evaluating gene expression data set for a group of tumors in his research project. He views pattern of gene expression data.
Areas
Expression
Biospecimen Data
Output queries tables from - ODB: REFERENCE + RESULT
Query -
insert into query_13select r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('TP53')AND r_exp.probe_wid = p.row_widAND w_eha_rslt_study.result_study_name = 'study_11'AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND r_exp.result_spec_wid = 1050union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('BRCA1')AND r_exp.probe_wid = p.row_widAND w_eha_rslt_study.result_study_name = 'study_11'AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND r_exp.result_spec_wid = 1050union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('BRCA2')AND r_exp.probe_wid = p.row_widAND w_eha_rslt_study.result_study_name = 'study_11'AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND r_exp.result_spec_wid = 1050union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('BYSL')AND r_exp.probe_wid = p.row_widAND w_eha_rslt_study.result_study_name = 'study_11'AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND r_exp.result_spec_wid = 1050
Use Case - Dr. Smith started a trial for patients with metastatic melanoma. His hypothesis is that certain mutations make a patient a better candidate for a new treatment protocol compared to patients without the mutation. Dr. Storm searches the dat awarehouse using the HRI solution tools and identifies a group of patients clinically eligible for his study when identifies a subgroup of patients with certain gene mutaitons who also fit the clinical eligibility criteria for study. He contacts this group through the TCC protocol coordinator and commences his study.
Areas
Cancer Diagnosis
TCC Consent
Variant
Gene Annotation
Treatment Protocol
Output queries tables from - ODB: REFERENCE + RESULT
Query -
create global temporary table query_14(result_spec_wid NUMBER, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;/* 32.1 sec */INSERT INTO Query_14SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ vrt.RESULT_SPEC_WID, vi.start_position, vi.end_position, vi.replace_tagFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('CHAT') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE w_eha_rslt_study.result_study_name = 'study_4'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND vrt.VARIANT_WID = vi.ROW_WIDAND vrt.RESULT_SPEC_WID = 350;
Use Case - Dr. Smith is attempting to identify clusters of genes that are activated with maglinant melanoma (skin cancer), in order to find successful interventions for prevention or treatment. Dr. Merdock is utilizing a Hidden Markov model to identify temporal patterns in the gene expression. He will visualize these patterns with a hierarchical clustering chart in order to more easily identify contributing factors for the illness. Dr. Merdock processes a large number of sequence results and uses the BI tools to identify data he wants to pull into his analysis. He runs statistical and advanced visualization of this data, archives the data utilized in the analysis and is able to find patterns that would be extremely difficult to find without these tools. These analyses would assist in determining patient prognosis, best therapies and potentially new druagable targets, especially in patients who are non-responsive to therapy.
Areas
Cancer Diagnosis
Test and Results
Output queries tables from - ODB: REFERENCE + RESULT
Query -
insert into query_15select r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('TP53') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_14' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1350union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('BRCA1') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_14' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1350union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('GPR4') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_14' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1350union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('PABPC1') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_14' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1350union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('SOBP') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_14' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1350
Use Case - CohortID: the researcher should be able to select a set of chips based on the following (and potential other) criteria's:
demographic (age, sex, and so on);
clinical data;
tumor type;
tumor properties;
chip quality.
The researcher will then divide the cohort into different groups (this can also be done at the cohort level where the researchers select several different cohorts and saves them) based on some criteria. The researcher will then see what genes (probes) are most differently regulated between these classes (cohorts). The research should also be able to select a sepecific gene and see its expression values across the samples. The results should also be shown in scatter plots or bar graphs.
Areas
Cancer Diagnosis
Demographic
Biospecimen Data
QC Data
Gene Annotation
Expression
Output queries tables from - ODB+CDM
Query -
insert into query_16select r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('LEPR') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_20' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1950union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('MSSE') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_20' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1950union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('PCCA') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_20' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1950union ALLselect r_exp.result_spec_wid, p.primary_hugo_name, r_exp.intensityfrom W_EHA_PROBE p, W_EHA_RSLT_GENE_EXP r_exp, w_eha_rslt_studywhere p.primary_hugo_name in ('SLC1A1') AND r_exp.probe_wid = p.row_wid AND w_eha_rslt_study.result_study_name = 'study_20' AND r_exp.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID AND r_exp.result_spec_wid = 1950
Use Case - The researcher should be able to select a patient cohort based on the expression level for a set of genes. The same selection as the example above is needed but with the addition filters for gene express levels.
Areas
Cancer Diagnosis
Demographic
Biospecimen Data
QC Data
Gene Annotation
Expression
Output queries tables from - ODB+CDM
Query -
select count(distinct r_exp2.result_spec_wid)from W_EHA_PROBE p2, W_EHA_RSLT_GENE_EXP r_exp2, w_eha_rslt_study w_eha_rslt_study2,(select avg(r_exp1.intensity) exp_avgfrom W_EHA_PROBE p1, W_EHA_RSLT_GENE_EXP r_exp1, w_eha_rslt_study w_eha_rslt_study1where p1.primary_hugo_name in ('TP53','BRCA1','GPR4','PABPC1','SOBP') AND r_exp1.probe_wid = p1.row_wid AND w_eha_rslt_study1.result_study_name = 'study_15' AND r_exp1.RESULT_STUDY_WID = w_eha_rslt_study1.ROW_WID) intensitywhere p2.primary_hugo_name in ('TP53','BRCA1','GPR4','PABPC1','SOBP') AND r_exp2.probe_wid = p2.row_wid AND r_exp2.intensity > intensity.exp_avg AND w_eha_rslt_study2.result_study_name = 'study_15' AND r_exp2.RESULT_STUDY_WID = w_eha_rslt_study2.ROW_WID
Use Case - I have a set of mutations meeting a set of criteria (for example, frequency, and so on), I need to annotate them with gene and function information.
Areas
Variant
Variant Population Info
Demographic
Gene Annotation
Functional Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query -
select DISTINCT hu.*, gi.reference_id from W_EHA_HUGO_INFO hu, ( select DISTINCT g.hugo_name, vref.reference_id from w_eha_gene g, w_eha_variant v, w_eha_variant_xref vref, W_EHA_DNA_SOURCE s, W_EHA_GENE_SEGMENT gs, W_EHA_SPECIES sp where vref.reference_id = 'rs11169' AND vref.variant_wid = v.row_wid AND gs.SOURCE_WID= v.SOURCE_WID AND s.row_wid = gs.source_wid AND v.END_POSITION >= gs.START_POSITION AND gs.END_POSITION >= v.START_POSITION AND gs.GENE_WID=g.ROW_WID AND sp.common_name = 'human' AND s.SPECIES_WID=sp.ROW_WID )gi WHERE hu.APPROVED_SYMBOL = gi.HUGO_NAME;
Use Case - Select mutation with deep functional annotation (for example, polyphen, sift)
Areas
Variant
Gene Annotation
Output queries tables from - ODB: REFERENCE
Query -
SELECT vg.REFERENCE_ID, vg.absolute_position, vg.chromosome, vg.replace_tag, pi.ACCESSION, p.AMINO_ACID_SEQUENCEFROM W_EHA_GENE_COMPONENT gc, W_EHA_PROTEIN p, W_EHA_PROT_INFO pi, ( SELECT gcs.GENE_COMPONENT_WID, vx.reference_id, v.absolute_position, v.chromosome, v.replace_tag FROM W_EHA_GENE_COMP_SEGMENT gcs, W_EHA_VARIANT v, W_EHA_VARIANT_XREF vx WHERE vx.reference_id = 'rs111690037' AND vx.VARIANT_WID = v.ROW_WID AND v.SOURCE_WID = gcs.SOURCE_WID AND v.START_POSITION <= gcs.END_POSITION AND gcs.START_POSITION <= v.END_POSITION )vgWHERE gc.ROW_WID = vg.GENE_COMPONENT_WID AND gc.component_type = 'CDS' AND gc.PROTEIN_WID = p.ROW_WID AND pi.protein_wid = p.row_wid
Use Case - I have a pathway. What mutations are present in the pathway and which study were they identified in (for example, what tumor types)?
Areas
Variant
Gene Annotation
Pathway Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query -
Firstly, identify the genes from the selected pathway then
truncate table query_25;insert into query_25select /*+ index(vr W_EHA_RSLT_SEQUENCING_F2) */ vr.result_spec_wid, vg.hugo_name, vg.absolute_position, vg.chromosome, vg.replace_tag from ( SELECT g.hugo_name, v.row_wid, v.absolute_position, v.chromosome, v.replace_tag from w_eha_variant v, w_eha_gene g, w_eha_gene_segment gs, w_eha_dna_source s, w_eha_species sp where g.HUGO_NAME IN ( select pthp.hugo_symbol from w_eha_pathway pth, w_eha_pathway_protein pthp where pth.pathway_name = 'Apoptosis' AND pthp.pathway_wid = pth.row_wid ) AND g.ROW_WID = gs.GENE_WID AND gs.SOURCE_WID = v.SOURCE_WID AND v.END_POSITION >= gs.START_POSITION AND gs.END_POSITION >= v.START_POSITION AND gs.source_wid = s.row_wid AND s.SPECIES_WID = sp.ROW_WID AND sp.SPECIES_NAME = 'Homo sapiens' ) vg, w_eha_rslt_sequencing vr, w_eha_rslt_study rstwhere vr.VARIANT_WID = vg.row_widAND rst.result_study_name = 'study_1'AND vr.result_study_wid = rst.row_widAND vr.result_spec_wid = 2;
Use Case - I have a mutation I have deemed interesting, I want to know if it perturbs gene expression (in the same project or in other projects).
Areas
Variant
Expression
Gene Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query -
truncate table query_29;insert into query_29SELECT /*+ index(vr W_EHA_RSLT_SEQUENCING_F2) */ exr.result_spec_wid, exr.intensity, exr.p_value, exr.call, exr.probe_wid from W_EHA_RSLT_GENE_EXP exr, W_EHA_VARIANT v, w_eha_rslt_sequencing vr, W_EHA_VARIANT_XREF vx, W_EHA_GENE_SEGMENT gs, W_EHA_GENE g, W_EHA_PROBE prb, w_eha_rslt_study rst where vx.REFERENCE_ID = 'rs111690037' AND vx.VARIANT_WID = v.ROW_WID AND v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND v.END_POSITION >= gs.START_POSITION AND gs.GENE_WID = g.ROW_WID AND g.HUGO_NAME = prb.PRIMARY_HUGO_NAME and exr.probe_wid = prb.row_wid AND v.ROW_WID = vr.VARIANT_WID AND rst.result_study_name = 'study_1' AND vr.result_study_wid = rst.row_wid AND vr.result_spec_wid = exr.result_spec_wid AND vr.result_spec_wid > 0 AND vr.result_spec_wid <= 100;
Use Case - Examine gene expression across cell lines within each panel.
Areas
Expression
Gene Annotation
Biospecimen Data
Panel
Output queries tables from - ODB: REFERENCE + RESULT
Query -
truncate table query_30;insert into query_30 SELECT rge.result_spec_wid, p.primary_hugo_name, rge.hybridization_name, rge.intensity, rge.p_value, rge.callFROM w_eha_rslt_gene_exp rge, w_eha_probe p, w_eha_rslt_study rstWHERE rge.PROBE_WID = p.row_wid AND rst.result_study_name = 'study_1' AND rge.result_study_wid = rst.row_wid AND rge.result_spec_wid = 6;
Use Case - Examine copy number across cell lines within each panel.
Areas
Copy Number
Biospecimen Data
Panel
Output queries tables from - ODB+CDM
Query -
/*[FOR SEQUENCING]*/truncate table query_31a;insert into query_31aSELECT cnv.result_spec_wid, chr.chromosome, cnv.start_position, cnv.end_position, cnv.avg_normalized_cvg from w_eha_rslt_copy_nbr_var cnv, w_eha_rslt_study rst, w_eha_chromosome chrwhere rst.result_study_name = 'study_1' AND cnv.result_study_wid = rst.row_wid AND chr.row_wid = cnv.chromosome_wid AND cnv.result_spec_wid = 6; -----------------------------------------------/*[FOR EXPRESSION DATA]*/truncate table query_31b;insert into query_31bSELECT exr.result_spec_wid, exr.hybridization_name, exr.intensity, exr.p_value, exr.call from W_EHA_RSLT_GENE_EXP exr, w_eha_rslt_study rstwhere rst.result_study_name = 'study_1' AND exr.result_study_wid = rst.row_wid AND exr.result_spec_wid = 6;
Use Case - Do KRAS mutant cell lines express more IL-6 than wildtype.
Areas
Variant
Protein
Gene Annotation
Output queries tables from - ODB: REFERENCE + RESULT
Query19 -
truncate table query_32;insert into query_32SELECT rge.result_spec_wid, rge.intensity, rge.p_value, rge.callFROM W_EHA_RSLT_GENE_EXP rge, W_EHA_PROBE prb, ( SELECT DISTINCT rs.result_study_wid --, rs.result_spec_wid FROM w_eha_prot_info pi, W_EHA_GENE_COMPONENT gc, W_EHA_GENE_COMP_SEGMENT gcs, W_EHA_VARIANT v, W_EHA_RSLT_SEQUENCING rs, w_eha_rslt_study rst WHERE pi.PROTEIN_NAME = 'RASK_HUMAN' AND gc.PROTEIN_WID = pi.protein_wid AND gcs.GENE_COMPONENT_WID = gc.ROW_WID AND v.SOURCE_WID = gcs.SOURCE_WID AND v.START_POSITION <= gcs.END_POSITION AND gcs.START_POSITION <= v.END_POSITION AND rs.VARIANT_WID = v.ROW_WID and rst.result_study_name = 'study_1' and rs.result_study_wid = rst.row_wid AND rs.result_spec_wid = 2) aiWHERE prb.PRIMARY_HUGO_NAME = 'IL6' AND rge.PROBE_WID = prb.ROW_WID AND rge.result_study_wid = ai.result_study_wid AND rge.result_spec_wid = 6
Use Case - Do RAS high cell lines expression more IL-6 than low RAS lines.
Areas
Gene Annotation
Protein
Output queries tables from - ODB: REFERENCE + RESULT
Query -
truncate table query_33;insert into query_33SELECT er.result_spec_wid, er.intensity, er.p_value, er.call from W_EHA_RSLT_GENE_EXP er, W_EHA_PROBE prb, w_eha_rslt_study rst where prb.PRIMARY_HUGO_NAME = 'IL6' AND er.PROBE_WID = prb.ROW_WID AND rst.result_study_name = 'study_1' AND er.result_study_wid = rst.row_wid AND er.result_spec_wid = 6;
Use Case - Are cMET amplified cell line more sensitive to METi?
Areas
Gene Annotation
Copy Number
Output queries tables from - ODB: REFERENCE + RESULT
Query -
SELECT /*+ index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M1),index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M2)*/count(distinct r_cnv.result_spec_wid)FROM (select r_cnv1.* from W_EHA_RSLT_COPY_NBR_VAR r_cnv1, w_eha_rslt_study rst where rst.result_study_name = 'study_1' AND r_cnv1.result_study_wid = rst.row_wid AND r_cnv1.called_CNV_Type = 'gain') r_cnv, (SELECT gsg.START_POSITION gsg_START_POSITION, gsg.END_POSITION gsg_END_POSITION, s.START_POSITION s_START_POSITION, s.END_POSITION s_END_POSITIONFROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g, W_EHA_DNA_SOURCE sWHERE g.HUGO_NAME IN ('MET') AND gsg.GENE_WID = g.ROW_WID AND gsg.SOURCE_WID= s.ROW_WID ) sgWHERE r_cnv.START_POSITION <= (sg.gsg_end_position+sg.s_start_position)AND (sg.gsg_start_POSITION+sg.s_start_position) <= r_cnv.end_position;
Use Case - Show HER+ lines have ERBB2 copy number gains.
Areas
Protein
Biospecimen Data
Gene Annotation
Copy Number
Output queries tables from - ODB+CDM
Query -
SELECT /*+ index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M1),index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M2)*/distinct r_cnv.called_CNV_TypeFROM W_EHA_RSLT_COPY_NBR_VAR r_cnv, w_eha_rslt_study, (SELECT gsg.START_POSITION gsg_START_POSITION, gsg.END_POSITION gsg_END_POSITION, s.START_POSITION s_START_POSITION, s.END_POSITION s_END_POSITION FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g, W_EHA_DNA_SOURCE s WHERE g.HUGO_NAME IN ('ERBB2') AND gsg.GENE_WID = g.ROW_WID AND gsg.SOURCE_WID= s.ROW_WID ) sgWHERE r_cnv.START_POSITION <= (sg.gsg_end_position+sg.s_start_position)AND (sg.gsg_start_POSITION+sg.s_start_position) <= r_cnv.end_positionAND r_cnv.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND w_eha_rslt_study.result_study_name = 'study_6'AND r_cnv.result_spec_wid = 550
Use Case - Label cell lines for PTEN, PIK3CA mutations.
Areas
Biospecimen Data
Variant
Gene Annotation
Output queries tables from - ODB+CDM
Query -
SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ count(distinct vrt.RESULT_SPEC_WID)FROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('PTEN','PIK3CA') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE w_eha_rslt_study.result_study_name = 'study_4'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND vrt.VARIANT_WID = vi.ROW_WID;
Use Case - What is the frequency of co-mutation of two genes in a data set?
Areas
Variant
Output queries tables from - ODB: REFERENCE + RESULT
Query -
/* concurrent count */ select count(a_and_b.result_SPEC_WID) concurrent_cntfrom (SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ distinct vrt.result_SPEC_WIDFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('KRAS') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi, w_eha_rslt_studyWHERE vrt.VARIANT_WID = vi.ROW_WIDAND w_eha_rslt_study.result_study_name = 'study_4'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDintersectSELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ distinct vrt.result_SPEC_WIDFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('BRCA2') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi, w_eha_rslt_studyWHERE vrt.VARIANT_WID = vi.ROW_WIDAND w_eha_rslt_study.result_study_name = 'study_4'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID) a_and_b/* in A but not in B *//* 38.5 sec*/select count(a_not_b.result_SPEC_WID) a_not_b_cntfrom (SELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ distinct vrt.result_SPEC_WIDFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('KRAS') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE vrt.VARIANT_WID = vi.ROW_WIDAND w_eha_rslt_study.result_study_name = 'study_3'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDminusSELECT /*+ index(vrt W_EHA_RSLT_SEQUENCING_F2) */ distinct vrt.result_SPEC_WIDFROM W_EHA_RSLT_SEQUENCING vrt, ( SELECT v.* FROM W_EHA_VARIANT v, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('BCRA2') AND gsg.GENE_WID = g.ROW_WID )gs WHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION) vi,w_eha_rslt_study WHERE vrt.VARIANT_WID = vi.ROW_WIDAND w_eha_rslt_study.result_study_name = 'study_3'AND vrt.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WID) a_not_b
Use Case - The steps for Initial data processing to identify sequence variants should be fully automated by writing scripts to interact with public database. The variants could then be presented in a graphic interface that summarize their characteristics.
Output queries tables from - ODB: REFERENCE
Query -
truncate table query_53drop table query_53create global temporary table query_53(reference_id varchar2(200), reference_data varchar2(100)) ON COMMIT DELETE ROWS;insert into query_53SELECT distinct vx.reference_ID, vx.database from W_EHA_VARIANT v, w_eha_variant_xref vx, W_EHA_GENE g, W_EHA_GENE_SEGMENT gs, W_EHA_HUGO_INFO hgwhere hg.ENTREZ_GENE_ID = 56624AND hg.approved_symbol = g.hugo_nameAND g.ROW_WID = gs.GENE_WIDAND gs.source_wid = v.source_widAND gs.END_POSITION >= v.START_POSITIONAND v.END_POSITION >= gs.START_POSITIONAND v.row_wid = vx.variant_wid;
Use Case - Map the variant coordinates to gene, exon, amino acid, chromosomal location, region of protein; and obtain CCDS (Consensus Coding Sequence) and RefSeq IDs. Example: PDEGFR D842V, exon 18 out of 23 total exons, 4q11-q13, protein tyrosine kinase domain; CCDS3495.1, NP_006197.
Output queries tables from - ODB: REFERENCE
Query 26 A -
/*a. Map the variant coordinates to gene and obtain CCDS (Consensus Coding Sequence) and RefSeq IDs. Example: PDEGFR; CCDS3495.1, NP_006197*//* 24.3 sec */truncate table query_58adrop table query_58acreate global temporary table query_58a(ROW_WID NUMBER, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;insert into query_58aSELECT v.ROW_WID, v.start_position, v.end_position, v.replace_tag, sg.gcx_reference_id, sg.px_reference_idFROM W_EHA_VARIANT v, ( SELECT gcsg.SOURCE_WID, gcsg.START_POSITION, gcsg.END_POSITION, gcx.reference_id gcx_reference_id, px.reference_id px_reference_id FROM W_EHA_GENE g, w_eha_gene_structure gst, w_eha_gene_component gc, w_eha_gene_comp_segment gcsg, w_eha_gene_comp_xref gcx, w_EHA_PROT_XREF px WHERE g.HUGO_NAME IN ('PTEN') AND gc.COMPONENT_TYPE IN ('CDS') AND gc.STRUCTURE_WID = gst.ROW_WID AND gst.GENE_WID = g.ROW_WID AND gcsg.GENE_COMPONENT_WID = gc.ROW_WID AND gcx.gene_component_wid = gc.row_wid AND gcx.database in ('CCDS') and px.database in ('RefSeq') and px.protein_wid = gc.protein_wid)sgWHERE v.SOURCE_WID = sg.SOURCE_WID AND v.START_POSITION <= sg.END_POSITION AND sg.START_POSITION <= v.END_POSITION
Query 26 B -
/*b. Map to exon and obtain CCDS (Consensus Coding Sequence) and RefSeq IDs.Example: PDGFRA, exon 18 out of 23 total exons ; CCDS3495.1, NP_006197*//* 6.4 sec */truncate table query_58bdrop table query_58bcreate global temporary table query_58b(ROW_WID NUMBER, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;insert into query_58bSELECT v.ROW_WID, v.start_position, v.end_position, v.replace_tag, sg.gcx_reference_id, sg.px_reference_idFROM W_EHA_VARIANT v, ( SELECT gcsg.SOURCE_WID, gcsg.START_POSITION, gcsg.END_POSITION, gcx.reference_id gcx_reference_id, px.reference_id px_reference_id FROM W_EHA_GENE g, w_eha_gene_structure gst, w_eha_gene_component gc, w_eha_gene_comp_segment gcsg, w_eha_gene_comp_xref gcx, w_EHA_PROT_XREF px WHERE g.HUGO_NAME IN ('KRAS') AND gc.COMPONENT_TYPE IN ('CDS') AND gc.STRUCTURE_WID = gst.ROW_WID AND gst.GENE_WID = g.ROW_WID AND gcsg.GENE_COMPONENT_WID = gc.ROW_WID AND gcsg.number_in_sequence = 2 AND gcx.gene_component_wid = gc.row_wid AND gcx.database in ('CCDS') and px.database in ('RefSeq') and px.protein_wid = gc.protein_wid)sgWHERE v.SOURCE_WID = sg.SOURCE_WID AND v.START_POSITION <= sg.END_POSITION AND sg.START_POSITION <= v.END_POSITION
Query 26 C -
/*c. Map to chromosomal location and obtain CCDS (Consensus Coding Sequence) and RefSeq IDs. 4q11-q13(4:74301933..74321492); *//* 6.6sec */truncate table query_58cdrop table query_58ccreate global temporary table query_58c(hugo_name varchar2(200), start_position NUMBER, end_position NUMBER, chromosome varchar2(50),replace_tag varchar2(1000), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;insert into query_58cSELECT sg.hugo_name, v.start_position, v.end_position, v.chromosome, v.replace_tag, sg.gcx_refid, sg.px_refidfrom W_EHA_VARIANT v, ( SELECT distinct g.hugo_name, gcs.source_wid, gcs.start_position, gcs.end_position, gcx.reference_id as gcx_refid, px.REFERENCE_ID as px_refid from W_EHA_GENE g, w_eha_gene_comp_xref gcx, W_EHA_GENE_COMPONENT gc, w_eha_gene_comp_segment gcs, w_eha_prot_xref px, W_EHA_GENE_STRUCTURE gs, ( select ds.row_wid, ds.start_position, ds.end_position from w_eha_dna_source ds where ds.chromosome in ('4') and ds.start_position <= 74321492 and ds.end_position >= 74301933 ) ref_chr where gcs.source_wid = ref_chr.row_wid and (ref_chr.start_position + gcs.start_position - 1) <= 74321492 and (ref_chr.start_position + gcs.end_position - 1) >= 74301933 and gcs.gene_component_wid = gc.row_wid and gc.component_type = 'CDS'-- can also be 'exons' and gs.row_wid = gc.structure_wid and g.row_wid = gs.gene_wid and px.protein_wid = gc.protein_wid and px.database = 'RefSeq' and gcx.gene_component_wid = gc.row_wid and gcx.database = 'CCDS')sgwhere v.SOURCE_WID = sg.source_widAND v.START_POSITION <= sg.END_POSITIONAND sg.START_POSITION <= v.END_POSITION
Query 26 D -
/*d. Map to region of protein and obtain CCDS (Consensus Coding Sequence) and RefSeq IDs. protein tyrosine kinase domain; result: CCDS3495.1, NP_006197 *//* 44.6 sec*/truncate table query_58ddrop table query_58dcreate global temporary table query_58d(hugo_name varchar2(200), start_position NUMBER, end_position NUMBER, chromosome varchar2(50),replace_tag varchar2(1000), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;insert into query_58dSELECT distinct sg.hugo_name, v.start_position, v.end_position, v.chromosome, v.replace_tag, sg.gcx_refid, sg.px_refidfrom W_EHA_VARIANT v, ( select distinct g.hugo_name, gcs.source_wid, gcs.start_position, gcs.end_position, gcx.reference_id as gcx_refid, px.REFERENCE_ID as px_refid from W_EHA_GENE g, w_eha_gene_comp_xref gcx, W_EHA_GENE_COMPONENT gc, w_eha_gene_comp_segment gcs, w_eha_prot_xref px, w_eha_gene_structure gs where px.protein_wid in ( select px2.protein_wid from w_eha_prot_xref px2 where px2.reference_id IN ('PROTEIN_KINASE_TYR') and px2.database = 'PROSITE') and px.protein_wid = gc.protein_wid and px.database = 'RefSeq' and gcx.gene_component_wid = gc.row_wid and gcx.database = 'CCDS' and gc.component_type = 'CDS'-- can also be 'exons' and gcs.gene_component_wid = gc.row_wid and gs.row_wid = gc.structure_wid and g.row_wid = gs.gene_wid) sgwhere v.SOURCE_WID = sg.source_wid AND v.START_POSITION <= sg.END_POSITION AND sg.START_POSITION <= v.END_POSITION
Use Case - Determine whether the variant is a known SNP or a previously characterized somatic mutation in cancer. Also determine the total number of reported somatic variants in the entire gene, and whether any are near the variant in question.
Output queries tables from - ODB: REFERENCE
Query -
/*a) Check if the variant is a known COSMIC variant*//* 0.1sec */SELECT v.row_wid, vx.database from W_EHA_VARIANT v, W_EHA_VARIANT_XREF vxwhere v.ROW_WID = vx.VARIANT_WID AND vx.DATABASE IN ('COSMIC', 'dbSNP_132') AND vx.REFERENCE_ID = 'rs11169';/*b) Determine total no. of reported somatic variants in entire gene*//* 0.48sec */SELECT v.*FROM W_EHA_VARIANT v, w_eha_variant_xref vx, ( SELECT gsg.START_POSITION, gsg.END_POSITION, gsg.SOURCE_WID FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g WHERE g.HUGO_NAME IN ('PTEN') AND gsg.GENE_WID = g.ROW_WID)gsWHERE v.SOURCE_WID = gs.SOURCE_WID AND v.START_POSITION <= gs.END_POSITION AND gs.START_POSITION <= v.END_POSITION AND vx.VARIANT_WID = v.ROW_WID AND vx.DATABASE in ('COSMIC', 'dbSNP_132')
Use Case - Show me all patients whose cancer cells had a deletion in gene X.
Output queries tables from - ODB+CDM
Query -
SELECT /*+ index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M1),index (r_cnv W_EHA_RSLT_COPY_NBR_VAR_M2)*/distinct r_cnv.START_POSITION, r_cnv.END_POSITION, r_cnv.called_CNV_TypeFROM W_EHA_RSLT_COPY_NBR_VAR r_cnv, w_eha_rslt_study, (SELECT gsg.START_POSITION gsg_START_POSITION, gsg.END_POSITION gsg_END_POSITION, s.START_POSITION s_START_POSITION, s.END_POSITION s_END_POSITION FROM W_EHA_GENE_SEGMENT gsg, W_EHA_GENE g, W_EHA_DNA_SOURCE s WHERE g.HUGO_NAME IN ('PTEN') AND gsg.GENE_WID = g.ROW_WID AND gsg.SOURCE_WID= s.ROW_WID ) sgWHERE r_cnv.START_POSITION <= (sg.gsg_end_position+sg.s_start_position)AND (sg.gsg_start_POSITION+sg.s_start_position) <= r_cnv.end_positionAND r_cnv.RESULT_STUDY_WID = w_eha_rslt_study.ROW_WIDAND w_eha_rslt_study.result_study_name = 'study_5'AND r_cnv.RESULT_SPEC_WID = 450;
Following are the code snippets for creating global temporary tables:
create global temporary table query_1(SPECIMEN_WID NUMBER, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;
create global temporary table query_2(specimen_wid NUMBER, primary_hugo_name varchar2(200), intensity number, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;
create global temporary table query_2_a(replace_tag varchar2(1000) , start_position NUMBER, end_position NUMBER, specimen_wid number) ON COMMIT DELETE ROWS;
create global temporary table query_4(SPECIMEN_WID NUMBER, primary_hugo_name varchar2(200), intensity NUMBER) ON COMMIT DELETE ROWS;
create global temporary table query_5(SPECIMEN_WID NUMBER, primary_hugo_name varchar2(200), intensity NUMBER) ON COMMIT DELETE ROWS;
create global temporary table query_6(SPECIMEN_WID NUMBER, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;
create global temporary table query_7(SPECIMEN_WID NUMBER, hugo_name varchar2(200), start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;
create global temporary table query_8(SPECIMEN_WID NUMBER, primary_hugo_name varchar2(200), intensity NUMBER) ON COMMIT DELETE ROWS;
create global temporary table query_9(SPECIMEN_WID NUMBER, start_position NUMBER, end_position NUMBER, replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;
create global temporary table query_10(SPECIMEN_WID NUMBER, primary_hugo_name varchar2(200), intensity NUMBER) ON COMMIT DELETE ROWS;
create global temporary table query_11(SPECIMEN_WID NUMBER, primary_hugo_name varchar2(200), intensity NUMBER) ON COMMIT DELETE ROWS;
create global temporary table query_12(SPECIMEN_WID NUMBER, primary_hugo_name varchar2(200), intensity NUMBER) ON COMMIT DELETE ROWS;
ccreate global temporary table query_15( specimen_wid number, absolute_position number, chromosome varchar2(50), replace_tag varchar2(1000)) ON COMMIT DELETE ROWS;
create global temporary table query_17(SPECIMEN_WID NUMBER, primary_hugo_name varchar2(200), hybridization_name varchar2(200), intensity NUMBER, p_vall NUMBER, exp_call varchar2(100)) ON COMMIT DELETE ROWS;
create global temporary table query_18b(SPECIMEN_WID NUMBER, hybridization_name varchar2(200), intensity NUMBER, p_vall NUMBER, exp_call varchar2(100)) ON COMMIT DELETE ROWS;
create global temporary table query_19_a(specimen_wid number) ON COMMIT DELETE ROWS;
create global temporary table query_26a(row_wid number, START_POSITION number, END_POSITION number, replace_tag varchar2(200), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;
create global temporary table query_26b(row_wid number, START_POSITION number, END_POSITION number, replace_tag varchar2(200), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;
create global temporary table query_26c(HUGO_NAME varchar2(200), START_POSITION number, END_POSITION number, chromosome varchar2(200), replace_tag varchar2(200), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;
create global temporary table query_26d(HUGO_NAME varchar2(200), START_POSITION number, END_POSITION number, chromosome varchar2(200) ,replace_tag varchar2(200), gcx_reference_id varchar2(200), px_reference_id varchar2(200)) ON COMMIT DELETE ROWS;