3.3 Using Semantic Operators to Query Relational Data

You can use semantic operators to query relational data in an ontology-assisted manner, based on the semantic relationship between the data in a table column and terms in an ontology.

The SEM_RELATED semantic operator retrieves rows based on semantic relatedness. The SEM_DISTANCE semantic operator returns distance measures for the semantic relatedness, so that rows returned by the SEM_RELATED operator can be ordered or restricted using the distance measure. The index type MDSYS.SEM_INDEXTYPE allows efficient execution of such queries, enabling scalable performance over large data sets.

Note:

SEM_RELATED and SEM_DISTANCE are not supported on schema-private RDF networks.

3.3.1 Using the SEM_RELATED Operator

Referring to the ontology example in Example: Disease Ontology, consider the following query that requires semantic matching: Find all patients whose diagnosis is of the type 'Immune_System_Disorder'. A typical database query of the PATIENTS table (described in Example: Disease Ontology) involving syntactic match will not return any rows, because no rows have a DIAGNOSIS column containing the exact value Immune_System_Disorder. For example the following query will not return any rows:

SELECT diagnosis FROM patients WHERE diagnosis = 'Immune_System_Disorder';

Example 3-9 SEM_RELATED Operator

However, many rows in the patient data table are relevant, because their diagnoses fall under this class. Example 3-9 uses the SEM_RELATED operator (instead of lexical equality) to retrieve all the relevant rows from the patient data table. (In this example, the term Immune_System_Disorder is prefixed with a namespace, and the default assumption is that the values in the table column also have a namespace prefix. However, that might not always be the case, as explained in Using URIPREFIX When Values Are Not Stored as URIs.)

SELECT diagnosis FROM patients 
  WHERE SEM_RELATED (diagnosis, 
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;

The SEM_RELATED operator has the following attributes:

SEM_RELATED(
  sub  VARCHAR2,
  predExpr  VARCHAR2,
  obj  VARCHAR2,
  ontologyName  SEM_MODELS,
  ruleBases  SEM_RULEBASES,
  index_status  VARCHAR2,
  lower_bound INTEGER,
  upper_bound INTEGER
 ) RETURN INTEGER;

The sub attribute is the name of table column that is being searched. The terms in the table column are typically the subject in a <subject, predicate, object> triple pattern.

The predExpr attribute represents the predicate that can appear as a label of the edge on the path from the subject node to the object node.

The obj attribute represents the term in the ontology for which related terms (related by the predExpr attribute) have to be found in the table (in the column specified by the sub attribute). This term is typically the object in a <subject, predicate, object> triple pattern. (In a query with the equality operator, this would be the query term.)

The ontologyName attribute is the name of the ontology that contains the relationships between terms.

The rulebases attribute identifies one or more rulebases whose rules have been applied to the ontology to infer new relationships. The query will be answered based both on relationships from the ontology and the inferred new relationships when this attribute is specified.

The index_status optional attribute lets you query the data even when the relevant inferred graph (created when the specified rulebase was applied to the ontology) does not have a valid status. If this attribute is null, the query returns an error if the inferred graph does not have a valid status. If this attribute is not null, it must be the string VALID, INCOMPLETE, or INVALID, to specify the minimum status of the inferred graph for the query to succeed. Because OWL does not guarantee monotonicity, the value INCOMPLETE should not be used when an OWL Rulebase is specified.

The lower_bound and upper_bound optional attributes let you specify a bound on the distance measure of the relationship between terms that are related. See Using the SEM_DISTANCE Ancillary Operator for the description of the distance measure.

The SEM_RELATED operator returns 1 if the two input terms are related with respect to the specified predExpr relationship within the ontology, and it returns 0 if the two input terms are not related. If the lower and upper bounds are specified, it returns 1 if the two input terms are related with a distance measure that is greater than or equal to lower_bound and less than or equal to upper_bound.

3.3.2 Using the SEM_DISTANCE Ancillary Operator

The SEM_DISTANCE ancillary operator computes the distance measure for the rows filtered using the SEM_RELATED operator. The SEM_DISTANCE operator has the following format:

SEM_DISTANCE (number) RETURN NUMBER;

The number attribute can be any number, as long as it matches the number that is the last attribute specified in the call to the SEM_RELATED operator (see Example 3-10). The number is used to match the invocation of the ancillary operator SEM_DISTANCE with a specific SEM_RELATED (primary operator) invocation, because a query can have multiple invocations of primary and ancillary operators.

Example 3-10 SEM_DISTANCE Ancillary Operator

Example 3-10 expands Example 3-9 to show several statements that include the SEM_DISTANCE ancillary operator, which gives a measure of how closely the two terms (here, a patient's diagnosis and the term Immune_System_Disorder) are related by measuring the distance between the terms. Using the ontology described in Example: Disease Ontology, the distance between AIDS and Immune_System_Disorder is 3.

SELECT diagnosis, SEM_DISTANCE(123) FROM patients 
  WHERE SEM_RELATED (diagnosis, 
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1;
 
SELECT diagnosis FROM patients 
  WHERE SEM_RELATED (diagnosis,
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1
  ORDER BY SEM_DISTANCE(123);
 
SELECT diagnosis, SEM_DISTANCE(123) FROM patients 
  WHERE SEM_RELATED (diagnosis,
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1 
  AND SEM_DISTANCE(123) <= 3;

Example 3-11 Using SEM_DISTANCE to Restrict the Number of Rows Returned

Example 3-11 uses distance information to restrict the number of rows returned by the primary operator. All rows with a term related to the object attribute specified in the SEM_RELATED invocation, but with a distance of greater than or equal to 2 and less than or equal to 4, are retrieved.

SELECT diagnosis FROM patients 
  WHERE SEM_RELATED (diagnosis,
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime'), 2, 4) = 1;

In Example 3-11, the lower and upper bounds are specified using the lower_bound and upper_bound parameters in the SEM_RELATED operator instead of using the SEM_DISTANCE operator. The SEM_DISTANCE operator can be also be used for restricting the rows returned, as shown in the last SELECT statement in Example 3-10.

3.3.2.1 Computation of Distance Information

Distances are generated for the following properties during inference (inferred graph): OWL properties defined as transitive properties, and RDFS subClassOf and RDFS subPropertyOf properties. The distance between two terms linked through these properties is computed as the shortest distance between them in a hierarchical class structure. Distances of two terms linked through other properties are undefined and therefore set to null.

Each transitive property link in the original model (viewed as a hierarchical class structure) has a distance of 1, and the distance of an inferred triple is generated according to the number of links between the two terms. Consider the following hypothetical sample scenarios:

  • If the original graph contains C1 rdfs:subClassOf C2 and C2 rdfs:subClassOf C3, then C1 rdfs:subClassof of C3 will be derived. In this case:

    • C1 rdfs:subClassOf C2: distance = 1, because it exists in the model.

    • C2 rdfs:subClassOf C3: distance = 1, because it exists in the model.

    • C1 rdfs:subClassOf C3: distance = 2, because it is generated during inference.

  • If the original graph contains P1 rdfs:subPropertyOf P2 and P2 rdfs:subPropertyOf P3, then P1 rdfs:subPropertyOf P3 will be derived. In this case:

    • P1 rdfs:subPropertyOf P2: distance = 1, because it exists in the model.

    • P2 rdfs:subPropertyOf P3: distance = 1, because it exists in the model.

    • P1 rdfs:subPropertyOf P3: distance = 2, because it is generated during inference.

  • If the original graph contains C1 owl:equivalentClass C2 and C2 owl:equivalentClass C3, then C1 owl:equivalentClass C3 will be derived. In this case:

    • C1 owl:equivalentClass C2: distance = 1, because it exists in the model.

    • C2 owl:equivalentClass C3: distance = 1, because it exists in the model.

    • C1 owl:equivalentClass C3: distance = 2, because it is generated during inference.

The SEM_RELATED operator works with user-defined rulebases. However, using the SEM_DISTANCE operator with a user-defined rulebase is not yet supported, and will raise an error.

3.3.3 Creating a Semantic Index of Type MDSYS.SEM_INDEXTYPE

When using the SEM_RELATED operator, you can create a semantic index of type MDSYS.SEM_INDEXTYPE on the column that contains the ontology terms. Creating such an index will result in more efficient execution of the queries. The CREATE INDEX statement must contain the INDEXTYPE IS MDSYS.SEM_INDEXTYPE clause, to specify the type of index being created.

Example 3-12 Creating a Semantic Index

Example 3-12 creates a semantic index named DIAGNOSIS_SEM_IDX on the DIAGNOSIS column of the PATIENTS table using the ontology in Example: Disease Ontology.

CREATE INDEX diagnosis_sem_idx
  ON patients (diagnosis) 
  INDEXTYPE IS MDSYS.SEM_INDEXTYPE;

The column on which the index is built (DIAGNOSIS in Example 3-12) must be the first parameter to the SEM_RELATED operator, in order for the index to be used. If it not the first parameter, the index is not used during the execution of the query.

Example 3-13 Creating a Semantic Index Specifying an RDF and Rulebase

To improve the performance of certain RDF queries, you can cause statistical information to be generated for the semantic index by specifying one or more RDF graphs and rulebases when you create the index. Example 3-13 creates an index that will also generate statistics information for the specified RDF graph and rulebase. The index can be used with other RDF graphs and rulebases during query, but the statistical information will be used only if the RDF graph and rulebase specified during the creation of the index are the same RDF graph and rulebase specified in the query.

CREATE INDEX diagnosis_sem_idx
  ON patients (diagnosis) 
  INDEXTYPE IS MDSYS.SEM_INDEXTYPE('ONTOLOGY_MODEL(medical_ontology), 
    RULEBASE(OWLPrime)');

Example 3-14 Query Benefitting from Generation of Statistical Information

The statistical information is useful for queries that return top-k results sorted by semantic distance. Example 3-14 shows such a query.

SELECT /*+ FIRST_ROWS */ diagnosis FROM patients 
  WHERE SEM_RELATED (diagnosis,
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1
  ORDER BY SEM_DISTANCE(123);

3.3.4 Using SEM_RELATED and SEM_DISTANCE When the Indexed Column Is Not the First Parameter

If an index of type MDSYS.SEM_INDEXTYPE has been created on a table column that is the first parameter to the SEM_RELATED operator, the index will be used. For example, the following query retrieves all rows that have a value in the DIAGNOSIS column that is a subclass of (rdfs:subClassOf) Immune_System_Disorder.

SELECT diagnosis FROM patients 
  WHERE SEM_RELATED (diagnosis, 
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;

Assume, however, that this query instead needs to retrieve all rows that have a value in the DIAGNOSIS column for which Immune_System_Disorder is a subclass. You could rewrite the query as follows:

SELECT diagnosis FROM patients 
  WHERE SEM_RELATED
    ('<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', 
    diagnosis, 
    sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;

However, in this case a semantic index on the DIAGNOSIS column will not be used, because it is not the first parameter to the SEM_RELATED operator. To cause the index to be used, you can change the preceding query to use the inverseOf keyword, as follows:

SELECT diagnosis FROM patients 
  WHERE SEM_RELATED (diagnosis,
    'inverseOf(http://www.w3.org/2000/01/rdf-schema#subClassOf)',
    '<http://www.example.org/medical_terms/Immune_System_Disorder>', 
    sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;

This form causes the table column (on which the index is built) to be the first parameter to the SEM_RELATED operator, and it retrieves all rows that have a value in the DIAGNOSIS column for which Immune_System_Disorder is a subclass.

3.3.5 Using URIPREFIX When Values Are Not Stored as URIs

By default, the semantic operator support assumes that the values stored in the table are URIs. These URIs can be from different namespaces. However, if the values in the table do not have URIs, you can use the URIPREFIX keyword to specify a URI when you create the semantic index. In this case, the specified URI is prefixed to the value in the table and stored in the index structure. (One implication is that multiple URIs cannot be used).

Example 3-15 creates a semantic index that uses a URI prefix.

Example 3-15 Specifying a URI Prefix During Semantic Index Creation

CREATE INDEX diagnosis_sem_idx
  ON patients (diagnosis) 
  INDEXTYPE IS MDSYS.SEM_INDEXTYPE
  PARAMETERS('URIPREFIX(<http://www.example.org/medical/>)');

The slash (/) character at the end of the URI is important, because the URI is prefixed to the table value (in the index structure) without any parsing.