1.7 Using the SEM_MATCH Table Function to Query RDF Data

To query RDF data, use the SEM_MATCH table function.

This function has the following attributes:

SEM_MATCH(
  query         VARCHAR2,
  models        SEM_MODELS,
  rulebases     SEM_RULEBASES,
  aliases       SEM_ALIASES,
  filter        VARCHAR2,
  index_status  VARCHAR2    DEFAULT NULL,
  options       VARCHAR2    DEFAULT NULL,
  graphs        SEM_GRAPHS  DEFAULT NULL,
  named_graphs  SEM_GRAPHS  DEFAULT NULL,
  network_owner VARCHAR2    DEFAULT NULL,
  network_name  VARCHAR2    DEFAULT NULL
 ) RETURN ANYDATASET;

The query and models attributes are required. The other attributes are optional (that is, each can be a null value).

The query attribute is a string literal (or concatenation of string literals) with one or more triple patterns, usually containing variables. (The query attribute cannot be a bind variable or an expression involving a bind variable.) A triple pattern is a triple of atoms followed by a period. Each atom can be a variable (for example, ?x), a qualified name (for example, rdf:type) that is expanded based on the default namespaces and the value of the aliases attribute, or a full URI (for example, <http://www.example.org/family/Male>). In addition, the third atom can be a numeric literal (for example, 3.14), a plain literal (for example, "Herman"), a language-tagged plain literal (for example, "Herman"@en), or a typed literal (for example, "123"^^xsd:int).

For example, the following query attribute specifies three triple patterns to find grandfathers (that is, grandparents who are also male) and the height of each of their grandchildren:

'SELECT * WHERE { ?x :grandParentOf ?y . ?x rdf:type :Male . ?y :height ?h }'

The models attribute identifies the RDF graphs to use. Its data type is SEM_MODELS, which has the following definition: TABLE OF VARCHAR2(25). If you are querying an RDF graph collection, specify only the name of the RDF graph collection and no other RDF graphs. (RDF graph collections are explained in RDF Graph Collections.)

The rulebases attribute identifies one or more rulebases whose rules are to be applied to the query. Its data type is SEM_RULEBASES, which has the following definition: TABLE OF VARCHAR2(25). If you are querying an RDF graph collection, this attribute must be null.

The aliases attribute identifies one or more namespaces, in addition to the default namespaces, to be used for expansion of qualified names in the query pattern. Its data type is SEM_ALIASES, which has the following definition: TABLE OF SEM_ALIAS, where each SEM_ALIAS element identifies a namespace ID and namespace value. The SEM_ALIAS data type has the following definition: (namespace_id VARCHAR2(30), namespace_val VARCHAR2(4000))

The following default namespaces (namespace_id and namespace_val attributes) are used by the SEM_MATCH table function and the SEM_CONTAINS and SEM_RELATED operators:

('ogc', 'http://www.opengis.net/ont/geosparql#')
('ogcf', 'http://www.opengis.net/def/function/geosparql/')
('ogcgml', 'http://www.opengis.net/ont/gml#')
('ogcsf', 'http://www.opengis.net/ont/sf#')
('orardf', 'http://xmlns.oracle.com/rdf/')
('orageo', 'http://xmlns.oracle.com/rdf/geo/')
('owl',    'http://www.w3.org/2002/07/owl#')
('rdf',    'http://www.w3.org/1999/02/22-rdf-syntax-ns#')
('rdfs',   'http://www.w3.org/2000/01/rdf-schema#')
('xsd',    'http://www.w3.org/2001/XMLSchema#')

You can override any of these defaults by specifying the namespace_id value and a different namespace_val value in the aliases attribute.

The filter attribute identifies any additional selection criteria. If this attribute is not null, it should be a string in the form of a WHERE clause without the WHERE keyword. For example: '(h >= ''6'')' to limit the result to cases where the height of the grandfather's grandchild is 6 or greater (using the example of triple patterns earlier in this section).

Note:

Instead of using the filter attribute, you are encouraged to use the FILTER keyword inside your query pattern whenever possible (as explained in Graph Patterns: Support for Curly Brace Syntax_ and OPTIONAL_ FILTER_ UNION_ and GRAPH Keywords). Using the FILTER keyword is likely to give better performance because of internal optimizations. The filter argument, however, can be useful if you require SQL constructs that cannot be expressed with the FILTER keyword.

The index_status attribute lets you query RDF data even when the relevant inferred graph does not have a valid status. (If you are querying an RDF graph collection, this attribute refers to the inferred graph associated with the RDF graph collection.) If this attribute is null, the query returns an error if the inferred graph does not have a valid status. If this attribute is not null, it must be the string INCOMPLETE or INVALID. For an explanation of query behavior with different index_status values, see Performing Queries with Incomplete or Invalid Entailments.

The options attribute identifies options that can affect the results of queries. Options are expressed as keyword-value pairs. The following options are supported:

  • ALL_AJ_HASH, ALL_AJ_MERGE, and ALL_BGP_NL are global query optimizer hints that specify that all anti joins for NOT EXISTS and MINUS operations should use the specified join type.

  • ALL_BGP_HASH and ALL_BGP_NL are global query optimizer hints that specify that all inter-BGP joins (for example. the join between the root BGP and an OPTIONAL BGP) should use the specified join type. (BGP stands for basic graph pattern. From the W3C SPARQL Query Language for RDF Recommendation: "SPARQL graph pattern matching is defined in terms of combining the results from matching basic graph patterns. A sequence of triple patterns interrupted by a filter comprises a single basic graph pattern. Any graph pattern terminates a basic graph pattern."

    The BGP_JOIN(USE_NL) and BGP_JOIN(USE_HASH) HINT0 query optimizer hints can be used to control the join type with finer granularity.

    Example 1-17 shows the ALL_BGP_HASH option used in a SEM_MATCH query.

  • AUTO_HINTS=T automatically detects and generates USE_HASH hints for unselective SPARQL queries.

  • ALL_LINK_HASH and ALL_LINK_NL are global query optimizer hints that specify the join type for all RDF_LINK$ joins (that is, all joins between triple patterns within a BGP). ALL_LINK_HASH and ALL_LINK_NL can also be used within a HINT0 query optimizer hint for finer granularity.

  • ALL_MAX_PP_DEPTH(n) is a global query optimizer hint that sets the maximum depth to use when evaluating * and + property path operators. The default value is 10. The MAX_PP_DEPTH(n) HINT0 hint can be used to specify maximum depth with finer granularity.

  • ALL_NO_MERGE is a global query optimizer hint that adds NO_MERGE to each subquery in the generated SQL for a SPARQL query. This hint is used to ensure that a selective subquery in a SPARQL query is not merged with the other parts of the SPARQL query.

  • ALL_ORDERED is a global query optimizer hint that specifies that the triple patterns in each BGP in the query should be evaluated in order.

    Example 1-17 shows the ALL_ORDERED option used in a SEM_MATCH query.

  • ALL_USE_PP_HASH and ALL_USE_PP_NL are global query optimizer hints that specify the join type to use when evaluating property path expressions. The USE_PP_HASH and USE_PP_NL HINT0 hints can be used for specifying join type with finer granularity.

  • ALLOW_DUP=T generates an underlying SQL statement that performs a "union all" instead of a union of the RDF graphs and inferred data (if applicable). This option may introduce more rows (duplicate triples) in the result set, and you may need to adjust the application logic accordingly. If you do not specify this option, duplicate triples are automatically removed across all the RDF graphs and inferred data to maintain the set semantics of merged RDF graphs; however, removing duplicate triples increases query processing time. In general, specifying 'ALLOW_DUP=T' improves performance significantly when multiple RDF graphs are involved in a SEM_MATCH query.

    If you are querying an RDF graph collection, specifying ALLOW_DUP=T causes the SEMV_vm_name view to be queried; otherwise, the SEMU_vm_name view is queried.

  • ALLOW_PP_DUP=T allows duplicate results for + and * property path queries. Allowing duplicate results may return the first result rows faster.

  • AS_OF [SCN, <SCN_VALUE>] , where <SCN_VALUE> is a valid system change number, indicates that Flashback Query should be used to query the state of the RDF network as of the specified SCN.

  • AS_OF [TIMESTAMP, <TIMESTAMP_VALUE>] , where <TIMESTAMP_VALUE> is a valid timestamp string with format 'YYYY/MM/DD HH24:MI:SS.FF', indicates that Flashback Query should be used to query the state of the RDF network as of the specified timestamp.

  • CLOB_AGG_SUPPORT=T enables support for CLOB values for the following aggregates: MIN, MAX, GROUP_CONCAT, SAMPLE. Note that enabling CLOB support incurs a significant performance penalty.

  • CLOB_EXP_SUPPORT=T enables support for CLOB values for some built-in SPARQL functions. Note that enabling CLOB support incurs a significant performance penalty.

  • CONSTRUCT_STRICT=T eliminates invalid RDF triples from the result of SPARQL CONSTRUCT or SPARQL DESCRIBE syntax queries. RDF triples with literals in the subject position or literals or blank nodes in the predicate position are considered invalid.

  • CONSTRUCT_UNIQUE=T eliminates duplicate RDF triples from the result of SPARQL CONSTRUCT or SPARQL DESCRIBE syntax queries.

  • DISABLE_IM_VIRTUAL_COL specifies that the query compiler should not use in-memory virtual columns.

  • DISABLE_MVIEW specifies that the query compiler should not use materialized views.

  • DISABLE_NULL_EXPR_JOIN specifies that the query compiler should assume that all SELECT expressions produce non-null output.

  • DISABLE_SAMEAS_BLOOM specifies that the query compiler should not use a Bloom filter when owl:sameAs triples are joined. (For detailed information, see the explanation of Bloom filters in Oracle Database SQL Tuning Guide.)

  • DO_UNESCAPE=T causes characters in the following return columns to be unescaped according to the W3C N-Triples specification (http://www.w3.org/TR/rdf-testcases/#ntriples): var, var$_PREFIX, var$_SUFFIX, var$RDFCLOB, var$RDFLTYP, var$RDFLANG, and var$RDFTERM.

    See also the reference information for SEM_APIS.ESCAPE_CLOB_TERM, SEM_APIS.ESCAPE_CLOB_VALUE, SEM_APIS.ESCAPE_RDF_TERM, SEM_APIS.ESCAPE_RDF_VALUE, SEM_APIS.UNESCAPE_CLOB_TERM, SEM_APIS.UNESCAPE_CLOB_VALUE, SEM_APIS.UNESCAPE_RDF_TERM, and SEM_APIS.UNESCAPE_RDF_VALUE.

  • FINAL_VALUE_HASH and FINAL_VALUE_NL are global query optimizer hints that specify the join method that should be used to obtain the lexical values for any query variables that are not used in a FILTER clause.

  • GRAPH_MATCH_UNNAMED=T allows unnamed triples (null G_ID) to be matched inside GRAPH clauses. That is, two triples will satisfy the graph join condition if their graphs are equal or if one or both of the graphs are null. This option may be useful when your dataset includes unnamed TBOX triples or unnamed entailed triples.

  • HINT0={<hint-string>} (pronounced and written "hint" and the number zero) specifies one or more keywords with hints to influence the execution plan and results of queries. Conceptually, a graph pattern with n triple patterns and referring to m distinct variables results in an (n+m)-way join: n-way self-join of the target RDFgraphs and optionally the corresponding inferred graph, and then m joins with RDF_VALUE$ for looking up the values for the m variables. A hint specification affects the join order and join type used for the query execution.

    The hint specification, <hint-string>, uses keywords, some of which have parameters consisting of a sequence or set of aliases, or references, for individual triple patterns and variables used in the query. Aliases for triple patterns are of the form ti where i refers to the 0-based ordinal numbers of triple patterns in the query. For example, the alias for the first triple pattern in a query is t0, the alias for the second one is t1, and so on. Aliases for the variables used in a query are simply the names of those variables. Thus, ?x will be used in the hint specification as the alias for a variable ?x used in the graph pattern.

    Hints used for influencing query execution plans include LEADING(<sequence of aliases>), USE_NL(<set of aliases>), USE_HASH(<set of aliases>), and INDEX(<alias> <index_name>). These hints have the same format and basic meaning as hints in SQL statements, which are explained in Oracle Database SQL Language Reference.

    Example 1-12 shows the HINT0 option used in a SEM_MATCH query.

  • HTTP_METHOD=POST_PAR indicates that the HTTP POST method with URL-encoded parameters pass should be used for the SERVICE request. The default option for requests is the HTTP GET method. For more information about SPARQL protocol, see http://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#protocol.

  • INF_ONLY=T queries only the entailed graph for the specified RDF graphs and rulebases.

  • OVERLOADED_NL=T specifies that a procedural nested loop execution should be used to join with an overloaded SERVICE clause.

  • PLUS_RDFT=T can be used with SPARQL SELECT syntax (see Expressions in the SELECT Clause) to additionally return a var$RDFTERM CLOB column for each projected query variable. The value for this column is equivalent to the result of SEM_APIS.COMPOSE_RDF_TERM(var, var$RDFVTYP, var$RDFLTYP, var$RDFLANG, var$RDFCLOB). When using this option, the return columns for each variable var will be var, var$RDFVID, var$_PREFIX, var$_SUFFIX, var$RDFVTYP, var$RDFCLOB, var$RDFLTYP, var$RDFLANG, and var$RDFTERM.

  • PLUS_RDFT=VC can be used with SPARQL SELECT syntax (see Expressions in the SELECT Clause) to additionally return a var$RDFTERM VARCHAR2(NETWORK_MAX_STRING_SIZE) column for each projected query variable. The value for this column is equivalent to the result of SEM_APIS.COMPOSE_RDF_TERM(var, var$RDFVTYP, var$RDFLTYP, var$RDFLANG). When using this option, the return columns for each variable var will be var, var$RDFVID, var$_PREFIX, var$_SUFFIX, var$RDFVTYP, var$RDFCLOB, var$RDFLTYP, var$RDFLANG, and var$RDFTERM. Note that when your RDF network is using NETWORK_STORAGE_FORM=UNESC, special characters in var$RDFTERM are automatically escaped to form syntactically valid RDF values. This may cause the size of var$RDFTERM to exceed NETWORK_MAX_STRING_SIZE and hence an error will be raised in such cases. To avoid the error, you can use PLUS_RDFT=T to return a CLOB instead.

  • PROJ_EXACT_VALUES=T disables canonicalization of values returned from functions and of constant values used in value assignment statements. Such values are canonicalized by default.

  • SERVICE_CLOB=F sets the column values of var$RDFCLOB to null instead of saving values when calling the service. If CLOB data is not needed in your application, performance can be improved by using this option to skip CLOB processing.

  • SERVICE_ESCAPE=F disables character escaping for RDF literal values returned by SPARQL SERVICE calls. RDF literal values are escaped by default. If character escaping is not relevant for your application, performance can be improved by disabling character escaping.

  • SERVICE_JPDWN=T is a query optimizer hint for using nested loop join in SPARQL SERVICE. Example 1-73 shows the SERVICE_JPDWN=T option used in a SEM_MATCH query.

  • SERVICE_PROXY=<proxy-string> sets a proxy address to be used when performing http connections. The given proxy-string will be used in SERVICE queries. Example 1-76 shows a SEM_MATCH query including a proxy address.

  • STRICT_AGG_CARD=T uses SPARQL semantics (one null row) instead of SQL semantics (zero rows) for aggregate queries with graph patterns that fail to match. This option incurs a slight performance penalty.

  • STRICT_DEFAULT=T restricts the default graph to unnamed triples when no dataset information is specified.

The graphs attribute specifies the set of named graphs from which to construct the default graph for a SEM_MATCH query. Its data type is SEM_GRAPHS, which has the following definition: TABLE OF VARCHAR2(4000). The default value for this attribute is NULL. When graphs is NULL, the "union all" of all default graphs in the set of RDF graphs specified in the models attribute is used as the default graph.

The named_graphs attribute specifies the set of named graphs that can be matched by a GRAPH clause. Its data type is SEM_GRAPHS, which has the following definition: TABLE OF VARCHAR2(4000). The default value for this attribute is NULL. When named_graphs is NULL, all named graphs in the set of RDF graphs specified in the models attribute can be matched by a GRAPH clause.

The network_owner attribute specifies the schema that owns the RDF network that contains the RDF graph or RDF graph collection specified in the models attribute. This attribute should be non-null to query a schema-private RDF network.

The network_name attribute specifies the name of the RDF network that contains the RDF graph or graph collection specified in the models attribute. This attribute should be non-null to query a schema-private RDF network.

The SEM_MATCH table function returns an object of type ANYDATASET, with elements that depend on the input variables. In the following explanations, var represents the name of a variable used in the query. For each variable var, the result elements have the following attributes: var, var$RDFVID, var$_PREFIX, var$_SUFFIX, var$RDFVTYP, var$RDFCLOB, var$RDFLTYP, and var$RDFLANG.

In such cases, var has the lexical value bound to the variable, var$RDFVID has the VALUE_ID of the value bound to the variable, var$_PREFIX and var$_SUFFIX are the prefix and suffix of the value bound to the variable, var$RDFVTYP indicates the type of value bound to the variable (URI, LIT [literal], or BLN [blank node]), var$RDFCLOB has the lexical value bound to the variable if the value is a long literal, var$RDFLTYP indicates the type of literal bound if a literal is bound, and var$RDFLANG has the language tag of the bound literal if a literal with language tag is bound. var$RDFCLOB is of type CLOB, while all other attributes are of type VARCHAR2.

For a literal value or a blank node, its prefix is the value itself and its suffix is null. For a URI value, its prefix is the left portion of the value up to and including the rightmost occurrence of any of the three characters / (slash), # (pound), or : (colon), and its suffix is the remaining portion of the value to the right. For example, the prefix and suffix for the URI value http://www.example.org/family/grandParentOf are http://www.example.org/family/ and grandParentOf, respectively.

Along with columns for variable values, a SEM_MATCH query that uses SPARQL SELECT syntax returns one additional NUMBER column, SEM$ROWNUM, which can be used to ensure the correct result ordering for queries that involve a SPARQL ORDER BY clause.

A SEM_MATCH query that uses SPARQL ASK syntax returns the columns ASK, ASK$RDFVID, ASK$_PREFIX, ASK$_SUFFIX, ASK$RDFVTYP, ASK$RDFCLOB, ASK$RDFLTYP, ASK$RDFLANG, and SEM$ROWNUM. This is equivalent to a SPARQL SELECT syntax query that projects a single ?ask variable.

A SEM_MATCH query that uses SPARQL CONSTRUCT or SPARQL DESCRIBE syntax returns columns that contain RDF triple data rather than query result bindings. Such queries return values for subject, predicate and object components. See Graph Patterns: Support for SPARQL CONSTRUCT Syntaxfor details.

To use the SEM_RELATED operator to query an OWL ontology, see Using Semantic Operators to Query Relational Data.

When you are querying multiple RDF graphs, or querying one or more RDF graphs and the corresponding inferred graph, consider using RDF graph collections (explained in RDF Graph Collections) because of the potential performance benefits.

Example 1-11 SEM_MATCH Table Function

Example 1-11 selects all grandfathers (grandparents who are male) and their grandchildren from the family RDF graph, using inferencing from both the RDFS and family_rb rulebases. (This example is an excerpt from Example 1-130 in Example: Family Information.)

SELECT x$rdfterm grandfather, y$rdfterm grandchild
  FROM TABLE(SEM_MATCH(
    'PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     PREFIX     : <http://www.example.org/family/>
     SELECT ?x ?y
     WHERE {?x :grandParentOf ?y . ?x rdf:type :Male}',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null,  null, null,
    ' PLUS_RDFT=VC ',
    null, null,
    'RDFUSER', 'NET1'));

Example 1-12 HINT0 Option with SEM_MATCH Table Function

Example 1-12 is functionally the same as Example 1-11, but it adds the HINT0 option.

SELECT x$rdfterm grandfather, y$rdfterm grandchild
  FROM TABLE(SEM_MATCH(
    'PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     PREFIX     : <http://www.example.org/family/>
     SELECT ?x ?y
     WHERE {?x :grandParentOf ?y . ?x rdf:type :Male}',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null,  null, null,
    ' PLUS_RDFT=VC HINT0={LEADING(t0 t1) USE_NL(?x ?y)}',
    null, null,
    'RDFUSER', 'NET1'));

Example 1-13 DISABLE_SAMEAS_BLOOM Option with SEM_MATCH Table Function

Example 1-12 specifies that the query compiler should not use a Bloom filter when owl:sameAs triples are joined.

SELECT select s, o
FROM table(sem_match('{ # HINT0={LEADING(t1 t0) USE_HASH(t0 t1)}
 ?s owl:sameAs ?o. ?o owl:sameAs ?s}', sem_models('M1'), null,null,null,null,
  ' DISABLE_SAMEAS_BLOOM ')) order by 1,2;

Example 1-14 SEM_MATCH Table Function

Example 1-14 uses the Pathway/Genome BioPax ontology to get all chemical compound types that belong to both Proteins and Complexes:

SELECT t.r 
  FROM TABLE (SEM_MATCH ( 
      'PREFIX : <http://www.biopax.org/release1/biopax-release1.owl>
       SELECT ?r
       WHERE {
        ?r rdfs:subClassOf :Proteins .  
        ?r rdfs:subClassOf :Complexes}', 
      SEM_Models ('BioPax'), 
      SEM_Rulebases ('rdfs'), 
      NULL, NULL, NULL, '', NULL, NULL,
      'RDFUER','NET1')) t;

As shown in Example 1-14, the search pattern for the SEM_MATCH table function is specified using SPARQL syntax where the variable starts with the question-mark character (?). In this example, the variable ?r must match to the same term, and thus it must be a subclass of both Proteins and Complexes.

1.7.1 Performing Queries with Incomplete or Invalid Inferred Graphs

You can query RDF data even when the relevant inferred graph does not have a valid status if you specify the string value INCOMPLETE or INVALID for the index_status attribute of the SEM_MATCH table function. (The inferred graph status is stored in the STATUS column of the SEM_RULES_INDEX_INFO view, which is described in Inferred Graphs. The SEM_MATCH table function is described in Using the SEM_MATCH Table Function to Query RDF Data.)

The index_status attribute value affects the query behavior as follows:

  • If the inferred graph has a valid status, the query behavior is not affected by the value of the index_status attribute.

  • If you provide no value or specify a null value for index_status, the query returns an error if the inferred graph does not have a valid status.

  • If you specify the string INCOMPLETE for the index_status attribute, the query is performed if the status of the inferred graph is incomplete or valid.

  • If you specify the string INVALID for the index_status attribute, the query is performed regardless of the actual status of the inferred graph (invalid, incomplete, or valid).

However, the following considerations apply if the status of the inferred graph is incomplete or invalid:

  • If the status is incomplete, the content of an inferred graph may be approximate, because some triples that are inferable (due to the recent insertions into the underlying RDF graphs) may not actually be present in the inferred graph, and therefore results returned by the query may be inaccurate.

  • If the status is invalid, the content of the inferred graph may be approximate, because some triples that are no longer inferable (due to recent modifications to the underlying RDF graphs or rulebases, or both) may still be present in the inferred graph, and this may affect the accuracy of the result returned by the query. In addition to possible presence of triples that are no longer inferable, some inferable rows may not actually be present in the inferred graph.

1.7.2 Graph Patterns: Support for Curly Brace Syntax, and OPTIONAL, FILTER, UNION, and GRAPH Keywords

The SEM_MATCH table function accepts the syntax for the graph pattern in which a sequence of triple patterns is enclosed within curly braces. The period is usually required as a separator unless followed by the OPTIONAL, FILTER, UNION, or GRAPH keyword. With this syntax, you can do any combination of the following:

  • Use the OPTIONAL construct to retrieve results even in the case of a partial match

  • Use the FILTER construct to specify a filter expression in the graph pattern to restrict the solutions to a query

  • Use the UNION construct to match one of multiple alternative graph patterns

  • Use the GRAPH construct (explained in GRAPH Keyword Support) to scope graph pattern matching to a set of named graphs

In addition to arithmetic operators (+, -, *, /), Boolean operators and logical connectives (||, &&, !), and comparison operators (<, >, <=, >=, =, !=), several built-in functions are available for use in FILTER clauses. Table 1-14 lists built-in functions that you can use in the FILTER clause. In the Description column of Table 1-14, x, y, and z are arguments of the appropriate types.

Table 1-14 Built-in Functions Available for FILTER Clause

Function Description

ABS(RDF term)

Returns the absolute value of term. If term is a non-numerical value, returns null.

BNODE(literal) or BNODE()

Constructs a blank node that is distinct from all blank nodes in the dataset of the query, and those created by this function in other queries. The form with no arguments results in a distinct blank node in every call. The form with a simple literal results in distinct blank nodes for different simple literals, and the same blank node for calls with the same simple literal.

BOUND(variable)

BOUND(x) returns true if x is bound (that is, non-null) in the result, false otherwise.

CEIL(RDF term)

Returns the closest number with no fractional part which is not less than term. If term is a non-numerical value, returns null.

COALESCE(term list)

Returns the first element on the argument list that is evaluated without raising an error. Unbound variables raise an error if evaluated. Returns null if there are no valid elements in the term list.

CONCAT(term list)

Returns an xsd:String value resulting of the concatenation of the string values in the term list.

CONTAINS(literal, match)

Returns true if the string match is found anywhere in literal. It returns false otherwise.

DATATYPE(literal)

DATATYPE(x) returns a URI representing the datatype of x.

DAY(argument)

Returns an integer corresponding to the day part of argument. If the argument is not a dateTime or date data type, it returns a null value.

ENCODE_FOR_URI(literal)

Returns a string where the reserved characters in literal are escaped and converted to its percent-encode form.

EXISTS(pattern)

Returns true if the pattern matches the query data set, using the current bindings in the containing group graph pattern and the current active graph. If there are no matches, it returns false.

FLOOR(RDF term)

Returns the closest number with no fractional part which is less than term. If term is a non-numerical value, returns null.

HOURS(argument)

Returns an integer corresponding to the hours part of argument. If the argument is not a dateTime or date data type, it returns a null value.

IF(condition , expression1, expression2)

Evaluates the condition and obtains the effective Boolean value. If true, the first expression is evaluated and its value returned. If false, the second expression is used. If the condition raises an error, the error is passed as the result of the IF statement.

IRI(RDF term)

Returns an IRI resolving the string representation of argument term. If there is a base IRI defined in the query, the IR is resolve against it, and the result must result in an absolute IRI.

isBLANK(RDF term)

isBLANK(x) returns true if x is a blank node, false otherwise.

isIRI(RDF term)

isIRI(x) returns true if x is an IRI, false otherwise.

isLITERAL(RDF term)

isLiteral(x) returns true if x is a literal, false otherwise.

IsNUMERIC(RDF term)

Returns true if term is a numeric value, false otherwise.

isURI(RDF term)

isURI(x) returns true if x is a URI, false otherwise.

LANG(literal)

LANG(x) returns a plain literal serializing the language tag of x.

LANGMATCHES(literal, literal)

LANGMATCHES(x, y) returns true if language tag x matches language range y, false otherwise.

LCASE(literal)

Returns a string where each character in literal is converted to its lowercase correspondent.

MD5(literal)

Note:

Starting from Oracle Database 21c Release, the use of MD5 algorithm is deprecated. As this function will be desupported in a future release, it is recommended to replace MD5 with one of the SHA hash functions.

Returns the checksum for literal, corresponding to the MD5 hash function.

MINUTES(argument)

Returns an integer corresponding to the minutes part of argument. If the argument is not a dateTime or date data type, it returns a null value.

MONTH(argument)

Returns an integer corresponding to the month part of argument. If the argument is not a dateTime or date data type, it returns a null value.

NOT_EXISTS(pattern)

Returns true if the pattern does not match the query data set, using the current bindings in the containing group graph pattern and the current active graph. It returns false otherwise.

NOW()

Returns an xsd:dateTime value corresponding to the current time at the moment of the query execution.

RAND()

Generates a numeric value in the range of [0,1).

REGEX(string, pattern)

REGEX(x,y) returns true if x matches the regular expression y, false otherwise. For more information about the regular expressions supported, see the Oracle Regular Expression Support appendix in Oracle Database SQL Language Reference.

REGEX(string, pattern, flags)

REGEX(x,y,z) returns true if x matches the regular expression y using the options given in z, false otherwise. Available options: 's' – dot all mode ('.' matches any character including the newline character); 'm' – multiline mode ('^' matches the beginning of any line and '$' matches the end of any line); 'i' – case insensitive mode; 'x' – remove whitespace characters from the regular expression before matching.

REPLACE(string, pattern, replacement)

Returns a string where each match of the regular expression pattern in string is replaced by replacement. For more information about the regular expressions supported, see the Oracle Regular Expression Support appendix in Oracle Database SQL Language Reference.

REPLACE(string, pattern, replacement, flags)

Returns a string where each match of the regular expression pattern in string is replaced by replacement. Available options: 's' – dot all mode ('.' matches any character including the newline character); 'm' – multiline mode ('^' matches the beginning of any line and '$' matches the end of any line); 'i' – case insensitive mode; 'x' – remove whitespace characters from the regular expression before matching.

For more information about the regular expressions supported, see the Oracle Regular Expression Support appendix in Oracle Database SQL Language Reference.

ROUND(RDF term)

Returns the closest number with no fractional part to term. If two values exist, the value closer to positive infinite is returned. If term is a non-numerical value, returns null.

sameTerm(RDF term, RDF term)

sameTerm(x, y) returns true if x and y are the same RDF term, false otherwise.

SECONDS(argument)

Returns an integer corresponding to the seconds part of argument. If the argument is not a dateTime or date data type, it returns a null value.

SHA1(literal)

Returns the checksum for literal, corresponding to the SHA1 hash function.

SHA256(literal)

Returns the checksum for literal, corresponding to the SHA256 hash function.

SHA384(literal)

Returns the checksum for literal, corresponding to the SHA384 hash function.

SHA512(literal)

Returns the checksum for literal, corresponding to the SHA512 hash function.

STR(RDF term)

STR(x) returns a plain literal of the string representation of x (that is, what would be stored in the VALUE_NAME column of RDF_VALUE$ enclosed within double quotes).

STRAFTER(literal, literal)

StrAfter (x,y) returns the portion of the string corresponding to substring that precedes in x the first match of y, and the end of x. If y cannot be matched inside x, the empty string is returned.

STRBEFORE(literal, literal)

StrBefore (x,y) returns the portion of the string corresponding to the start of x and the first match of y. If y cannot be matched inside x, the empty string is returned.

STRDT(string, datatype)

Construct a literal term composed by the string lexical form and the datatype passed as arguments. datatype must be a URI; otherwise, the function returns a null value.

STRENDS(literal, match)

Returns true if the string literal ends with the string match. It returns false otherwise.

STRLANG (string, languageTag)

Constructs a string composed by the string lexical form and language tag passed as arguments.

STRLEN(literal)

Returns the length of the lexical form of the literal.

STRSTARTS(literal, match)

Returns true if the string literal starts with the string match. It returns false otherwise.

STRUUID()

Returns a string containing the scheme section of a new UUID.

SUBSTR(term, startPos)

Returns the string corresponding to the portion of term that starts at startPos and continues until term ends. The index of the first character is 1.

SUBSTR(term, startPos, length)

Returns the string corresponding to the portion of term that starts at startPos and continues for length characters. The index of the first character is 1.

term IN (term list)

The expression x IN(term list) returns true if x can be found in any of the values in termlist. Returns false if not found. Zero-length lists are legal. An error is raised if any of the values in termlist raises an error and x is not found.

term NOT IN (term list)

The expression x NOT IN(term list) returns false if x can be found in any of the values in term list. Returns true if not found. Zero-length lists are legal. An error is raised if any of the values in term list raises an error and x is not found.

TIMEZONE(argument)

Returns the time zones section of argument as an xsd:dayTimeDuration value. If the argument is not a dateTime or date data type, it returns a null value.

TZ(argument)

Returns an integer corresponding to the time zone part of argument. If the argument is not a dateTime or date data type, it returns a null value.

UCASE(literal)

Returns a string where each character in literal is converted to its uppercase correspondent.

URI(RDF term)

(Synonym for IRI(RDF term)

UUID()

Returns a URI with a new Universal Unique Identifier. The value and the version correspond to the PL/SQL function sys_guid ().

YEAR(argument)

Returns an integer corresponding to the year part of argument.

See also the descriptions of the built-in functions defined in the SPARQL query language specification (http://www.w3.org/TR/sparql11-query/), to better understand the built-in functions available in SEM_MATCH.

In addition, Oracle provides some proprietary query functions that take advantage of Oracle Database features and help improve query performance. The following table lists these Oracle-specific query functions. Note that the built-in namespace prefix orardf expands to <http://xmlns.oracle.com/rdf/>.

Table 1-15 Oracle-Specific Query Functions

Function Description
orardf:concat(RDF term, RDF term, …)

Returns true if the given term matches with the given like pattern. Otherwise, the function returns false.

orardf:contains(RDF term, RDF term)

Returns true if the string representation of the first term contains the string representation of the second term as a substring. Otherwise, the function returns false.

orardf:instr(RDF term, RDF term)

Searches the string representation of the first term for the string representation of the second term. Returns an integer indicating the position of the first character of the occurrence in the first term (the first character in a string is position 1).

If the search is unsuccessful, then the returned value is 0.

orardf:instr(RDF term, RDF term, position)

Searches the string representation of the first term for the string representation of the second term. Returns an integer indicating the position of the first character of the occurrence in first term (the first character is position 1).

If the search is unsuccessful, then the returned value is 0.

Position is a nonzero interger indicating the character of the string representation of the first term at which to begin the search (the first character is position 1). If position is negative, then orardf:instr counts and searches backwards from the end of the first term.

orardf:instr(RDF term, RDF term, position, occurrence)

Searches the string representation of the first term for the string representation of the second term. Returns an integer indicating the position of the first character of the occurrence in first term (the first character is position 1).

If the search is unsuccessful, then the returned value is 0.

Position is a nonzero interger indicating the character of the string representation of the first term at which to begin the search (the first character is position 1). If position is negative, then orardf:instr counts and searches backwards from the end of the first term.

Occurrence is a positive integer indicating which occurrence of the first term orardf:instr should search for.

orardf:lcase(RDF term)

Returns a string literal whose lexical form is the lower case of the string representation of the input term.

orardf:like(RDF term, pattern)

Returns true if the given term matches with the given like pattern. Otherwise, the function returns false. See Full-Text Search for more information.

orardf:like(RDF term, pattern, flags)

Returns true if the given term matches with the given like pattern using the specified flags. Otherwise, the function returns false. Available flags: 'i' – case insensitive mode. See Full-Text Search for more information.

orardf:ltrim(RDF term)

Returns the string representation of the input term with all blank characters removed from the left end.

orardf:ltrim(RDF term, set)

Returns the string representation of the input term with all of the characters contained in set removed from the left end.

orardf:rtrim(RDF term)

Returns the string representation of the input term with all blank characters removed from the right end.

orardf:rtrim(RDF term, set)

Returns the string representation of the input term with all of the characters contained in set removed from the right end.

orardf:sameCanonTerm(RDF term, RDF term)

Returns true if two terms represent the same canonical RDF term. Otherwise, the function returns false. Allows a VALUE_ID-based comparison, which is more efficient than sameTerm(?x, ?y) or (?x = ?y).

orardf:strafter(RDF term, RDF term)

Returns the part of the string representation of the first term that follows the first occurrence of the string representation of the second term.

If there is no such occurrence, then an empty string literal is returned.

orardf:strbefore(RDF term, RDF term)

Returns the part of the string representation of the first term that precedes the first occurrence of the string representation of the second term.

If there is no such occurrence, then an empty string literal is returned.

orardf:strends(RDF term, RDF term)

Returns true if the string representation of the first term contains the string representation of the second term as a trailing substring. Otherwise, the function returns false.

orardf:strlen(RDF term)

Returns the number of characters in the string representation of the input term.

orardf:strstarts(RDF term, RDF term) Returns true if the string representation of the first term contains the string representation of the second term as a leading substring. Otherwise, the function returns false.
orardf:substr(RDF term, start)

Returns a portion of the string representation of the input term beginning at the position indicated by the start value and continuing to the end of the input term (the first character is position 1).

orardf:substr(RDF term, start, length)

Returns a portion of the string representation of the input term beginning at the position indicated by the start value and continuing for the number of characters indicated by the length value (the first character is position 1).

orardf:textContains(RDF term, pattern)

Returns true if the given term matches with the given Oracle Text search pattern. Otherwise, the function returns false. See Full-Text Search for more information.

orardf:textScore(invocation id)

Returns the score of an orardf:textContains match. See Full-Text Search for more information.

orardf:ucase(RDF term)

Returns a string literal whose lexical form is the upper case of the lexical form of the input term.

(Spatial built-in functions)

(See Spatial Support.)

The following XML Schema casting functions are available for use in FILTER clauses. These functions take an RDF term as input and return a new RDF term of the desired type or raise an error if the term cannot be cast to the desired type. Details of type casting can be found in Section 17.1 of the XPath query specification: http://www.w3.org/TR/xpath-functions/#casting-from-primitive-to-primitive. These functions use the XML Namespace xsd : http://www.w3.org/2001/XMLSchema#.

  • xsd:string (RDF term)

  • xsd:dateTime (RDF term)

  • xsd:boolean (RDF term)

  • xsd:integer (RDF term)

  • xsd:float (RDF term)

  • xsd:double (RDF term)

  • xsd:decimal (RDF term)

If you use the syntax with curly braces to express a graph pattern:

  • The query always returns canonical lexical forms for the matching values for the variables.

  • Any hints specified in the options argument using HINT0={<hint-string>} (explained in Using the SEM_MATCH Table Function to Query RDF Data), should be constructed only on the basis of the portion of the graph pattern inside the root BGP. For example, the only valid aliases for use in a hint specification for the query in Example 1-16 are t0, t1, ?x, and ?y. Inline query optimizer hints can be used to influence other parts of the graph pattern (see Inline Query Optimizer Hints).

  • The FILTER construct is not supported for variables bound to long literals.

Example 1-15 Curly Brace Syntax

Example 1-15 uses the syntax with curly braces and a period to express a graph pattern in the SEM_MATCH table function.

SELECT x, y
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male}',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, '', null, null,
    'RDFUSER', 'NET1'));

Example 1-16 Curly Brace Syntax and OPTIONAL Construct

Example 1-16 uses the OPTIONAL construct to modify Example 1-15, so that it also returns, for each grandfather, the names of the games played or null if no games are played.

SELECT x, y, game
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male . 
      OPTIONAL{?x :plays ?game} 
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null,
    null,
    'HINT0={LEADING(t0 t1) USE_NL(?x ?y)}', 
    null, 
    null,
    'RDFUSER', 'NET1'));

Example 1-17 Curly Brace Syntax and Multi-Pattern OPTIONAL Construct

When multiple triple patterns are present in an OPTIONAL graph pattern, values for optional variables are returned only if a match is found for each triple pattern in the OPTIONAL graph pattern. Example 1-17 modifies Example 1-16 so that it returns, for each grandfather, the names of the games played with the grandchildren, or null if they have no such games in common. It also uses global query optimizer hints to specify that triple patterns should be evaluated in order within each BGP and that a hash join should be used to join the root BGP with the OPTIONAL BGP.

SELECT x, y, game
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male . 
      OPTIONAL{?x :plays ?game . ?y :plays ?game} 
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null,
    'ALL_ORDERED ALL_BGP_HASH', 
    null, null,
    'RDFUSER', 'NET1'));

Example 1-18 Curly Brace Syntax and Nested OPTIONAL Construct

A single query can contain multiple OPTIONAL graph patterns, which can be nested or parallel. Example 1-18 modifies Example 1-17 with a nested OPTIONAL graph pattern. This example returns (1) the games each grandfather plays or null if they play no games and (2) if the grandfather plays games, the ages of the grandchildren that play the same game, or null if they has no games in common. Note that in Example 1-18 a value is returned for ?game even if the nested OPTIONAL graph pattern ?y :plays ?game . ?y :age ?age is not matched.

SELECT x, y, game, age
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male . 
      OPTIONAL{?x :plays ?game 
                          OPTIONAL {?y :plays ?game . ?y :age ?age} } 
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-19 Curly Brace Syntax and Parallel OPTIONAL Construct

Example 1-19 modifies Example 1-17 with a parallel OPTIONAL graph pattern. This example returns (1) the games the each grandfather plays or null if they play no games and (2) the email address of each grandfather or null if they have no email address. Note that, unlike nested OPTIONAL graph patterns, parallel OPTIONAL graph patterns are treated independently. That is, if an email address is found, it will be returned regardless of whether or not a game was found; and if a game was found, it will be returned regardless of whether an email address was found.

SELECT x, y, game, email
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male . 
      OPTIONAL{?x :plays ?game}
      OPTIONAL{?x :email ?email} 
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-20 Curly Brace Syntax and FILTER Construct

Example 1-20 uses the FILTER construct to modify Example 1-15, so that it returns grandchildren information for only those grandfathers who are residents of either NY or CA.

SELECT x, y
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male . ?x :residentOf ?z
       FILTER (?z = "NY"  || ?z = "CA")}',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-21 Curly Brace Syntax and FILTER with REGEX and STR Built-In Constructs

Example 1-21 uses the REGEX built-in function to select all grandfathers who have an Oracle email address. Note that backslash (\) characters in the regular expression pattern must be escaped in the query string; for example, \\. produces the following pattern: \.

SELECT x, y, z
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male . ?x :email ?z
       FILTER (REGEX(STR(?z), "@oracle\\.com$"))}',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-22 Curly Brace Syntax and UNION and FILTER Constructs

Example 1-22 uses the UNION construct to modify Example 1-20, so that grandfathers are returned only if they are residents of NY or CA or own property in NY or CA, or if both conditions are true (they reside in and own property in NY or CA).

SELECT x, y
  FROM TABLE(SEM_MATCH(
    '{?x :grandParentOf ?y . ?x rdf:type :Male 
       {{?x :residentOf ?z} UNION {?x :ownsPropertyIn ?z}}
       FILTER (?z = "NY"  || ?z = "CA")}',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.2.1 GRAPH Keyword Support

A SEM_MATCH query is executed against an RDF Dataset. An RDF Dataset is a collection of graphs that includes one unnamed graph, known as the default graph, and one or more named graphs, which are identified by a URI. Graph patterns that appear inside a GRAPH clause are matched against the set of named graphs, and graph patterns that do not appear inside a graph clause are matched against the default graph. The graphs and named_graphs SEM_MATCH parameters are used to construct the default graph and set of named graphs for a given SEM_MATCH query. A summary of possible dataset configurations is shown in Table 1-16.

Table 1-16 SEM_MATCH graphs and named_graphs Values, and Resulting Dataset Configurations

Parameter Values Default Graph Set of Named Graphs

graphs: NULL

named_graphs: NULL

Union All of all unnamed triples and all named graph triples. (But if the options parameter contains STRICT_DEFAULT=T, only unnamed triples are included in the default graph.)

All named graphs

graphs: NULL

named_graphs: {g1,…, gn}

Empty set

{g1,…, gn}

graphs: {g1,…, gm}

named_graphs: NULL

Union All of {g1,…, gm}

Empty set

graphs: {g1,…, gm}

named_graphs: {gn,…, gz}

Union All of {g1,…, gm}

{gn,…, gz}

See also the W3C SPARQL specification for more information on RDF data sets and the GRAPH construct, specifically: http://www.w3.org/TR/rdf-sparql-query/#rdfDataset

Example 1-23 Named Graph Construct

Example 1-23 uses the GRAPH construct to scope graph pattern matching to a specific named graph. This example finds the names and email addresses of all people in the <http://www.example.org/family/Smith> named graph.

SELECT name, email
  FROM TABLE(SEM_MATCH(
    '{GRAPH :Smith {
       ?x :name ?name . ?x :email ?email } }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-24 Using the named_graphs Parameter

In addition to URIs, variables can appear after the GRAPH keyword. Example 1-24 uses a variable, ?g, with the GRAPH keyword, and uses the named_graphs parameter to restrict the possible values of ?g to the <http://www.example.org/family/Smith> and <http://www.example.org/family/Jones> named graphs. Aliases specified in SEM_ALIASES argument can be used in the graphs and named_graphs parameters.

SELECT name, email
  FROM TABLE(SEM_MATCH(
    '{GRAPH ?g {
       ?x :name ?name . ?x :email ?email } }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null,null,null,null,
    SEM_GRAPHS('<http://www.example.org/family/Smith>',
               ':Jones'),
    'RDFUSER', 'NET1'));

Example 1-25 Using the graphs Parameter

Example 1-25 uses the default graph to query the union of the <http://www.example.org/family/Smith> and <http://www.example.org/family/Jones> named graphs.

FROM TABLE(SEM_MATCH(
    '{?x :name ?name . ?x :email ?email }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null,null,null,
    SEM_GRAPHS('<http://www.example.org/family/Smith>', 
               ':Jones'),
    null,
    'RDFUSER', 'NET1'));

1.7.3 Graph Patterns: Support for SPARQL ASK Syntax

SEM_MATCH allows fully-specified SPARQL ASK queries in the query parameter.

ASK queries are used to test whether or not a solution exists for a given query pattern. In contrast to other forms of SPARQL queries, ASK queries do not return any information about solutions to the query pattern. Instead, such queries return "true"^^xsd:boolean if a solution exists and "false"^^xsd:boolean if no solution exists.

All SPARQL ASK queries return the same columns: ASK, ASK$RDFVID, ASK$_PREFIX, ASK$_SUFFIX, ASK$RDFVTYP, ASK$RDFCLOB, ASK$RDFLTYP, ASK$RDFLANG, SEM$ROWNUM. Note that these columns are the same as a SPARQL SELECT syntax query that projects a single ?ask variable.

SPARQL ASK queries will generally give better performance than an equivalent SPARQL SELECT syntax query because the ASK query does not have to retrieve lexical values for query variables, and query execution can stop after a single result has been found.

SPARQL ASK queries use the same syntax as SPARQL SELECT queries, but the topmost SELECT clause must be replaced with the keyword ASK.

Example 1-26 SPARQL ASK

Example 1-26 shows a SPARQL ASK query that determines whether or not any cameras are for sale with more than 10 megapixels that cost less than 50 dollars.

SELECT ask
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     ASK
     WHERE
      {?x :price ?p .
       ?x :megapixels ?m .
       FILTER (?p < 50 && ?m > 10)
      }',
    SEM_Models('electronics'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

See also the W3C SPARQL specification for more information on SPARQL ASK queries, specifically: http://www.w3.org/TR/sparql11-query/#ask

1.7.4 Graph Patterns: Support for SPARQL CONSTRUCT Syntax

SEM_MATCH allows fully-specified SPARQL CONSTRUCT queries in the query parameter.

CONSTRUCT queries are used to build RDF graphs from stored RDF data. In contrast to SPARQL SELECT queries, CONSTRUCT queries return a set of RDF triples rather than a set of query solutions (variable bindings).

All SPARQL CONSTRUCT queries return the same columns from SEM_MATCH. These columns correspond to the subject, predicate and object of an RDF triple, and there are 10 columns for each triple component. In addition, a SEM$ROWNUM column is returned. More specifically, the following columns are returned:

SUBJ
SUBJ$RDFVID
SUBJ$_PREFIX
SUBJ$_SUFFIX
SUBJ$RDFVTYP
SUBJ$RDFCLOB
SUBJ$RDFLTYP
SUBJ$RDFLANG
SUBJ$RDFTERM
SUBJ$RDFCLBT
PRED
PRED$RDFVID
PRED$_PREFIX
PRED$_SUFFIX
PRED$RDFVTYP
PRED$RDFCLOB
PRED$RDFLTYP
PRED$RDFLANG
PRED$RDFTERM
PRED$RDFCLBT
OBJ
OBJ$RDFVID
OBJ$_PREFIX
OBJ$_SUFFIX
OBJ$RDFVTYP
OBJ$RDFCLOB
OBJ$RDFLTYP
OBJ$RDFLANG
OBJ$RDFTERM
OBJ$RDFCLBT
SEM$ROWNUM

For each component, COMP, COMP$RDFVID, COMP$_PREFIX, COMP$_SUFFIX, COMP$RDFVTYP, COMP$RDFCLOB, COMP$RDFLTYP, and COMP$RDFLANG correspond to the same values as those from SPARQL SELECT queries. COMP$RDFTERM holds a VARCHAR2(NETWORK_MAX_STRING_SIZE) RDF term in N-Triple syntax, and COMP$RDFCLBT holds a CLOB RDF term in N-Triple syntax.

SPARQL CONSTRUCT queries use the same syntax as SPARQL SELECT queries, except the topmost SELECT clause is replaced with a CONSTRUCT template. The CONSTRUCT template determines how to construct the result RDF graph using the results of the query pattern defined in the WHERE clause. A CONSTRUCT template consists of the keyword CONSTRUCT followed by sequence of SPARQL triple patterns that are enclosed within curly braces. The keywords OPTIONAL, UNION, FILTER, MINUS, BIND, VALUES, and GRAPH are not allowed within CONSTRUCT templates, and property path expressions are not allowed within CONSTRUCT templates. These keywords, however, are allowed within the query pattern inside the WHERE clause.

SPARQL CONSTRUCT queries build result RDF graphs in the following manner. For each result row returned by the WHERE clause, variable values are substituted into the CONSTRUCT template to create one or more RDF triples. Suppose the graph pattern in the WHERE clause of Example 1-27 returns the following result rows.

E$RDFTERM FNAME$RDFTERM LNAME$RDFTERM

ent:employee1

"Fred"

"Smith"

ent:employee2

"Jane"

"Brown"

ent:employee3

"Bill"

"Jones"

The overall SEM_MATCH CONSTRUCT query in Example 1-27 would then return the following rows, which correspond to six RDF triples (two for each result row of the query pattern).

SUBJ$RDFTERM PRED$RDFTERM OBJ$RDFTERM

ent:employee1

foaf:givenName

"Fred"

ent:employee1

foaf:familyName

"Smith"

ent:employee2

foaf:givenName

"Jane"

ent:employee2

foaf:familyName

"Brown"

ent:employee3

foaf:givenName

"Bill"

ent:employee3

foaf:familyName

"Jones"

There are two SEM_MATCH query options that influence the behavior of SPARQL CONSTRUCT: CONSTRUCT_UNIQUE=T and CONSTRUCT_STRICT=T. Using the CONSTRUCT_UNIQUE=T query option ensures that only unique RDF triples are returned from the CONSTRUCT query. Using the CONSTRUCT_STRICT=T query option ensures that only valid RDF triples are returned from the CONSTRUCT query. Valid RDF triples are those that have (1) a URI or blank node in the subject position, (2) a URI in the predicate position, and (3) a URI, blank node or RDF literal in the object position. Both of these query options are turned off by default for improved query performance.

Example 1-27 SPARQL CONSTRUCT

Example 1-27 shows a SPARQL CONSTRUCT query that builds an RDF graph of employee names using the foaf vocabulary.

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
  FROM TABLE(SEM_MATCH(
    'PREFIX  ent: <http://www.example.org/enterprise/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     CONSTRUCT
      {?e foaf:givenName  ?fname .
       ?e foaf:familyName ?lname 
      }
     WHERE
      {?e ent:fname ?fname .
       ?e ent:lname ?lname 
      }',
    SEM_Models('enterprise'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-28 CONSTRUCT with Solution Modifiers

SPARQL SOLUTION modifiers can be used with CONSTRUCT queries. Example 1-28 shows the use of ORDER BY and LIMIT to build a graph about the top two highest-paid employees. Note that the LIMIT 2 clause applies to the query pattern not to the overall CONSTRUCT query. That is, the query pattern will return two result rows, but the overall CONSTRUCT query will return 6 RDF triples (three for each of the two employees bound to ?e).

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
  FROM TABLE(SEM_MATCH(
    'PREFIX  ent: <http://www.example.org/enterprise/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     CONSTRUCT
      { ?e ent:fname       ?fname .
        ?e ent:lname       ?lname .
        ?e ent:dateOfBirth ?dob }
     WHERE
      { ?e ent:fname  ?fname .
        ?e ent:lname  ?lname .
        ?e ent:salary ?sal
      }
     ORDER BY DESC(?sal)
     LIMIT 2',
    SEM_Models('enterprise'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-29 SPARQL 1.1 Features with CONSTRUCT

SPARQL 1.1 features are supported within CONSTRUCT query patterns. Example 1-29 shows the use of subqueries and SELECT expressions within a CONSTRUCT query.

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
  FROM TABLE(SEM_MATCH(
    'PREFIX  ent: <http://www.example.org/enterprise/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     CONSTRUCT
      { ?e foaf:name  ?name }
     WHERE
      { SELECT ?e (CONCAT(?fname," ",?lname) AS ?name)
        WHERE { ?e ent:fname ?fname .
                ?e ent:lname ?lname }
      }',
    SEM_Models('enterprise'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-30 SPARQL CONSTRUCT with Named Graphs

Named graph data cannot be returned from SPARQL CONSTRUCT queries because, in accordance with the W3C SPARQL specification, only RDF triples are returned, not RDF quads. The FROM, FROM NAMED and GRAPH keywords, however, can be used when matching the query pattern defined in the WHERE clause.

Example 1-30 constructs an RDF graph with ent:name triples from the UNION of named graphs ent:g1 and ent:g2, ent:dateOfBirth triples from named graph ent:g3, and ent:ssn triples from named graph ent:g4.

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
  FROM TABLE(SEM_MATCH(
    'PREFIX  ent: <http://www.example.org/enterprise/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     CONSTRUCT
      { ?e ent:name ?name .
        ?e ent:dateOfBirth ?dob .
        ?e ent:ssn ?ssn
      }
     FROM ent:g1
     FROM ent:g2
     FROM NAMED ent:g3
     FROM NAMED ent:g4
     WHERE
      { ?e foaf:name ?name .
        GRAPH ent:g3 { ?e ent:dateOfBirth ?dob }
        GRAPH ent:g4 { ?e ent:ssn ?ssn } 
      }',
    SEM_Models('enterprise'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-31 SPARQL CONSTRUCT Normal Form

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
  FROM TABLE(SEM_MATCH(
    'PREFIX  ent: <http://www.example.org/enterprise/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     CONSTRUCT
      {?e foaf:givenName  ?fname .
       ?e foaf:familyName ?lname 
      }
     WHERE
      {?e ent:fname ?fname .
       ?e ent:lname ?lname 
      }',
    SEM_Models('enterprise'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-32 SPARQL CONSTRUCT Short Form

A short form of CONSTRUCT is supported when the CONSTRUCT template is exactly the same as the WHERE clause. In this case, only the keyword CONSTRUCT is needed, and the graph pattern in the WHERE clause will also be used as a CONSTRUCT template. Example 1-32 shows the short form of Example 1-31.

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
  FROM TABLE(SEM_MATCH(
    'PREFIX  ent: <http://www.example.org/enterprise/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     CONSTRUCT
     WHERE
      {?e ent:fname ?fname .
       ?e ent:lname ?lname 
      }',
    SEM_Models('enterprise'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.4.1 Typical SPARQL CONSTRUCT Workflow

A typical workflow for SPARQL CONSTRUCT would be to execute a CONSTRUCT query to extract and/or transform RDF triple data from an existing RDF graph and then load this data into an existing or new RDF graph. The data loading can be accomplished through simple INSERT statements or executing the SEM_APIS.BULK_LOAD_RDF_GRAPH procedure.

Example 1-33 SPARQL CONSTRUCT Workflow

Example 1-33 constructs foaf:name triples from existing ent:fname and ent:lname triples and then bulk loads these new triples back into the original RDF graph. Afterward, you can query the original graph for foaf:name values.

-- Use create table as select to build a staging table 
CREATE TABLE STAB(RDF$STC_sub, RDF$STC_pred, RDF$STC_obj) AS
SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
FROM TABLE(SEM_MATCH(
 'PREFIX  ent: <http://www.example.org/enterprise/> 
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  CONSTRUCT
   { ?e foaf:name  ?name }
  WHERE
   { SELECT ?e (CONCAT(?fname," ",?lname) AS ?name)
     WHERE { ?e ent:fname ?fname .
             ?e ent:lname ?lname }
   }',
 SEM_Models('enterprise'),
 null, null, null, null, ' ', null, null,
 'RDFUSER', 'NET1')); 
 

 
-- Bulk load data back into the enterprise model
BEGIN
  SEM_APIS.BULK_LOAD_RDF_GRAPH(
    rdf_graph_name=>'enterprise',
    table_owner=>'rdfuser',
    table_name=>'stab',
    flags=>' parallel_create_index parallel=4 ',
    network_owner=>'RDFUSER',
    network_name=>'NET1');
END;
/
 
-- Query for foaf:name data
SELECT e$rdfterm, name$rdfterm
FROM TABLE(SEM_MATCH(
 'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  SELECT ?e ?name
  WHERE { ?e foaf:name ?name }',
 SEM_Models('enterprise'),
 null, null, null, null, ' ', null, null,
 'RDFUSER', 'NET1'));

See also the W3C SPARQL specification for more information on SPARQL CONSTRUCT queries, specifically: http://www.w3.org/TR/sparql11-query/#construct

1.7.5 Graph Patterns: Support for SPARQL DESCRIBE Syntax

SEM_MATCH allows fully-specified SPARQL DESCRIBE queries in the query parameter.

SPARQL DESCRIBE queries are useful for exploring RDF data sets. You can easily find information about a given resource or set of resources without knowing information about the exact RDF properties used in the data set. A DESCRIBE query returns a "description" of a resource r, where a "description" is the set of RDF triples in the query data set that contain r in either the subject or object position.

Like CONSTRUCT queries, DESCRIBE queries return an RDF graph instead of result bindings. Each DESCRIBE query, therefore, returns the same columns as a CONSTRUCT query (see Graph Patterns: Support for SPARQL CONSTRUCT Syntax for a listing of return columns).

SPARQL DESCRIBE queries use the same syntax as SPARQL SELECT queries, except the topmost SELECT clause is replaced with a DESCRIBE clause. A DESCRIBE clause consists of the DESCRIBE keyword followed by a sequence of URIs and/or variables separated by whitespace or the DESCRIBE keyword followed by a single * (asterisk).

Two SEM_MATCH query options affect SPARQL DESCRIBE queries: CONSTRUCT_UNIQUE=T and CONSTRUCT_STRICT=T. CONSTRUCT_UNIQUE=T ensures that duplicate triples are eliminated from the result, and CONSTRUCT_STRICT=T ensures that invalid triples are eliminated from the result. Both of these options are turned off by default. These options are described in more detail in Graph Patterns: Support for SPARQL CONSTRUCT Syntax.

See also the W3C SPARQL specification for more information on SPARQL DESCRIBE queries, specifically: http://www.w3.org/TR/sparql11-query/#describe

Example 1-34 SPARQL DESCRIBE Short Form

A short form of SPARQL DESCRIBE is provided to describe a single constant URI. In the short form, only a DESCRIBE clause is needed. Example 1-34 shows a short form SPARQL DESCRIBE query.

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
FROM TABLE(SEM_MATCH(
 'DESCRIBE <http://www.example.org/enterprise/emp_1>',
 SEM_Models('enterprise'),
 null, null, null, null, ' ', null, null,
 'RDFUSER', 'NET1'));

Example 1-35 SPARQL DESCRIBE Normal Form

The normal form of SPARQL DESCRIBE specifies a DESCRIBE clause and a SPARQL query pattern, possibly including solution modifiers. Example 1-35 shows a SPARQL DESCRIBE query that describes all employees whose departments are located in New Hampshire.

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
FROM TABLE(SEM_MATCH(
 'PREFIX  ent: <http://www.example.org/enterprise/>
  DESCRIBE ?e
  WHERE 
   { ?e ent:department ?dept .
     ?dept ent:locatedIn "New Hampshire" }',
 SEM_Models('enterprise'),
 null, null, null, null, ' ', null, null,
 'RDFUSER', 'NET1'));

Example 1-36 DESCRIBE *

With the normal form of DESCRIBE, as shown in Example 1-35, all resources bound to variables listed in the DESCRIBE clause are described. In Example 1-35, all employees returned from the query pattern and bound to ?e will be described. When DESCRIBE * is used, all visible variables in the query are described.

Example 1-36 shows a modified version of Example 1-35 that describes both employees (bound to ?e) and departments (bound to ?dept).

SELECT subj$rdfterm, pred$rdfterm, obj$rdfterm
FROM TABLE(SEM_MATCH(
 'PREFIX  ent: <http://www.example.org/enterprise/>
  DESCRIBE *
  WHERE 
   { ?e ent:department ?dept .
     ?dept ent:locatedIn "New Hampshire" }',
 SEM_Models('enterprise'),
 null, null, null, null, ' ', null, null,
 'RDFUSER', 'NET1'));

1.7.6 Graph Patterns: Support for SPARQL SELECT Syntax

In addition to curly-brace graph patterns, SEM_MATCH allows fully-specified SPARQL SELECT queries in the query parameter. When using the SPARQL SELECT syntax option, SEM_MATCH supports the following query constructs: BASE, PREFIX, SELECT, SELECT DISTINCT, FROM, FROM NAMED, WHERE, ORDER BY, LIMIT, and OFFSET. Each SPARQL SELECT syntax query must include a SELECT clause and a graph pattern.

A key difference between curly-brace and SPARQL SELECT syntax when using SEM_MATCH is that only variables appearing in the SPARQL SELECT clause are returned from SEM_MATCH when using SPARQL SELECT syntax.

One additional column, SEM$ROWNUM, is returned from SEM_MATCH when using SPARQL SELECT syntax. This NUMBER column can be used to order the results of a SEM_MATCH query so that the result order matches the ordering specified by a SPARQL ORDER BY clause.

The SPARQL ORDER BY clause can be used to order the results of SEM_MATCH queries. This clause specifies a sequence of comparators used to order the results of a given query. A comparator consists of an expression composed of variables, RDF terms, arithmetic operators (+, -, *, /), Boolean operators and logical connectives (||, &&, !), comparison operators (<, >, <=, >=, =, !=), and any functions available for use in FILTER expressions.

The following order of operations is used when evaluating SPARQL SELECT queries:

  1. Graph pattern matching

  2. Grouping (see Grouping and Aggregation.)

  3. Aggregates (see Grouping and Aggregation)

  4. Having (see Grouping and Aggregation)

  5. Values (see Value Assignment)

  6. Select expressions

  7. Order by

  8. Projection

  9. Distinct

  10. Offset

  11. Limit

See also the W3C SPARQL specification for more information on SPARQL BASE, PREFIX, SELECT, SELECT DISTINCT, FROM, FROM NAMED, WHERE, ORDER BY, LIMIT, and OFFSET constructs, specifically: http://www.w3.org/TR/sparql11-query/

Example 1-37 SPARQL PREFIX, SELECT, and WHERE Clauses

Example 1-37 uses the following SPARQL constructs:

  • SPARQL PREFIX clause to specify an abbreviation for the http://www.example.org/family/ and http://xmlns.com/foaf/0.1/ namespaces

  • SPARQL SELECT clause to specify the set of variables to project out of the query

  • SPARQL WHERE clause to specify the query graph pattern

SELECT y, name
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT ?y ?name
     WHERE
     {?x :grandParentOf ?y . 
      ?x foaf:name ?name }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-37 returns the following columns: y, y$RDFVID, y$_PREFIX, y$_SUFFIX, y$RDFVTYP, y$RDFCLOB, y$RDFLTYP, y$RDFLANG, name, name$RDFVID, name$_PREFIX, name$_SUFFIX, name$RDFVTYP, name$RDFCLOB, name$RDFLTYP, name$RDFLANG, and SEM$ROWNUM.

Example 1-38 SPARQL SELECT * (All Variables in Triple Pattern)

The SPARQL SELECT clause specifies either (A) a sequence of variables and/or expressions (see Expressions in the SELECT Clause), or (B) * (asterisk), which projects all variables that appear in a specified triple pattern. Example 1-38 uses the SPARQL SELECT clause to select all variables that appear in a specified triple pattern.

SELECT x, y, name
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT *
     WHERE
     {?x :grandParentOf ?y . 
      ?x foaf:name ?name }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-39 SPARQL SELECT DISTINCT

The DISTINCT keyword can be used after SELECT to remove duplicate result rows. Example 1-39 uses SELECT DISTINCT to select only the distinct names.

SELECT name
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT DISTINCT ?name
     WHERE
     {?x :grandParentOf ?y . 
      ?x foaf:name ?name }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-40 RDF Dataset Specification Using FROM and FROM NAMED

SPARQL FROM and FROM NAMED are used to specify the RDF dataset for a query. FROM clauses are used to specify the set of graphs that make up the default graph, and FROM NAMED clauses are used to specify the set of graphs that make up the set of named graphs. Example 1-40 uses FROM and FROM NAMED to select email addresses and friend of relationships from the union of the <http://www.friends.com/friends> and <http://www.contacts.com/contacts> graphs and grandparent information from the <http://www.example.org/family/Smith> and <http://www.example.org/family/Jones> graphs.

SELECT x, y, z, email
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/> 
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     PREFIX friends: <http://www.friends.com/>
     PREFIX contacts: <http://www.contacts.com/>
     SELECT *
     FROM friends:friends
     FROM contacts:contacts
     FROM NAMED :Smith
     FROM NAMED :Jones
     WHERE
     {?x foaf:frendOf ?y .
      ?x :email ?email .
      GRAPH ?g {
        ?x :grandParentOf ?z }
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-41 SPARQL ORDER BY

In a SPARQL ORDER BY clause:

  • Single variable ordering conditions do not require enclosing parenthesis, but parentheses are required for more complex ordering conditions.

  • An optional ASC() or DESC() order modifier can be used to indicate the desired order (ascending or descending, respectively). Ascending is the default order.

  • When using SPARQL ORDER BY in SEM_MATCH, the containing SQL query should be ordered by SEM$ROWNUM to ensure that the desired ordering is maintained through any enclosing SQL blocks.

Example 1-41 uses a SPARQL ORDER BY clause to select all cameras, and it specifies ordering by descending type and ascending total price (price * (1 - discount) * (1 + tax)).

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT * 
     WHERE
      {?x :price ?p .
       ?x :discount ?d .
       ?x :tax ?t .
       ?x :cameraType ?cType .
      }
     ORDER BY DESC(?cType) ASC(?p * (1-?d) * (1+?t))',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'))
ORDER BY SEM$ROWNUM;

Example 1-42 SPARQL LIMIT

SPARQL LIMIT and SPARQL OFFSET can be used to select different subsets of the query solutions. Example 1-42 uses SPARQL LIMIT to select the five cheapest cameras, and Example 1-43 uses SPARQL LIMIT and OFFSET to select the fifth through tenth cheapest cameras.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?p
     WHERE
      {?x :price ?p .
       ?x :cameraType ?cType .
      }
     ORDER BY ASC(?p)
     LIMIT 5',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'))
ORDER BY SEM$ROWNUM;

Example 1-43 SPARQL OFFSET

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?p
     WHERE
      {?x :price ?p .
       ?x :cameraType ?cType .
      }
     ORDER BY ASC(?p)
     LIMIT 5
     OFFSET 5',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'))
ORDER BY SEM$ROWNUM;

Example 1-44 Query Using Full URIs

The SPARQL BASE keyword is used to set a global prefix. All relative IRIs will be resolved with the BASE IRI using the basic algorithm described in Section 5.2 of the Uniform Resource Identifier (URI): Generic Syntax (RFC3986) (http://www.ietf.org/rfc/rfc3986.txt). Example 1-44 is a simple query using full URIs, and Example 1-45 is an equivalent query using a base IRI.

SELECT *
  FROM TABLE(SEM_MATCH(
    'SELECT ?employee ?position
     WHERE
      {?x <http://www.example.org/employee> ?p .
       ?p <http://www.example.org/employee/name> ?employee .
       ?p <http://www.example.org/employee/position> ?pos .
       ?pos <http://www.example.org/positions/name> ?position
      }',
    SEM_Models('enterprise'),
    null, 
     null, null, null, ' ', null, null,
   'RDFUSER', 'NET1'))
ORDER BY 1,2;

Example 1-45 Query Using a Base IRI

SELECT *
  FROM TABLE(SEM_MATCH(
    'BASE <http://www.example.org/>
     SELECT ?employee ?position
     WHERE
      {?x <employee> ?p .
       ?p <employee/name> ?employee .
       ?p <employee/position> ?pos .
       ?pos <positions/name> ?position
      }',
    SEM_Models('enterprise'),
    null, 
    null, null, null, ' ', null, null,
   'RDFUSER', 'NET1'))
ORDER BY 1,2;

1.7.7 Graph Patterns: Support for SPARQL 1.1 Constructs

SEM_MATCH supports the following SPARQL 1.1 constructs:

1.7.7.1 Expressions in the SELECT Clause

Expressions can be used in the SELECT clause to project the value of an expression from a query. A SELECT expression is composed of variables, RDF terms, arithmetic operators (+, -, *, /), Boolean operators and logical connectives (||, &&, !), comparison operators (<, >, <=, >=, =, !=), and any functions available for use in FILTER expressions. The expression must be aliased to a single variable using the AS keyword, and the overall <expression> AS <alias> fragment must be enclosed in parentheses. The alias variable cannot already be defined in the query. A SELECT expression may reference the result of a previous SELECT expression (that is, an expression that appears earlier in the SELECT clause).

Example 1-46 SPARQL SELECT Expression

Example 1-46 uses a SELECT expression to project the total price for each camera.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ((?p * (1-?d) * (1+?t)) AS ?totalPrice) 
     WHERE
      {?x :price ?p .
       ?x :discount ?d .
       ?x :tax ?t .
       ?x :cameraType ?cType .
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-47 SPARQL SELECT Expressions (2)

Example 1-47 uses two SELECT expressions to project the discount price with and without sales tax.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ((?p * (1-?d)) AS ?preTaxPrice) ((?preTaxPrice * (1+?t)) AS ?finalPrice)
     WHERE
      {?x :price ?p .
       ?x :discount ?d .
       ?x :tax ?t .
       ?x :cameraType ?cType .
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.7.2 Subqueries

Subqueries are allowed with SPARQL SELECT syntax. That is, fully-specified SPARQL SELECT queries may be embedded within other SPARQL SELECT queries. Subqueries have many uses, for example, limiting the number of results from a subcomponent of a query.

Example 1-48 SPARQL SELECT Subquery

Example 1-48 uses a subquery to find the manufacturer that makes the cheapest camera and then finds all other cameras made by this manufacturer.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?c1
     WHERE {?c1 rdf:type :Camera .
            ?c1 :manufacturer ?m .
            {
             SELECT ?m
             WHERE {?c2 rdf:Type :Camera .
                    ?c2 :price ?p .
                    ?c2 :manufacturer ?m .
             }
             ORDER BY ASC(?p)
             LIMIT 1
            }
     }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Subqueries are logically evaluated first, and the results are projected up to the outer query. Note that only variables projected in the subquery's SELECT clause are visible to the outer query.

1.7.7.3 Grouping and Aggregation

The GROUP BY keyword used to perform grouping. Syntactically, the GROUP BY keyword must appear after the WHERE clause and before any solution modifiers such as ORDER BY or LIMIT.

Aggregates are used to compute values across results within a group. An aggregate operates over a collection of values and produces a single value as a result. SEM_MATCH supports the following built-in Aggregates: COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT and SAMPLE. These aggregates are described in Table 1-17.

Table 1-17 Built-in Aggregates

Aggregate Description

AVG(expression)

Returns the numeric average of expression over the values within a group.

COUNT(* | expression)

Counts the number of times expression has a bound, non-error value within a group; asterisk (*) counts the number of results within a group.

GROUP_CONCAT(expression [; SEPARATOR = "STRING"])

Performs string concatenation of expression over the values within a group. If provided, an optional separator string will be placed between each value.

MAX(expression)

Returns the maximum value of expression within a group based on the ordering defined by SPARQL ORDER BY.

MIN(expression)

Returns the minimum value of expression within a group based on the ordering defined by SPARQL ORDER BY.

SAMPLE(expression)

Returns expression evaluated for a single arbitrary value from a group.

SUM(expression)

Calculates the numeric sum of expression over the values within a group.

Certain restrictions on variable references apply when using grouping and aggregation. Only group-by variables (single variables in the GROUP BY clause) and alias variables from GROUP BY value assignments can be used in non-aggregate expressions in the SELECT or HAVING clauses.

Example 1-49 Simple Grouping Query

Example 1-49 shows a query that uses the GROUP BY keyword to find all the different types of cameras.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?cType
     WHERE
      {?x rdf:type :Camera .
       ?x :cameraType ?cType .
      }
     GROUP BY ?cType',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
     null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

A grouping query partitions the query results into a collection of groups based on a grouping expression (?cType in Example 1-49) such that each result within a group has the same values for the grouping expression. The final result of the grouping operation will include one row for each group.

Example 1-50 Complex Grouping Expression

A grouping expression consists of a sequence of one or more of the following: a variable, an expression, or a value assignment of the form (<expression> as <alias>). Example 1-50 shows a grouping query that uses one of each type of component in the grouping expression.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?cType ?totalPrice
     WHERE
      {?x rdf:type :Camera .
       ?x :cameraType ?cType .
       ?x :manufacturer ?m .
       ?x :price ?p .
       ?x :tax ?t .
      }
     GROUP BY ?cType (STR(?m)) ((?p*(1+?t)) AS ?totalPrice)',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-51 Aggregation

Example 1-51 uses aggregates to select the maximum, minimum, and average price for each type of camera.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?cType 
            (MAX(?p) AS ?maxPrice) 
            (MIN(?p) AS ?minPrice) 
            (AVG(?p) AS ?avgPrice)
     WHERE
      {?x rdf:type :Camera .
       ?x :cameraType ?cType .
       ?x :manufacturer ?m .
       ?x :price ?p .
      }
     GROUP BY ?cType',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-52 Aggregation Without Grouping

If an aggregate is used without a grouping expression, then the entire result set is treated as a single group. Example 1-52 computes the total number of cameras for the whole data set.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT (COUNT(?x) as ?cameraCnt)
     WHERE
      { ?x rdf:type :Camera 
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-53 Aggregation with DISTINCT

The DISTINCT keyword can optionally be used as a modifier for each aggregate. When DISTINCT is used, duplicate values are removed from each group before computing the aggregate. Syntactically, DISTINCT must appear as the first argument to the aggregate. Example 1-53 uses DISTINCT to find the number of distinct camera manufacturers. In this case, duplicate values of STR(?m) are removed before counting.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT (COUNT(DISTINCT STR(?m)) as ?mCnt)
     WHERE
      { ?x rdf:type :Camera .
        ?x :manufacturer ?m
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
     null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-54 HAVING Clause

The HAVING keyword can be used to filter groups based on constraints. HAVING expressions can be composed of variables, RDF terms, arithmetic operators (+, -, *, /), Boolean operators and logical connectives (||, &&, !), comparison operators (<, >, <=, >=, =, !=), aggregates, and any functions available for use in FILTER expressions. Syntactically, the HAVING keyword appears after the GROUP BY clause and before any other solution modifiers such as ORDER BY or LIMIT.

Example 1-54 uses a HAVING expression to find all manufacturers that sell cameras for less than $200.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?m
     WHERE
      { ?x rdf:type :Camera .
        ?x :manufacturer ?m .
        ?x :price ?p
      }
     GROUP BY ?m
     HAVING (MIN(?p) < 200)
     ORDER BY ASC(?m)',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.7.4 Negation

SEM_MATCH supports two forms of negation in SPARQL query patterns: NOT EXISTS and MINUS. NOT EXISTS can be used to filter results based on whether or not a graph pattern matches, and MINUS can be used to remove solutions based on their relation to another graph pattern.

Example 1-55 Negation with NOT EXISTS

Example 1-55 uses a NOT EXISTS FILTER to select those cameras that do not have any user reviews.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?p
     WHERE
      {?x :price ?p .
       ?x :cameraType ?cType .
       FILTER( NOT EXISTS({?x :userReview ?r}) )
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-56 EXISTS

Conversely, the EXISTS operator can be used to ensure that a pattern matches. Example 1-56 uses an EXISTS FILTER to select only those cameras that have a user review.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?p
     WHERE
      {?x :price ?p .
       ?x :cameraType ?cType .
       FILTER( EXISTS({?x :userReview ?r}) )
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    RDFUSER', 'NET1'));

Example 1-57 Negation with MINUS

Example 1-57 uses MINUS to arrive at the same result as Example 1-55. Only those solutions that are not compatible with solutions from the MINUS pattern are included in the result. That is, if a solution has the same values for all shared variables as a solution from the MINUS pattern, it is removed from the result.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?p
     WHERE
      {?x :price ?p .
       ?x :cameraType ?cType .
       MINUS {?x :userReview ?r}
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'), 
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-58 Negation with NOT EXISTS (2)

NOT EXISTS and MINUS represent two different styles of negation and have different results in certain cases. One such case occurs when no variables are shared between the negation pattern and the rest of the query. For example, the NOT EXISTS query in Example 1-58 removes all solutions because {?subj ?prop ?obj} matches any triple, but the MINUS query in Example 1-59 removes no solutions because there are no shared variables.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?p
     WHERE
      {?x :price ?p .
       ?x :cameraType ?cType .
       FILTER( NOT EXISTS({?subj ?prop ?obj}) )
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-59 Negation with MINUS (2)

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?p
     WHERE
      {?x :price ?p .
       ?x :cameraType ?cType .
       MINUS {?subj ?prop ?obj}
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.7.5 Value Assignment

SEM_MATCH provides a variety of ways to assign values to variables in a SPARQL query.

The value of an expression can be assigned to a new variable in three ways: (1) expressions in the SELECT clause, (2) expressions in the GROUP BY clause, and (3) the BIND keyword. In each case, the new variable must not already be defined in the query. After assignment, the new variable can be used in the query and returned in results. As discussed in Expressions in the SELECT Clause, the syntax for value assignment is (<expression> AS <alias>) where alias is the new variable, for example, ((?price * (1+?tax)) AS ?totalPrice).

Example 1-60 Nested SELECT Expression

Example 1-60 uses a nested SELECT expression to compute the total price of a camera and assign the value to a variable (?totalPrice). This variable is then used in a FILTER in the outer query to find cameras costing less than $200.

SELECT *
 FROM TABLE(SEM_MATCH(
   'PREFIX : <http://www.example.org/electronics/> 
    SELECT ?x ?cType ?totalPrice
    WHERE
     {?x :cameraType ?cType .
       { SELECT ?x ( ((?price*(1+?tax)) AS ?totalPrice )
         WHERE { ?x :price ?price .
                 ?x :tax ?tax }
       }
      FILTER (?totalPrice < 200)
     }',
   SEM_Models('electronics'),
   SEM_Rulebases('RDFS'),
   null, null, null, ' ', null, null,
   'RDFUSER', 'NET1'));

Example 1-61 BIND

The BIND keyword can be used inside a basic graph pattern to assign a value and is syntactically more compact than an equivalent nested SELECT expression. Example 1-61 uses the BIND keyword to expresses a query that is logically equivalent to Example 1-60.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?totalPrice
     WHERE
      {?x :cameraType ?cType .
       ?x :price ?price .
       ?x :tax ?tax .
       BIND ( ((?price*(1+?tax)) AS ?totalPrice )
       FILTER (?totalPrice < 200)
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
   'RDFUSER', 'NET1'));

Example 1-62 GROUP BY Expression

Value assignments in the GROUP BY clause can subsequently be used in the SELECT clause, the HAVING clause, and the outer query (in the case of a nested grouping query). Example 1-62 uses a GROUP BY expression to find the maximum number of megapixels for cameras at each price point less than $1000.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?totalPrice (MAX(?mp) as ?maxMP)
     WHERE
      {?x rdf:type :Camera .
       ?x :price ?price .
       ?x :tax ?tax .
       GROUP BY ( ((?price*(1+?tax)) AS ?totalPrice )
       HAVING (?totalPrice < 1000)
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null));

Example 1-63 VALUES

In addition to the preceding three ways to assign the value of an expression to a new variable, the VALUES keyword can be used to introduce an unordered solution sequence that is combined with the query results through a join operation. A VALUES block can appear inside a query pattern or at the end of a SPARQL SELECT query block after any solution modifiers. The VALUES construct can be used in subqueries.

Example 1-63 uses the VALUES keyword to constrain the query results to DSLR cameras made by :Company1 or any type of camera made by :Company2. The keyword UNDEF is used to represent an unbound variable in the solution sequence.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?m
     WHERE 
      { ?x rdf:type :Camera .
        ?x :cameraType ?cType .
        ?x :manufacturer ?m
      }
     VALUES (?cType ?m)
     { ("DSLR" :Company1)
       (UNDEF  :Company2) 
     }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-64 Simplified VALUES Syntax

A simplified syntax can be used for the common case of a single variable. Specifically, the parentheses around the variable and each solution can be omitted. Example 1-64 uses the simplified syntax to constrain the query results to cameras made by :Company1 or :Company2.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?m
     WHERE 
      { ?x rdf:type :Camera .
        ?x :cameraType ?cType .
        ?x :manufacturer ?m
      }
     VALUES ?m
     { :Company1
       :Company2 
     }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-65 Inline VALUES Block

Example 1-65 also constrains the query results to any camera made by :Company1 or :Company2, but specifies the VALUES block inside the query pattern.

SELECT *
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?x ?cType ?m
     WHERE 
      { VALUES ?m { :Company1 :Company2 }
        ?x rdf:type :Camera .
        ?x :cameraType ?cType .
        ?x :manufacturer ?m
      }',
    SEM_Models('electronics'),
    SEM_Rulebases('RDFS'),
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.7.6 Property Paths

A SPARQL Property Path describes a possible path between two RDF resources (nodes) in an RDF graph. A property path appears in the predicate position of a triple pattern and uses a regular expression-like syntax to place constraints on the properties (edges) making up a path from the subject of the triple pattern to the object of a triple pattern. Property paths allow SPARQL queries to match arbitrary length paths in the RDF graph and also provide a more concise way to express other graph patterns.

Table 1-18 describes the syntax constructs available for constructing SPARQL Property Paths. Note that iri is either an IRI or a prefixed name, and elt is a property path element, which may itself be composed of other property path elements.

Table 1-18 Property Path Syntax Constructs

Syntax Construct Matches

iri

An IRI or a prefixed name. A path of length 1 (one).

^elt

Inverse path (object to subject).

!iri or !(iri1 | … | irin)

Negated property set. An IRI that is not one of irii.

!^iri or !(iri1 | … | irij | ^irij+1 | … | ^irin)

Negated property set with some inverse properties. An IRI that is not one of irii, nor one of irij+1...irin as reverse paths. !^iri is short for !(^iri). The order of properties and inverse properties is not important. They can occur in mixed order.

(elt)

A group path elt; brackets control precedence.

elt1 / elt2

A sequence path of elt1, followed by elt2.

elt1 | elt2

An alternative path of elt1, or elt2 (all possibilities are tried).

elt*

A path of zero or more occurrences of elt.

elt+

A path of one or more occurrences of elt.

elt?

A path of zero or one occurrence of elt.

The precedence of the syntax constructs is as follows (from highest to lowest):

  • IRI, prefixed names

  • Negated property sets

  • Groups

  • Unary operators *, ?, +

  • Unary ^ inverse links

  • Binary operator /

  • Binary operator |

Precedence is left-to-right within groups.

Special Considerations for Property Path Operators + and *

In general, truly unbounded graph traversals using the + (plus sign) and * (asterisk) operator can be very expensive. For this reason, a depth-limited version of the + and * operator is used by default, and the default depth limit is 10. In addition, the depth-limited implementation can be run in parallel. The ALL_MAX_PP_DEPTH(n) SEM_MATCH query option or the MAX_PP_DEPTH(n) inline HINT0 query optimizer hint can be used to change the depth-limit setting. To achieve a truly unbounded traversal, you can set a depth limit of less than 1 to fall back to a CONNECT BY-based implementation.

Query Hints for Property Paths

Other query hints are available to influence the performance of property path queries. The ALLOW_PP_DUP=T query option can be used with * and + queries to allow duplicate results. Allowing duplicate results may return the first rows from a query faster. In addition, ALL_USE_PP_HASH and ALL_USE_PP_NL query options are available to influence the join types used when evaluating property path expressions. Analogous USE_PP_HASH and USE_PP_NL inline HINT0 query optimizer hints can also be used.

Example 1-66 SPARQL Property Path (Using rdfs:subClassOf Relations)

Example 1-66 uses a property path to find all Males based on transitivity of the rdfs:subClassOf relationship. A property path allows matching an arbitrary number of consecutive rdfs:subClassOf relations.

SELECT x, name
  FROM TABLE(SEM_MATCH(
    '{ ?x foaf:name ?name .
       ?x rdf:type ?t .
       ?t rdfs:subClassOf* :Male }',
    SEM_Models('family'),
    null, 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/') 
                SEM_ALIAS('foaf',' http://xmlns.com/foaf/0.1/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-67 SPARQL Property Path (Using foaf:friendOf or foaf:knows Relationships)

Example 1-67 uses a property path to find all of Scott's close friends (those people reachable within two hops using foaf:friendOf or foaf:knows relationships).

SELECT name
  FROM TABLE(SEM_MATCH(
    '{ { :Scott (foaf:friendOf | foaf:knows) ?f }
       UNION
       { :Scott (foaf:friendOf | foaf:knows)/(foaf:friendOf | foaf:knows) ?f }
       ?f foaf:name ?name .
       FILTER (!sameTerm(?f, :Scott)) }',
    SEM_Models('family'),
    null, 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/'),
                SEM_ALIAS('foaf',' http://xmlns.com/foaf/0.1/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-68 Specifying Property Path Maximum Depth Value

Example 1-68 specifies a maximum depth of 12 for all property path expressions with the ALL_MAX_PP_DEPTH(n) query option value.

SELECT x, name
  FROM TABLE(SEM_MATCH(
    '{ ?x foaf:name ?name .
       ?x rdf:type ?t .
       ?t rdfs:subClassOf* :Male }',
    SEM_Models('family'),
    null, 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/') 
                SEM_ALIAS('foaf',' http://xmlns.com/foaf/0.1/')),
    null,
    null,
    ' ALL_MAX_PP_DEPTH(12) ', 
    null, null,
    'RDFUSER', 'NET1'));

Example 1-69 Specifying Property Path Join Hint

Example 1-69 shows an inline HINT0 query optimizer hint that requests a nested loop join for evaluating the property path expression.

SELECT x, name
  FROM TABLE(SEM_MATCH(
    '{ # HINT0={ USE_PP_NL }
       ?x foaf:name ?name .
       ?x rdf:type ?t .
       ?t rdfs:subClassOf* :Male }',
    SEM_Models('family'),
    null, 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/') 
                SEM_ALIAS('foaf',' http://xmlns.com/foaf/0.1/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.8 Graph Patterns: Support for SPARQL 1.1 Federated Query

SEM_MATCH supports SPARQL 1.1 Federated Query (see http://www.w3.org/TR/sparql11-federated-query/#SPROT). The SERVICE construct can be used to retrieve results from a specified SPARQL endpoint URL. With this capability, you can combine local RDF data (native RDF data or RDF views of relational data) with other, possibly remote, RDF data served by a W3C standards-compliant SPARQL endpoint.

Example 1-70 SPARQL SERVICE Clause to Retrieve All Triples

Example 1-70 shows a query that uses a SERVICE clause to retrieve all triples from the SPARQL endpoint available at http://www.example1.org/sparql.

SELECT s, p, o
  FROM TABLE(SEM_MATCH(
    'SELECT ?s ?p ?o
     WHERE {
       SERVICE <http://www.example1.org/sparql>{ ?s ?p ?o }
     }',
    SEM_Models('electronics'),
    null, null, null, null, ' ', 
    null, null,
    'RDFUSER', 'NET1'));

Example 1-71 SPARQL SERVICE Clause to Join Remote and Local RDF Data

Example 1-71 joins remote RDF data with local RDF data. This example joins camera types ?cType from local RDF graph electronics with the camera names ?name from the SPARQL endpoint at http://www.example1.org/sparql.

SELECT cType, name
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?cType ?name
     WHERE {
       ?s :cameraType ?cType
       SERVICE <http://www.example1.org/sparql>{ ?s :name ?name }
      }',
    SEM_Models('electronics'),
    null, null, null, null, ' ', 
    null, null,
    'RDFUSER', 'NET1'));

1.7.8.1 Privileges Required to Execute Federated SPARQL Queries

You need certain database privileges to use the SERVICE construct within SEM_MATCH queries. You should be granted EXECUTE privilege on the SPARQL_SERVICE function by a user with DBA privileges. The following example grants this access to a user named RDFUSER:

grant execute on sparql_service to rdfuser;

Also, an Access Control List (ACL) should be used to grant the CONNECT privilege to the user attempting a federated query. Example 1-72 creates a new ACL to grant the user RDFUSER the CONNECT privilege and assigns the domain * to the ACL. For more information about ACLs, see Oracle Database PL/SQL Packages and Types Reference.

Example 1-72 Access Control List and Host Assignment

dbms_network_acl_admin.create_acl (
  acl       => 'rdfuser.xml',
  description => 'Allow rdfuser to query SPARQL endpoints',
  principal => 'RDFUSER',
  is_grant  => true,
  privilege => 'connect'
);
 
dbms_network_acl_admin.assign_acl (
  acl  => 'rdfuser.xml',
  host => '*'
);      

After the necessary privileges are granted, you are ready to execute federated queries from SEM_MATCH

1.7.8.2 SPARQL SERVICE Join Push Down

The SPARQL SERVICE Join Push Down (SERVICE_JPDWN=T) feature can be used to improve the performance of certain SPARQL SERVICE queries. By default, the query pattern within the SERVICE clause is executed first on the remote SPARQL endpoint. The full result of this remote execution is then joined with the local portion of the query. This strategy can result in poor performance if the local portion of the query is very selective and the remote portion of the query is very unselective.

The SPARQL SERVICE Join Push Down feature cannot be used in a query that contains more than one SERVICE clause.

Example 1-73 SPARQL SERVICE Join Push Down

Example 1-73 shows the SPARQL SERVICE Join Push Down feature.

SELECT s, prop, obj
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?s ?prop ?obj
     WHERE {
       ?s rdf:type   :Camera .
       ?s :modelName "Camera 12345"
       SERVICE <http://www.example1.org/sparql> { ?s ?prop ?obj }
      }',
    SEM_Models('electronics'),
    null, null, null, null, ' SERVICE_JPDWN=T ', 
    null, null,
    'RDFUSER', 'NET1'));

In Example 1-73, the local portion of the query will return a very small number of rows, but the remote portion of the query is completely unbound and will return the entire remote dataset. When the SERVICE_JPDWN=T option is specified, SEM_MATCH performs a nested-loop style evaluation by first executing the local portion of the query and then executing a modified version of the remote query once for each row returned by the local portion. The remote query is modified with a FILTER clause that effectively performs a substitution for the join variable ?s. For example, if <urn:camera1> and <urn:camera2> are returned from the local portion of Example 1-73 as bindings for ?s, then the following two queries are sent to the remote endpoint: { ?s ?prop ?obj FILTER (?s = <urn:camera1>) } and { s ?prop ?obj FILTER (?s = <urn:camera2>) }.

1.7.8.3 SPARQL SERVICE SILENT

When the SILENT keyword is used in federated queries, errors while accessing the specified remote SPARQL endpoint will be ignored. If the SERVICE SILENT request fails, a single solution with no bindings will be returned.

Example 1-74 uses SERVICE with the SILENT keyword inside an OPTIONAL clause, so that, when connection errors accessing http://www.example1.org/sparql appear, such errors will be ignored and all the rows retrieved from triple ?s :cameratype ?k will be combined with a null value for ?n.

Example 1-74 SPARQL SERVICE with SILENT Keyword

SELECT s, n
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/electronics/> 
     SELECT ?s ?n
     WHERE {
       ?s :cameraType ?k
       OPTIONAL { SERVICE SILENT <http://www.example1.org/sparql>{ ?k :name ?n } }
      }',
    SEM_Models('electronics'),
    null, null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.8.4 Using a Proxy Server with SPARQL SERVICE

The following methods are available for sending SPARQL SERVICE requests through an HTTP proxy:

  • Specifying the HTTP proxy that should be used for requests in the current session. This can be done through the SET_PROXY function of UTL_HTTP package. Example 1-75 sets the proxy proxy.example.com to be used for HTTP requests, excluding those to hosts in the domain example2.com. (For more information about the SET_PROXY procedure, see Oracle Database PL/SQL Packages and Types Reference.)

  • Using the SERVICE_PROXY SEM_MATCH option, which allows setting the proxy address for SPARQL SERVICE request. However, in this case no exceptions can be specified, and all requests are sent to the given proxy server. Example 1-76 shows a SEM_MATCH query where the proxy address proxy.example.com at port 80 is specified.

Example 1-75 Setting Proxy Server with UTL_HTTP.SET_PROXY

BEGIN
  UTL_HTTP.SET_PROXY('proxy.example.com:80', 'example2.com');
END;
/

Example 1-76 Setting Proxy Server in SPARQL SERVICE

SELECT *
  FROM TABLE(SEM_MATCH(
    'SELECT *
     WHERE {
       SERVICE <http://www.example1.org/sparql>{ ?s ?p ?o }
      }',
    SEM_Models('electronics'),
    null, null, null, null, ' SERVICE_PROXY=proxy.example.com:80 ', 
    null, null,
    'RDFUSER', 'NET1'));

1.7.8.5 Accessing SPARQL Endpoints with HTTP Basic Authentication

To allow accessing of SPARQL endpoints with HTTP Basic Authentication, user credentials should be saved in Session Context SDO_SEM_HTTP_CTX. A user with DBA privileges must grant EXECUTE on this context to the user that wishes to use basic authentication. The following example grants this access to a user named RDFUSER:

grant execute on mdsys.sdo_sem_http_ctx to rdfuser;

After the privilege is granted, the user should save the user name and password for each SPARQL Endpoint with HTTP Authentication through functions mdsys.sdo_sem_http_ctx.set_usr and mdsys.sdo_sem_http_ctx.set_pwd. The following example sets a user name and password for the SPARQL endpoint at http://www.example1.org/sparql:

BEGIN
  mdsys.sdo_sem_http_ctx.set_usr('http://www.example1.org/sparql','user');
  mdsys.sdo_sem_http_ctx.set_pwd('http://www.example1.org/sparql','pwrd');
END;
/

1.7.9 Inline Query Optimizer Hints

In SEM_MATCH, the SPARQL comment construct has been overloaded to allow inline HINT0 query optimizer hints. In SPARQL, the hash (#) character indicates that the remainder of the line is a comment. To associate an inline hint with a particular BGP, place a HINT0 hint string inside a SPARQL comment and insert the comment between the opening curly bracket ({) and the first triple pattern in the BGP. Inline hints enable you to influence the execution plan for each BGP in a query.

Inline optimizer hints override any hints passed to SEM_MATCH through the options argument. For example, a global ALL_ORDERED hint applies to each BGP that does not specify an inline optimizer hint, but those BGPs with an inline hint use the inline hint instead of the ALL_ORDERED hint.

Example 1-77 Inline Query Optimizer Hints (BGP_JOIN)

The following example shows a query with inline query optimizer hints.

SELECT x, y, hp, cp
  FROM TABLE(SEM_MATCH(
    '{ # HINT0={ LEADING(t0) USE_NL(?x ?y ?bd) }
      ?x :grandParentOf ?y . ?x rdf:type :Male . ?x :birthDate ?bd
      OPTIONAL { # HINT0={ LEADING(t0 t1) BGP_JOIN(USE_HASH) }
                 ?x :homepage ?hp . ?x :cellPhoneNum ?cp }
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

The BGP_JOIN hint influences inter-BGP joins and has the following syntax: BGP_JOIN(<join_type>), where <join_type> is USE_HASH or USE_NL. Example 1-77 uses the BGP_JOIN(USE_HASH) hint to specify that a hash join should be used when joining the OPTIONAL BGP with its parent BGP.

Inline optimizer hints override any hints passed to SEM_MATCH through the options argument. For example, a global ALL_ORDERED hint applies to each BGP that does not specify an inline optimizer hint, but those BGPs with an inline hint use the inline hint instead of the ALL_ORDERED hint.

Example 1-78 Inline Query Optimizer Hints (ANTI_JOIN)

The ANTI_JOIN hint influences the evaluation of NOT EXISTS and MINUS clauses. This hint has the syntax ANTI_JOIN(<join_type>), where <join_type> is HASH_AJ, NL_AJ, or MERGE_AJ. The following example uses a hint to indicate that a hash anti join should be used. Global ALL_AJ_HASH, ALL_AJ_NL, ALL_AJ_MERGE can be used in the options argument of SEM_MATCH to influence the join type of all NOT EXISTS and MINUS clauses in the entire query.

SELECT x, y
  FROM TABLE(SEM_MATCH(
    'SELECT ?x ?y
     WHERE {
       ?x :grandParentOf ?y . ?x rdf:type :Male . ?x :birthDate ?bd
       FILTER ( 
         NOT EXISTS {# HINT0={ ANTI_JOIN(HASH_AJ) }
                     ?x :homepage ?hp . ?x :cellPhoneNum ?cp })
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

Example 1-79 Inline Query Optimizer Hints (NON_NULL)

HINT0={ NON_NULL} is supported in SPARQL SELECT clauses to signify that a particular variable is always bound (that is, has a non-null value in each result row). This hint allows the query compiler to optimize joins for values produced by SELECT expressions. These optimizations cannot be applied by default because it cannot be guaranteed that expressions will produce non-null values for all possible input. If you know that a SELECT expression will not produce any null values for a particular query, using this NON_NULL hint can significantly increase performance. This hint should be specified in the comment in a line before the 'AS' keyword of a SELECT expression.

The following example shows the NON_NULL hint option used in a SEM_MATCH query, specifying that variable ?full_name is definitely bound.

SELECT s, t
  FROM TABLE(SEM_MATCH(
    'SELECT * WHERE {
       ?s :name ?full_name
       { SELECT (CONCAT(?fname, " ", ?lname) # HINT0={ NON_NULL }
                 AS ?full_name)
         WHERE { 
           ?t :fname ?fname .
           ?t :lname ?lname } 
         } 
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')),
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.10 Full-Text Search

The Oracle-specific orardf:textContains SPARQL FILTER function uses full-text indexes on the RDF_VALUE$ table. This function has the following syntax (where orardf is a built-in prefix that expands to <http://xmlns.oracle.com/rdf/>):

orardf:textContains(variable, pattern)

The first argument to orardf:textContains must be a local variable (that is, a variable present in the BGP that contains the orardf:textContains filter), and the second argument must be a constant plain literal.

For example, orardf:textContains(x, y) returns true if x matches the expression y, where y is a valid expression for the Oracle Text SQL operator CONTAINS. For more information about such expressions, see Oracle Text Reference.

Before using orardf:textContains, you must create an Oracle Text index for the RDF network. To create such an index, invoke the SEM_APIS.ADD_DATATYPE_INDEX procedure as follows:

EXECUTE SEM_APIS.ADD_DATATYPE_INDEX('http://xmlns.oracle.com/rdf/text', network_owner=>'RDFUSER', network_name=>'NET1');

Performance for wildcard searches like orardf:textContains(?x, "%abc%") can be improved by using prefix and substring indexes. You can include any of the following options to the SEM_APIS.ADD_DATATYPE_INDEX procedure:

  • PREFIX_INDEX=TRUE – for adding prefix index

  • PREFIX_MIN_LENGTH=<number> – minimum length for prefix index tokens

  • PREFIX_MAX_LENGTH=<number> – maximum length for prefix index tokens

  • SUBSTRING_INDEX=TRUE – for adding substring index

  • LOGGING=T – to enable logging for text index

For more information about Oracle Text indexing elements, see Oracle Text Reference.

When performing large bulk loads into a RDF network with a text index, the overall load time may be faster if you drop the text index, perform the bulk load, and then re-create the text index. See Using Data Type Indexes for more information about data type indexing.

After creating a text index, you can use the orardf:textContains FILTER function in SEM_MATCH queries. Example 1-80 uses orardf:textContains to find all grandfathers whose names start with the letter A or B.

Example 1-80 Full-Text Search

SELECT x, y, n
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/>
     SELECT * 
     WHERE { 
       ?x :grandParentOf ?y . ?x rdf:type :Male . ?x :name ?n 
       FILTER (orardf:textContains(?n, " A% | B% ")) }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null, null, null, ' ', null, null,
   'RDFUSER', 'NET1'));

Example 1-81 orardf:textScore

The ancillary operator orardf:textScore can be used in combination with orardf:textContains to rank results by the goodness of their text match. There are, however, limitations when using orardf:textScore. The orardf:textScore invocation must appear as a SELECT expression in the SELECT clause immediately surrounding the basic graph pattern that contains the corresponding orardf:textContains FILTER. The alias for this SELECT expression can then be used in other parts of the query. In addition, a REWRITE=F' query hint must be used in the options argument of SEM_MATCH.

The following example finds text matches with score greater than 0.5. Notice that an additional invocation id argument is required for orardf:textContains, so that it can be linked to the orardf:textScore invocation with the same invocation id. The invocation ID is an arbitrary integer constant used to match a primary operator with its ancillary operator.

SELECT x, y, n, scr
  FROM TABLE(SEM_MATCH(
    'PREFIX <http://www.example.org/family/>
     SELECT *
     WHERE {
       { SELECT ?x ?y ?n (orardf:textScore(123) AS ?scr)
         WHERE { 
           ?x :grandParentOf ?y . ?x rdf:type :Male . ?x :name ?n 
           FILTER (orardf:textContains(?n, " A% | B% ", 123)) }
       }
       FILTER (?scr > 0.5)
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null,           
    null,
    null,
    ' REWRITE=F ', 
    null, null,
    'RDFUSER', 'NET1'));

Example 1-82 orardf:like

For a lightweight text search, you can use the orardf:like function, which performs simple test for pattern matching using the Oracle SQL operator LIKE. The orardf:like function has the following syntax:

  • orardf:like(string, pattern)
  • orardf:like(string, pattern, flags)

The first argument of orardf:like can be any variable or RDF term, as opposed to orardf:Contains, which requires the first argument to be a local variable. When the first argument to orardf:like is a URI, the match is performed against the URI suffix only. The second argument must be a pattern expression, which can contain the following special pattern-matching characters:

  • The percent sign (%) can match zero or more characters.
  • The underscore (_) matches exactly one character.

The flags argument must be a constant string. The flag "i" is supported to allow a case-insensitive search.

The following example shows a percent sign (%) wildcard search to find all grandparents whose URIs start with Ja.

SELECT x, y, n
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/>
     SELECT * 
     WHERE { 
        ?x :grandParentOf ?y . ?y :name ?n 
        FILTER (orardf:like(?x, "Ja%")) }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

The following example shows an underscore (_) wildcard search to find all the grandchildren whose names start with J followed by two characters and end with k. The case-insensitive flag "i" is used to make the search case-insensitive.

SELECT x, y, n
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/>
     SELECT * 
     WHERE { 
        ?x :grandParentOf ?y . ?y :name ?n 
        FILTER (orardf:like(?n, "j__k", "i")) 
    }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    
    null, null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

For efficient execution of orardf:like, you can create an index using the SEM_APIS.ADD_DATATYPE_INDEX procedure with http://xmlns.oracle.com/rdf/like as the data type URI. This index can speed up queries when the first argument is a local variable and the leading character of the search pattern is not a wildcard. The underlying index is a simple function-based B-Tree index on a varchar function, which has lower maintenance and storage costs than a full Oracle Text index. The index for orardf:like is created as follows:

EXECUTE SEM_APIS.ADD_DATATYPE_INDEX('http://xmlns.oracle.com/rdf/like');

1.7.11 Spatial Support

RDF Graph supports storage and querying of spatial geometry data through the OGC GeoSPARQL standard and through Oracle-specific SPARQL extensions. Geometry data can be stored as orageo:WKTLiteral, ogc:wktLiteral, ogc:gmlLiteral, ogc:geoJSONLiteral, or ogc:kmlLiteral typed literals, and geometry data can be queried using several query functions for spatial operations. Spatial indexing for increased performance is also supported.

orageo is a built-in prefix that expands to <http://xmlns.oracle.com/rdf/geo/>, ogc is a built-in prefix that expands to <http://www.opengis.net/ont/geosparql#>, and ogcf is a built-in prefix that expands to <http://www.opengis.net/def/function/geosparql>.

1.7.11.1 OGC GeoSPARQL Support

RDF Graph supports the following conformance classes for the OGC GeoSPARQL standard (http://www.opengeospatial.org/standards/geosparql) using well-known text (WKT) serialization and the Simple Features relation family.

  • Core
  • Topology Vocabulary Extension (Simple Features)
  • Geometry Extension (WKT, 1.2.0)
  • Geometry Topology Extension (Simple Features, WKT, 1.2.0)
  • RDFS Entailment Extension (Simple Features, WKT, 1.2.0)

RDF Graph supports the following conformance classes for OGC GeoSPARQL using Geography Markup Language (GML) serialization and the Simple Features relation family.

  • Core
  • Topology Vocabulary Extension (Simple Features)
  • Geometry Extension (GML, 3.1.1)
  • Geometry Topology Extension (Simple Features, GML, 3.1.1)
  • RDFS Entailment Extension (Simple Features, GML, 3.1.1)

RDF Graph supports the following conformance classes for OGC GeoSPARQL using Geographic JavaScript Object Notation (GeoJSON) serialization and the Simple Features relation family.

  • Core
  • Topology Vocabulary Extension (Simple Features)
  • Geometry Extension (GeoJSON, 1.0)
  • Geometry Topology Extension (Simple Features, GeoJSON, 1.0)
  • RDFS Entailment Extension (Simple Features, GeoJSON, 1.0)

RDF Graph supports the following conformance classes for OGC GeoSPARQL using Keyhole Markup Language (KML) serialization and the Simple Features relation family.

  • Core
  • Topology Vocabulary Extension (Simple Features)
  • Geometry Extension (KML, 2.1)
  • Geometry Topology Extension (Simple Features, KML, 2.1)
  • RDFS Entailment Extension (Simple Features, KML, 2.1)

Specifics for representing and querying spatial data using GeoSPARQL are covered in sections that follow this one.

1.7.11.2 Representing Spatial Data in RDF

Spatial geometries can be represented in RDF as orageo:WKTLiteral, ogc:wktLiteral, ogc:gmlLiteral, ogc:geoJSONLiteral, or ogc:kmlLiteral typed literals. In this document, the term geometry literal is used to refer to an RDF literal that is any one of these five literal types.

Example 1-83 Spatial Point Geometry Represented as orageo:WKTLiteral

The following example shows the orageo:WKTLiteral encoding for a simple point geometry.

"Point(-83.4 34.3)"^^<http://xmlns.oracle.com/rdf/geo/WKTLiteral>

Example 1-84 Spatial Point Geometry Represented as ogc:wktLiteral

The following example shows the ogc:wktLiteral encoding for the same point as in the preceding example.

"Point(-83.4 34.3)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

Both orageo:WKTLiteral and ogc:wktLiteral encodings consist of an optional spatial reference system URI, followed by a Well-Known Text (WKT) string that encodes a geometry value. The spatial reference system URI and the WKT string should be separated by a whitespace character.

Supported spatial reference system URIs have the following form <http://www.opengis.net/def/crs/EPSG/0/{srid}>, where {srid} is a valid spatial reference system ID defined by the European Petroleum Survey Group (EPSG). For URIs that are not in the EPSG Geodetic Parameter Dataset, the spatial reference system URIs used have the form <http://xmlns.oracle.com/rdf/geo/srid/{srid}>, where {srid} is a valid spatial reference system ID from Oracle Spatial. If a geometry literal value does not include a spatial reference system URI, then the default spatial reference system, WGS84 Longitude-Latitude (URI <http://www.opengis.net/def/crs/OGC/1.3/CRS84>), is used. The same default spatial reference system is used when geometry literal values are encountered in a query string.

Example 1-85 Spatial Point Geometry Represented as ogc:gmlLiteral

The following example shows the ogc:gmlLiteral encoding for a point geometry.

"<gml:Point srsName=\"urn:ogc:def:crs:EPSG::8307\" xmlns:gml=\"http://www.opengis.net/gml\"><gml:posList srsDimension=\"2\">-83.4 34.3</gml:posList></gml:Point>"^^<http://www.opengis.net/ont/geosparql#gmlLiteral>

ogc:gmlLiteral encodings consist of a valid element from the GML schema that implements a subtype of GM_Object. In contrast to WKT literals, A GML encoding explicitly includes spatial reference system information, so a spatial reference system URI prefix is not needed.

Example 1-86 Spatial Polygon Geometry Represented as ogc:geoJSONLiteral

The following example shows a valid ogc:geoJSONLiteral encoding for a polygon geometry.

"{ \"type\": \"Polygon\", \"coordinates\": [ [ [-75, 44], [-75, 42], [-72, 42], 
[-72, 45], [-74, 45], [-75, 44] ] ] }"^^<http://www.opengis.net/ont/geosparql#geoJSONLiteral>

ogc:geoJSONLiteral encodings consist of a valid GeoJSON serialization of a geometry object. ogc:geoJSONLiterals are always interpreted using WGS84 geodetic longitude-latitude spatial reference system.

Example 1-87 Spatial Polygon Geometry Represented as ogc:kmlLiteral

The following example shows the ogc:kmlLiteral encoding for a polygon geometry.

"<Polygon><extrude>0</extrude><tessellate>0</tessellate><altitudeMode>relativeToGround</altitudeMode>
<outerBoundaryIs><LinearRing><coordinates>-73.0,44.0 -71.0,44.0 -71.0,47.0 -73.0,47.0 -73.0,44.0 </coordinates>
</LinearRing></outerBoundaryIs></Polygon>"^^<http://www.opengis.net/ont/geosparql#kmlLiteral>

ogc:kmlLiteral encodings consist of a valid KML geometry serialization. ogc:kmlLiterals are always interpreted using WGS84 geodetic longitude-latitude spatial reference system.

Several geometry types can be represented as geometry literal values, including point, linestring, polygon, polyhedral surface, triangle, TIN, multipoint, multi-linestring, multipolygon, and geometry collection. Up to 500,000 vertices per geometry are supported for two-dimensional geometries.

Example 1-88 Spatial Data Encoded Using ogc:wktLiteral Values

The following example shows some RDF spatial data (in N-triple format) encoded using ogc:wktLiteral values. In this example, the first two geometries (in lot1) use the default WGS84 coordinate system (SRID 4326), but the other two geometries (in lot2) specify SRID 4269.

# spatial data for lot1 using the default WGS84 Longitude-Latitude spatial reference system
<urn:lot1> <urn:hasExactGeometry> "Polygon((-83.6 34.1, -83.6 34.5, -83.2 34.5, -83.2 34.1, -83.6 34.1))"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<urn:lot1> <urn:hasPointGeometry> "Point(-83.4 34.3)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
# spatial data for lot2 using the NAD83 Longitude-Latitude spatial reference system
<urn:lot2> <urn:hasExactGeometry> "<http://www.opengis.net/def/crs/EPSG/0/4269> Polygon((-83.6  34.1, -83.6 34.3, -83.4 34.3, -83.4 34.1, -83.6 34.1))"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<urn:lot2> <urn:hasPointGeometry> "<http://www.opengis.net/def/crs/EPSG/0/4269> Point(-83.5 34.2)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .

For more information, see the chapter about coordinate systems (spatial reference systems) in Oracle Spatial Developer's Guide. See also the material about the WKT geometry representation in the Open Geospatial Consortium (OGC) Simple Features document, available at: http://www.opengeospatial.org/standards/sfa

1.7.11.3 Validating Geometries

Before manipulating spatial data, you should check that there are no invalid geometry literals stored in your RDF graph. The procedure SEM_APIS.VALIDATE_GEOMETRIES allows verifying geometries in an RDF graph. The geometries are validated using an input SRID and tolerance value. (SRID and tolerance are explained in Indexing Spatial Data.)

If there are invalid geometries, a table with name {graph_name}_IVG$, is created in the user schema, where {graph_name} is the name of the RDF graph specified. Such table contains, for each invalid geometry literal, the value_id of the geometry literal in the RDF_VALUE$ table, the error message explaining the reason the geometry is not valid and a corrected geometry literal if the geometry can be rectified. For more information about geometry validation, see the reference information for the Oracle Spatial subprograms SDO_GEOM.VALIDATE_GEOMETRY_WITH_CONTEXT and SDO_GEOM.VALIDATE_LAYER_WITH_CONTEXT.

Example 1-89 Validating Geometries in an RDF Graph

The following example validates an RDF graph m, using SRID=8307 and tolerance=0.1.

-- Validate
EXECUTE sem_apis.validate_geometries(RDF graph_name=>'m',SRID=>8307,tolerance=>0.1, network_owner=>'RDFUSER', network_name=>'NET1');-- Check for invalid geometries
SELECT original_vid, error_msg, corrected_wkt_literal FROM M_IVG$;

1.7.11.4 Indexing Spatial Data

Before you can use any of the SPARQL extension functions (introduced in Querying Spatial Data) to query spatial data, you must create a spatial index on the RDF network by calling the SEM_APIS.ADD_DATATYPE_INDEX procedure.

When you create the spatial index, you must specify the following information:

  • SRID - The ID for the spatial reference system in which to create the spatial index. Any valid spatial reference system ID from Oracle Spatial and Graph can be used as an SRID value.

    Note:

    If you plan to use geospatial RDF data in web-based mapping applications like Oracle Spatial Studio, it is recommended to pre-transform your data to WGS84 longitude-latitude (SRID 4326 or 8307) and also use SRID 4326 or 8307 for your spatial index. This will improve performance by avoiding repeated coordinate transformations to WGS84 longitude-latitude for display on a map.
  • TOLERANCE – The tolerance value for the spatial index. Tolerance is a positive number indicating how close together two points must be to be considered the same point. The units for this value are determined by the default units for the SRID used (for example, meters for WGS84 Long-Lat). Tolerance is explained in detail in Oracle Spatial Developer's Guide.

  • DIMENSIONS - A text string encoding dimension information for the spatial index. Each dimension is represented by a sequence of three comma-separated values: name, minimum value, and maximum value. Each dimension is enclosed in parentheses, and the set of dimensions is enclosed by an outer parenthesis.

Example 1-90 Adding a Spatial Data Type Index on RDF Data

Example 1-90 adds a spatial data type index on the RDF network, specifying the WGS84 Longitude-Latitude spatial reference system, a tolerance value of 0.1, and the recommended dimensions for the indexing of spatial data that uses this coordinate system. The TOLERANCE, SRID, and DIMENSIONS keywords are case sensitive, and creating a data type index for any supported geometry literal type ( <http://xmlns.oracle.com/rdf/geo/WKTLiteral>, <http://www.opengis.net/ont/geosparql#wktLiteral>, <http://www.opengis.net/ont/geosparql#gmlLiteral>, <http://www.opengis.net/ont/geosparql#geoJSONLiteral>, or <http://www.opengis.net/ont/geosparql#kmlLiteral>) will create an index for all the supported geometry literal types. For example, if you create an index for ogc:wktLiteral, any orageo:WKTLiteral, ogc:gmlLiteral, ogc:geoJSONLiteral, and ogc:kmlLiteral geometry literals will also be indexed.

EXECUTE sem_apis.add_datatype_index('http://www.opengis.net/ont/geosparql#wktLiteral',  
                    options=>'TOLERANCE=0.1 SRID=8307 DIMENSIONS=((LONGITUDE,-180,180) (LATITUDE,-90,90))', 
                    network_owner=>'RDFUSER', network_name=>'NET1');

No more than one spatial data type index is supported for an RDF network. Geometry literal values stored in the RDF network are automatically normalized to the spatial reference system used for the index, so a single spatial index can simultaneously support geometry literal values from different spatial reference systems. This coordinate transformation is done transparently for indexing and spatial computations. When geometry literal values are returned from a SEM_MATCH query, the original, untransformed geometry is returned.

For more information about spatial indexing, see the chapter about indexing and querying spatial data in Oracle Spatial Developer's Guide.

Example 1-91 Adding a Spatial Data Type Materialized Index on RDF Data

When you manipulate spatial data, conversions from geometry literals to geometry objects may be needed, but several conversions may lead to poor performance. To avoid this situation, all the stored geometry literals can be transformed into SDO_GEOMETRY objects and materialized at index creation time.

This can be achieved using the MATERIALIZE=T option when adding a spatial data type index. If the amount of geometry literals to be indexed is very large, using the option INS_AS_SEL=T may help to speed up the materialized index creation.

The following example shows the creation of a materialized spatial index.

EXECUTE sem_apis.add_datatype_index('http://www.opengis.net/ont/geosparql#wktLiteral',  
             options=>'TOLERANCE=0.1 SRID=8307 DIMENSIONS=((LONGITUDE,-180,180) (LATITUDE,-90,90)) MATERIALIZE=T ');

Example 1-92 Adding a 3D Spatial Data Type Index on RDF Data

Spatial indexes with three coordinates can be created in Oracle Spatial. To create a 3D index, you must specify SDO_INDX_DIMS=3 option in the options argument of the SEM_APIS.ADD_DATATYPE_INDEX procedure.

The following example shows creation and indexing of 3D data. Note that coordinates are specified in (X, Y, Z) order, and linear rings for outer polygon boundaries are given in counter-clockwise order.

Note: For information about support for geometry operations with 3D data, including any restrictions, see Three Dimensional Spatial Objects.

conn rdfuser/<password>;

create table geo3d_tab(tri sdo_rdf_triple_s);

exec sem_apis.create_sem_model('geo3d','geo3d_tab','tri');

-- 3D Polygon
insert into geo3d_tab(tri) values(sdo_rdf_triple_s('geo3d','<http://example.org/ApplicationSchema#A>', 
                                  '<http://example.org/ApplicationSchema#hasExactGeometry>', 
                                  '<http://example.org/ApplicationSchema#AExactGeom>'));
insert into geo3d_tab(tri) values(sdo_rdf_triple_s('geo3d','<http://example.org/ApplicationSchema#AExactGeom>', 
                                  '<http://www.opengis.net/ont/geosparql#asWKT>', 
                                  '"<http://xmlns.oracle.com/rdf/geo/srid/31468> Polygon ((4467504.578 5333958.396 513.9, 
                                                                  4467508.939 5333956.379 513.9, 
                                                                  4467509.736 5333958.101 513.9, 
                                                                  4467505.374 5333960.118 513.9, 
                                                                  4467504.578 5333958.396 513.9))"^^<http://www.opengis.net/ont/geosparql#wktLiteral>'));

-- 3D Point at same elevation as Polygon
insert into geo3d_tab(tri) values(sdo_rdf_triple_s('geo3d','<http://example.org/ApplicationSchema#B>', 
                                                   '<http://example.org/ApplicationSchema#hasExactGeometry>', 
                                                   '<http://example.org/ApplicationSchema#BExactGeom>'));
insert into geo3d_tab(tri) values(sdo_rdf_triple_s('geo3d','<http://example.org/ApplicationSchema#BExactGeom>', 
                                                   '<http://www.opengis.net/ont/geosparql#asWKT>', 
                                                   '"<http://xmlns.oracle.com/rdf/geo/srid/31468> Point (4467505.000 5333959.000 513.9)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>'));

-- 3D Point at different elevation from Polygon
insert into geo3d_tab(tri) values(sdo_rdf_triple_s('geo3d','<http://example.org/ApplicationSchema#C>', 
                                                   '<http://example.org/ApplicationSchema#hasExactGeometry>', 
                                                   '<http://example.org/ApplicationSchema#CExactGeom>'));
insert into geo3d_tab(tri) values(sdo_rdf_triple_s('geo3d','<http://example.org/ApplicationSchema#CExactGeom>', 
                                                   '<http://www.opengis.net/ont/geosparql#asWKT>', 
                                                   '"<http://xmlns.oracle.com/rdf/geo/srid/31468> Point (4467505.000 5333959.000 13.9)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>'));
commit;

-- Create 3D index
conn system/manager;
exec sem_apis.add_datatype_index('http://www.opengis.net/ont/geosparql#wktLiteral' ,
                                  options=>'TOLERANCE=0.1 SRID=3148 
                                  DIMENSIONS=((x,4386596.4101,4613610.5843) (y,5237914.5325,6104496.9694) (z,0,10000)) 
                                  SDO_INDX_DIMS=3 ');

conn rdfuser/rdfuser;
-- Find geometries within 200 M of my:A
-- Returns only one point because of 3D index
SELECT aGeom, f, fGeom, aWKT, fWKT
FROM TABLE(SEM_MATCH(
 '{ my:A my:hasExactGeometry ?aGeom .
     ?aGeom ogc:asWKT ?aWKT .
     ?f my:hasExactGeometry ?fGeom .
     ?fGeom ogc:asWKT ?fWKT .
     FILTER (orageo:withinDistance(?aWKT, ?fWKT,200,"M") &&
             !sameTerm(?aGeom,?fGeom))
   }',
 SEM_Models('geo3d'),
 null,
 SEM_ALIASES(
  SEM_ALIAS('my','http://example.org/ApplicationSchema#')),
 null));

1.7.11.5 Querying Spatial Data

Several SPARQL extension functions are available for performing spatial queries in SEM_MATCH. For example, for spatial RDF data, you can find the area and perimeter (length) of a geometry, the distance between two geometries, and the centroid and the minimum bounding rectangle (MBR) of a geometry, and you can check various topological relationships between geometries.

SEM_MATCH Support for Spatial Queries contains reference and usage information about the available functions, including:

  • GeoSPARQL functions

  • Oracle-specific functions

1.7.11.6 Using Long Literals with GeoSPARQL Queries

Geometry literals can become very long, which make the use of CLOBs necessary to represent them when using a SQL interface. CLOB constants cannot be used directly in a SEM_MATCH query. However, a user-defined SPARQL function can be used to bind CLOB constants into SEM_MATCH queries. Note that long geometry literals can be used directly in SPARQL query strings when using Java or REST interfaces for SPARQL execution.

The following example uses a user-defined SPARQL function in combination with a temporary table to allow CLOB geometries in a SEM_MATCH query.

Example 1-93 Binding a CLOB Constant into a SPARQL Query

conn rdfuser/<password>;

-- Create temporary table
create global temporary table local_value$(
 VALUE_TYPE 	 VARCHAR2(10),
 VALUE_NAME     VARCHAR2(4000),
 LITERAL_TYPE	 VARCHAR2(1000),
 LANGUAGE_TYPE	 VARCHAR2(80),
 LONG_VALUE 	 CLOB)
on commit preserve rows;

-- Create user-defined function to transform a CLOB into an RDF term
CREATE OR REPLACE FUNCTION myGetClobTerm
RETURN SDO_RDF_TERM
AS
  term SDO_RDF_TERM;
BEGIN
  select sdo_rdf_term(
      value_type,
      value_name,
      literal_type,
      language_type,
      long_value)
  into term
  from local_value$
  where rownum < 2;

  RETURN term;
END;
/

-- Insert a row with CLOB geometry
insert into local_value$(value_type,value_name,literal_type,language_type,long_value)
values ('LIT','','http://www.opengis.net/ont/geosparql#wktLiteral','','Some_CLOB_WKT');

-- Use the CLOB constant in a SEM_MATCH query
SELECT cdist
FROM table(sem_match(
'{ ?cdist ogc:asWKT ?cgeom 
   FILTER (
     orageo:withinDistance(?cgeom, oraextf:myGetClobTerm(), 200, "M")) }'
,sem_models('gov_all_vm')
,null, null, null, null, ' ALLOW_DUP=T ', null, null
,'RDFUSER', 'NET1'));

1.7.12 Flashback Query Support

You can perform SEM_MATCH queries that return past data using Flashback Query. A TIMESTAMP or a System Change Number (SCN) value is passed to SEM_MATCH through the AS_OF hint. The AS_OF hint can have one of the following forms:

  • AS_OF[TIMESTAMP,<TIMESTAMP_VALUE>], where <TIMESTAMP_VALUE> is a valid timestamp string with format 'YYYY/MM/DD HH24:MI:SS.FF'.

  • AS_OF[SCN,<SCN_VALUE>], where <SCN_VALUE> is a valid SCN.

The AS_OF hint is internally transformed to perform a Flashback Query (SELECT AS OF) against the queried table or view containing triples of the specified RDF graph. This allows you to query the graph as it existed in a prior time. For this feature to work, the invoker needs a flashback privilege on the queried metadata table or view (RDFM_rdf-graph-name view for native RDF graphs, SEMU_rdf-collection--name and SEMV_rdf-collection-name for RDF graph collections, and underlying relational tables for RDF view graphs). For example: grant flashback on RDFUSER.NET1#RDFM_FAMILY to scott

Restrictions on Using Flashback Query with RDF Data

Adding or removing a partition from a partitioned table disables Flashback Query for previous versions of the partitioned table. As a consequence, creating or dropping a native RDF graph or creating or dropping an inferred graph will disable Flashback Query for previous versions of all native RDF graphs in an RDF network. Therefore, be sure to control such operations when using Flashback Query in an RDF network.

Example 1-94 Flashback Query Using TIMESTAMP

The following example shows the use of the AS_OF clause defining a TIMESTAMP.

SELECT x, name
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/>
     SELECT * 
     WHERE { ?x :name ?name }',
    SEM_Models('family'),
    null, null,
    null,null,' AS_OF=[TIMESTAMP,2016/05/02 13:06:03.979546]', 
    null, null,
    'RDFUSER', 'NET1'));

Example 1-95 Flashback Query Using SCN

The following example shows the use of the AS_OF clause specifying an SCN.

SELECT x, name
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/>
     SELECT * 
     WHERE { ?x :name ?name }',
    SEM_Models('family'),
    null, null,
    null,null,' AS_OF=[SCN,1429849]', 
    null, null,
    'RDFUSER', 'NET1'));

1.7.13 Best Practices for Query Performance

This section describes some recommended practices for using the SEM_MATCH table function to query RDF data. It includes the following subsections:

1.7.13.1 FILTER Constructs Involving xsd:dateTime, xsd:date, and xsd:time

By default, SEM_MATCH complies with the XML Schema standard for comparison of xsd:date, xsd:time, and xsd:dateTime values. According to this standard, when comparing two calendar values c1 and c2 where c1 has an explicitly specified time zone and c2 does not have a specified time zone, c2 is converted into the interval [c2-14:00, c2+14:00]. If c2-14:00 <= c1 <= c2+14:00, then the comparison is undefined and will always evaluate to false. If c1 is outside this interval, then the comparison is defined.

However, the extra logic required to evaluate such comparisons (value with a time zone and value without a time zone) can significantly slow down queries with FILTER constructs that involve calendar values. For improved query performance, you can disable this extra logic by specifying FAST_DATE_FILTER=T in the options parameter of the SEM_MATCH table function. When FAST_DATE_FILTER=T is specified, all calendar values without time zones are assumed to be in Greenwich Mean Time (GMT).

Note that using FAST_DATE_FILTER=T does not affect query correctness when either (1) all calendar values in the data set have a time zone or (2) all calendar values in the data set do not have a time zone.

1.7.13.2 Indexes for FILTER Constructs Involving Typed Literals

The evaluation of SEM_MATCH queries involving the FILTER construct often uses order columns on the RDF_VALUE$ table. For example, the filter (?x < "1929-11-16Z"^^xsd:date) uses the ORDER_DATE column.

Indexes can be used to improve the performance of queries that contain a filter condition involving a typed literal. For example, an xsd:date index may speed up evaluation of the filter (?x < "1929-11-16Z"^^xsd:date).

Convenient interfaces are provided for creating, altering, and dropping these indexes for order columns. For more information, see Using Data Type Indexes.

Note, however, that the existence of these indexes on the RDF_VALUE$ table can significantly slow down bulk load operations. In many cases it may be faster to drop the indexes, perform the bulk load, and then re-create the indexes, as opposed to doing the bulk load with the indexes in place.

1.7.13.3 FILTER Constructs Involving Relational Expressions

The following recommendations apply to FILTER constructs involving relational expressions:

  • The orardf:sameCanonTerm extension function is the most efficient way to compare two RDF terms for equality because it allows an id-based comparison in all cases.

  • When using standard SPARQL features, the sameTerm built-in function is more efficient than using = or != when comparing two variables in a FILTER clause, so (for example) use sameTerm(?a, ?b) instead of (?a = ?b) and use (!sameTerm(?a, ?b)) instead of (?a != ?b) whenever possible.

  • When comparing values in FILTER expressions, you may get better performance by reducing the use of negation. For example, it is more efficient to evaluate (?x <= "10"^^xsd:int) than it is to evaluate the expression (!(?x > "10"^^xsd:int)).

1.7.13.4 Optimizer Statistics and Dynamic Sampling

Having sufficient statistics for the query optimizer is critical for good query performance. In general, you should ensure that you have gathered basic statistics for the RDF network using the SEM_PERF.GATHER_STATS procedure (described in SEM_PERF Package Subprograms).

Due to the inherent flexibility of the RDF graph, static information may not produce optimal execution plans for SEM_MATCH queries. Dynamic sampling can often produce much better query execution plans. Dynamic sampling levels can be set at the session or system level using the optimizer_dynamic_sampling parameter, and at the individual query level using the dynamic_sampling(level) SQL query hint. In general, it is good to experiment with dynamic sampling levels between 3 and 6. For information about estimating statistics with dynamic sampling, see Oracle Database SQL Tuning Guide.

Example 1-96 uses a SQL hint for a dynamic sampling level of 6.

Example 1-96 SQL Hint for Dynamic Sampling

SELECT /*+ DYNAMIC_SAMPLING(6) */ x, y
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/>
     SELECT * 
     WHERE {
      ?x :grandParentOf ?y . 
      ?x rdf:type :Male . 
      ?x :birthDate ?bd }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null, null, null, '', null, null,
    'RDFUSER', 'NET1'));

1.7.13.5 Multi-Partition Queries

The following recommendations apply to the use of multiple RDF graphs, RDF graphs plus inferred graphs, and RDF graph collections:

  • If you execute SEM_MATCH queries against multiple RDF graphs or against RDF graphs plus inferred graphs, you can probably improve query performance if you create a RDF graph collection (see RDF Graph Collections) that contains all the RDF graphs and inferred graphs you are querying and then query this single RDF graph collection.

  • Use the ALLOW_DUP=T query option. If you do not use this option, then an expensive (in terms of processing) duplicate-elimination step is required during query processing, in order to maintain set semantics for RDF data. However, if you use this option, the duplicate-elimination step is not performed, and this results in significant performance gains.

1.7.13.6 Compression on Systems with OLTP Index Compression

On systems where OLTP index compression is supported (such as Exadata). you can take advantage of the feature to improve the compression ratio for some of the B-tree indexes used by the RDF network.

For example, a DBA or the owner of a schema-private network can use the following command to change the compression scheme on the RDF_VAL_NAMETYLITLNG_IDX index from prefix compression to OLTP index compression:

SQL> alter index rdfuser.net1#RDF_VAL_NAMETYLITLNG_IDX rebuild compress for oltp high;

1.7.13.7 Unbounded Property Path Expressions

A depth-limited search should be used for + and * property path operators whenever possible. The depth-limited implementation for * and + is likely to significantly outperform the CONNECT BY-based implementation in large and/or highly connected graphs. A depth limit of 10 is used by default. For a given graph, depth limits larger than the graph's diameter are not useful. See Property Paths for more information on setting depth limits.

A backward chaining style inference using rdfs:subClassOf+ for ontologies with very deep class hierarchies may be an exception to this rule. In such cases, unbounded CONNECT BY-based evaluations may perform better than depth-limited evaluations with very high depth limits (for example, 50).

1.7.13.8 Nested Loop Pushdown for Property Paths

If an unbounded CONNECT BY evaluation is performed for a property path, and if the subject of the property path triple pattern is a variable, a CONNECT BY WITHOUT FILTERING operation will most likely be used. If this subject variable is only bound to a small number of values during query execution, a nested loop strategy (see Nested Loop Pushdown with Overloaded Service) could be a good option to run the query. In this case, the property path can be pushed down into an overloaded SERVICE clause and the OVERLOADED_NL=T hint can be used.

For example, consider the following query where there is an unbounded property path search { ?s :hasManager+ ?x }, but the triple { ?s :ename "ADAMS" } only has a small number of possible values for ?s.

select s, x
from table(sem_match(
'PREFIX : <http://scott-hr.org#>
 SELECT *
 WHERE {
   ?s :ename "ADAMS" .
   ?s :hasManager+ ?x .
 }',
sem_models('scott_hr_data'),
null,null,null,null,'  ALL_MAX_PP_DEPTH(0) ', null, null,
'RDFUSER', 'NET1'));

The query can be transformed to force the nested-loop strategy. Notice that the RDF graph specified in the SERVICE graph is the same as the RDF graph specified in the SEM_MATCH call.

select s, x
from table(sem_match(
'PREFIX : <http://scott-hr.org#>
 SELECT *
 WHERE {
   ?s :ename "ADAMS" .
   service oram:scott_hr_data { ?s :hasManager+ ?x . }
 }',
sem_models('scott_hr_data'),
null,null,null,null,'  ALL_MAX_PP_DEPTH(0) OVERLOADED_NL=T ', null, null,
'RDFUSER', 'NET1'));

With this nested-loop strategy, { ?s :hasManager_ ?x } is evaluated once for each value of ?s, and in each evaluation, a constant value is substituted for ?s. This constant in the subject position allows a CONNECT BY WITH FILTERING operation, which usually provides a substantial performance improvement.

1.7.13.9 Grouping and Aggregation

MIN, MAX and GROUP_CONCAT aggregates require special logic to fully capture SPARQL semantics for input of non-uniform type (for example, MAX(?x)). For certain cases where a uniform input type can be determined at compile time (for example, MAX(STR(?x)) – plain literal input), optimizations for built-in SQL aggregates can be used. Such optimizations generally give an order of magnitude increase in performance. The following cases are optimized:

  • MIN/MAX(<plain literal>)

  • MIN/MAX(<numeric>)

  • MIN/MAX(<dateTime>)

  • GROUP_CONCAT(<plain literal>)

Example 1-97 uses MIN/MAX(<numeric>) optimizations.

Example 1-97 Aggregate Optimizations

SELECT dept, minSal, maxSal
  FROM TABLE(SEM_MATCH(
    'SELECT ?dept (MIN(xsd:decimal(?sal)) AS ?minSal) (MAX(xsd:decimal(?sal)) AS ?maxSal)
     WHERE
       {?x :salary ?y . 
        ?x :department ?dept }
     GROUP BY ?dept',
    SEM_Models('hr_data'),
    null, null, null, null, '', null, null,
    'RDFUSER', 'NET1'));

1.7.13.10 Use of Bind Variables to Reduce Compilation Time

For some queries, query compilation can be more expensive than query execution, which can limit throughput on workloads of small queries. If the queries in your workload differ only in the constants used, then session context-based bind variables can be used to skip the compilation step for SEM_MATCH queries. See also Using Bind Variables with SEM_APIS.SPARQL_TO_SQL for a description of how to use JDBC bind variables and PL/SQL bind variables with SPARQL queries.

The following example shows how to use a session context in combination with a user-defined SPARQL function to compile a SEM_MATCH query once and then run it with different constants. The basic idea is to create a user-defined function that reads an RDF term value from the session context and returns it. A SEM_MATCH query with this function will read the RDF term value at run time; so when the session context variable changes, the same exact SEM_MATCH query will see a different value.

conn / as sysdba;
grant create any context to testuser;

conn testuser/testuser;

create or replace package MY_CTXT_PKG as  
  procedure set_attribute(name varchar2, value varchar2);  
  function get_attribute(name varchar2) return varchar2;  
end MY_CTXT_PKG;  
/

create or replace package body MY_CTXT_PKG as  
  procedure set_attribute(  
    name varchar2,  
    value varchar2  
  ) as  
  begin  
    dbms_session.set_context(namespace => 'MY_CTXT',  
                             attribute => name,  
                             value     => value );
  end;  

  function get_attribute(  
    name varchar2  
  ) return varchar2 as  
  begin  
    return sys_context('MY_CTXT', name);  
  end;  
end MY_CTXT_PKG;  
/

create or replace function myCtxFunc(  
  params in SDO_RDF_TERM_LIST  
) return SDO_RDF_TERM  
as  
  name varchar2(4000);  
  arg  SDO_RDF_TERM;  
begin  
  arg := params(1);  
  name := arg.value_name;  
  return SDO_RDF_TERM(my_ctxt_pkg.get_attribute(name));  
end;  
/  

CREATE OR REPLACE CONTEXT MY_CTXT using TESTUSER.MY_CTXT_PKG;

-- Set a value
exec MY_CTXT_PKG.set_attribute('value','<http://www.example.org/family/Martha>');

-- Query using the function
-- Note the use of HINT0={ NON_NULL } to allow the most efficient join
SELECT s, p, o
  FROM TABLE(SEM_MATCH(
    'SELECT ?s ?p ?o
     WHERE {
       BIND (oraextf:myCtxFunc("value") # HINT0={ NON_NULL } 
             AS ?s)
       ?s ?p ?o }',
    SEM_Models('family'),
    null,
    null,
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

-- Set another value
exec MY_CTXT_PKG.set_attribute('value','<http://www.example.org/family/Sammy>');

-- Now the same query runs for Sammy without recompiling
SELECT s, p, o
  FROM TABLE(SEM_MATCH(
    'SELECT ?s ?p ?o
     WHERE {
       BIND (oraextf:myCtxFunc("value") # HINT0={ NON_NULL } 
             AS ?s)
       ?s ?p ?o }',
    SEM_Models('family'),
    null,
    null,
    null, null, ' ', null, null,
    'RDFUSER', 'NET1'));

1.7.13.11 Non-Null Expression Hints

When performing a join of several graph patterns with common variables that can be unbound, a more complex join condition is needed to handle null values to avoid performance degradation. Unbound values can be introduced through SELECT expressions, binds, OPTIONAL clauses, and unions. In many cases, SELECT expressions are not expected to produce NULL values. In such cases, query performance can be substantially improved through use of an inline HINT0={ NON_NULL } hint to mark a specific SELECT expression as definitely non-null or through use of a DISABLE_NULL_EXPR_JOIN query option to signify that all SELECT expressions produce only non-null values.

The following example includes the global DISABLE_NULL_EXPR_JOIN hint to signify that variable ?fulltitle is always bound on both sides of the join. (See also Inline Query Optimizer Hints.)

SELECT s, t
  FROM TABLE(SEM_MATCH(
    'PREFIX : <http://www.example.org/family/>
     SELECT * WHERE {
       { SELECT ?s (CONCAT(?title, ". ", ?fullname) AS ?fulltitle)
         WHERE { ?s :fullname ?fullname .
                 ?s :title ?title }
       }
       { SELECT ?t (CONCAT(?title, ". ", ?fname, " ", ?lname) AS ?fulltitle)
         WHERE { 
         ?t :fname ?fname .
         ?t :lname ?lname . 
         ?t :title ?title } 
       } 
     }',
    SEM_Models('family'),
    SEM_Rulebases('RDFS','family_rb'), 
    null,
    null,
    null,
    ' DISABLE_NULL_EXPR_JOIN ', null, null,
    'RDFUSER', 'NET1'));

1.7.13.12 Automatic JOIN Hints

SEM_MATCH queries that are very unselective usually execute faster if the SQL engine uses HASH joins to evaluate joins between triple patterns. The SPARQL-to-SQL query translator used by SEM_MATCH will attempt to auto detect such queries and automatically add appropriate USE_HASH hints if the string AUTO_HINTS=T appears in the options argument string.

The following SEM_MATCH query uses AUTO_HINTS=T to automatically generate USE_HASH hints.

SELECT f, l, n, e
  FROM table(sem_match(
        'PREFIX : <http://www.example.com#> 
         SELECT ?f ?l ?n ?e
          WHERE { ?s :fname ?f . ?s :lname ?l . ?s :nickName ?n . ?s :email ?e }',
                sem_models('m1'),
                null,null,null,null,
                ' AUTO_HINTS=T ')
          );

1.7.13.13 RDF Network Indexes

RDF Network Indexes (described in Using RDF Network Indexes) are nonunique B-tree indexes on the RDF_LINK$ table. Network owners and DBAs can manage these indexes with various SEM_APIS procedures. Columns to index in RDF_LINK$ are identified by an index code, which is a sequence of the following letters (without repetition): P, C, S, G, M, H. These letters used in the index_code correspond to the following columns in RDF_LINK$: P_VALUE_ID (predicate), CANON_END_NODE_ID (object), START_NODE_ID (subject), G_ID (graph), MODEL_ID, and H - a function-based index on (MODEL_ID, GID).

It is important to have the proper set of RDF Network Indexes for your query workload. In versions 19c and earlier, the default index setup is PCSGM, PSCGM. In versions 21c and later the default index setup is PCSGM, SPCGM, CM, H.

The following are a few general recommendations for RDF Network Indexes:

  • Most SPARQL queries have triple patterns with bound predicates, so it is a good idea to have P, PC, and PS combinations covered as leading columns in your overall index set. Such a combination is captured by the default index setup (PCSGM, PSCGM in 19c, and PCSGM, SPCGM in 21c).
  • If you have queries with unbound predicates (for example, { ?s :ssn 1234 . ?s ?p ?o }), then a network index with a leading column other than P may be needed. An SPCGM index would be more suitable for this example because of the join on subject variable ?s.
  • If you are running DESCRIBE queries or DESCRIBE-style patterns such as { { <urn:abc> ?p1 ?o1 } UNION { ?s2 ?p2 <urn:abc> } }, then a network index with a leading C column (for example, CM) in addition to an index with a leading S column may be needed.
  • If you have named graph queries with selective FROM, FROM NAMED, or GRAPH clauses, then a network index with a leading G column may be needed (for example, GPCSM).
  • An H index is needed for efficient SPARQL Update GRAPH operations (for example, DROP GRAPH) on schema-private networks.
  • A PSCGM index is usually smaller than an SPCGM index due to better prefix compression, so if your workload does not include queries with unbound predicates, replacing an SPCGM index with a PSCGM index may give better performance.

1.7.13.14 Using RDF with Oracle Database In-Memory

RDF data stored in the RDF_LINK$ and RDF_VALUE$ tables can be loaded into memory using Oracle Database In-Memory. See RDF Support for Oracle Database In-Memory for details on how to load RDF data into memory using SEM_APIS procedures.

In general, for the best and most consistent performance with Oracle Database In-Memory, it is recommended to make indexes on the RDF_LINK$ (RDF network indexes) and RDF_VALUE$ tables invisible, with the exception of <NETWORK_NAME>#C_PK_VID and <NETWORK_NAME>#RDF_VAL_NAMETYLITLNG_IDX indexes on RDF_VALUE$. These index settings can be achieved with the following SQL commands (assuming a RDF network named NET1 owned by RDFUSER).

exec sem_apis.alter_rdf_indexes('VISIBILITY','N', network_owner=>'RDFUSER', network_name=>'NET1');

alter index NET1#C_PK_VID visible;

alter index NET1#RDF_VAL_NAMETYLITLNG_IDX visible;

Note that the performance of very selective queries may suffer with RDF_LINK$ indexes invisible, so you may need to experiment with index visibility depending on your query workload.

In addition to these index settings, it is recommended to use parallel query execution with Oracle Database In-Memory, as the speedup from parallelization can be significant in many cases.

For larger datasets (100 M triples or more), it is also recommended to use a hash-subpartitioned RDF network with Oracle Database In-Memory. Hash subpartitioning is described in RDF Networks.

1.7.13.15 Using Language Tags in FILTER Expressions

When filtering query results based on language tags, it is more efficient to use LANG instead of LANGMATCHES whenever possible. For example, the simple filter langMatches(lang(?x), "en") could be replaced with lang(?x) = "en" for a more efficient evaluation. Language tags in stored RDF literals are canonicalized to lower case, so a lower case language tag constant should be used in such filters.

1.7.13.16 Type Casting for More Efficient FILTER Evaluation

SPARQL FILTERs that compare two variables using operators other than equality, for example ?x < ?y, can have poor performance in some cases because of weak typing in SPARQL. Because datatypes for ?x and ?y cannot be determined at query compilation time, complex logic for comparisons of multiple datatypes must be used at run time.

If you know the datatypes of the values to which ?x and ?y will be bound, then it is best to cast ?x and ?y to those datatypes in your FILTER expression, so that the types will be known at query compilation time. For example, the following query casts salary values to xsd:decimal in the FILTER clause for a more efficient single-datatype comparison.

SELECT ?y
WHERE { 
  :emp1 :salary ?s1 . 
  ?y :salary ?s2 .
  FILTER (xsd:decimal(?s2) < xsd:decimal(?s1))
}

1.7.13.17 Spatial Indexing for GeoSPARQL Queries

Options used during spatial index creation can have significant effects on the performance of GeoSPARQL queries.

The two most important options are:

  • Type of index: function-based or materialized
  • Spatial reference system: SRID used for the index

SEM_APIS.ADD_DATATYPE_INDEX creates a function-based spatial index by default. A function-based index is adequate for simple point geometries, but you should use a materialized spatial index if your dataset contains polygon or line geometries. You can create a materialized spatial index by specifying MATERIALIZE=T in the options argument of SEM_APIS.ADD_DATATYPE_INDEX.

The SRID used for a spatial index is also important for performance. Oracle's GeoSPARQL implementation is very flexible in that it allows you to load geometry literals that have been encoded in different spatial reference systems. These geometries must be canonicalized to a single SRID for indexing and query evaluation. You can specify this canonical SRID at index creation time. For best performance, you must choose the SRID that is most common among your geometry literals to minimize required coordinate transformations.

See Indexing Spatial Data for more information on spatial index creation.

1.7.14 Special Considerations When Using SEM_MATCH

The following considerations apply to SPARQL queries executed using SEM_MATCH:

  • Value assignment

    • A compile-time error is raised when undefined variables are referenced in the source of a value assignment.

  • Grouping and aggregation

    • Non-grouping variables (query variables not used for grouping and therefore not valid for projection) cannot be reused as a target for value assignment.

    • Non-numeric values are ignored by the AVG and SUM aggregates.

    • By default, SEM_MATCH returns no rows for an aggregate query with a graph pattern that fails to match. The W3C specification requires a single, null row for this case. W3C-compliant behavior can be obtained with the STRICT_AGG_CARD=T query option for a small performance penalty.

  • ORDER BY

    • When using SPARQL ORDER BY in SEM_MATCH, the containing SQL query should be ordered by SEM$ROWNUM to ensure that the desired ordering is maintained through any enclosing SQL blocks.

  • Numeric computations

    • The native Oracle NUMBER type is used internally for all arithmetic operations, and the results of all arithmetic operations are serialized as xsd:decimal. Note that the native Oracle NUMBER type is more precise than both BINARY_FLOAT and BINARY_DOUBLE. See Oracle Database SQL Language Reference for more information on the NUMBER built-in data type.

    • Division by zero causes a runtime error instead of producing an unbound value.

  • Negation

    • EXISTS and NOT EXISTS filters that reference potentially unbound variables are not supported in the following contexts:

      • Non-aliased expressions in GROUP BY

      • Input to aggregates

      • Expressions in ORDER BY

      • FILTER expressions within OPTIONAL graph patterns that also reference variables that do not appear inside of the OPTIONAL graph pattern

      The first three cases can be realized by first assigning the result of the EXISTS or NOT EXISTS filter to a variable using a BIND clause or SELECT expression.

      These restrictions do not apply to EXISTS and NOT EXISTS filters that only reference definitely bound variables.

  • Blank nodes

    • Blank nodes are not supported within graph patterns.

    • The BNODE(literal) function returns the same blank node value every time it is called with the same literal argument.

  • Property paths

    • Unbounded operators + and * use a 10-hop depth limit by default for performance reasons. This behavior can be changed to a truly unbounded search by setting a depth limit of 0. See Property Paths for details.

  • Long literals (CLOBs)

    • SPARQL functions and aggregates do not support long literals by default.

    • Specifying the CLOB_EXP_SUPPORT=T query option enables long literal support for the following SPARQL functions: IF, COALESCE, STRLANG, STRDT, SUBSTR, STRBEFORE, STRAFTER, CONTAINS, STRLEN, STRSTARTS, STRENDS.

    • Specifying the CLOB_AGG_SUPPORT=T query option enables long literal support for the following aggregates: MIN, MAX, SAMPLE, GROUP_CONCAT.

  • Canonicalization of RDF literals

    • By default, RDF literals returned from SPARQL functions and constant RDF literals used in value assignment statements (BIND, SELECT expressions, GROUP BY expressions) are canonicalized. This behavior is consistent with the SPARQL 1.1 D-Entailment Regime.

    • Canonicalization can be disabled with the PROJ_EXACT_VALUES=T query option.