5.5 Searching for Documents Using SPARQL Query Patterns
Documents that are semantically indexed (that is, indexed using the mdsys.SemContext index type) can be searched using SEM_CONTAINS operator within a standard SQL query.
In the query, the SEM_CONTAINS operator must have at least two parameters, the first specifying the column in which the documents are stored and the second specifying the document search criteria expressed as a SPARQL query pattern, as in the following example:
SELECT docId FROM Newsfeed WHERE SEM_CONTAINS (article, '{ ?org rdf:type <http://www.example.com/classes/Organization> . ?org <http://example.com/pred/hasCategory> <http://www.example.com/category/BusinessFinance> }' )= 1;
The SPARQL query pattern specified with the SEM_CONTAINS operator is matched against the individual graphs corresponding to each document, and a document is considered to match a search criterion if the triples from the corresponding graph satisfy the query pattern. In the preceding example, the SPARQL query pattern identifies the individual graphs (thus, the documents) that refer to an Organization
that belong to BusinessFinance
category. The SQL query returns the rows corresponding to the matching documents in its result set. The preceding example assumes that the URIs used in the query are generated by the underlying extractor, and that you (the user searching for documents) are aware of the properties and terms that are generated by the extractor in use.
When you create an index using a dependent extractor policy that includes one or
more user-defined RDF graphs, the triples asserted in the user RDF graphs are considered to be
common to all the documents. Document searches involving such policies test the search
criteria against the triples in individual graphs corresponding to the documents, combined
with the triples in the user RDF graphs. For example, the following query identifies all
articles referring to organizations in the state of New Hampshire, using the geographical
ontology (geo_ontology
RDF graph from a preceding example) that maps cities
to states:
SELECT docId FROM Newsfeed WHERE SEM_CONTAINS (article, '{ ?org rdf:type class:Organization . ?org pred:hasLocation ?city . ?city geo:hasState state:NewHampshire }', 'SEM_EXTR_PLUS_GEOONT', sem_aliases( sem_alias('class', 'http://www.myorg.com/classes/'), sem_alias('pred', 'http://www.myorg.com/pred/'), sem_alias('geo', 'http://geoont.org/rel/'), sem_alias('state', 'http://geoont.org/state/'))) = 1;
The preceding query, with a reference to the extractor policy SEM_EXTR_PLUS_GEOONT (created in an example in Extractor Policies), combines the triples extracted from the indexed documents and the triples in the user RDF graph to find matching documents. In this example, the name of the extractor policy is optional if the corresponding index is created with just this policy or if this is the default extractor policy for the index. When the query pattern uses some qualified names, an optional parameter to the SEM_CONTAINS operator can specify the namespaces to be used for expanding the qualified names.
SPARQL-based document searches can make use of the SPARQL syntax that is supported through SEM_MATCH queries.
Parent topic: Semantic Indexing for Documents