User-Defined Inferencing

7.1 User-Defined Inferencing

The RDF Graph inference extension architecture enables you to add user-defined inferencing to the presupplied inferencing support.

Parent topic: User-Defined Inferencing and Querying

7.1.1 Problem Solved and Benefit Provided by User-Defined Inferencing

Before Oracle Database 12c Release 1 (12.1), the database inference engine provided native support for OWL 2 RL,RDFS, SKOS, SNOMED (core EL), and user-defined rules, which covered a wide range of applications and requirements. However, there was the limitation that no new RDF resources could be created as part of the rules deduction process.

As an example of the capabilities and the limitation before Oracle Database 12c Release 1 (12.1), consider the following straightforward inference rule:

?C   rdfs:subClassOf  ?D .
?x    rdf:type  ?C  .  ==>  ?x   rdf:type  ?D

The preceding rule says that any instance x of a subclass C will be an instance of C's superclass, D. The consequent part of the rule mentions two variables ?x and ?D. However, these variables must already exist in the antecedents of the rule, which further implies that these RDF resources must already exist in the knowledge base. In other words, for example, you can derive that John is a Student only if you know that John exists as a GraduateStudent and if an axiom specifies that the GraduateStudent class is a subclass of the Student class.

Another example of a limitation is that before Oracle Database 12c Release 1 (12.1), the inference functions did not support combining a person's first name and last name to produce a full name as a new RDF resource in the inference process. Specifically, this requirement can be captured as a rule like the following:

?x   :firstName  ?fn
?x   :lastName   ?ln  ==>  ?x  :fullName  concatenate(?fn ?ln)

Effective with Oracle Database 12c Release 1 (12.1), the RDF Graph inference extension architecture opens the inference process so that users can implement their own inference extension functions and integrate them into the native inference process. This architecture:

Supports rules that require the generation of new RDF resources.

Examples might include concatenation of strings or other string operations, mathematical calculations, and web service callouts.
Allows implementation of certain existing rules using customized optimizations.

Although the native OWL inference engine has optimizations for many rules and these rules work efficiently for a variety of large-scale ontologies, for some new untested ontologies a customized optimization of a particular inference component may work even better. In such a case, you can disable a particular inference component in the SEM_APIS.CREATE_INFERRED_GRAPH call and specify a customized inference extension function (using the inf_ext_user_func_name parameter) that implements the new optimization.
Allows the inference engine to be extended with sophisticated inference capabilities.

Examples might include integrating geospatial reasoning, time interval reasoning, and text analytical functions into the native database inference process.

Parent topic: User-Defined Inferencing

7.1.2 API Support for User-Defined Inferencing

The primary application programming interface (API) for user-defined inferencing is the SEM_APIS.CREATE_INFERRED_GRAPH procedure, specifically the last parameter:

inf_ext_user_func_name  IN VARCHAR2 DEFAULT NULL

The inf_ext_user_func_name parameter, if specified, identifies one or more user-defined inference functions that implement the specialized logic that you want to use.

User-Defined Inference Function Requirements

Parent topic: User-Defined Inferencing

7.1.2.1 User-Defined Inference Function Requirements

Each user-defined inference function that is specified in the inf_ext_user_func_name parameter in the call to the SEM_APIS.CREATE_INFERRED_GRAPH procedure must:

Have a name that starts with the following string: SEM_INF_
Be created with definer's rights, not invoker's rights. (For an explanation of definer's rights and invoker's rights, see Oracle AI Database Security Guide.)

The format of the user-defined inference function must be that shown in the following example for a hypothetical function named SEM_INF_EXAMPLE:

create or replace function sem_inf_example(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  pragma autonomous_transaction;
begin
  if (action = SDO_SEM_INFERENCE.INF_EXT_ACTION_START) then
    <... preparation work ...>  
  end if;
  if (action = SDO_SEM_INFERENCE.INF_EXT_ACTION_RUN) then
    <... actual inference logic ...>
    commit;
  end if;
  if (action = SDO_SEM_INFERENCE.INF_EXT_ACTION_END) then
    <... clean up ...> 
  end if;
return true;  -- succeed
end;
/
grant execute on sem_inf_example to <network_owner>;

In the user-defined function format, the optimization_flag output parameter can specify one or more Oracle-defined names that are associated with numeric values. You can specify one or more of the following:

SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NONE indicates that the inference engine should not enable any optimizations for the extension function. (This is the default behavior of the inference engine when the optimization_flag parameter is not set.)
SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_ALL_IDS indicates that all triples/quads inferred by the extension function use only resource IDs. In other words, the output_tab table only contains resource IDs (columns gid, sid, pid, and oid) and does not contain any lexical values (columns g, s, p, and o are all null). Enabling this optimization flag allows the inference engine to skip resource ID lookups.
SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NEWDATA_ONLY indicates that all the triples/quads inferred by the extension function are new and do not already exist in src_tab_view. Enabling this optimization flag allows the inference engine to skip checking for duplicates between the output_tab table and src_tab_view. Note that the src_tab_view contains triples/quads from previous rounds of reasoning, including triples/quads inferred from extension functions.
SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_UNIQDATA_ONLY indicates that all the triples/quads inferred by the extension function are unique and do not already exist in the output_tab table. Enabling this optimization flag allows the inference engine to skip checking for duplicates within the output_tab table (for example, no need to check for the same triple inferred twice by an extension function). Note that the output_tab table is empty at the beginning of each round of reasoning for an extension function, so uniqueness of the data must only hold for the current round of reasoning.
SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_IGNORE_NULL indicates that the inference engine should ignore an inferred triple or quad if the subject, predicate, or object resource is null. The inference engine considers a resource null if both of its columns in the output_tab table are null (for example, subject is null if the s and sid columns are both null). Enabling this optimization flag allows the inference engine to skip invalid triples/quads in the output_tab table. Note that the inference engine interprets null graph columns (g and gid) as the default graph.

To specify more than one value for the optimization_flag output parameter, use the plus sign (+) to concatenate the values. For example:

optimization_flag := SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_ALL_IDS +
                     SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NEWDATA_ONLY +
                     SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_UNIQDATA_ONLY;

For more information about using the optimization_flag output parameter, see Example 3: Optimizing Performance.

Parent topic: API Support for User-Defined Inferencing

7.1.3 User-Defined Inference Extension Function Examples

The following examples demonstrate how to use user-defined inference extension functions to create inferred graphs.

Example 1: Adding Static Triples, Example 2: Adding Dynamic Triples, and Example 3: Optimizing Performance cover the basics of user-defined inference extensions.

Example 1: Adding Static Triples and Example 2: Adding Dynamic Triples focus on adding new, inferred triples.

Example 3: Optimizing Performance focuses on optimizing performance.
Example 4: Temporal Reasoning (Several Related Examples) and Example 5: Spatial Reasoning demonstrate how to handle special data types efficiently by leveraging native Oracle types and operators.

Example 4: Temporal Reasoning (Several Related Examples) focuses on the xsd:dateTime data type.

Example 5: Spatial Reasoning focuses on geospatial data types.
Example 6: Calling a Web Service makes a web service call to the Oracle Geocoder service.

The first three examples assume that the RDF graph EMPLOYEES exists and contains the following RDF data, displayed in Turtle format:

:John   :firstName  "John"  ;
        :lastName   "Smith" .
 
:Mary   :firstName  "Mary"  ;
        :lastName   "Smith" ;
        :name       "Mary Smith" .
 
:Alice  :firstName  "Alice" .
 
:Bob    :firstName  "Bob" ;
        :lastName   "Billow" .

For requirements and guidelines for creating user-defined inference extension functions, see API Support for User-Defined Inferencing.

Parent topic: User-Defined Inferencing

7.1.3.1 Example 1: Adding Static Triples

The most basic method to infer new data in a user-defined inference extension function is adding static data. Static data does not depend on any existing data in an RDF graph. This is not a common case for a user-defined inference extension function, but it demonstrates the basics of adding triples to an inferred graph. Inserting static data is more commonly done during the preparation phase (that is, action='START') to expand on the existing ontology.

The following user-defined inference extension function (sem_inf_static) adds three static triples to an inferred graph:

-- this user-defined rule adds static triples
create or replace function sem_inf_static(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  query varchar2(4000);
  pragma autonomous_transaction;
begin
  if (action = 'RUN') then
    -- generic query we use to insert triples
    query := 
      'insert /*+ parallel append */ into ' || output_tab || 
      ' ( s,  p,  o) VALUES ' ||
      ' (:1, :2, :3) ';
 
    -- execute the query with different values
    execute immediate query using 
      '<http://example.org/S1>', '<http://example.org/P2>', '"O1"';
 
    execute immediate query using
      '<http://example.org/S2>', '<http://example.org/P2>', '"2"^^xsd:int';
 
    -- duplicate quad
    execute immediate query using
      '<http://example.org/S2>', '<http://example.org/P2>', '"2"^^xsd:int';
 
    execute immediate query using
      '<http://example.org/S3>', '<http://example.org/P3>', '"3.0"^^xsd:double';
 
    -- commit our changes
    commit;
  end if;
 
  -- return true to indicate success
  return true;
end sem_inf_static;
/
show errors;

The sem_inf_static function inserts new data by executing a SQL insert query, with output_tab as the target table for insertion. The output_tab table will only contain triples added by the sem_inf_static function during the current call (see the num_calls parameter). The inference engine will always call a user-defined inference extension function at least three times, once for each possible value of the action parameter ('START', 'RUN', and 'END'). Because sem_inf_static does not need to perform any preparation or cleanup, the function only adds data during the RUN phase. The extension function can be called more than once during the RUN phase, depending on the data inferred during the current round of reasoning.

Although the sem_inf_static function makes no checks for existing triples (to prevent duplicate triples), the inference engine will not generate duplicate triples in the resulting inferred graph. The inference engine will filter out duplicates from the output_tab table (the data inserted by the extension function) and from the final inferred graph (the RDF graph or RDF graphs and other inferred data). Setting the appropriate optimization flags (using the optimization_flag parameter) will disable this convenience feature and improve performance. (See Example 3: Optimizing Performance for more information about optimization flags.)

Although the table definition for output_tab shows a column for graph names, the inference engine will ignore and override all graph names on triples added by extension functions when performing Global Inference (default behavior of SEM_APIS.CREATE_INFERRED_GRAPH) and Named Graph Global Inference (NGGI). To add triples to specific named graphs in a user-defined extension function, use NGLI (Named Graph Local Inference). During NGLI, all triples must belong to a named graph (that is, the gid and g columns of output_tab cannot both be null).

The network owner must have execute privileges on the sem_inf_static function to use the function for reasoning. The following example shows how to grant the appropriate privileges on the sem_inf_static function and create an inferred graph using the function (along with OWLPRIME inference logic):

-- grant appropriate privileges
grant execute on sem_inf_static to RDFUSER;
 
-- create the inferred graph
begin
  sem_apis.create_inferred_graph( 
    'EMPLOYEES_INF'
  , sem_models('EMPLOYEES')
  , sem_rulebases('OWLPRIME')
  , passes => SEM_APIS.REACH_CLOSURE
  , inf_ext_user_func_name => 'sem_inf_static' 
  , network_owner=>'RDFUSER'
  , network_name=>'NET1' 
);
end;
/

The following example displays the newly entailed data:

-- formatting
column s format a23;
column p format a23;
column o format a23;
set linesize 100;
 
-- show results
select s, p, o from table(SEM_MATCH(
    'select ?s ?p ?o where { ?s ?p ?o } order by ?s ?p ?o'
  , sem_models('EMPLOYEES')
  , sem_rulebases('OWLPRIME')
  , null, null, null
  , 'INF_ONLY=T'
  , network_owner=>'RDFUSER'
  , network_name=>'NET1'));

The preceding query returns the three unique static triples added by sem_inf_static, with no duplicates:

S                      P                      O
---------------------- ---------------------- -----------------------
http://example.org/S1  http://example.org/P2  O1
http://example.org/S2  http://example.org/P2  2
http://example.org/S3  http://example.org/P3  3E0

Parent topic: User-Defined Inference Extension Function Examples

7.1.3.2 Example 2: Adding Dynamic Triples

Adding static data is useful, but it is usually done during the preparation (that is, action='START') phase. Adding dynamic data involves looking at existing data in the RDF graph and generating new data based on the existing data. This is the most common case for a user-defined inference extension function.

The following user-defined inference extension function (sem_inf_dynamic) concatenates the first and last names of employees to create a new triple that represents the full name.

-- this user-defined rule adds static triples
create or replace function sem_inf_dynamic(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  firstNamePropertyId number;
  lastNamePropertyId  number;
  fullNamePropertyId  number;
 
  sqlStmt    varchar2(4000);
  insertStmt varchar2(4000);
  pragma autonomous_transaction;
begin
  if (action = 'RUN') then
    -- retrieve ID of resource that already exists in the data (will
    -- throw exception if resource does not exist). These will improve
    -- performance of our SQL queries. 
    firstNamePropertyId := sdo_sem_inference.oracle_orardf_res2vid('http://example.org/firstName');
    lastNamePropertyId  := sdo_sem_inference.oracle_orardf_res2vid('http://example.org/lastName');
    fullNamePropertyId  := sdo_sem_inference.oracle_orardf_res2vid('http://example.org/name');
 
    -- SQL query to find all employees and their first and last names
    sqlStmt :=
      'select ids1.sid employeeId,
              values1.value_name firstName,
              values2.value_name lastName
       from   ' || resource_id_map_view || ' values1,
              ' || resource_id_map_view || ' values2,
              ' || src_tab_view || '         ids1,
              ' || src_tab_view || '         ids2
       where  ids1.sid = ids2.sid 
         AND  ids1.pid = ' || to_char(firstNamePropertyId,'TM9') || ' 
         AND  ids2.pid = ' || to_char(lastNamePropertyId,'TM9')  || ' 
         AND  ids1.oid = values1.value_id 
         AND  ids2.oid = values2.value_id 
       /* below ensures we have NEWDATA (a no duplicate optimization flag) */
         AND  not exists
               (select 1 
                from   ' || src_tab_view || ' 
                where  sid = ids1.sid AND 
                       pid = ' || to_char(fullNamePropertyId,'TM9') || ')';
 
    -- create the insert statement that concatenates the first and
    -- last names from our sqlStmt into a new triple.
    insertStmt :=
      'insert /*+ parallel append */ 
       into ' || output_tab || ' (sid, pid, o) 
       select employeeId, ' || to_char(fullNamePropertyId,'TM9') || ', ''"'' || firstName || '' '' || lastName ||  ''"''
       from   (' || sqlStmt || ')';
 
    -- execute the insert statement
    execute immediate insertStmt;
 
    -- commit our changes
    commit;
 
    -- set our optimization flags indicating we already checked for
    -- duplicates in the RDF graph (src_tab_view)
    optimization_flag := SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NEWDATA_ONLY;
  end if;
 
  -- return true to indicate success
  return true;
end sem_inf_dynamic;
/
show errors;

The sem_inf_dynamic function inserts new data using two main steps. First, the function builds a SQL query that collects all first and last names from the existing data. The sqlStmt variable stores this SQL query. Next, the function inserts new triples based on the first and last names it collects, to form a full name for each employee. The insertStmt variable stores this SQL query. Note that the insertStmt query includes the sqlStmt query because it is performing an INSERT with a subquery.

The sqlStmt query performs a join across two main views: the resource view (resource_id_map_view) and the existing data view (src_tab_view). The existing data view contains all existing triples but stores the values of those triples using numeric IDs instead of lexical values. Because the sqlStmt query must extract the lexical values of the first and last names of an employee, it joins with the resource view twice (once for the first name and once for the last name).

The sqlStmt query contains the PARALLEL SQL hint to help improve performance. Parallel execution on a balanced hardware configuration can significantly improve performance. (See Example 3: Optimizing Performance for more information.)

The insertStmt query also performs a duplicate check to avoid adding a triple if it already exists in the existing data view (src_tab_view). The function indicates it has performed this check by enabling the INF_EXT_OPT_FLAG_NEWDATA_ONLY optimization flag. Doing the check inside the extension function improves overall performance of the reasoning. Note that the existing data view does not contain the new triples currently being added by the sem_inf_dynamic function, so duplicates may still exist within the output_tab table. If the sem_inf_dynamic function additionally checked for duplicates within the output_tab table, then it could also enable the INF_EXT_OPT_FLAG_UNIQUEDATA_ONLY optimization flag.

Both SQL queries use numeric IDs of RDF resources to perform their joins and inserts. Using IDs instead of lexical values improves the performance of the queries. The sem_inf_dynamic function takes advantage of this performance benefit by looking up the IDs of the lexical values it plans to use. In this case, the function looks up three URIs representing the first name, last name, and full name properties. If the sem_inf_dynamic function inserted all new triples purely as IDs, then it could enable the INF_EXT_OPT_FLAG_ALL_IDS optimization flag. For this example, however, the new triples each contain a single, new, lexical value: the full name of the employee.

To create an inferred graph with the sem_inf_dynamic function, grant execution privileges to the network owner, then pass the function name to the SEM_APIS.CREATE_INFERRED_GRAPH procedure, as follows:

-- grant appropriate privileges
grant execute on sem_inf_dynamic to RDFUSER;
 
-- create the inferred graph
begin
  sem_apis.create_inferred_graph( 
    'EMPLOYEES_INF'
  , sem_models('EMPLOYEES')
  , sem_rulebases('OWLPRIME')
  , passes => SEM_APIS.REACH_CLOSURE
  , inf_ext_user_func_name => 'sem_inf_dynamic' 
  , network_owner=>'RDFUSER'
  , network_name=>'NET1'
);
end;
/

The inferred graph should contain the following two new triples added by sem_inf_dynamic:

S                        P                        O
------------------------ ------------------------ -----------------------
http://example.org/Bob   http://example.org/name  Bob Billow
http://example.org/John  http://example.org/name  John Smith

Note that the sem_inf_dynamic function in the preceding example did not infer a full name for Mary Smith, because Mary Smith already had their full name specified in the existing data.

Parent topic: User-Defined Inference Extension Function Examples

7.1.3.3 Example 3: Optimizing Performance

Several techniques can improve the performance of an inference extension function. One such technique is to use the numeric IDs of resources rather than their lexical values in queries. By only using resource IDs, the extension function avoids having to join with the resource view (resource_id_map_view), and this can greatly improve query performance. Inference extension functions can obtain additional performance benefits by also using resource IDs when adding new triples to the output_tab table (that is, using only using the gid, sid, pid, and oid columns of the output_tab table).

The following user-defined inference extension function (sem_inf_related) infers a new property, :possibleRelative, for employees who share the same last name. The SQL queries for finding such employees use only resource IDs (no lexical values, no joins with the resource view). Additionally, the inference extension function in this example inserts the new triples using only resource IDs, allowing the function to enable the INF_EXT_OPT_FLAG_ALL_IDS optimization flag.

-- this user-defined rule adds static triples
create or replace function sem_inf_related(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  lastNamePropertyId  number;
  relatedPropertyId   number;
 
  sqlStmt    varchar2(4000);
  insertStmt varchar2(4000);
  pragma autonomous_transaction;
begin
  if (action = 'RUN') then
    -- retrieve ID of resource that already exists in the data (will
    -- throw exception if resource does not exist).
    lastNamePropertyId := sdo_sem_inference.oracle_orardf_res2vid('http://example.org/lastName');
 
    -- retreive ID of resource or generate a new ID if resource does
    -- not already exist
    relatedPropertyId := sdo_sem_inference.oracle_orardf_add_res('http://example.org/possibleRelative');
 
    -- SQL query to find all employees that share a last name
    sqlStmt :=
      'select ids1.sid employeeId,
              ids2.sid relativeId
       from   ' || src_tab_view || '         ids1,
              ' || src_tab_view || '         ids2
       where  ids1.pid = ' || to_char(lastNamePropertyId,'TM9') || ' 
         AND  ids2.pid = ' || to_char(lastNamePropertyId,'TM9') || ' 
         AND  ids1.oid  = ids2.oid 
       /* avoid employees related to themselves */
         AND  ids1.sid != ids2.sid 
       /* below ensures we have NEWDATA (a no duplicate optimization flag) */
         AND  not exists
               (select 1 
                from   ' || src_tab_view || ' 
                where  sid = ids1.sid 
                  AND  pid = ' || to_char(relatedPropertyId,'TM9') || ' 
                  AND  oid = ids2.sid) 
       /* below ensures we have UNIQDATA (a no duplicate optimization flag) */
         AND  not exists
               (select 1 
                from   ' || output_tab || ' 
                where  sid = ids1.sid 
                  AND  pid = ' || to_char(relatedPropertyId,'TM9') || ' 
                  AND  oid = ids2.sid)';
 
    -- create the insert statement that only uses resource IDs
    insertStmt :=
      'insert /*+ parallel append */ 
       into ' || output_tab || ' (sid, pid, oid) 
       select employeeId, ' || to_char(relatedPropertyId,'TM9') || ', relativeId
       from   (' || sqlStmt || ')';
 
    -- execute the insert statement
    execute immediate insertStmt;
 
    -- commit our changes
    commit;
 
    -- set flag indicating our new triples
    --   1) are specified using only IDs
    --   2) produce no duplicates with the RDF graph (src_tab_view)
    --   3) produce no duplicates in the output (output_tab)
    optimization_flag := SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_ALL_IDS +
                         SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NEWDATA_ONLY +
                         SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_UNIQDATA_ONLY;
  end if;
 
  -- return true to indicate success
  return true;
end sem_inf_related;
/
show errors;

The sem_inf_related function has a few key differences from previous examples. First, the sem_inf_related function queries purely with resource IDs and inserts new triples using only resource IDs. Because all the added triples in the output_tab table only use resource IDs, the function can enable the INF_EXT_OPT_FLAG_ALL_IDS optimization flag. For optimal performance, functions should try to use resource IDs over lexical values. However, sometimes this is not possible, as in Example 2: Adding Dynamic Triples, which concatenates lexical values to form a new lexical value. Note that in cases like Example 2: Adding Dynamic Triples, it is usually better to join with the resource view (resource_id_map_view) than to embed calls to oracle_orardf_res2vid within the SQL query. This is due to the overhead of calling the function for each possible match as opposed to joining with another table.

Another key difference in the sem_inf_related function is the use of the oracle_orardf_add_res function (compared to oracle_orardf_res2vid). Unlike the res2vid function, the add_res function will add a resource to the resource view (resource_id_map_view) if the resource does not already exist. Inference extensions functions should use the add_res function if adding the resource to the resource view is not a concern. Calling the function multiple times will not generate duplicate entries in the resource view.

The last main difference is the additional NOT EXISTS clause in the SQL query. The first NOT EXISTS clause avoids adding any triples that may be duplicates of triples already in the RDF graph or triples inferred by other rules (src_tab_view). Checking for these duplicates allows sem_inf_related to enable the INF_EXT_OPT_FLAG_NEWDATA_ONLY optimization flag. The second NOT EXISTS clause avoids adding triples that may be duplicates of triples already added by the sem_inf_related function to the output_tab table during the current round of reasoning (see the num_calls parameter). Checking for these duplicates allows sem_inf_related to enable the INF_EXT_OPT_FLAG_UNIQDATA_ONLY optimization flag.

Like the sem_inf_dynamic example, sem_inf_related example uses a PARALLEL SQL query hint in its insert statement. Parallel execution on a balanced hardware configuration can significantly improve performance. For a data-intensive application, a good I/O subsystem is usually a critical component to the performance of the whole system.

-- grant appropriate privileges
grant execute on sem_inf_related to RDFUSER;

-- create the inferred graph
begin
  sem_apis.create_inferred_graph( 
    'EMPLOYEES_INF'
  , sem_models('EMPLOYEES')
  , sem_rulebases('OWLPRIME')
  , passes => SEM_APIS.REACH_CLOSURE
  , inf_ext_user_func_name => 'sem_inf_related'
  , network_owner=>'RDFUSER'
  , network_name=>'NET1'
);
end;
/

The inferred graph should contain the following two new triples added by sem_inf_related:

S                        P                                    O
------------------------ ------------------------------------ ------------------------
http://example.org/John  http://example.org/possibleRelative  http://example.org/Mary
http://example.org/Mary  http://example.org/possibleRelative  http://example.org/John

Parent topic: User-Defined Inference Extension Function Examples

7.1.3.4 Example 4: Temporal Reasoning (Several Related Examples)

User-defined extension functions enable you to better leverage certain data types (like xsd:dateTime) in the triples. For example, with user-defined extension functions, it is possible to infer relationships between triples based on the difference between two xsd:dateTime values. The three examples in this section explore two different temporal reasoning rules and how to combine them into one inferred graph. The examples assume the models EVENT and EVENT_ONT exist and contain the following RDF data:

EVENT_ONT

@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix :     <http://example.org/event/> .
 
# we model two types of events
:Meeting      rdfs:subClassOf :Event .
:Presentation rdfs:subClassOf :Event .
 
# events have topics
:topic        rdfs:domain     :Event .
 
# events have start and end times 
:startTime    rdfs:domain     :Event ;
              rdfs:range      xsd:dateTime .
:endTime      rdfs:domain     :Event ;
              rdfs:range      xsd:dateTime .
 
# duration (in minutes) of an event
:lengthInMins rdfs:domain      :Event ;
              rdfs:range       xsd:integer .
 
# overlaps property identifies conflicting events
:overlaps     rdfs:domain      :Event ;
              rdf:type         owl:SymmetricProperty .
:noOverlap    rdfs:domain      :Event ;
              rdf:type         owl:SymmetricProperty .

EVENT_TBOX

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix :    <http://example.org/event/> .
 
:m1 rdf:type   :Meeting ;
    :topic     "Beta1 launch" ;
    :startTime "2012-04-01T09:30:00-05:00"^^xsd:dateTime ;
    :endTime   "2012-04-01T11:00:00-05:00"^^xsd:dateTime .
 
:m2 rdf:type   :Meeting ;
    :topic     "Standards compliance" ;
    :startTime "2012-04-01T12:30:00-05:00"^^xsd:dateTime ;
    :endTime   "2012-04-01T13:30:00-05:00"^^xsd:dateTime .
 
:p1 rdf:type   :Presentation ;
    :topic     "OWL Reasoners" ;
    :startTime "2012-04-01T11:00:00-05:00"^^xsd:dateTime ;
    :endTime   "2012-04-01T13:00:00-05:00"^^xsd:dateTime .

The examples are as follow.

Parent topic: User-Defined Inference Extension Function Examples

7.1.3.4.1 Example 4a: Duration Rule

The following user-defined inference extension function (sem_inf_durations) infers the duration in minutes of events, given the start and end times of an event. For example, an event starting at 9:30 AM and ending at 11:00 AM has duration of 90 minutes. The following extension function extracts the start and end times for each event, converts the xsd:dateTime values into Oracle timestamps, then computes the difference between the timestamps. Notice that this extension function can handle time zones.

create or replace function sem_inf_durations(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  eventClassId        number;
  rdfTypePropertyId   number;
  startTimePropertyId number;
  endTimePropertyId   number;
  durationPropertyId  number;
 
  xsdTimeFormat       varchar2(100);
  sqlStmt             varchar2(4000);
  insertStmt          varchar2(4000);
 
  pragma autonomous_transaction;
begin
  if (action = 'RUN') then
    -- retrieve ID of resource that already exists in the data (will
    -- throw exception if resource does not exist).
    eventClassId        := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/Event', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    startTimePropertyId := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/startTime', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    endTimePropertyId   := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/endTime', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    durationPropertyId  := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/lengthInMins', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    rdfTypePropertyId   := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
 
    -- set the TIMESTAMP format we will use to parse XSD times
    xsdTimeFormat := 'YYYY-MM-DD"T"HH24:MI:SSTZH:TZM';
 
    -- query we use to extract the event ID and start/end times.  
    sqlStmt := 
      'select ids1.sid eventId,
              TO_TIMESTAMP_TZ(values1.value_name,''YYYY-MM-DD"T"HH24:MI:SSTZH:TZM'') startTime,
              TO_TIMESTAMP_TZ(values2.value_name,''YYYY-MM-DD"T"HH24:MI:SSTZH:TZM'') endTime
       from   ' || resource_id_map_view || ' values1,
              ' || resource_id_map_view || ' values2,
              ' || src_tab_view || '         ids1,
              ' || src_tab_view || '         ids2,
              ' || src_tab_view || '         ids3
       where  ids1.sid = ids3.sid 
         AND  ids3.pid = ' || to_char(rdfTypePropertyId,'TM9') || ' 
         AND  ids3.oid = ' || to_char(eventClassId,'TM9')      || ' 
         AND  ids1.sid = ids2.sid 
         AND  ids1.pid = ' || to_char(startTimePropertyId,'TM9') || ' 
         AND  ids2.pid = ' || to_char(endTimePropertyId,'TM9')   || ' 
         AND  ids1.oid = values1.value_id 
         AND  ids2.oid = values2.value_id 
       /* ensures we have NEWDATA */
         AND  not exists
               (select 1 
                from   ' || src_tab_view || ' 
                where  sid = ids3.sid 
                  AND  pid = ' || to_char(durationPropertyId,'TM9') || ') 
       /* ensures we have UNIQDATA */
         AND  not exists
               (select 1 
                from   ' || output_tab || ' 
                where  sid = ids3.sid 
                  AND  pid = ' || to_char(durationPropertyId,'TM9') || ')';
 
    -- compute the difference (in minutes) between the two Oracle
    -- timestamps from our sqlStmt query.  Store the minutes as
    -- xsd:integer.
    insertStmt :=
      'insert /*+ parallel append */ into ' || output_tab || ' (sid, pid, o) 
       select eventId, 
              ' || to_char(durationPropertyId,'TM9') || ', 
              ''"'' || minutes || ''"^^xsd:integer''
       from   (
         select eventId,
                (extract(day    from (endTime - startTime))*24*60 +
                 extract(hour   from (endTime - startTime))*60 +
                 extract(minute from (endTime - startTime))) minutes
         from   (' || sqlStmt || '))';
 
    -- execute the query
    execute immediate insertStmt;
 
    -- commit our changes
    commit;
  end if;
 
  -- we already checked for duplicates in src_tab_view (NEWDATA) and
  -- in output_tab (UNIQDATA)
  optimization_flag := SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NEWDATA_ONLY +
                       SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_UNIQDATA_ONLY;
 
  -- return true to indicate success
  return true;
 
  -- handle any exceptions
  exception 
    when others then
      diag_message := 'error occurred: ' || SQLERRM;
      return false;
end sem_inf_durations;
/
show errors;

The sem_inf_durations function leverages built-in Oracle temporal functions to compute the event durations. First, the function converts the xsd:dateTime literal value to an Oracle TIMESTAMP object using the TO_TIMESTAMP_TZ function. Taking the difference between two Oracle TIMESTAMP objects produces an INTERVAL object that represents a time interval. Using the EXTRACT operator, the sem_inf_durations function computes the duration of each event in minutes by extracting the days, hours, and minutes out of the duration intervals.

Because the sem_inf_durations function checks for duplicates against both data in the existing model (src_tab_view) and data in the output_tab table, it can enable the INF_EXT_OPT_FLAG_NEWDATA_ONLY and INF_EXT_OPT_FLAG_UNIQDATA_ONLY optimization flags. (See Example 3: Optimizing Performance for more information about optimization flags.)

Notice that unlike previous examples, sem_inf_durations contains an exception handler. Exception handlers are useful for debugging issues in user-defined inference extension functions. To produce useful debugging messages, catch exceptions in the extension function, set the diag_message parameter to reflect the error, and return FALSE to indicate that an error occurred during execution of the extension function. The sem_inf_durations function catches all exceptions and sets the diag_message value to the exception message.

To create an inferred graph with the sem_inf_durations function, grant execution privileges to RDFUSER, then pass the function name to the SEM_APIS.CREATE_INFERRED_GRAPH procedure, as follows:

-- grant appropriate privileges
grant execute on sem_inf_durations to RDFUSER;
 
-- create the inferred graph
begin
  sem_apis.create_inferred_graph( 
    'EVENT_INF'
  , sem_models('EVENT', 'EVENT_ONT')
  , sem_rulebases('OWLPRIME')
  , passes => SEM_APIS.REACH_CLOSURE
  , inf_ext_user_func_name => 'sem_inf_durations'
  , network_owner=>'RDFUSER'
  , network_name=>'NET1'
);
end;
/

In addition to the triples inferred by OWLPRIME, the inferred graph should contain the following three new triples added by sem_inf_durations:

S                            P                                      O
---------------------------- -------------------------------------- ---------
http://example.org/event/m1  http://example.org/event/lengthInMins  90
http://example.org/event/m2  http://example.org/event/lengthInMins  60
http://example.org/event/p1  http://example.org/event/lengthInMins  120

Parent topic: Example 4: Temporal Reasoning (Several Related Examples)

7.1.3.4.2 Example 4b: Overlap Rule

The following user-defined inference extension function (sem_inf_overlap) infers whether two events overlap. Two events overlap if one event starts while the other event is in progress. The function extracts the start and end times for every pair of events, converts the xsd:dateTime values into Oracle timestamps, then computes whether one event starts within the other.

create or replace function sem_inf_overlap(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  eventClassId        number;
  rdfTypePropertyId   number;
  startTimePropertyId number;
  endTimePropertyId   number;
  overlapsPropertyId  number;
  noOverlapPropertyId number;
 
  xsdTimeFormat       varchar2(100);
  sqlStmt             varchar2(4000);
  insertStmt          varchar2(4000);
 
  pragma autonomous_transaction;
begin
  if (action = 'RUN') then
    -- retrieve ID of resource that already exists in the data (will
    -- throw exception if resource does not exist).
    eventClassId        := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/Event', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    startTimePropertyId := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/startTime', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    endTimePropertyId   := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/endTime', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    overlapsPropertyId  := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/overlaps', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    noOverlapPropertyId := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://example.org/event/noOverlap', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
    rdfTypePropertyId   := sdo_sem_inference.oracle_orardf_res2vid(
                             'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 
                             p_network_owner=>'RDFUSER', 
                             p_network_name=>'NET1');
 
    -- set the TIMESTAMP format we will use to parse XSD times
    xsdTimeFormat := 'YYYY-MM-DD"T"HH24:MI:SSTZH:TZM';
 
    -- query we use to extract the event ID and start/end times.  
    sqlStmt := 
      'select idsA1.sid eventAId,
              idsB1.sid eventBId,
              TO_TIMESTAMP_TZ(valuesA1.value_name,''YYYY-MM-DD"T"HH24:MI:SSTZH:TZM'') startTimeA,
              TO_TIMESTAMP_TZ(valuesA2.value_name,''YYYY-MM-DD"T"HH24:MI:SSTZH:TZM'') endTimeA,
              TO_TIMESTAMP_TZ(valuesB1.value_name,''YYYY-MM-DD"T"HH24:MI:SSTZH:TZM'') startTimeB,
              TO_TIMESTAMP_TZ(valuesB2.value_name,''YYYY-MM-DD"T"HH24:MI:SSTZH:TZM'') endTimeB
       from   ' || resource_id_map_view || ' valuesA1,
              ' || resource_id_map_view || ' valuesA2,
              ' || resource_id_map_view || ' valuesB1,
              ' || resource_id_map_view || ' valuesB2,
              ' || src_tab_view || '         idsA1,
              ' || src_tab_view || '         idsA2,
              ' || src_tab_view || '         idsA3,
              ' || src_tab_view || '         idsB1,
              ' || src_tab_view || '         idsB2,
              ' || src_tab_view || '         idsB3
       where  idsA1.sid = idsA3.sid 
         AND  idsA3.pid = ' || to_char(rdfTypePropertyId,'TM9') || ' 
         AND  idsA3.oid = ' || to_char(eventClassId,'TM9')      || ' 
         AND  idsB1.sid = idsB3.sid 
         AND  idsB3.pid = ' || to_char(rdfTypePropertyId,'TM9') || ' 
         AND  idsB3.oid = ' || to_char(eventClassId,'TM9')      || ' 
       /* only do half the checks, our TBOX ontology will handle symmetries */
         AND  idsA1.sid < idsB1.sid                   
       /* grab values of startTime and endTime for event A */
         AND  idsA1.sid = idsA2.sid 
         AND  idsA1.pid = ' || to_char(startTimePropertyId,'TM9') || ' 
         AND  idsA2.pid = ' || to_char(endTimePropertyId,'TM9')   || ' 
         AND  idsA1.oid = valuesA1.value_id 
         AND  idsA2.oid = valuesA2.value_id 
       /* grab values of startTime and endTime for event B */
         AND  idsB1.sid = idsB2.sid 
         AND  idsB1.pid = ' || to_char(startTimePropertyId,'TM9') || ' 
         AND  idsB2.pid = ' || to_char(endTimePropertyId,'TM9')   || ' 
         AND  idsB1.oid = valuesB1.value_id 
         AND  idsB2.oid = valuesB2.value_id 
       /* ensures we have NEWDATA */
         AND  not exists
               (select 1 
                from   ' || src_tab_view || ' 
                where  sid = idsA1.sid  
                  AND  oid = idsB1.sid 
                  AND  pid in (' || to_char(overlapsPropertyId,'TM9')  || ',' || 
                                    to_char(noOverlapPropertyId,'TM9') || ')) 
       /* ensures we have UNIQDATA */
         AND  not exists
               (select 1
                from   ' || output_tab   || '
                where  sid = idsA1.sid 
                  AND  oid = idsB1.sid 
                  AND  pid in (' || to_char(overlapsPropertyId,'TM9')  || ',' || 
                                    to_char(noOverlapPropertyId,'TM9') || '))';
 
    -- compare the two event times
    insertStmt :=
      'insert /*+ parallel append */ into ' || output_tab || ' (sid, pid, oid) 
       select eventAId, overlapStatusId, eventBId
       from   (
         select eventAId,
                (case 
                 when (startTimeA < endTimeB and 
                       startTimeA > startTimeB) then
                   ' || to_char(overlapsPropertyId,'TM9') || '
                 when (startTimeB < endTimeA and
                       startTimeB > startTimeA) then
                   ' || to_char(overlapsPropertyId,'TM9') || '
                 else
                   ' || to_char(noOverlapPropertyId,'TM9') || '
                 end) overlapStatusId,
                 eventBId
         from   (' || sqlStmt || '))';
 
    -- execute the query
    execute immediate insertStmt;
 
    -- commit our changes
    commit;
  end if;
 
  -- we only use ID values in the output_tab and we check for
  -- duplicates with our NOT EXISTS clause.
  optimization_flag := SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_ALL_IDS +
                       SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NEWDATA_ONLY +
                       SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_UNIQDATA_ONLY;
 
  -- return true to indicate success
  return true;
 
  -- handle any exceptions
  exception 
    when others then
      diag_message := 'error occurred: ' || SQLERRM;
      return false;
end sem_inf_overlap;
/
show errors;

The sem_inf_overlap function is similar to the sem_inf_durations function in Example 4b: Overlap Rule. The main difference between the two is that the query in sem_inf_overlap contains more joins and enables the INF_EXT_OPT_FLAG_ALL_IDS optimization flag because it does not need to generate new lexical values. (See Example 3: Optimizing Performance for more information about optimization flags.)

To create an inferred graph with the sem_inf_overlap function, grant execution privileges to RDFUSER, then pass the function name to the SEM_APIS.CREATE_INFERRED_GRAPH procedure, as follows:

-- grant appropriate privileges
grant execute on sem_inf_overlap to RDFUSER;
 
-- create the inferred graph
begin
  sem_apis.create_inferred_graph( 
    'EVENT_INF'
  , sem_models('EVENT', 'EVENT_ONT')
  , sem_rulebases('OWLPRIME')
  , passes => SEM_APIS.REACH_CLOSURE
  , inf_ext_user_func_name => 'sem_inf_overlap'
  , network_owner=>'RDFUSER'
  , network_name=>'NET1'
);
end;
/

In addition to the triples inferred by OWLPRIME, the inferred graph should contain the following six new triples added by sem_inf_overlap:

S                            P                                   O
---------------------------- ----------------------------------- ----------------------------
http://example.org/event/m1  http://example.org/event/noOverlap  http://example.org/event/m2
http://example.org/event/m1  http://example.org/event/noOverlap  http://example.org/event/p1
http://example.org/event/m2  http://example.org/event/noOverlap  http://example.org/event/m1
http://example.org/event/m2  http://example.org/event/overlaps   http://example.org/event/p1
http://example.org/event/p1  http://example.org/event/noOverlap  http://example.org/event/m1
http://example.org/event/p1  http://example.org/event/overlaps   http://example.org/event/m2

Parent topic: Example 4: Temporal Reasoning (Several Related Examples)

7.1.3.4.3 Example 4c: Duration and Overlap Rules

The example in this section uses the extension functions from Example 4a: Duration Rule (sem_inf_durations) and Example 4b: Overlap Rule (sem_inf_overlap) together to produce a single inferred graph. The extension functions are left unmodified for this example.

To create an inferred graph using multiple extension functions, use a comma to separate each extension function passed to the inf_ext_user_func_name parameter of SEM_APIS.CREATE_INFERRED_GRAPH. The following example assumes that the RDFUSER has already been granted the appropriate privileges on the extension functions.

-- use multiple user-defined inference functions
begin
  sem_apis.create_inferred_graph( 
    'EVENT_INF'
  , sem_models('EVENT', 'EVENT_ONT')
  , sem_rulebases('OWLPRIME')
  , passes => SEM_APIS.REACH_CLOSURE
  , inf_ext_user_func_name => 'sem_inf_durations,sem_inf_overlap'
  , network_owner=>'RDFUSER'
  , network_name=>'NET1'
);
end;
/

In addition to the triples inferred by OWLPRIME, the inferred graph should contain the following nine new triples added by sem_inf_durations and sem_inf_overlap:

S                            P                                      O
---------------------------- -------------------------------------- ----------------------------
http://example.org/event/m1  http://example.org/event/lengthInMins  90
http://example.org/event/m1  http://example.org/event/noOverlap     http://example.org/event/m2
http://example.org/event/m1  http://example.org/event/noOverlap     http://example.org/event/p1
http://example.org/event/m2  http://example.org/event/lengthInMins  60
http://example.org/event/m2  http://example.org/event/noOverlap     http://example.org/event/m1
http://example.org/event/m2  http://example.org/event/overlaps      http://example.org/event/p1
http://example.org/event/p1  http://example.org/event/lengthInMins  120
http://example.org/event/p1  http://example.org/event/noOverlap     http://example.org/event/m1
http://example.org/event/p1  http://example.org/event/overlaps      http://example.org/event/m2

Notice that the extension functions, sem_inf_durations and sem_inf_overlap, did not need to use the same optimization flags. It is possible to use extension functions with contradictory optimization flags (for example, one function using INF_EXT_OPT_FLAG_ALL_IDS and another function inserting all new triples as lexical values).

Parent topic: Example 4: Temporal Reasoning (Several Related Examples)

7.1.3.5 Example 5: Spatial Reasoning

User-defined inference extension functions can also leverage geospatial data types, like WKT (well-known text), to perform spatial reasoning. For example, with user-defined extension functions, it is possible to infer a "contains" relationship between geometric entities, such as states and cities.

The example in this section demonstrates how to infer whether a geometry (a US state) contains a point (a US city). This example assumes the RDF network already has a spatial index (described in section 1.6.6.2). This example also assumes the RDF graph STATES exists and contains the following RDF data:

@prefix orageo: <http://xmlns.oracle.com/rdf/geo/> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix :       <http://example.org/geo/> .
 
:Colorado rdf:type  :State ;
          :boundary "Polygon((-109.0448 37.0004, -102.0424 36.9949, -102.0534 41.0006, -109.0489 40.9996, -109.0448 37.0004))"^^orageo:WKTLiteral .
:Utah     rdf:type  :State ;
          :boundary "Polygon((-114.0491 36.9982, -109.0462 37.0026, -109.0503 40.9986, -111.0471 41.0006, -111.0498 41.9993, -114.0395 41.9901, -114.0491 36.9982))"^^orageo:WKTLiteral .
:Wyoming  rdf:type  :State ;
          :boundary "Polygon((-104.0556 41.0037, -104.0584 44.9949, -111.0539 44.9998, -111.0457 40.9986, -104.0556 41.0037))"^^orageo:WKTLiteral
 
:StateCapital rdfs:subClassOf :City ;
 
:Denver   rdf:type  :StateCapital ;
          :location "Point(-104.984722 39.739167)"^^orageo:WKTLiteral .
:SaltLake rdf:type  :StateCaptial ;
          :location "Point(-111.883333 40.75)"^^orageo:WKTLiteral .
:Cheyenne rdf:type  :StateCapital ;
          :location "Point(-104.801944 41.145556)"^^orageo:WKTLiteral .

The following user-defined inference extension function (sem_inf_capitals) searches for capital cities within each state using the WKT geometries. If the function finds a capital city, it infers the city is the capital of the state containing it.

create or replace function sem_inf_capitals(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  stateClassId        number;
  capitalClassId      number;
  
  boundaryPropertyId  number;
  locationPropertyId  number;
  rdfTypePropertyId   number;
  capitalPropertyId   number;
 
  defaultSRID         number := 8307;
 
  xsdTimeFormat       varchar2(100);
  sqlStmt             varchar2(4000);
  insertStmt          varchar2(4000);
 
  pragma autonomous_transaction;
begin
  if (action = 'RUN') then
    -- retrieve ID of resource that already exists in the data (will
    -- throw exception if resource does not exist).
    stateClassId       := sdo_sem_inference.oracle_orardf_res2vid(
                            'http://example.org/geo/State',
                            p_network_owner=>'RDFUSER',
                            p_network_name=>'NET1');
    capitalClassId     := sdo_sem_inference.oracle_orardf_res2vid(
                            'http://example.org/geo/StateCapital', 
                            p_network_owner=>'RDFUSER', 
                            p_network_name=>'NET1');
    boundaryPropertyId := sdo_sem_inference.oracle_orardf_res2vid(
                            'http://example.org/geo/boundary', 
                            p_network_owner=>'RDFUSER', 
                            p_network_name=>'NET1');
    locationPropertyId := sdo_sem_inference.oracle_orardf_res2vid(
                            'http://example.org/geo/location', 
                            p_network_owner=>'RDFUSER',
                            p_network_name=>'NET1');
    rdfTypePropertyId  := sdo_sem_inference.oracle_orardf_res2vid(
                            'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
                            p_network_owner=>'RDFUSER',
                            p_network_name=>'NET1');
 
    -- retreive ID of resource or generate a new ID if resource does
    -- not already exist
    capitalPropertyId := sdo_sem_inference.oracle_orardf_add_res(
                           'http://example.org/geo/capital',
                           p_network_owner=>'RDFUSER',
                           p_network_name=>'NET1');
 
    -- query we use to extract the capital cities contained within state boundaries
    sqlStmt := 
      'select idsA1.sid stateId,
              idsB1.sid cityId
       from   ' || resource_id_map_view || ' valuesA,
              ' || resource_id_map_view || ' valuesB,
              ' || src_tab_view || '         idsA1,
              ' || src_tab_view || '         idsA2,
              ' || src_tab_view || '         idsB1,
              ' || src_tab_view || '         idsB2
       where  idsA1.pid = ' || to_char(rdfTypePropertyId,'TM9') || ' 
         AND  idsA1.oid = ' || to_char(stateClassId,'TM9')      || ' 
         AND  idsB1.pid = ' || to_char(rdfTypePropertyId,'TM9') || ' 
         AND  idsB1.oid = ' || to_char(capitalClassId,'TM9')    || ' 
       /* grab geometric lexical values */
         AND  idsA2.sid = idsA1.sid                                  
         AND  idsA2.pid = ' || to_char(boundaryPropertyId,'TM9')|| ' 
         AND  idsA2.oid = valuesA.value_id                           
         AND  idsB2.sid = idsB1.sid                                  
         AND  idsB2.pid = ' || to_char(locationPropertyId,'TM9')|| ' 
         AND  idsB2.oid = valuesB.value_id                           
       /* compare geometries to see if city is contained by state */              
         AND  SDO_RELATE( 
                SDO_RDF.getV$GeometryVal( 
                  valuesA.value_type, 
                  valuesA.vname_prefix, 
                  valuesA.vname_suffix, 
                  valuesA.literal_type, 
                  valuesA.language_type, 
                  valuesA.long_value, 
                  ' || to_char(defaultSRID,'TM9') || '),
                SDO_RDF.getV$GeometryVal(
                  valuesB.value_type, 
                  valuesB.vname_prefix, 
                  valuesB.vname_suffix, 
                  valuesB.literal_type, 
                  valuesB.language_type, 
                  valuesB.long_value, 
                  ' || to_char(defaultSRID,'TM9') || '),
                ''mask=CONTAINS'') = ''TRUE'' 
       /* ensures we have NEWDATA and only check capitals not assigned to a state */
         AND  not exists
               (select 1 
                from   ' || src_tab_view || ' 
                where  pid = ' || to_char(capitalPropertyId,'TM9') || ' 
                  AND  (sid = idsA1.sid OR oid = idsB1.sid)) 
       /* ensures we have UNIQDATA and only check capitals not assigned to a state */
         AND  not exists
               (select 1
                from   ' || output_tab   || '
                where  pid = ' || to_char(capitalPropertyId,'TM9') || ' 
                  AND  (sid = idsA1.sid OR oid = idsB1.sid))';
 
    -- insert new triples using only IDs
    insertStmt :=
      'insert /*+ parallel append */ into ' || output_tab || ' (sid, pid, oid) 
       select stateId, ' || to_char(capitalPropertyId,'TM9') || ', cityId
       from   (' || sqlStmt || ')';
 
    -- execute the query
    execute immediate insertStmt;
 
    -- commit our changes
    commit;
  end if;
 
  -- we only use ID values in the output_tab and we check for
  -- duplicates with our NOT EXISTS clauses.
  optimization_flag := SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_ALL_IDS +
                       SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_NEWDATA_ONLY +
                       SDO_SEM_INFERENCE.INF_EXT_OPT_FLAG_UNIQDATA_ONLY;
 
  -- return true to indicate success
  return true;
 
  -- handle any exceptions
  exception 
    when others then
      diag_message := 'error occurred: ' || SQLERRM;
      return false;
end sem_inf_capitals;
/
show errors;

The sem_inf_capitals function is similar to the sem_inf_durations function in Example 4a: Duration Rule, in that both functions must convert the lexical values of some triples into Oracle types to leverage native Oracle operators. In the case of sem_inf_capitals, the function converts the WKT lexical values encoding polygons and points into the Oracle Spatial SDO_GEOMETRY type, using the SDO_RDF.getV$GeometryVal function. The getV$GeometryVal function requires arguments mostly provided by the resource view (resource_id_map_view) and an additional argument, an ID to a spatial reference system (SRID). The getV$GeometryVal function will convert the geometry into the spatial reference system specified by SRID. The sem_inf_capitals function uses the default Oracle Spatial reference system, WGS84 Longitude-Latitude, specified by SRID value 8307. (For more information about support in RDF Graph for spatial references systems, see Spatial Support.)

After converting the WKT values into SDO_GEOMETRY types using the getV$GeometryVal function, the sem_inf_capitals function compares the state geometry with the city geometry to see if the state contains the city. The SDO_RELATE operator performs this comparison and returns the literal value 'TRUE' when the state contains the city. The SDO_RELATE operator can perform various different types of comparisons. (See Oracle Spatial Developer's Guide for more information about SDO_RELATE and other spatial operators.)

To create an inferred graph with the sem_inf_capitals function, grant execution privileges to the RDFUSER, then pass the function name to the SEM_APIS.CREATE_INFERRED_GRAPH procedure, as follows:

-- grant appropriate privileges
grant execute on sem_inf_capitals to RDFUSER;
 
-- create the inferred graph
begin
  sem_apis.create_inferred_graph( 
    'STATES_INF'
  , sem_models('STATES')
  , sem_rulebases('OWLPRIME')
  , passes => SEM_APIS.REACH_CLOSURE
  , inf_ext_user_func_name => 'sem_inf_capitals' 
  , network_owner=>'RDFUSER'
  , network_name=>'NET1'
);
end;
/

In addition to the triples inferred by OWLPRIME, the inferred graph should contain the following three new triples added by sem_inf_capitals:

S                                P                               O
-------------------------------- ------------------------------- --------------------------------
http://example.org/geo/Colorado  http://example.org/geo/capital  http://example.org/geo/Denver
http://example.org/geo/Utah      http://example.org/geo/capital  http://example.org/geo/SaltLake
http://example.org/geo/Wyoming   http://example.org/geo/capital  http://example.org/geo/Cheyenne

Parent topic: User-Defined Inference Extension Function Examples

7.1.3.6 Example 6: Calling a Web Service

This section contains a user-defined inference extension function (sem_inf_geocoding) and a related helper procedure (geocoding), which enable you to make a web service call to the Oracle Geocoder service. The user-defined inference extension function looks for the object values of triples using predicate <urn:streetAddress>, makes callouts to the Oracle public Geocoder service endpoint at http://maps.oracle.com/geocoder/gcserver, and inserts the longitude and latitude information as two separate triples.

For example, assume that the RDF graph contains the following assertion:

<urn:NEDC>  <urn:streetAddress>  "1 Oracle Dr., Nashua, NH"

In this case, an inference call using sem_inf_geocoding will produce the following new assertions:

<urn:NEDC> <http://www.w3.org/2003/01/geo/wgs84_pos#long>  "-71.46421"
<urn:NEDC> <http://www.w3.org/2003/01/geo/wgs84_pos#lat>   "42.75836"
<urn:NEDC> <http://www.opengis.net/geosparql#asWKT>  "POINT(-71.46421 42.75836)"^^<http://www.opengis.net/geosparql#wktLiteral>
<urn:NEDC> <http://xmlns.oracle.com/rdf/geo/asWKT>   "POINT(-71.46421 42.75836)"^^<http://xmlns.oracle.com/rdf/geo/WKTLiteral>

The sem_inf_geocoding function is defined as follows:

create or replace function sem_inf_geocoding(
    src_tab_view         in  varchar2,
    resource_id_map_view in  varchar2,
    output_tab           in  varchar2,
    action               in  varchar2,
    num_calls            in  number,
    tplInferredLastRound in  number,
    options              in  varchar2 default null,
    optimization_flag    out number,
    diag_message         out varchar2
    )
return boolean
as
  pragma autonomous_transaction;
  iCount integer;
  
  nLong number;
  nLat  number;
  nWKT  number;
  nOWKT number;
  nStreetAddr number;
  
  sidTab    dbms_sql.number_table;
  oidTab    dbms_sql.number_table;
  
  vcRequestBody varchar2(32767);
  vcStmt        varchar2(32767);
  vcStreeAddr   varchar2(3000);
  
  type cur_type is ref cursor;
  cursorFind    cur_type; 
  vcLong varchar2(100);
  vcLat  varchar2(100);
begin
  if (action = 'START') then
    nLat := sdo_sem_inference.oracle_orardf_add_res(
              'http://www.w3.org/2003/01/geo/wgs84_pos#lat', 
              p_network_owner=>'RDFUSER', 
              p_network_name=>'NET1');
    nLong := sdo_sem_inference.oracle_orardf_add_res(
               'http://www.w3.org/2003/01/geo/wgs84_pos#long', 
               p_network_owner=>'RDFUSER', 
               p_network_name=>'NET1');
    nWKT  := sdo_sem_inference.oracle_orardf_add_res(
               'http://www.opengis.net/geosparql#asWKT', 
               p_network_owner=>'RDFUSER', 
               p_network_name=>'NET1');
    nOWKT := sdo_sem_inference.oracle_orardf_add_res(
               'http://xmlns.oracle.com/rdf/geo/asWKT', 
               p_network_owner=>'RDFUSER', 
               p_network_name=>'NET1');
  end if; 
  
  if (action = 'RUN') then
    nStreetAddr := sdo_sem_inference.oracle_orardf_res2vid(
                     '<urn:streetAddress>', 
                     p_network_owner=>'RDFUSER', 
                     p_network_name=>'NET1');
    nLat := sdo_sem_inference.oracle_orardf_res2vid(
              'http://www.w3.org/2003/01/geo/wgs84_pos#lat', 
              p_network_owner=>'RDFUSER', 
              p_network_name=>'NET1');
    nLong := sdo_sem_inference.oracle_orardf_res2vid(
               'http://www.w3.org/2003/01/geo/wgs84_pos#long', 
               p_network_owner=>'RDFUSER', 
               p_network_name=>'NET1');
    nWKT  := sdo_sem_inference.oracle_orardf_res2vid(
               'http://www.opengis.net/geosparql#asWKT', 
               p_network_owner=>'RDFUSER', 
               p_network_name=>'NET1');
    nOWKT := sdo_sem_inference.oracle_orardf_res2vid(
               'http://xmlns.oracle.com/rdf/geo/asWKT', 
               p_network_owner=>'RDFUSER', 
               p_network_name=>'NET1');
 
    vcStmt := '
      select /*+ parallel */ distinct s1.sid as s_id, s1.oid as o_id
        from ' || src_tab_view || ' s1
       where s1.pid = :1
         and not exists ( select 1
                            from   ' || src_tab_view || ' x
                           where  x.sid = s1.sid
                             and  x.pid = :2
                        ) ';
    open cursorFind for vcStmt using nStreetAddr, nLong;
    
    loop
      fetch cursorFind bulk collect into sidTab, oidTab limit 10000;
      for i in 1..sidTab.count loop 
        vcStreeAddr := sdo_sem_inference.oracle_orardf_vid2lit(
                         oidTab(i), 
                         p_network_owner=>'RDFUSER', 
                         p_network_name=>'NET1');
        -- dbms_output.put_line('Now processing street addr ' || vcStreeAddr);
        geocoding(vcStreeAddr, vcLong, vcLat);
        execute immediate 'insert into ' || output_tab || '(sid,pid,oid,gid,s,p,o,g)
            values(:1, :2, null, null, null, null, :3, null) '
            using sidTab(i), nLong, '"'||vcLong||'"';
        execute immediate 'insert into ' || output_tab || '(sid,pid,oid,gid,s,p,o,g)
            values(:1, :2, null, null, null, null, :3, null) '
            using sidTab(i), nLat, '"'||vcLat||'"';
        execute immediate 'insert into ' || output_tab || '(sid,pid,oid,gid,s,p,o,g) 
            values(:1, :2, null, null, null, null, :3, null) '
            using sidTab(i), nWKT, '"POINT('|| vcLong || ' ' ||vcLat ||')"^^<http://www.opengis.net/geosparql#wktLiteral>';
        execute immediate 'insert into ' || output_tab || '(sid,pid,oid,gid,s,p,o,g) 
            values(:1, :2, null, null, null, null, :3, null) '
            using sidTab(i), nOWKT, '"POINT('|| vcLong || ' ' ||vcLat ||')"^^<http://xmlns.oracle.com/rdf/geo/WKTLiteral>';
      end loop;
      exit when cursorFind%notfound;
    end loop;   
    commit;
  end if;
  return true;
end;
/
grant execute on sem_inf_geocoding to RDFUSER;

The sem_inf_geocoding function makes use of the following helper procedure named geocoding, which does the actual HTTP communication with the Geocoder web service endpoint. Note that proper privileges are required to connect to the web server.

create or replace procedure geocoding(addr varchar2,
                                     vcLong out varchar2,
                                     vcLat  out varchar2
                                    )
as
  httpReq  utl_http.req;
  httpResp utl_http.resp;
  
  vcRequestBody varchar2(32767);
  
  vcBuffer  varchar2(32767);
  idxLat integer;
  idxLatEnd integer;
begin
  vcRequestBody := utl_url.escape('xml_request=<?xml version="1.0" standalone="yes"?>
    <geocode_request vendor="elocation">
      <address_list> 
      <input_location id="27010">
      <input_address match_mode="relax_street_type">
        <unformatted country="US">
           <address_line value="'|| addr ||'"/>
        </unformatted>
       </input_address>
      </input_location>
    </address_list>
    </geocode_request>
  ');
  dbms_output.put_line('request ' || vcRequestBody);
  
  -- utl_http.set_proxy('<your_proxy_here_if_necessary>', null);
  httpReq := utl_http.begin_request (
    'http://maps.oracle.com/geocoder/gcserver', 'POST');
    
  utl_http.set_header(httpReq, 'Content-Type', 'application/x-www-form-urlencoded');
  utl_http.set_header(httpReq, 'Content-Length', lengthb(vcRequestBody));
  
  utl_http.write_text(httpReq, vcRequestBody);
  
  httpResp := utl_http.get_response(httpReq);
  
  utl_http.read_text(httpResp, vcBuffer, 32767);
  utl_http.end_response(httpResp);
  
  -- dbms_output.put_line('response ' || vcBuffer);
  -- Here we are doing some simple string parsing out of an XML.
  -- It is more robust to use XML functions instead.
  idxLat := instr(vcBuffer, 'longitude="'); 
  idxLatEnd := instr(vcBuffer, '"', idxLat + 12);
  vcLong := substr(vcBuffer, idxLat + 11, idxLatEnd - idxLat - 11);
  dbms_output.put_line('long = ' || vcLong);
  
  idxLat := instr(vcBuffer, 'latitude="');
  idxLatEnd := instr(vcBuffer, '"', idxLat + 11);
  vcLat := substr(vcBuffer, idxLat + 10, idxLatEnd - idxLat - 10);
  dbms_output.put_line('lat = ' || vcLat);
exception
  when others then
    dbms_output.put_line('geocoding: error ' || dbms_utility.format_error_backtrace || ' '
                                             || dbms_utility.format_error_stack);
end;
/

Parent topic: User-Defined Inference Extension Function Examples