9.2 User-Defined Functions and Aggregates

The RDF Graph query extension architecture enables you to add user-defined functions and aggregates to be used in SPARQL queries, both through the SEM_MATCH table function and through the support for Apache Jena.

The SPARQL 1.1 Standard provides several functions used mainly for filtering and categorizing data obtained by a query. However, you may need specialized functions not supported by the standard.

Some simple examples include finding values that belong to a specific type, or obtaining values with a square sum value that is greater than a certain threshold. Although this can be done by means of combining functions, it may be useful to have a single function that handles the calculations, which also allows for a simpler and shorter query.

The RDF Graph query extension allows you to include your own query functions and aggregates. This architecture allows:

9.2.1 Data Types for User-Defined Functions and Aggregates

The SDO_RDF_TERM object type is used to represent an RDF term when creating user-defined functions and aggregates.

SDO_RDF_TERM has the following attributes, which correspond to columns in the RDF_VALUE$ table (see Table 1-5 in Statements for a description of these attributes). The CTX1 and FLAGS attributes are reserved for future use and do not have corresponding columns in RDF_VALUE$.

SDO_RDF_TERM(
  VALUE_TYPE    VARCHAR2(10),
  VALUE_NAME    CLOB, 
  VNAME_PREFIX  CLOB,
  VNAME_SUFFIX  VARCHAR2(512), 
  LITERAL_TYPE  VARCHAR2(1000), 
  LANGUAGE_TYPE VARCHAR2(80), 
  LONG_VALUE    CLOB, 
  CTX1          VARCHAR2(4000),
  FLAGS         INTEGER )

The following constructors are available for creating SDO_RDF_TERM objects. The first constructor populates each attribute from a single, lexical RDF term string. The second, third, and fourth constructors receive individual attribute values as input. Only the first RDF term string constructor sets values for VNAME_PREFIX and VNAME_SUFFIX. These values are initialized to null by the other constructors.

SDO_RDF_TERM (
  rdf_term_str  VARCHAR2) 
  RETURN SELF;

SDO_RDF_TERM (
  value_type    VARCHAR2, 
  value_name    VARCHAR2, 
  literal_type  VARCHAR2, 
  language_type VARCHAR2, 
  long_value    CLOB) 
  RETURN SELF;

SDO_RDF_TERM (
  value_type    VARCHAR2, 
  value_name    VARCHAR2, 
  literal_type  VARCHAR2, 
  language_type VARCHAR2, 
  long_value    CLOB, 
  ctx1          VARCHAR2) 
  RETURN SELF;

SDO_RDF_TERM (
  value_type    VARCHAR2, 
  value_name    VARCHAR2, 
  literal_type  VARCHAR2, 
  language_type VARCHAR2, 
  long_value    CLOB, 
  ctx1          VARCHAR2,
  flags         INTEGER) 
  RETURN SELF;

The SDO_RDF_TERM_LIST type is used to hold a list of SDO_RDF_TERM objects and is defined as VARRAY(32767) of SDO_RDF_TERM.

9.2.2 API Support for User-Defined Functions

A user-defined function is created by implementing a PL/SQL function with a specific signature, and a specific URI is used to invoke the function in a SPARQL query pattern.

After each successful inference extension function call, a commit is executed to persist changes made in the inference extension function call. If an inference extension function is defined as autonomous by specifying pragma autonomous_transaction, then it should either commit or roll back at the end of its implementation logic. Note that the inference engine may call an extension function multiple times when creating an inferred graph (once per round). Commits and rollbacks from one call will not affect other calls.

9.2.2.1 PL/SQL Function Implementation

Each user-defined function must be implemented by a PL/SQL function with a signature in the following format:

FUNCTION user_function_name (params IN SDO_RDF_TERM_LIST)
  RETURN SDO_RDF_TERM

This signature supports an arbitrary number of RDF term arguments, which are passed in using a single SDO_RDF_TERM_LIST object, and returns a single RDF term as output, which is represented as a single SDO_RDF_TERM object. Type checking or other verifications for these parameters are not performed. You should take steps to validate the data according to the function goals.

Note that PL/SQL supports callouts to functions written in other programming languages, such as C and Java, so the PL/SQL function that implements a user-defined query function can serve only as a wrapper for functions written in other programming languages.

9.2.2.2 Invoking User-Defined Functions from a SPARQL Query Pattern

After a user-defined function is implemented in PL/SQL, it can be invoked from a SPARQL query pattern using a function URI constructed from the prefix <http://xmlns.oracle.com/rdf/extensions/> followed by schema.package_name.function_name if the corresponding PL/SQL function is part of a PL/SQL package, or schema.function_name if the function is not part of a PL/SQL package. The following are two example function URIs:

<http://xmlns.oracle.com/rdf/extensions/my_schema.my_package.my_function>(arg_1, …, arg_n)

<http://xmlns.oracle.com/rdf/extensions/my_schema.my_function>(arg_1, …, arg_n)

9.2.2.3 User-Defined Function Examples

This section presents examples of the implementation of a user-defined function and the use of that function in a FILTER clause, in a SELECT expression, and in a BIND operation.

For the examples, assume that the following data, presented here in N-triple format, exists inside a model called MYMODEL:

<a>  <p>  "1.0"^^xsd:double .
<b>  <p>  "1.5"^^xsd:float .
<c>  <p>  "3"^^xsd:decimal .
<d>  <p>  "4"^^xsd:string .

Example 9-1 User-Defined Function to Calculate Sum of Two Squares

Example 9-1 shows the implementation of a simple function that receives two values and calculates the sum of the squares of each value.

CREATE OR REPLACE FUNCTION sum_squares (params IN SDO_RDF_TERM_LIST) 
   RETURN SDO_RDF_TERM
   AS 
     retTerm    SDO_RDF_TERM;
     sqr1       NUMBER;
     sqr2       NUMBER;
     addVal     NUMBER;
     val1       SDO_RDF_TERM;
     val2       SDO_RDF_TERM;
   BEGIN 
     –- Set the return value to null.
     retTerm := SDO_RDF_TERM(NULL,NULL,NULL,NULL,NULL);
     –- Obtain the data from the first two parameters.
     val1 := params(1); 
     val2 := params(2);
     –- Convert the value stored in the sdo_rdf_term to number.
     –- If any exception occurs, return the null value.
     BEGIN
       sqr1 := TO_NUMBER(val1.value_name);
       sqr2 := TO_NUMBER(val2.value_name);
       EXCEPTION WHEN OTHERS THEN RETURN retTerm;
     END;
     –- Compute the square sum of both values.
       addVal := (sqr1 * sqr1) + (sqr2 * sqr2);
     –- Set the return value to the desired rdf term type.
     retTerm := SDO_RDF_TERM('LIT',to_char(addVal),
                'http://www.w3.org/2001/XMLSchema#integer','',NULL);
     – Return the new value.
     RETURN retTerm;
END;
/
SHOW ERRORS;

Note that the sum_squares function in Example 9-1 does not verify the data type of the value received. It is intended as a demonstration only, and relies on TO_NUMBER to obtain the numeric value stored in the VALUE_NAME field of SDO_RDF_TERM.

Example 9-2 User-Defined Function Used in a FILTER Clause

Example 9-2 shows the sum_squares function (from Example 9-1) used in a FILTER clause.

SELECT s, o
FROM table(sem_match(
'SELECT  ?s ?o
 WHERE { ?s ?p ?o 
 FILTER (<http://xmlns.oracle.com/rdf/extensions/schema.sum_squares>(?o,?o) > 2)}',
sem_models('MYMODEL'),null,null,null,null,'',null,null,'RDFUSER','NET1'));

The query in Example 9-2 returns the following result:

s                    o                    
-------------------- -------------------- 
b                    1.5
c                    3                  
d                    4

Example 9-3 User-Defined Function Used in a SELECT Expression

Example 9-3 shows the sum_squares function (from Example 9-1) used in an expression in the SELECT clause.

SELECT s, o, sqr_sum
FROM table(sem_match(
'SELECT  ?s ?o 
       (<http://xmlns.oracle.com/rdf/extensions/schema.sum_squares>(?o,?o) AS    
        ?sqr_sum)
 WHERE { ?s ?p ?o }',
sem_models('MYMODEL'),null,null,null,null,'',null,null,'RDFUSER','NET1'));

The query in Example 9-3 returns the following result:

s                    o                    sqr_sum
-------------------- -------------------- -------------------- 
a                    1                    2
b                    1.5                  4.5
c                    3                    18
d                    4                    32

Example 9-4 User-Defined Function Used in a BIND Operation

Example 9-4 shows the sum_squares function (from Example 9-1) used in a BIND operation.

SELECT s, o, sqr_sum
FROM table(sem_match(
'SELECT  ?s ?o ?sqr_sum
 WHERE { ?s ?p ?o .
 BIND (<http://xmlns.oracle.com/rdf/extensions/schema.sum_squares>(?o,?o) AS
       ?sqr_sum)}',
sem_models('MYMODEL'),null,null,null,null,'',null,null,'RDFUSER','NET1'));

The query in Example 9-4 returns the following result:

s                    o                    sqr_sum
-------------------- -------------------- -------------------- 
a                    1                    2
b                    1.5                  4.5
c                    3                    18
d                    4                    32

9.2.3 API Support for User-Defined Aggregates

User-defined aggregates are implemented by defining a PL/SQL object type that implements a set of interface methods. After the user-defined aggregate is created, a specific URI is used to invoke it.

9.2.3.1 ODCIAggregate Interface

User-defined aggregates use the ODCIAggregate PL/SQL interface. For more detailed information about this interface, see the chapter about user-defined aggregate functions in Oracle Database Data Cartridge Developer's Guide.

The ODCIAggregate interface is implemented by a PL/SQL object type that implements four main functions:

  • ODCIAggregateInitialize

  • ODCIAggregateIterate

  • ODCIAggregateMerge

  • ODCIAggregateTerminate

As with user-defined functions (described in API Support for User-Defined Functions), user-defined aggregates receive an arbitrary number of RDF term arguments, which are passed in as an SDO_RDF_TERM_LIST object, and return a single RDF term value, which is represented as an SDO_RDF_TERM object.

This scheme results in the following signatures for the PL/SQL ODCIAggregate interface functions (with my_aggregate_obj_type representing the actual object type name):

STATIC FUNCTION ODCIAggregateInitialize(
        sctx IN OUT my_aggregate_obj_type)
RETURN NUMBER
 
MEMBER FUNCTION ODCIAggregateIterate(
        self       IN OUT my_aggregate_obj_type
       ,value      IN     SDO_RDF_TERM_LIST)
RETURN NUMBER
 
MEMBER FUNCTION ODCIAggregateMerge(
        self IN OUT my_aggregate_obj_type
       ,ctx2 IN     my_aggregate_obj_type)
RETURN NUMBER
 
MEMBER FUNCTION ODCIAggregateTerminate (
        self IN my_aggregate_obj_type
       ,return_value OUT SDO_RDF_TERM
       ,flags IN NUMBER)              
RETURN NUMBER

9.2.3.2 Invoking User-Defined Aggregates

After a user-defined aggregate is implemented in PL/SQL, it can be invoked from a SPARQL query by referring to an aggregate URI constructed from the prefix <http://xmlns.oracle.com/rdf/aggExtensions/> followed by schema_name.aggregate_name. The following is an example aggregate URI:

<http://xmlns.oracle.com/rdf/aggExtensions/schema.my_aggregate>(arg_1, …, arg_n)

The DISTINCT modifier can be used with user-defined aggregates, as in the following example:

<http://xmlns.oracle.com/rdf/aggExtensions/schema.my_aggregate>(DISTINCT arg_1)

In this case, only distinct argument values are passed to the aggregate. Note, however, that the DISTINCT modifier can only be used with aggregates that have exactly one argument.

9.2.3.3 User-Defined Aggregate Examples

This section presents examples of implementing and using a user-defined aggregate. For the examples, assume that the following data, presented here in N-triple format, exists inside a model called MYMODEL:

<a>  <p>  "1.0"^^xsd:double .
<b>  <p>  "1.5"^^xsd:float .
<c>  <p>  "3"^^xsd:decimal .
<c>  <p>  "4"^^xsd:decimal .
<d>  <p>  "4"^^xsd:string .

Example 9-5 User-Defined Aggregate Implementation

Example 9-5 shows the implementation of a simple user-defined aggregate (countSameType). This aggregate has two arguments: the first is any RDF term, and the second is a constant data type URI. The aggregate counts how many RDF terms from the first argument position have a data type equal to the second argument.

-- Aggregate type creation
CREATE OR REPLACE TYPE countSameType authid current_user AS OBJECT(
 
count NUMBER, –- Variable to store the number of same-type terms.
 
–- Mandatory Functions for aggregates 
STATIC FUNCTION ODCIAggregateInitialize(
        sctx IN OUT countSameType)
RETURN NUMBER,
 
MEMBER FUNCTION ODCIAggregateIterate(
         self       IN OUT countSameType
       , value      IN     SDO_RDF_TERM_LIST)
RETURN NUMBER,
 
MEMBER FUNCTION ODCIAggregateMerge(
        self IN OUT countSameType
       ,ctx2 IN     countSameType)
RETURN NUMBER,
 
MEMBER FUNCTION ODCIAggregateTerminate (
        self IN          countSameType
       ,return_value OUT SDO_RDF_TERM
       ,flags IN NUMBER)              
RETURN NUMBER
);
/
SHOW ERRORS;
 
–- Interface function for the user-defined aggregate
CREATE OR REPLACE FUNCTION countSameAs (input SDO_RDF_TERM_LIST) RETURN SDO_RDF_TERM
PARALLEL_ENABLE AGGREGATE USING countSameType;
/
show errors;
 
–- User-defined aggregate body
CREATE OR REPLACE TYPE BODY countSameType IS
 
STATIC FUNCTION ODCIAggregateInitialize(
         sctx            IN OUT countSameType)
RETURN NUMBER IS
BEGIN
  sctx := countSameType (0); –- Aggregate initialization
  RETURN ODCIConst.Success;
END;
 
MEMBER FUNCTION ODCIAggregateIterate(
         self           IN OUT countSameType
       , value          IN     SDO_RDF_TERM_LIST )
RETURN NUMBER IS
BEGIN
  -- Increment count if the first argument has a literal type
  -- URI equal to the value of the second argument
  IF (value(1).literal_type = value(2).value_name) THEN
    self.count := self.count + 1;
  END IF;                                    
  RETURN ODCIConst.Success;
END;   
 
MEMBER FUNCTION ODCIAggregateMerge(
         self            IN OUT countSameType
        ,ctx2            IN countSameType)
RETURN NUMBER IS
BEGIN
  –- Sum count to merge parallel threads.
  self.count := self.count + ctx2.count;   
  RETURN ODCIConst.Success;
END;
 
MEMBER FUNCTION ODCIAggregateTerminate(
         self            IN  countSameType
        ,return_value    OUT SDO_RDF_TERM
        ,flags           IN  NUMBER)              
RETURN NUMBER IS
BEGIN
   -- Set the return value
   return_value := SDO_RDF_TERM('LIT',to_char(self.count),
     'http://www.w3.org/2001/XMLSchema#decimal',NULL,NULL); RETURN  ODCIConst.Success;
END;
 
END;
/
SHOW ERRORS;

Example 9-6 User-Defined Aggregate Used Without a GROUP BY Clause

Example 9-6 shows the countSameType aggregate (from Example 9-5) used over an entire query result group.

FROM o
from table(sem_match(
'SELECT 
 (<http://xmlns.oracle.com/rdf/aggExtensions/schema.countSameType>(?o,xsd:decimal) 
  AS ?o)
 WHERE { ?s ?p ?o }',
sem_models('MYMODEL'),null,null,null,null,'',null,null,'RDFUSER','NET1'));

The query in Example 9-6 returns the following result:

o                    
-------------------- 
2                    

Example 9-7 User-Defined Aggregate Used With a GROUP BY Clause

Example 9-7 shows the countSameType aggregate (from Example 9-5) used over a set of groups formed from a GROUP BY clause.

select s, o
from table(sem_match(
'SELECT ?s
 (<http://xmlns.oracle.com/rdf/aggExtensions/schema.countSameType>(?o,xsd:decimal) 
  AS ?o)
 WHERE { ?s ?p ?o } GROUP BY ?s',
sem_models('MYMODEL'),null,null,null,null,'',null,null,'RDFUSER','NET1'));

The query in Example 9-7 returns the following result:

s                    o                    
-------------------- -------------------- 
a                    0
b                    0
c                    2                    
d                    0