SQL-Based Property Graph Query and Analytics

4 SQL-Based Property Graph Query and Analytics

You can use SQL to query property graph data in Oracle Spatial and Graph.

For the property graph support in Oracle Spatial and Graph, all the vertices and edges data are persisted in relational form in Oracle Database. For detailed information about the Oracle Spatial and Graph property graph schema objects, see Property Graph Schema Objects for Oracle Database.

This chapter provides examples of typical graph queries implemented using SQL. The audience includes DBAs as well as application developers who understand SQL syntax and property graph schema objects.

The benefits of querying directly property graph using SQL include:

There is no need to bring data outside Oracle Database.

You can leverage the industry-proven SQL engine provided by Oracle Database.

You can easily join or integrate property graph data with other data types (relational, JSON, XML, and so on).

You can take advantage of existing Oracle SQL tuning and database management tools and user interface.

The examples assume that there is a property graph named connections in the current schema. The SQL queries and example output are for illustration purpose only, and your output may be different depending on the data in your connections graph. In some examples, the output is reformatted for readability.

Simple Property Graph Queries
The examples in this topic query vertices, edges, and properties of the graph.
Text Queries on Property Graphs
If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column.
Navigation and Graph Pattern Matching
A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends.
Navigation Options: CONNECT BY and Parallel Recursion
The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying.
Pivot
The PIVOT clause lets you dynamically add columns to a table to create a new table.
SQL-Based Property Graph Analytics
In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.

4.1 Simple Property Graph Queries

The examples in this topic query vertices, edges, and properties of the graph.

Example 4-1 Find a Vertex with a Specified Vertex ID

This example find the vertex with vertex ID 1 in the connections graph.

SQL> select vid, k, v, vn, vt 
      from connectionsVT$ 
      where vid=1;

The output might be as follows:

     1 country     United States
     1 name        Robert Smith
     1 occupation  CEO of Example Corporation
     ...

Example 4-2 Find an Edge with a Specified Edge ID

This example find the edge with edge ID 100 in the connections graph.

SQL> select eid,svid,dvid,k,t,v,vn,vt 
      from connectionsGE$ 
      where eid=1000;

The output might be as follows:

    1000  1 2 weight  3  1  1

In the preceding output, the K of the edge property is "weight" and the type ID of the value is 3, indicating a float value.

Example 4-3 Perform Simple Counting

This example performs simple counting in the connections graph.

SQL> -- Get the total number of K/V pairs of all the vertices
SQL> select /*+ parallel */ count(1) 
       from connectionsVT$;

    299

SQL> -- Get the total number of K/V pairs of all the edges
SQL> select /*+ parallel(8) */ count(1) 
       from connectionsGE$;
    164

SQL> -- Get the total number of vertices
SQL> select /*+ parallel */ count(distinct vid) 
       from connectionsVT$;

    78

SQL> -- Get the total number of edges
SQL> select /*+ parallel */ count(distinct eid) 
       from connectionsGE$;

    164

Example 4-4 Get the Set of Property Keys Used

This example gets the set of property keys used for the vertices n the connections graph.

SQL> select /*+ parallel */ distinct k 
      from connectionsVT$;

company
show
occupation
type
team
religion
criminal charge
music genre
genre
name
role
political party
country

13 rows selected.

SQL> -- get the set of property keys used for edges
SQL> select /*+ parallel */ distinct k 
       from connectionsGE$;

weight

Example 4-5 Find Vertices with a Value

This example finds vertices with a value (of any property) that is of String type, and where and the value contains two adjacent occurrences of a, e, i, o, or u, regardless of case.n the connections graph.

SQL> select vid, t, k, v 
       from connectionsVT$ 
      where t=1 
        and regexp_like(v, '([aeiou])\1', 'i');

     6        1  name  Jordan Peele 
     6        1  show  Key and Peele
    54        1  name  John Green
        ...

It is usually hard to leverage a B-Tree index for the preceding kind of query because it is difficult to know beforehand what kind of regular expression is going to be used. For the above query, you might get the following execution plan. Note that full table scan is chosen by the optimizer.

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                |    15 |   795 |    28    (0)| 00:00:01 |       |       |        |      |           |
|   1 |  PX COORDINATOR      |                |       |       |             |          |       |       |        |      |           |
|   2 |   PX SEND QC (RANDOM)| :TQ10000       |    15 |   795 |    28    (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND) |
|   3 |    PX BLOCK ITERATOR |                |    15 |   795 |    28    (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |           |
|*  4 |     TABLE ACCESS FULL| CONNECTIONSVT$ |    15 |   795 |    28    (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |           |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(INTERNAL_FUNCTION("V") AND  REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
   - Degree of Parallelism is 2 because of table property

If the Oracle Database In-Memory option is available and memory is sufficient, it can help performance to place the table (full table or a set of relevant columns) in memory. One way to achieve that is as follows:

SQL> alter table connectionsVT$ inmemory;
Table altered.

Now, entering the same SQL containing the regular expression shows a plan that performs a "TABLE ACCESS INMEMORY FULL".

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)  | Time     | Pstart| Pstop |    TQ    |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |    15 |   795 |    28     (0)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR               |                |       |       |              |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)         | :TQ10000       |    15 |   795 |    28     (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR          |                |    15 |   795 |    28     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS INMEMORY FULL| CONNECTIONSVT$ |    15 |   795 |    28     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(INTERNAL_FUNCTION("V") AND  REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
   - Degree of Parallelism is 2 because of table property

Parent topic: SQL-Based Property Graph Query and Analytics

4.2 Text Queries on Property Graphs

If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column.

Oracle Text can process text that is directly stored in the database. The text can be short strings (such as names or addresses), or it can be full-length documents. These documents can be in a variety of textual format.

The text can also be in many different languages. Oracle Text can handle any space-separated languages (including character sets such as Greek or Cyrillic). In addition, Oracle Text is able to handle the Chinese, Japanese and Korean pictographic languages)

Because the property graph feature uses NVARCHAR typed column for better support of Unicode, it is highly recommended that UTF8 (AL32UTF8) be used as the database character set.

To create an Oracle Text index on the vertices table (or edges table), the ALTER SESSION privilege is required. For example:

SQL> grant alter session to <YOUR_USER_SCHEMA_HERE>;

If customization is required, also grant the EXECUTE privilege on CTX_DDL:

SQL> grant execute on ctx_ddl to <YOUR_USER_SCHEMA_HERE>;

The following shows some example statements for granting these privileges to SCOTT.

SQL> conn / as sysdba
Connected.
SQL> -- This is a PDB setup -- 
SQL> alter session set container=orcl;
Session altered.

SQL> grant execute on ctx_ddl to scott;
Grant succeeded.

SQL> grant alter session to scott;
Grant succeeded.

Example 4-6 Create a Text Index

This example creates an Oracle Text index on the vertices table (V column) of the connections graph in the SCOTT schema. Note that the Oracle Text index created here is for all property keys, not just one or a subset of property keys. In addition, if a new property is added to the graph and the property value is of String data type, then it will automatically be included in the same text index.

The example uses the OPG_AUTO_LEXER lexer owned by MDSYS.

SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'MDSYS', lexer=>'OPG_AUTO_LEXER', dop=>2);

If customization is desired, you can use the ctx_ddl.create_preference API. For example:

SQL> -- The following requires access privilege to CTX_DDL
SQL> exec ctx_ddl.create_preference('SCOTT.OPG_AUTO_LEXER', 'AUTO_LEXER');

PL/SQL procedure successfully completed.

SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'scott', lexer=>'OPG_AUTO_LEXER', dop=>2);

PL/SQL procedure successfully completed.

You can now use a rich set of functions provided by Oracle Text to perform queries against graph elements.

Note:

If you no longer need an Oracle Text index, you can use the drop_vertices_text_idx or opg_apis.drop_edges_text_idx API to drop it. The following statements drop the text indexes on the vertices and edges of a graph named connections owned by SCOTT:

SQL> exec opg_apis.drop_vertices_text_Idx('scott', 'connections');
SQL> exec opg_apis.drop_edges_text_Idx('scott', 'connections');

Example 4-7 Find a Vertex that Has a Property Value

The following example find a vertex that has a property value (of string type) containing the keyword "Smith".

SQL> select vid, k, t, v 
       from connectionsVT$ 
      where t=1 
        and contains(v, 'Smith', 1) > 0 
      order by score(1) desc
      ;

The output and SQL execution plan from the preceding statement may appear as follows. Note that DOMAIN INDEX appears as an operation in the execution plan.

     1  name    1  Robert Smith

Execution Plan
----------------------------------------------------------
Plan hash value: 1619508090

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |     1 |    56 |     5  (20) | 00:00:01 |       |       |
|   1 |  SORT ORDER BY                      |                 |     1 |    56 |     5  (20) | 00:00:01 |       |       |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$  |     1 |    56 |     4    (0)| 00:00:01 | ROWID | ROWID |
|*  3 |    DOMAIN INDEX                     | CONNECTIONSXTV$ |       |       |     4    (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T"=1 AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
   3 - access("CTXSYS"."CONTAINS"("V",'Smith',1)>0)

Example 4-8 Fuzzy Match

The following example finds a vertex that has a property value (of string type) containing variants of "ameriian" (a deliberate misspelling for this example) Fuzzy match is used.

SQL> select vid, k, t, v 
       from connectionsVT$ 
      where contains(v, 'fuzzy(ameriian,,,weight)', 1) > 0 
      order by score(1) desc;

The output and SQL execution plan from the preceding statement may appear as follows.

     8 role      1  american business man
     9 role      1  american business man
     4 role      1  american economist
     6 role      1  american comedian actor
     7 role      1  american comedian actor
     1 occupation 1 44th president of United States of America

6 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1619508090

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |     1 |    56 |     5  (20)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                      |                 |     1 |    56 |     5  (20)| 00:00:01 |       |       |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$  |     1 |    56 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  3 |    DOMAIN INDEX                     | CONNECTIONSXTV$ |       |       |     4   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))

Example 4-9 Query Relaxation

The following example is a sophisticated Oracle Text query that implements query relaxation, which enables you to execute the most restrictive version of a query first, progressively relaxing the query until the required number of matches is obtained. Using query relaxation with queries that contain multiple strings, you can provide guidance for determining the “best” matches, so that these appear earlier in the results than other potential matches.

This example searchs for "american actor" with a query relaxation sequence.

SQL> select vid, k, t, v  
       from connectionsVT$ 
      where CONTAINS (v,
 '<query>
   <textquery lang="ENGLISH" grammar="CONTEXT">
     <progression>
       <seq>{american} {actor}</seq>
       <seq>{american} NEAR {actor}</seq>
       <seq>{american} AND {actor}</seq>
       <seq>{american} ACCUM {actor}</seq>
     </progression>
   </textquery>
   <score datatype="INTEGER" algorithm="COUNT"/>
  </query>') > 0;

The output and SQL execution plan from the preceding statement may appear as follows.

     7 role       1 american comedian actor
     6 role       1 american comedian actor
    44 occupation 1 actor
     8 role       1 american business man
    53 occupation 1 actor film producer
    52 occupation 1 actor
     4 role       1 american economist
    47 occupation 1 actor
     9 role       1 american business man

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2158361449

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 |       1 |      56 |       4   (0)| 00:00:01 |         |         |
|*  1 |  TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$  |       1 |      56 |       4   (0)| 00:00:01 | ROWID | ROWID |
|*  2 |   DOMAIN INDEX                     | CONNECTIONSXTV$ |         |         |       4   (0)| 00:00:01 |         |         |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
   2 - access("CTXSYS"."CONTAINS"("V",'<query>      <textquery lang="ENGLISH" grammar="CONTEXT">
          <progression>       <seq>{american} {actor}</seq>    <seq>{american} NEAR {actor}</seq>
          <seq>{american} AND {actor}</seq>        <seq>{american} ACCUM {actor}</seq>    </progression>
          </textquery>    <score datatype="INTEGER" algorithm="COUNT"/> </query>')>0)

Example 4-10 Find an Edge

Just as with vertices, you can create an Oracle Text index on the V column of the edges table (GE$) of a property graph. The following example uses the OPG_AUTO_LEXER lexer owned by MDSYS.

SQL> exec opg_apis.create_edges_text_idx('scott', 'connections', pref_owner=>'mdsys', lexer=>'OPG_AUTO_LEXER', dop=>4);

If customization is required, use the ctx_ddl.create_preference API.

Parent topic: SQL-Based Property Graph Query and Analytics

4.3 Navigation and Graph Pattern Matching

A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends.

This topic provides graph navigation and pattern matching examples using the example property graph named connections. Most of the SQL statements are relatively simple, but they can be used as building blocks to implement requirements that are more sophisticated. It is generally best to start from something simple, and progressively add complexity.

Example 4-11 Who Are a Person's Collaborators?

The following SQL ststement finds all entities that a vertex with ID 1 collaborates with. For simplicity, it considers only outgoing relationships.

SQL> select dvid, el, k, vn, v 
       from connectionsGE$ 
      where svid=1 
        and el='collaborates';

Note:

To find the specific vertex ID of interest, you can perform a text query on the property graph using keywords or fuzzy matching. (For details and examples, see Text Queries on Property Graphs.)

The preceding example's output and execution plan may be as follows.

     2 collaborates weight 1 1
    21 collaborates weight 1 1
    22 collaborates weight 1 1
      ....
    26 collaborates weight 1 1


10 rows selected.


-------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name            | Rows  | Bytes | Cost (%CPU)| Time      | Pstart| Pstop |    TQ    |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |    10 |   460 |     2     (0)| 00:00:01 |       |       |        |      |        |
|   1 |  PX COORDINATOR                              |                 |       |       |              |          |       |       |        |      |        |
|   2 |   PX SEND QC (RANDOM)                        | :TQ10000        |    10 |   460 |     2     (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL                     |                 |    10 |   460 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |        |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$  |    10 |   460 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
|*  5 |      INDEX RANGE SCAN                        | CONNECTIONSXSE$ |    20 |       |     1     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates' AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
   5 - access("SVID"=1)

Example 4-12 Who Are a Person's Collaborators and What are Their Occupations?

The following SQL statement finds collaborators of the vertex with ID 1, and the occupation of each collaborator. A join with the vertices table (VT$) is required.

SQL> select dvid, vertices.v 
       from connectionsGE$, connectionsVT$ vertices 
      where svid=1 
        and el='collaborates' 
        and dvid=vertices.vid 
        and vertices.k='occupation';

The preceding example's output and execution plan may be as follows.

    21  67th United States Secretary of State
    22  68th United States Secretary of State
    23  chancellor
    28  7th president of Iran
    19  junior United States Senator from New York
...


--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |                 |     7 |   525 |     7   (0)| 00:00:01 |       |       |     |    |         |
|   1 |  PX COORDINATOR                               |                 |       |       |            |          |       |       |     |    |         |
|   2 |   PX SEND QC (RANDOM)                         | :TQ10000        |     7 |   525 |     7   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    NESTED LOOPS                               |                 |     7 |   525 |     7   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |         |
|   4 |     PX PARTITION HASH ALL                     |                 |    10 |   250 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |         |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$  |    10 |   250 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |         |
|*  6 |       INDEX RANGE SCAN                        | CONNECTIONSXSE$ |    20 |       |     1   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |         |
|   7 |     PARTITION HASH ITERATOR                   |                 |     1 |       |     0   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |         |
|*  8 |      TABLE ACCESS BY LOCAL INDEX ROWID        | CONNECTIONSVT$  |       |       |            |          |   KEY |   KEY |  Q1,00 | PCWP |         |
|*  9 |       INDEX UNIQUE SCAN                       | CONNECTIONSXQV$ |     1 |       |     0   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates')
   6 - access("SVID"=1)
   8 - filter(INTERNAL_FUNCTION("VERTICES"."V"))
   9 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'occupation')
       filter(INTERNAL_FUNCTION("VERTICES"."K"))

Example 4-13 Find a Person's Enemies and Aggregate Them by Their Country

The following SQL statement finds enemies (that is, those with the feuds relationship) of the vertex with ID 1, and aggregates them by their countries. A join with the vertices table (VT$) is required.

SQL>   select vertices.v, count(1) 
         from connectionsGE$, connectionsVT$ vertices 
        where svid=1 
          and el='feuds'  
          and dvid=vertices.vid 
          and vertices.k='country' 
     group by vertices.v;

The example's output and execution plan may be as follows. In this case, the vertex with ID 1 has 3 enemies in the United States and 1 in Russia.

United States      3
Russia             1


------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name            | Rows   | Bytes  | Cost (%CPU)| Time      | Pstart| Pstop |     TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |        |      |         |
|   1 |  PX COORDINATOR                               |                 |        |        |             |          |        |        |        |      |         |
|   2 |   PX SEND QC (RANDOM)                         | :TQ10001        |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                              |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,01 | PCWP |         |
|   4 |     PX RECEIVE                                |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,01 | PCWP |         |
|   5 |      PX SEND HASH                             | :TQ10000        |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,00 | P->P | HASH     |
|   6 |       HASH GROUP BY                           |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,00 | PCWP |         |
|   7 |        NESTED LOOPS                           |                 |      5 |    375 |      4   (0)| 00:00:01 |        |        |  Q1,00 | PCWP |         |
|   8 |     PX PARTITION HASH ALL                     |                 |      5 |    125 |      2   (0)| 00:00:01 |      1 |      8 |  Q1,00 | PCWC |         |
|*  9 |      TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$  |      5 |    125 |      2   (0)| 00:00:01 |      1 |      8 |  Q1,00 | PCWP |         |
|* 10 |       INDEX RANGE SCAN                        | CONNECTIONSXSE$ |     20 |        |      1   (0)| 00:00:01 |      1 |      8 |  Q1,00 | PCWP |         |
|  11 |     PARTITION HASH ITERATOR                   |                 |      1 |        |      0   (0)| 00:00:01 |    KEY |    KEY |  Q1,00 | PCWP |         |
|* 12 |      TABLE ACCESS BY LOCAL INDEX ROWID        | CONNECTIONSVT$  |        |        |             |          |    KEY |    KEY |  Q1,00 | PCWP |         |
|* 13 |       INDEX UNIQUE SCAN                       | CONNECTIONSXQV$ |      1 |        |      0   (0)| 00:00:01 |    KEY |    KEY |  Q1,00 | PCWP |         |
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   9 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'feuds')
  10 - access("SVID"=1)
  12 - filter(INTERNAL_FUNCTION("VERTICES"."V"))
  13 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country')
       filter(INTERNAL_FUNCTION("VERTICES"."K"))

Example 4-14 Find a Person's Collaborators, and aggregate and sort them

The following SQL statement finds the collaborators of the vertex with ID 1, aggregates them by their country, and sorts them in ascending order.

SQL> select vertices.v, count(1) 
      from connectionsGE$, connectionsVT$ vertices 
     where svid=1 
       and el='collaborates' 
       and dvid=vertices.vid 
       and vertices.k='country' 
  group by vertices.v 
  order by count(1) asc;

The example output and execution plan may be as follows. In this case, the vertex with ID 1 has the most collaborators in the United States.

Germany        1
Japan          1
Iran           1
United States  7


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                         | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ    |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                  |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |        |      |        |
|   1 |  PX COORDINATOR                                   |                 |       |       |              |          |       |       |        |      |        |
|   2 |   PX SEND QC (ORDER)                              | :TQ10002        |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,02 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                                  |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,02 | PCWP |        |
|   4 |     PX RECEIVE                                    |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,02 | PCWP |        |
|   5 |      PX SEND RANGE                                | :TQ10001        |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,01 | P->P | RANGE  |
|   6 |       HASH GROUP BY                               |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,01 | PCWP |        |
|   7 |        PX RECEIVE                                 |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,01 | PCWP |        |
|   8 |     PX SEND HASH                                  | :TQ10000        |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,00 | P->P | HASH   |
|   9 |      HASH GROUP BY                                |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,00 | PCWP |        |
|  10 |       NESTED LOOPS                                |                 |    10 |   750 |     7     (0)| 00:00:01 |       |       |  Q1,00 | PCWP |        |
|  11 |        PX PARTITION HASH ALL                      |                 |    10 |   250 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |        |
|* 12 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | CONNECTIONSGE$  |    10 |   250 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
|* 13 |          INDEX RANGE SCAN                         | CONNECTIONSXSE$ |    20 |       |     1     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
|  14 |        PARTITION HASH ITERATOR                    |                 |     1 |       |     0     (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |        |
|* 15 |         TABLE ACCESS BY LOCAL INDEX ROWID         | CONNECTIONSVT$  |       |       |        |                |   KEY |   KEY |  Q1,00 | PCWP |        |
|* 16 |          INDEX UNIQUE SCAN                        | CONNECTIONSXQV$ |     1 |       |     0     (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |        |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  12 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates')
  13 - access("SVID"=1)
  15 - filter(INTERNAL_FUNCTION("VERTICES"."V"))
  16 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country')
       filter(INTERNAL_FUNCTION("VERTICES"."K"))

Parent topic: SQL-Based Property Graph Query and Analytics

4.4 Navigation Options: CONNECT BY and Parallel Recursion

The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying.

CONNECT BY lets you navigate and find matches in a hierarchical order. To follow outgoing edges, you can use prior dvid = svid to guide the navigation.
Parallel recursion lets you perform navigation up to a specified number of hops away.

The examples use a property graph named connections.

Example 4-15 CONNECT WITH

The following SQL statement follows the outgoing edges by 1 hop.

SQL> select G.dvid
       from connectionsGE$ G
      start with svid = 1
    connect by nocycle prior dvid = svid and level <= 1;

The preceding example's output and execution plan may be as follows.

         2
         3
         4
         5
         6
         7
         8
         9
        10
        ...
------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |     7 |   273 |     3  (67)| 00:00:01 |       |       |        |      |            |
|*  1 |  CONNECT BY WITH FILTERING|                 |       |       |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR          |                 |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000        |     2 |    12 |     0   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX PARTITION HASH ALL |                 |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|*  5 |      INDEX RANGE SCAN     | CONNECTIONSXSE$ |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
|*  6 |   FILTER                  |                 |       |       |            |          |       |       |        |      |            |
|   7 |    NESTED LOOPS           |                 |     5 |    95 |     1   (0)| 00:00:01 |       |       |        |      |            |
|   8 |     CONNECT BY PUMP       |                 |       |       |            |          |       |       |        |      |            |
|   9 |     PARTITION HASH ALL    |                 |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |        |      |            |
|* 10 |      INDEX RANGE SCAN     | CONNECTIONSXSE$ |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |        |      |            |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("SVID"=PRIOR "DVID")
       filter(LEVEL<=2)
   5 - access("SVID"=1)
   6 - filter(LEVEL<=2)
  10 - access("connect$_by$_pump$_002"."prior dvid "="SVID")

To extend from 1 hop to multiple hops, change 1 in the preceding example to another integer. For example, to change it to 2 hops, specify: level <= 2

Example 4-16 Parallel Recursion

The following SQL statement uses recursion within the WITH clause to perform navigation up to 4 hops away, a using recursively defined graph expansion: g_exp references g_exp in the query, and that defines the recursion. The example also uses the PARALLEL optimizer hint for parallel execution.

SQL> WITH g_exp(svid, dvid, depth) as
  ( 
    select svid as svid, dvid as dvid, 0 as depth
      from connectionsGE$
     where svid=1
   union all
     select g2.svid,  g1.dvid, g2.depth + 1
       from g_exp g2, connectionsGE$ g1
      where g2.dvid=g1.svid
        and g2.depth <= 3
  )   
select  /*+ parallel(4) */ dvid, depth
  from  g_exp
 where svid=1
;

The example's output and execution plan may be as follows. Note that CURSOR DURATION MEMORY is chosen in the execution, which indicates the graph expansion stores the intermediate data in memory.

        22          4
        25          4
        24          4
         1          4

        23          4
        33          4
        22          4
        22          4
       ...         ...


Execution Plan
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                            |   801 | 31239 |   147   (0)| 00:00:01 |       |       |        |      |            |
|   1 |  TEMP TABLE TRANSFORMATION                   |                            |       |       |            |          |       |       |        |      |            |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)    | SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |        |      |            |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                            |       |       |            |          |       |       |        |      |            |
|   4 |     PX COORDINATOR                           |                            |       |       |            |          |       |       |        |      |            |
|   5 |      PX SEND QC (RANDOM)                     | :TQ20000                   |     2 |    12 |     0   (0)| 00:00:01 |       |       |  Q2,00 | P->S | QC (RAND)  |
|   6 |       LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |  Q2,00 | PCWP |            |
|   7 |        PX PARTITION HASH ALL                 |                            |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q2,00 | PCWC |            |
|*  8 |         INDEX RANGE SCAN                     | CONNECTIONSXSE$            |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q2,00 | PCWP |            |
|   9 |     PX COORDINATOR                           |                            |       |       |            |          |       |       |        |      |            |
|  10 |      PX SEND QC (RANDOM)                     | :TQ10000                   |   799 |    12M|    12   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|  11 |       LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|* 12 |        HASH JOIN                             |                            |   799 |    12M|    12   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|  13 |         BUFFER SORT (REUSE)                  |                            |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|  14 |          PARTITION HASH ALL                  |                            |   164 |   984 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|  15 |           INDEX FAST FULL SCAN               | CONNECTIONSXDE$            |   164 |   984 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
|  16 |         PX BLOCK ITERATOR                    |                            |       |       |            |          |       |       |  Q1,00 | PCWC |            |
|* 17 |          TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|  18 |   PX COORDINATOR                             |                            |       |       |            |          |       |       |        |      |            |
|  19 |    PX SEND QC (RANDOM)                       | :TQ30000                   |   801 | 31239 |   135   (0)| 00:00:01 |       |       |  Q3,00 | P->S | QC (RAND)  |
|* 20 |     VIEW                                     |                            |   801 | 31239 |   135   (0)| 00:00:01 |       |       |  Q3,00 | PCWP |            |
|  21 |      PX BLOCK ITERATOR                       |                            |   801 |    12M|   135   (0)| 00:00:01 |       |       |  Q3,00 | PCWC |            |
|  22 |       TABLE ACCESS FULL                      | SYS_TEMP_0FD9D6614_11CB2D2 |   801 |    12M|   135   (0)| 00:00:01 |       |       |  Q3,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("SVID"=1)
  12 - access("G2"."DVID"="G1"."SVID")
  17 - filter("G2"."INTERNAL_ITERS$"=LEVEL AND "G2"."DEPTH"<=3)
  20 - filter("SVID"=1)

Parent topic: SQL-Based Property Graph Query and Analytics

4.5 Pivot

The PIVOT clause lets you dynamically add columns to a table to create a new table.

The schema design (VT$ and GE$) of the property graph is narrow ("skinny") rather than wide ("fat"). This means that if a vertex or edge has multiple properties, those property keys, values, data types, and so on will be stored using multiple rows instead of multiple columns. Such a design is very flexible in the sense that you can add properties dynamically without having to worry about adding too many columns or even reaching the physical maximum limit of number of columns a table may have. However, for some applications you may prefer to have a wide table if the properties are somewhat homogeneous.

Example 4-17 Pivot

The following CREATE TABLE ... AS SELECT statement uses PIVOT to add four columns: ‘company’,’ occupation’,’ name’, and ‘religion’.

SQL> CREATE TABLE table pg_wide
as
 with G AS (select vid, k, t, v
              from connectionsVT$
           )
 select *
   from G
  pivot (
    min(v) for k in ('company', 'occupation', 'name', 'religion')
  );

Table created.

The following DESCRIBE statement shows the definition of the new table, including the four added columns. (The output is reformatted for readability.)

SQL> DESCRIBE pg_wide;
 Name                                                  Null?    Type
--------------------------------------------------- -------- --------------------
 VID                                                   NOT NULL NUMBER
 T                                                              NUMBER(38)
 'company'                                                      NVARCHAR2(15000)
 'occupation'                                                   NVARCHAR2(15000)
 'name'                                                         NVARCHAR2(15000)
 'religion'                                                     NVARCHAR2(15000)

Parent topic: SQL-Based Property Graph Query and Analytics

4.6 SQL-Based Property Graph Analytics

In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.

The benefits of SQL-based analytics are:

Easier analysis of larger graphs that do not fit in physical memory
Cheaper analysis since no graph data is transferred outside the database
Better analysis using the current state of a property graph database
Simpler analysis by eliminating the step of synchronizing an in-memory graph with the latest updates from the graph database

However, when a graph (or a subgraph) fits in memory, then running analytics provided by the in-memory analyst usually provides better performance than using SQL-based analytics.

Because many of the analytics implementation require using intermediate data structures, most SQL- (and PL/SQL-) based analytics APIs have parameters for working tables (wt). A typical flow has the following steps:

Prepare the working table or tables.
Perform analytics (one or multiple calls).
Perform cleanup

The following subtopics provide SQL-based examples of some popular types of property graph analytics.

Parent topic: SQL-Based Property Graph Query and Analytics

4.6.1 Shortest Path Examples

The following examples demonstrate SQL-based shortest path analytics.

Example 4-18 Shortest Path Setup and Computation

Consider shortest path, for example. Internally, Oracle Database uses the bidirectional Dijkstra algorithm. The following code snippet shows an entire prepare, perform, and cleanup workflow.

set serveroutput on
  
DECLARE   
  wt1 varchar2(100);  -- intermediate working tables
  n number;
  path    varchar2(1000);
  weights varchar2(1000);
BEGIN
  -- prepare
  opg_apis.find_sp_prep('connectionsGE$', wt1);
  dbms_output.put_line('working table name    ' || wt1);

  -- compute
  opg_apis.find_sp(
     'connectionsGE$',
      1,                          -- start vertex ID
      53,                         -- destination vertex ID
      wt1,                        -- working table (for Dijkstra expansion)
      dop => 1,                   -- degree of parallelism
      stats_freq=>1000,           -- frequency to collect statistics
      path_output => path,        -- shortest path (a sequence of vertices)
      weights_output => weights,  -- edge weights
      options => null
      );
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- cleanup (commented out here; see text after the example)
  -- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/

This example may produce the following output. Note that if no working table name is provided, the preparation step will automatically generate a temporary table name and create it. Because the temporary working table name uses the session ID, your output will probably be different.

working table name    "CONNECTIONSGE$$TWFS12"
path    1 3    52 53
weights 4 3 1    1 1

PL/SQL procedure successfully completed.

If you want to know the definition of the working table or tables, then skip the cleanup phase (as shown in the preceding example that comments out the call to find_sp_cleanup). After the computation is done, you can describe the working table or tables.

SQL> describe "CONNECTIONSGE$$TWFS12"
 Name                Null?    Type
 --------- -------- ----------------------------
 NID                            NUMBER
 D2S                            NUMBER
 P2S                            NUMBER
 D2T                            NUMBER
 P2T                            NUMBER
 F                            NUMBER(38)
 B                            NUMBER(38)

For advanced users who want to try different table creation options, such as using in-memory or advanced compression, you can pre-create the preceding working table and pass the name in.

Example 4-19 Shortest Path: Create Working Table and Perform Analytics

The following statements show some advanced options, first creating a working table with the same column structure and basic compression enabled, then passing it to the SQL-based computation. The code optimizes the intermediate table for computations with CREATE TABLE compression and in-memory options.

create table connections$MY_EXP(
 NID                            NUMBER,
 D2S                            NUMBER,
 P2S                            NUMBER,
 D2T                            NUMBER,
 P2T                            NUMBER,
 F                            NUMBER(38),
 B                            NUMBER(38)
) compress nologging;


DECLARE
  wt1 varchar2(100) := 'connections$MY_EXP';
  n number;
  path    varchar2(1000);
  weights varchar2(1000);
BEGIN
  dbms_output.put_line('working table name    ' || wt1);

  -- compute
  opg_apis.find_sp(
     'connectionsGE$',
      1,
      53,
      wt1,
      dop => 1,
      stats_freq=>1000,
      path_output => path,
      weights_output => weights,
      options => null
      );
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- cleanup
  -- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/

At the end of the computation, if the working table has not been dropped or truncated, you can check the content of the working table, as follows. Note that the working table structure may vary between releases.

SQL> select * from connections$MY_EXP;
       NID        D2S        P2S        D2T        P2T          F          B
---------- ---------- ---------- ---------- ---------- ---------- ----------
         1          0            1.000E+100                     1         -1
        53 1.000E+100                     0                    -1          1
        54 1.000E+100                     1         53         -1          1
        52 1.000E+100                     1         53         -1          1
         5          1          1 1.000E+100                     0         -1
        26          1          1 1.000E+100                     0         -1
         8       1000          1 1.000E+100                     0         -1
         3          1          1          2         52          0          0
        15          1          1 1.000E+100                     0         -1
        21          1          1 1.000E+100                     0         -1
        19          1          1 1.000E+100                     0         -1
       ...

Example 4-20 Shortest Path: Perform Multiple Calls to Same Graph

To perform multiple calls to the same graph, only a single call to the preparation step is needed. The following shows an example of computing shortest path for multiple pairs of vertices in the same graph.

DECLARE
  wt1 varchar2(100);  -- intermediate working tables
  n number;
  path    varchar2(1000);
  weights varchar2(1000);
BEGIN
  -- prepare
  opg_apis.find_sp_prep('connectionsGE$', wt1);
  dbms_output.put_line('working table name    ' || wt1);

  -- find shortest path from vertex 1 to vertex 53
  opg_apis.find_sp( 'connectionsGE$', 1, 53,
      wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- find shortest path from vertex 2 to vertex 36
  opg_apis.find_sp( 'connectionsGE$', 2, 36,
      wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- find shortest path from vertex 30 to vertex 4
  opg_apis.find_sp( 'connectionsGE$', 30, 4,
      wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- cleanup
  opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/

The example's output may be as follows: three shortest paths have been found for the multiple pairs of vertices provided.

working table name    "CONNECTIONSGE$$TWFS12"
path    1 3    52 53
weights 4 3 1   1 1
path    2    36
weights 2 1   1
path    30 21    1 4
weights 4 3 1   1 1

PL/SQL procedure successfully completed.

Parent topic: SQL-Based Property Graph Analytics

4.6.2 Collaborative Filtering Overview and Examples

Collaborative filtering, also referred to as social filtering, filters information by using the recommendations of other people. Collaborative filtering is widely used in systems that recommend purchases based on purchases by others with similar preferences.

The following examples demonstrate SQL-based collaborative filtering analytics.

Example 4-21 Collaborative Filtering Setup and Computation

This example shows how to use SQL-based collaborative filtering, specifically using matrix factorization to recommend telephone brands to customers. This example assumes there exists a graph called "PHONES" in the database. This example graph contains customer and item vertices, and edges with a 'rating' label linking some customer vertices to other some item vertices. The rating labels have a numeric value corresponding to the rating that a specific customer (edge OUT vertex) assigned to the specified product (edge IN vertex).

The following figure shows this graph.

Figure 4-1 Phones Graph for Collaborative Filtering

Description of "Figure 4-1 Phones Graph for Collaborative Filtering"

The following code shows an end-to-end flow to run the SQL-based collaborative filtering algorithm, which internally uses the matrix factorization algorithm.

set serveroutput on

DECLARE
  wt_l varchar2(32);  -- working tables
  wt_r varchar2(32);
  wt_l1 varchar2(32);
  wt_r1 varchar2(32);
  wt_i varchar2(32);
  wt_ld varchar2(32);
  wt_rd varchar2(32);
  edge_tab_name    varchar2(32) := 'phonesge$';
  edge_label       varchar2(32) := 'rating';
  rating_property  varchar2(32) := '';
  iterations       integer      := 100;
  min_error        number       := 0.001;
  k                integer      := 5;
  learning_rate    number       := 0.001;
  decrease_rate    number       := 0.95;
  regularization   number       := 0.02;
  dop              number       := 2;
  tablespace       varchar2(32) := null;
  options          varchar2(32) := null; 
BEGIN

  -- prepare
  opg_apis.cf_prep(edge_tab_name,wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd);
  dbms_output.put_line('working table wt_l ' || wt_l);
  dbms_output.put_line('working table wt_r ' || wt_r);
  dbms_output.put_line('working table wt_l1 ' || wt_l1);
  dbms_output.put_line('working table wt_r1 ' || wt_r1);
  dbms_output.put_line('working table wt_i ' || wt_i);
  dbms_output.put_line('working table wt_ld ' || wt_ld);
  dbms_output.put_line('working table wt_rd ' || wt_rd);

  -- compute
  opg_apis.cf(edge_tab_name,edge_label,rating_property,iterations,
              min_error,k,learning_rate,decrease_rate,regularization,dop,
              wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd,tablespace,options);
END;
/

The flow starts by creating the temporary working tables that are later on passed to the computation. At the end of computation, the example may produce the following output. Note that if no working table name is provided, the preparation step will automatically generate a temporary table name and create it. Because the temporary working table name uses the session ID, your output will probably be different.

working table wt_l    "PHONESGE$$CFL57"
working table wt_r    "PHONESGE$$CFR57"
working table wt_l1    "PHONESGE$$CFL157"
working table wt_r1    "PHONESGE$$CFR157"
working table wt_i    "PHONESGE$$CFI57"
working table wt_ld    "PHONESGE$$CFLD57"
working table wt_rd    "PHONESGE$$CFRD57"

PL/SQL procedure successfully completed.

Example 4-22 Collaborative Filtering: Validating the Intermediate Error

At the end of every computation, you can check the current error of the algorithm with the following query as long as the data in the working tables has not been already deleted. The following SQL query illustrates how to get the intermediate error of a current run of the collaborative filtering algorithm.

SELECT /*+ parallel(48) */ SQRT(SUM((w1-w2)*(w1-w2) + 
              <regularization>/2 * (err_reg_l+err_reg_r))) AS err 
  FROM <wt_i>;

Note that the regularization parameter and the working table name (parameter wt_i) should be replaced according to the values used when running the OPG_APIS.CF algorithm. In the preceding previous example, replace <regularization> with 0.02 and <wt_i> with "PHONESGE$$CFI149" as follows:

SELECT /*+ parallel(48) */ SQRT(SUM((w1-w2)*(w1-w2) + 0.02/2 * (err_reg_l+err_reg_r))) AS err 
  FROM "PHONESGE$$CFI149";

This query may produce the following output.

       ERR
----------
4.82163662

f the value of the current error is too high or if the predictions obtained from the matrix factorization results of the collaborative filtering are not yet useful, you can run more iterations of the algorithm, by reusing the working tables and the progress made so far. The following example shows how to make predictions using the SQL-based collaborative filtering.

Example 4-23 Collaborative Filtering: Making Predictions

The result of the collaborative filtering algorithm is stored in the tables wt_l and wt_r, which are the two factors of a matrix product. These matrix factors should be used when making the predictions of the collaborative filtering.

In a typical flow of the algorithm, the two matrix factors can be used to make the predictions before calling the OPG_APIS.CF_CLEANUP procedure, or they can be copied and persisted into other tables for later use. The following example demonstrates the latter case:

DECLARE
  wt_l varchar2(32);  -- working tables
  wt_r varchar2(32);
  wt_l1 varchar2(32);
  wt_r1 varchar2(32);
  wt_i varchar2(32);
  wt_ld varchar2(32);
  wt_rd varchar2(32);
  edge_tab_name    varchar2(32) := 'phonesge$';
  edge_label       varchar2(32) := 'rating';
  rating_property  varchar2(32) := '';
  iterations       integer      := 100;
  min_error        number       := 0.001;
  k                integer      := 5;
  learning_rate    number       := 0.001;
  decrease_rate    number       := 0.95;
  regularization   number       := 0.02;
  dop              number       := 2;
  tablespace       varchar2(32) := null;
  options          varchar2(32) := null; 
BEGIN

  -- prepare
  opg_apis.cf_prep(edge_tab_name,wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd);

  -- compute
  opg_apis.cf(edge_tab_name,edge_label,rating_property,iterations,
              min_error,k,learning_rate,decrease_rate,regularization,dop,
              wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd,tablespace,options);
  
  -- save only these two tables for later predictions
  EXECUTE IMMEDIATE 'CREATE TABLE customer_mat AS SELECT * FROM ' || wt_l;
  EXECUTE IMMEDIATE 'CREATE TABLE item_mat AS SELECT * FROM ' || wt_r;

  -- cleanup
  opg_apis.cf_cleanup('phonesge$',wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd);
END;
/

This example will produce the only the following output.

PL/SQL procedure successfully completed.

Now that the matrix factors are saved in the tables customer_mat and item_mat, you can use the following query to check the "error" (difference) between the real values (those values that previously existed in the graph as 'ratings') and the estimated predictions (the result of the matrix multiplication in a certain customer row and item column).

Note that the following query is customized with a join on the vertex table in order return an NVARCHAR property of the vertices (for example, the name property) instead of a numeric ID. This query will return all the predictions for every single customer vertex to every item vertex in the graph.

SELECT /*+ parallel(48) */ MIN(vertex1.v) AS customer, 
                           MIN(vertex2.v) AS item, 
                           MIN(edges.vn) AS real, 
                           SUM(l.v * r.v) AS predicted
FROM PHONESGE$ edges, 
      CUSTOMER_MAT l, 
      ITEM_MAT r, 
      PHONESVT$ vertex1,   
      PHONESVT$ vertex2
WHERE l.k = r.k
  AND l.c = edges.svid(+)
  AND r.p = edges.dvid(+)
  AND l.c = vertex1.vid
  AND r.p = vertex2.vid
GROUP BY l.c, r.p
ORDER BY l.c, r.p  -- This order by clause is optional
;

This query may produce an output similar to the following (some rows are omitted for brevity).

CUSTOMER	ITEM		REAL	PREDICTED
------------------------------------------------
Adam		Apple 	 	5	3.67375703
Adam		Blackberry		3.66079652
Adam		Danger			2.77049596
Adam		Ericsson		4.21764858
Adam		Figo			3.10631337
Adam		Google		4 	4.42429022
Adam		Huawei		3	3.4289115
Ben		Apple	  	   	2.82127589
Ben		Blackberry	2	2.81132282
Ben		Danger		3	2.12761307
Ben		Ericsson	3   	3.2389595
Ben		Figo	  	    	2.38550534
Ben		Google		   	3.39765075
Ben		Huawei		    	2.63324582
...
Don		Apple		    	1.3777496
Don		Blackberry	1 	1.37288909
Don		Danger		1 	1.03900439
Don		Ericsson	   	1.58172236
Don		Figo		1	1.16494421
Don		Google			1.65921807
Don		Huawei		1	1.28592648
Erik		Apple		3	2.80809351
Erik		Blackberry	3	2.79818695
Erik		Danger			2.11767182
Erik		Ericsson	3 	3.2238255
Erik		Figo			2.3743591
Erik		Google		3	3.38177526
Erik		Huawei		3	2.62094201

If you want to check only some rows to decide whether the prediction results are ready or more iterations of the algorithm should be run, the previous query can be wrapped in an outer query. The following example will select only the first 11 results.

SELECT /*+ parallel(48) */ * FROM (
SELECT /*+ parallel(48) */ MIN(vertex1.v) AS customer, 
                           MIN(vertex2.v) AS item, 
                           MIN(edges.vn) AS real, 
                           SUM(l.v * r.v) AS predicted
FROM PHONESGE$ edges, 
     CUSTOMER_MAT l, 
     ITEM_MAT r, 
     PHONESVT$ vertex1,   
     PHONESVT$ vertex2
WHERE l.k = r.k
  AND l.c = edges.svid(+)
  AND r.p = edges.dvid(+)
  AND l.c = vertex1.vid
  AND r.p = vertex2.vid
GROUP BY l.c, r.p
ORDER BY l.c, r.p
) WHERE rownum <= 11;

This query may produce an output similar to the following.

CUSTOMER	ITEM		REAL	PREDICTED
------------------------------------------------
Adam		Apple 	 	5	3.67375703
Adam		Blackberry		3.66079652
Adam		Danger			2.77049596
Adam		Ericsson		4.21764858
Adam		Figo			3.10631337
Adam		Google		4 	4.42429022
Adam		Huawei		3	3.4289115
Ben		Apple	  	   	2.82127589
Ben		Blackberry	2	2.81132282
Ben		Danger		3	2.12761307
Ben		Ericsson	3   	3.2389595

To get a prediction for a specific vertex (customer, item, or both) the query can be restricted with the desired ID values. For example, to get the predicted value of vertex 1 (customer) and vertex 105 (item), you can use the following query.

SELECT /*+ parallel(48) */ MIN(vertex1.v) AS customer, 
                           MIN(vertex2.v) AS item, 
                           MIN(edges.vn) AS real, 
                           SUM(l.v * r.v) AS predicted
FROM PHONESGE$ edges, 
     CUSTOMER_MAT l, 
     ITEM_MAT r, 
     PHONESVT$ vertex1,   
     PHONESVT$ vertex2
WHERE l.k = r.k
  AND l.c = edges.svid(+)
  AND r.p = edges.dvid(+)
  AND l.c = vertex1.vid 
  AND vertex1.vid = 1 /* Remove to get all predictions for item 105 */
  AND r.p = vertex2.vid 
  AND vertex2.vid = 105 /* Remove to get all predictions for customer 1 */
                        /* Remove both lines to get all predictions */
GROUP BY l.c, r.p
ORDER BY l.c, r.p;

This query may produce an output similar to the following.

CUSTOMER	ITEM		REAL	PREDICTED
------------------------------------------------
Adam		Ericsson		4.21764858

Parent topic: SQL-Based Property Graph Analytics