4 SQL-Based Property Graph Query and Analytics
You can use SQL to query property graph data in Oracle Spatial and Graph.
For the property graph support in Oracle Spatial and Graph, all the vertices and edges data are persisted in relational form in Oracle Database. For detailed information about the Oracle Spatial and Graph property graph schema objects, see Property Graph Schema Objects for Oracle Database.
This chapter provides examples of typical graph queries implemented using SQL. The audience includes DBAs as well as application developers who understand SQL syntax and property graph schema objects.
The benefits of querying directly property graph using SQL include:
-
There is no need to bring data outside Oracle Database.
-
You can leverage the industry-proven SQL engine provided by Oracle Database.
-
You can easily join or integrate property graph data with other data types (relational, JSON, XML, and so on).
-
You can take advantage of existing Oracle SQL tuning and database management tools and user interface.
The examples assume that there is a property graph named connections
in the current schema. The SQL queries and example output are for illustration purpose only, and your output may be different depending on the data in your connections
graph. In some examples, the output is reformatted for readability.
- Simple Property Graph Queries
The examples in this topic query vertices, edges, and properties of the graph. - Text Queries on Property Graphs
If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column. - Navigation and Graph Pattern Matching
A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends. - Navigation Options: CONNECT BY and Parallel Recursion
The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying. - Pivot
The PIVOT clause lets you dynamically add columns to a table to create a new table. - SQL-Based Property Graph Analytics
In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.
4.1 Simple Property Graph Queries
The examples in this topic query vertices, edges, and properties of the graph.
Example 4-1 Find a Vertex with a Specified Vertex ID
This example find the vertex with vertex ID 1 in the connections
graph.
SQL> select vid, k, v, vn, vt
from connectionsVT$
where vid=1;
The output might be as follows:
1 country United States 1 name Robert Smith 1 occupation CEO of Example Corporation ...
Example 4-2 Find an Edge with a Specified Edge ID
This example find the edge with edge ID 100 in the connections
graph.
SQL> select eid,svid,dvid,k,t,v,vn,vt
from connectionsGE$
where eid=1000;
The output might be as follows:
1000 1 2 weight 3 1 1
In the preceding output, the K of the edge property is "weight" and the type ID of the value is 3, indicating a float value.
Example 4-3 Perform Simple Counting
This example performs simple counting in the connections
graph.
SQL> -- Get the total number of K/V pairs of all the vertices
SQL> select /*+ parallel */ count(1)
from connectionsVT$;
299
SQL> -- Get the total number of K/V pairs of all the edges
SQL> select /*+ parallel(8) */ count(1)
from connectionsGE$;
164
SQL> -- Get the total number of vertices
SQL> select /*+ parallel */ count(distinct vid)
from connectionsVT$;
78
SQL> -- Get the total number of edges
SQL> select /*+ parallel */ count(distinct eid)
from connectionsGE$;
164
Example 4-4 Get the Set of Property Keys Used
This example gets the set of property keys used for the vertices n the connections
graph.
SQL> select /*+ parallel */ distinct k
from connectionsVT$;
company
show
occupation
type
team
religion
criminal charge
music genre
genre
name
role
political party
country
13 rows selected.
SQL> -- get the set of property keys used for edges
SQL> select /*+ parallel */ distinct k
from connectionsGE$;
weight
Example 4-5 Find Vertices with a Value
This example finds vertices with a value (of any property) that is of String type, and where and the value contains two adjacent occurrences of a, e, i, o, or u, regardless of case.n the connections
graph.
SQL> select vid, t, k, v
from connectionsVT$
where t=1
and regexp_like(v, '([aeiou])\1', 'i');
6 1 name Jordan Peele
6 1 show Key and Peele
54 1 name John Green
...
It is usually hard to leverage a B-Tree index for the preceding kind of query because it is difficult to know beforehand what kind of regular expression is going to be used. For the above query, you might get the following execution plan. Note that full table scan is chosen by the optimizer.
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 795 | 28 (0)| 00:00:01 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 15 | 795 | 28 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
|* 4 | TABLE ACCESS FULL| CONNECTIONSVT$ | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(INTERNAL_FUNCTION("V") AND REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
- Degree of Parallelism is 2 because of table property
If the Oracle Database In-Memory option is available and memory is sufficient, it can help performance to place the table (full table or a set of relevant columns) in memory. One way to achieve that is as follows:
SQL> alter table connectionsVT$ inmemory;
Table altered.
Now, entering the same SQL containing the regular expression shows a plan that performs a "TABLE ACCESS INMEMORY FULL".
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 795 | 28 (0)| 00:00:01 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 15 | 795 | 28 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
|* 4 | TABLE ACCESS INMEMORY FULL| CONNECTIONSVT$ | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(INTERNAL_FUNCTION("V") AND REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
- Degree of Parallelism is 2 because of table property
Parent topic: SQL-Based Property Graph Query and Analytics
4.2 Text Queries on Property Graphs
If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column.
Oracle Text can process text that is directly stored in the database. The text can be short strings (such as names or addresses), or it can be full-length documents. These documents can be in a variety of textual format.
The text can also be in many different languages. Oracle Text can handle any space-separated languages (including character sets such as Greek or Cyrillic). In addition, Oracle Text is able to handle the Chinese, Japanese and Korean pictographic languages)
Because the property graph feature uses NVARCHAR typed column for better support of Unicode, it is highly recommended that UTF8 (AL32UTF8) be used as the database character set.
To create an Oracle Text index on the vertices table (or edges table), the ALTER SESSION privilege is required. For example:
SQL> grant alter session to <YOUR_USER_SCHEMA_HERE>;
If customization is required, also grant the EXECUTE privilege on CTX_DDL:
SQL> grant execute on ctx_ddl to <YOUR_USER_SCHEMA_HERE>;
The following shows some example statements for granting these privileges to SCOTT.
SQL> conn / as sysdba
Connected.
SQL> -- This is a PDB setup --
SQL> alter session set container=orcl;
Session altered.
SQL> grant execute on ctx_ddl to scott;
Grant succeeded.
SQL> grant alter session to scott;
Grant succeeded.
Example 4-6 Create a Text Index
This example creates an Oracle Text index on the vertices table (V column) of the connections graph in the SCOTT schema. Note that the Oracle Text index created here is for all property keys, not just one or a subset of property keys. In addition, if a new property is added to the graph and the property value is of String data type, then it will automatically be included in the same text index.
The example uses the OPG_AUTO_LEXER lexer owned by MDSYS.
SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'MDSYS', lexer=>'OPG_AUTO_LEXER', dop=>2);
If customization is desired, you can use the ctx_ddl.create_preference API. For example:
SQL> -- The following requires access privilege to CTX_DDL
SQL> exec ctx_ddl.create_preference('SCOTT.OPG_AUTO_LEXER', 'AUTO_LEXER');
PL/SQL procedure successfully completed.
SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'scott', lexer=>'OPG_AUTO_LEXER', dop=>2);
PL/SQL procedure successfully completed.
You can now use a rich set of functions provided by Oracle Text to perform queries against graph elements.
Note:
If you no longer need an Oracle Text index, you can use the drop_vertices_text_idx or opg_apis.drop_edges_text_idx API to drop it. The following statements drop the text indexes on the vertices and edges of a graph named connections
owned by SCOTT:
SQL> exec opg_apis.drop_vertices_text_Idx('scott', 'connections');
SQL> exec opg_apis.drop_edges_text_Idx('scott', 'connections');
Example 4-7 Find a Vertex that Has a Property Value
The following example find a vertex that has a property value (of string type) containing the keyword "Smith".
SQL> select vid, k, t, v
from connectionsVT$
where t=1
and contains(v, 'Smith', 1) > 0
order by score(1) desc
;
The output and SQL execution plan from the preceding statement may appear as follows. Note that DOMAIN INDEX appears as an operation in the execution plan.
1 name 1 Robert Smith
Execution Plan
----------------------------------------------------------
Plan hash value: 1619508090
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 56 | 5 (20) | 00:00:01 | | |
| 1 | SORT ORDER BY | | 1 | 56 | 5 (20) | 00:00:01 | | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID |
|* 3 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"=1 AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
3 - access("CTXSYS"."CONTAINS"("V",'Smith',1)>0)
Example 4-8 Fuzzy Match
The following example finds a vertex that has a property value (of string type) containing variants of "ameriian" (a deliberate misspelling for this example) Fuzzy match is used.
SQL> select vid, k, t, v
from connectionsVT$
where contains(v, 'fuzzy(ameriian,,,weight)', 1) > 0
order by score(1) desc;
The output and SQL execution plan from the preceding statement may appear as follows.
8 role 1 american business man 9 role 1 american business man 4 role 1 american economist 6 role 1 american comedian actor 7 role 1 american comedian actor 1 occupation 1 44th president of United States of America 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1619508090 ----------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 56 | 5 (20)| 00:00:01 | | | | 1 | SORT ORDER BY | | 1 | 56 | 5 (20)| 00:00:01 | | | |* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 3 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | | ----------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
Example 4-9 Query Relaxation
The following example is a sophisticated Oracle Text query that implements query relaxation, which enables you to execute the most restrictive version of a query first, progressively relaxing the query until the required number of matches is obtained. Using query relaxation with queries that contain multiple strings, you can provide guidance for determining the “best” matches, so that these appear earlier in the results than other potential matches.
This example searchs for "american actor" with a query relaxation sequence.
SQL> select vid, k, t, v
from connectionsVT$
where CONTAINS (v,
'<query>
<textquery lang="ENGLISH" grammar="CONTEXT">
<progression>
<seq>{american} {actor}</seq>
<seq>{american} NEAR {actor}</seq>
<seq>{american} AND {actor}</seq>
<seq>{american} ACCUM {actor}</seq>
</progression>
</textquery>
<score datatype="INTEGER" algorithm="COUNT"/>
</query>') > 0;
The output and SQL execution plan from the preceding statement may appear as follows.
7 role 1 american comedian actor 6 role 1 american comedian actor 44 occupation 1 actor 8 role 1 american business man 53 occupation 1 actor film producer 52 occupation 1 actor 4 role 1 american economist 47 occupation 1 actor 9 role 1 american business man 9 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2158361449 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 56 | 4 (0)| 00:00:01 | | | |* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 2 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V")) 2 - access("CTXSYS"."CONTAINS"("V",'<query> <textquery lang="ENGLISH" grammar="CONTEXT"> <progression> <seq>{american} {actor}</seq> <seq>{american} NEAR {actor}</seq> <seq>{american} AND {actor}</seq> <seq>{american} ACCUM {actor}</seq> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>')>0)
Example 4-10 Find an Edge
Just as with vertices, you can create an Oracle Text index on the V column of the edges table (GE$) of a property graph. The following example uses the OPG_AUTO_LEXER lexer owned by MDSYS.
SQL> exec opg_apis.create_edges_text_idx('scott', 'connections', pref_owner=>'mdsys', lexer=>'OPG_AUTO_LEXER', dop=>4);
If customization is required, use the ctx_ddl.create_preference API.
Parent topic: SQL-Based Property Graph Query and Analytics
4.3 Navigation and Graph Pattern Matching
A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends.
This topic provides graph navigation and pattern matching examples using the example property graph named connections. Most of the SQL statements are relatively simple, but they can be used as building blocks to implement requirements that are more sophisticated. It is generally best to start from something simple, and progressively add complexity.
Example 4-11 Who Are a Person's Collaborators?
The following SQL ststement finds all entities that a vertex with ID 1 collaborates with. For simplicity, it considers only outgoing relationships.
SQL> select dvid, el, k, vn, v
from connectionsGE$
where svid=1
and el='collaborates';
Note:
To find the specific vertex ID of interest, you can perform a text query on the property graph using keywords or fuzzy matching. (For details and examples, see Text Queries on Property Graphs.)
The preceding example's output and execution plan may be as follows.
2 collaborates weight 1 1 21 collaborates weight 1 1 22 collaborates weight 1 1 .... 26 collaborates weight 1 1 10 rows selected. ------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 460 | 2 (0)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 10 | 460 | 2 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 3 | PX PARTITION HASH ALL | | 10 | 460 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 4 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 10 | 460 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 5 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates' AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V")) 5 - access("SVID"=1)
Example 4-12 Who Are a Person's Collaborators and What are Their Occupations?
The following SQL statement finds collaborators of the vertex with ID 1, and the occupation of each collaborator. A join with the vertices table (VT$) is required.
SQL> select dvid, vertices.v
from connectionsGE$, connectionsVT$ vertices
where svid=1
and el='collaborates'
and dvid=vertices.vid
and vertices.k='occupation';
The preceding example's output and execution plan may be as follows.
21 67th United States Secretary of State 22 68th United States Secretary of State 23 chancellor 28 7th president of Iran 19 junior United States Senator from New York ... -------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 525 | 7 (0)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 7 | 525 | 7 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 3 | NESTED LOOPS | | 7 | 525 | 7 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 4 | PX PARTITION HASH ALL | | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 5 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 6 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 7 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 9 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates') 6 - access("SVID"=1) 8 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 9 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'occupation') filter(INTERNAL_FUNCTION("VERTICES"."K"))
Example 4-13 Find a Person's Enemies and Aggregate Them by Their Country
The following SQL statement finds enemies (that is, those with the feuds
relationship) of the vertex with ID 1, and aggregates them by their countries. A join with the vertices table (VT$) is required.
SQL> select vertices.v, count(1)
from connectionsGE$, connectionsVT$ vertices
where svid=1
and el='feuds'
and dvid=vertices.vid
and vertices.k='country'
group by vertices.v;
The example's output and execution plan may be as follows. In this case, the vertex with ID 1 has 3 enemies in the United States and 1 in Russia.
United States 3 Russia 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 5 | 375 | 5 (20)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,00 | P->P | HASH | | 6 | HASH GROUP BY | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,00 | PCWP | | | 7 | NESTED LOOPS | | 5 | 375 | 4 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 8 | PX PARTITION HASH ALL | | 5 | 125 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 9 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 5 | 125 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 10 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 11 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 12 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 13 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 9 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'feuds') 10 - access("SVID"=1) 12 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 13 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country') filter(INTERNAL_FUNCTION("VERTICES"."K"))
Example 4-14 Find a Person's Collaborators, and aggregate and sort them
The following SQL statement finds the collaborators of the vertex with ID 1, aggregates them by their country, and sorts them in ascending order.
SQL> select vertices.v, count(1)
from connectionsGE$, connectionsVT$ vertices
where svid=1
and el='collaborates'
and dvid=vertices.vid
and vertices.k='country'
group by vertices.v
order by count(1) asc;
The example output and execution plan may be as follows. In this case, the vertex with ID 1 has the most collaborators in the United States.
Germany 1 Japan 1 Iran 1 United States 7 --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 750 | 9 (23)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (ORDER) | :TQ10002 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | P->S | QC (ORDER) | | 3 | SORT ORDER BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | PCWP | | | 5 | PX SEND RANGE | :TQ10001 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | P->P | RANGE | | 6 | HASH GROUP BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | PCWP | | | 7 | PX RECEIVE | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | PCWP | | | 8 | PX SEND HASH | :TQ10000 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,00 | P->P | HASH | | 9 | HASH GROUP BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,00 | PCWP | | | 10 | NESTED LOOPS | | 10 | 750 | 7 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 11 | PX PARTITION HASH ALL | | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 12 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | CONNECTIONSGE$ | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 13 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 14 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 15 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 16 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 12 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates') 13 - access("SVID"=1) 15 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 16 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country') filter(INTERNAL_FUNCTION("VERTICES"."K"))
Parent topic: SQL-Based Property Graph Query and Analytics
4.4 Navigation Options: CONNECT BY and Parallel Recursion
The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying.
-
CONNECT BY lets you navigate and find matches in a hierarchical order. To follow outgoing edges, you can use prior dvid = svid to guide the navigation.
-
Parallel recursion lets you perform navigation up to a specified number of hops away.
The examples use a property graph named connections.
Example 4-15 CONNECT WITH
The following SQL statement follows the outgoing edges by 1 hop.
SQL> select G.dvid
from connectionsGE$ G
start with svid = 1
connect by nocycle prior dvid = svid and level <= 1;
The preceding example's output and execution plan may be as follows.
2 3 4 5 6 7 8 9 10 ... ------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 7 | 273 | 3 (67)| 00:00:01 | | | | | | |* 1 | CONNECT BY WITH FILTERING| | | | | | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 2 | 12 | 0 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 4 | PX PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 5 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 6 | FILTER | | | | | | | | | | | | 7 | NESTED LOOPS | | 5 | 95 | 1 (0)| 00:00:01 | | | | | | | 8 | CONNECT BY PUMP | | | | | | | | | | | | 9 | PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | | | | |* 10 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | | | | ------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("SVID"=PRIOR "DVID") filter(LEVEL<=2) 5 - access("SVID"=1) 6 - filter(LEVEL<=2) 10 - access("connect$_by$_pump$_002"."prior dvid "="SVID")
To extend from 1 hop to multiple hops, change 1 in the preceding example to another integer. For example, to change it to 2 hops, specify: level <= 2
Example 4-16 Parallel Recursion
The following SQL statement uses recursion within the WITH clause to perform navigation up to 4 hops away, a using recursively defined graph expansion: g_exp
references g_exp
in the query, and that defines the recursion. The example also uses the PARALLEL optimizer hint for parallel execution.
SQL> WITH g_exp(svid, dvid, depth) as
(
select svid as svid, dvid as dvid, 0 as depth
from connectionsGE$
where svid=1
union all
select g2.svid, g1.dvid, g2.depth + 1
from g_exp g2, connectionsGE$ g1
where g2.dvid=g1.svid
and g2.depth <= 3
)
select /*+ parallel(4) */ dvid, depth
from g_exp
where svid=1
;
The example's output and execution plan may be as follows. Note that CURSOR DURATION MEMORY
is chosen in the execution, which indicates the graph expansion stores the intermediate data in memory.
22 4
25 4
24 4
1 4
23 4
33 4
22 4
22 4
... ...
Execution Plan
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 801 | 31239 | 147 (0)| 00:00:01 | | | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | | | |
| 3 | UNION ALL (RECURSIVE WITH) BREADTH FIRST | | | | | | | | | | |
| 4 | PX COORDINATOR | | | | | | | | | | |
| 5 | PX SEND QC (RANDOM) | :TQ20000 | 2 | 12 | 0 (0)| 00:00:01 | | | Q2,00 | P->S | QC (RAND) |
| 6 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q2,00 | PCWP | |
| 7 | PX PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q2,00 | PCWC | |
|* 8 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q2,00 | PCWP | |
| 9 | PX COORDINATOR | | | | | | | | | | |
| 10 | PX SEND QC (RANDOM) | :TQ10000 | 799 | 12M| 12 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 11 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q1,00 | PCWP | |
|* 12 | HASH JOIN | | 799 | 12M| 12 (0)| 00:00:01 | | | Q1,00 | PCWP | |
| 13 | BUFFER SORT (REUSE) | | | | | | | | Q1,00 | PCWP | |
| 14 | PARTITION HASH ALL | | 164 | 984 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
| 15 | INDEX FAST FULL SCAN | CONNECTIONSXDE$ | 164 | 984 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
| 16 | PX BLOCK ITERATOR | | | | | | | | Q1,00 | PCWC | |
|* 17 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q1,00 | PCWP | |
| 18 | PX COORDINATOR | | | | | | | | | | |
| 19 | PX SEND QC (RANDOM) | :TQ30000 | 801 | 31239 | 135 (0)| 00:00:01 | | | Q3,00 | P->S | QC (RAND) |
|* 20 | VIEW | | 801 | 31239 | 135 (0)| 00:00:01 | | | Q3,00 | PCWP | |
| 21 | PX BLOCK ITERATOR | | 801 | 12M| 135 (0)| 00:00:01 | | | Q3,00 | PCWC | |
| 22 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6614_11CB2D2 | 801 | 12M| 135 (0)| 00:00:01 | | | Q3,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("SVID"=1)
12 - access("G2"."DVID"="G1"."SVID")
17 - filter("G2"."INTERNAL_ITERS$"=LEVEL AND "G2"."DEPTH"<=3)
20 - filter("SVID"=1)
Parent topic: SQL-Based Property Graph Query and Analytics
4.5 Pivot
The PIVOT clause lets you dynamically add columns to a table to create a new table.
The schema design (VT$ and GE$) of the property graph is narrow ("skinny") rather than wide ("fat"). This means that if a vertex or edge has multiple properties, those property keys, values, data types, and so on will be stored using multiple rows instead of multiple columns. Such a design is very flexible in the sense that you can add properties dynamically without having to worry about adding too many columns or even reaching the physical maximum limit of number of columns a table may have. However, for some applications you may prefer to have a wide table if the properties are somewhat homogeneous.
Example 4-17 Pivot
The following CREATE TABLE ... AS SELECT statement uses PIVOT to add four columns: ‘company’,’ occupation’,’ name’, and ‘religion’.
SQL> CREATE TABLE table pg_wide
as
with G AS (select vid, k, t, v
from connectionsVT$
)
select *
from G
pivot (
min(v) for k in ('company', 'occupation', 'name', 'religion')
);
Table created.
The following DESCRIBE statement shows the definition of the new table, including the four added columns. (The output is reformatted for readability.)
SQL> DESCRIBE pg_wide;
Name Null? Type
--------------------------------------------------- -------- --------------------
VID NOT NULL NUMBER
T NUMBER(38)
'company' NVARCHAR2(15000)
'occupation' NVARCHAR2(15000)
'name' NVARCHAR2(15000)
'religion' NVARCHAR2(15000)
Parent topic: SQL-Based Property Graph Query and Analytics
4.6 SQL-Based Property Graph Analytics
In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.
The benefits of SQL-based analytics are:
-
Easier analysis of larger graphs that do not fit in physical memory
-
Cheaper analysis since no graph data is transferred outside the database
-
Better analysis using the current state of a property graph database
-
Simpler analysis by eliminating the step of synchronizing an in-memory graph with the latest updates from the graph database
However, when a graph (or a subgraph) fits in memory, then running analytics provided by the in-memory analyst usually provides better performance than using SQL-based analytics.
Because many of the analytics implementation require using intermediate data structures, most SQL- (and PL/SQL-) based analytics APIs have parameters for working tables (wt). A typical flow has the following steps:
-
Prepare the working table or tables.
-
Perform analytics (one or multiple calls).
-
Perform cleanup
The following subtopics provide SQL-based examples of some popular types of property graph analytics.
Parent topic: SQL-Based Property Graph Query and Analytics
4.6.1 Shortest Path Examples
The following examples demonstrate SQL-based shortest path analytics.
Example 4-18 Shortest Path Setup and Computation
Consider shortest path, for example. Internally, Oracle Database uses the bidirectional Dijkstra algorithm. The following code snippet shows an entire prepare, perform, and cleanup workflow.
set serveroutput on
DECLARE
wt1 varchar2(100); -- intermediate working tables
n number;
path varchar2(1000);
weights varchar2(1000);
BEGIN
-- prepare
opg_apis.find_sp_prep('connectionsGE$', wt1);
dbms_output.put_line('working table name ' || wt1);
-- compute
opg_apis.find_sp(
'connectionsGE$',
1, -- start vertex ID
53, -- destination vertex ID
wt1, -- working table (for Dijkstra expansion)
dop => 1, -- degree of parallelism
stats_freq=>1000, -- frequency to collect statistics
path_output => path, -- shortest path (a sequence of vertices)
weights_output => weights, -- edge weights
options => null
);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- cleanup (commented out here; see text after the example)
-- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/
This example may produce the following output. Note that if no working table name is provided, the preparation step will automatically generate a temporary table name and create it. Because the temporary working table name uses the session ID, your output will probably be different.
working table name "CONNECTIONSGE$$TWFS12" path 1 3 52 53 weights 4 3 1 1 1 PL/SQL procedure successfully completed.
If you want to know the definition of the working table or tables, then skip the cleanup phase (as shown in the preceding example that comments out the call to find_sp_cleanup
). After the computation is done, you can describe the working table or tables.
SQL> describe "CONNECTIONSGE$$TWFS12"
Name Null? Type
--------- -------- ----------------------------
NID NUMBER
D2S NUMBER
P2S NUMBER
D2T NUMBER
P2T NUMBER
F NUMBER(38)
B NUMBER(38)
For advanced users who want to try different table creation options, such as using in-memory or advanced compression, you can pre-create the preceding working table and pass the name in.
Example 4-19 Shortest Path: Create Working Table and Perform Analytics
The following statements show some advanced options, first creating a working table with the same column structure and basic compression enabled, then passing it to the SQL-based computation. The code optimizes the intermediate table for computations with CREATE TABLE compression and in-memory options.
create table connections$MY_EXP(
NID NUMBER,
D2S NUMBER,
P2S NUMBER,
D2T NUMBER,
P2T NUMBER,
F NUMBER(38),
B NUMBER(38)
) compress nologging;
DECLARE
wt1 varchar2(100) := 'connections$MY_EXP';
n number;
path varchar2(1000);
weights varchar2(1000);
BEGIN
dbms_output.put_line('working table name ' || wt1);
-- compute
opg_apis.find_sp(
'connectionsGE$',
1,
53,
wt1,
dop => 1,
stats_freq=>1000,
path_output => path,
weights_output => weights,
options => null
);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- cleanup
-- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/
At the end of the computation, if the working table has not been dropped or truncated, you can check the content of the working table, as follows. Note that the working table structure may vary between releases.
SQL> select * from connections$MY_EXP;
NID D2S P2S D2T P2T F B
---------- ---------- ---------- ---------- ---------- ---------- ----------
1 0 1.000E+100 1 -1
53 1.000E+100 0 -1 1
54 1.000E+100 1 53 -1 1
52 1.000E+100 1 53 -1 1
5 1 1 1.000E+100 0 -1
26 1 1 1.000E+100 0 -1
8 1000 1 1.000E+100 0 -1
3 1 1 2 52 0 0
15 1 1 1.000E+100 0 -1
21 1 1 1.000E+100 0 -1
19 1 1 1.000E+100 0 -1
...
Example 4-20 Shortest Path: Perform Multiple Calls to Same Graph
To perform multiple calls to the same graph, only a single call to the preparation step is needed. The following shows an example of computing shortest path for multiple pairs of vertices in the same graph.
DECLARE
wt1 varchar2(100); -- intermediate working tables
n number;
path varchar2(1000);
weights varchar2(1000);
BEGIN
-- prepare
opg_apis.find_sp_prep('connectionsGE$', wt1);
dbms_output.put_line('working table name ' || wt1);
-- find shortest path from vertex 1 to vertex 53
opg_apis.find_sp( 'connectionsGE$', 1, 53,
wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- find shortest path from vertex 2 to vertex 36
opg_apis.find_sp( 'connectionsGE$', 2, 36,
wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- find shortest path from vertex 30 to vertex 4
opg_apis.find_sp( 'connectionsGE$', 30, 4,
wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- cleanup
opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/
The example's output may be as follows: three shortest paths have been found for the multiple pairs of vertices provided.
working table name "CONNECTIONSGE$$TWFS12" path 1 3 52 53 weights 4 3 1 1 1 path 2 36 weights 2 1 1 path 30 21 1 4 weights 4 3 1 1 1 PL/SQL procedure successfully completed.
Parent topic: SQL-Based Property Graph Analytics
4.6.2 Collaborative Filtering Overview and Examples
Collaborative filtering, also referred to as social filtering, filters information by using the recommendations of other people. Collaborative filtering is widely used in systems that recommend purchases based on purchases by others with similar preferences.
The following examples demonstrate SQL-based collaborative filtering analytics.
Example 4-21 Collaborative Filtering Setup and Computation
This example shows how to use SQL-based collaborative filtering, specifically using matrix factorization to recommend telephone brands to customers. This example assumes there exists a graph called "PHONES" in the database. This example graph contains customer and item vertices, and edges with a 'rating' label linking some customer vertices to other some item vertices. The rating labels have a numeric value corresponding to the rating that a specific customer (edge OUT vertex) assigned to the specified product (edge IN vertex).
The following figure shows this graph.
Figure 4-1 Phones Graph for Collaborative Filtering
Description of "Figure 4-1 Phones Graph for Collaborative Filtering"
set serveroutput on
DECLARE
wt_l varchar2(32); -- working tables
wt_r varchar2(32);
wt_l1 varchar2(32);
wt_r1 varchar2(32);
wt_i varchar2(32);
wt_ld varchar2(32);
wt_rd varchar2(32);
edge_tab_name varchar2(32) := 'phonesge$';
edge_label varchar2(32) := 'rating';
rating_property varchar2(32) := '';
iterations integer := 100;
min_error number := 0.001;
k integer := 5;
learning_rate number := 0.001;
decrease_rate number := 0.95;
regularization number := 0.02;
dop number := 2;
tablespace varchar2(32) := null;
options varchar2(32) := null;
BEGIN
-- prepare
opg_apis.cf_prep(edge_tab_name,wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd);
dbms_output.put_line('working table wt_l ' || wt_l);
dbms_output.put_line('working table wt_r ' || wt_r);
dbms_output.put_line('working table wt_l1 ' || wt_l1);
dbms_output.put_line('working table wt_r1 ' || wt_r1);
dbms_output.put_line('working table wt_i ' || wt_i);
dbms_output.put_line('working table wt_ld ' || wt_ld);
dbms_output.put_line('working table wt_rd ' || wt_rd);
-- compute
opg_apis.cf(edge_tab_name,edge_label,rating_property,iterations,
min_error,k,learning_rate,decrease_rate,regularization,dop,
wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd,tablespace,options);
END;
/
The flow starts by creating the temporary working tables that are later on passed to the computation. At the end of computation, the example may produce the following output. Note that if no working table name is provided, the preparation step will automatically generate a temporary table name and create it. Because the temporary working table name uses the session ID, your output will probably be different.working table wt_l "PHONESGE$$CFL57" working table wt_r "PHONESGE$$CFR57" working table wt_l1 "PHONESGE$$CFL157" working table wt_r1 "PHONESGE$$CFR157" working table wt_i "PHONESGE$$CFI57" working table wt_ld "PHONESGE$$CFLD57" working table wt_rd "PHONESGE$$CFRD57" PL/SQL procedure successfully completed.
Example 4-22 Collaborative Filtering: Validating the Intermediate Error
At the end of every computation, you can check the current error of the algorithm with the following query as long as the data in the working tables has not been already deleted. The following SQL query illustrates how to get the intermediate error of a current run of the collaborative filtering algorithm.
SELECT /*+ parallel(48) */ SQRT(SUM((w1-w2)*(w1-w2) +
<regularization>/2 * (err_reg_l+err_reg_r))) AS err
FROM <wt_i>;
Note that the regularization parameter and the working table name (parameter wt_i
) should be replaced according to the values used when running the OPG_APIS.CF algorithm. In the preceding previous example, replace <regularization>
with 0.02 and <wt_i>
with "PHONESGE$$CFI149" as follows:
SELECT /*+ parallel(48) */ SQRT(SUM((w1-w2)*(w1-w2) + 0.02/2 * (err_reg_l+err_reg_r))) AS err
FROM "PHONESGE$$CFI149";
This query may produce the following output.
ERR ---------- 4.82163662
f the value of the current error is too high or if the predictions obtained from the matrix factorization results of the collaborative filtering are not yet useful, you can run more iterations of the algorithm, by reusing the working tables and the progress made so far. The following example shows how to make predictions using the SQL-based collaborative filtering.
Example 4-23 Collaborative Filtering: Making Predictions
The result of the collaborative filtering algorithm is stored in the tables wt_l
and wt_r
, which are the two factors of a matrix product. These matrix factors should be used when making the predictions of the collaborative filtering.
In a typical flow of the algorithm, the two matrix factors can be used to make the predictions before calling the OPG_APIS.CF_CLEANUP procedure, or they can be copied and persisted into other tables for later use. The following example demonstrates the latter case:
DECLARE
wt_l varchar2(32); -- working tables
wt_r varchar2(32);
wt_l1 varchar2(32);
wt_r1 varchar2(32);
wt_i varchar2(32);
wt_ld varchar2(32);
wt_rd varchar2(32);
edge_tab_name varchar2(32) := 'phonesge$';
edge_label varchar2(32) := 'rating';
rating_property varchar2(32) := '';
iterations integer := 100;
min_error number := 0.001;
k integer := 5;
learning_rate number := 0.001;
decrease_rate number := 0.95;
regularization number := 0.02;
dop number := 2;
tablespace varchar2(32) := null;
options varchar2(32) := null;
BEGIN
-- prepare
opg_apis.cf_prep(edge_tab_name,wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd);
-- compute
opg_apis.cf(edge_tab_name,edge_label,rating_property,iterations,
min_error,k,learning_rate,decrease_rate,regularization,dop,
wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd,tablespace,options);
-- save only these two tables for later predictions
EXECUTE IMMEDIATE 'CREATE TABLE customer_mat AS SELECT * FROM ' || wt_l;
EXECUTE IMMEDIATE 'CREATE TABLE item_mat AS SELECT * FROM ' || wt_r;
-- cleanup
opg_apis.cf_cleanup('phonesge$',wt_l,wt_r,wt_l1,wt_r1,wt_i,wt_ld,wt_rd);
END;
/
This example will produce the only the following output.
PL/SQL procedure successfully completed.
Now that the matrix factors are saved in the tables customer_mat and item_mat, you can use the following query to check the "error" (difference) between the real values (those values that previously existed in the graph as 'ratings') and the estimated predictions (the result of the matrix multiplication in a certain customer row and item column).
Note that the following query is customized with a join on the vertex table in order return an NVARCHAR property of the vertices (for example, the name property) instead of a numeric ID. This query will return all the predictions for every single customer vertex to every item vertex in the graph.
SELECT /*+ parallel(48) */ MIN(vertex1.v) AS customer,
MIN(vertex2.v) AS item,
MIN(edges.vn) AS real,
SUM(l.v * r.v) AS predicted
FROM PHONESGE$ edges,
CUSTOMER_MAT l,
ITEM_MAT r,
PHONESVT$ vertex1,
PHONESVT$ vertex2
WHERE l.k = r.k
AND l.c = edges.svid(+)
AND r.p = edges.dvid(+)
AND l.c = vertex1.vid
AND r.p = vertex2.vid
GROUP BY l.c, r.p
ORDER BY l.c, r.p -- This order by clause is optional
;
This query may produce an output similar to the following (some rows are omitted for brevity).
CUSTOMER ITEM REAL PREDICTED ------------------------------------------------ Adam Apple 5 3.67375703 Adam Blackberry 3.66079652 Adam Danger 2.77049596 Adam Ericsson 4.21764858 Adam Figo 3.10631337 Adam Google 4 4.42429022 Adam Huawei 3 3.4289115 Ben Apple 2.82127589 Ben Blackberry 2 2.81132282 Ben Danger 3 2.12761307 Ben Ericsson 3 3.2389595 Ben Figo 2.38550534 Ben Google 3.39765075 Ben Huawei 2.63324582 ... Don Apple 1.3777496 Don Blackberry 1 1.37288909 Don Danger 1 1.03900439 Don Ericsson 1.58172236 Don Figo 1 1.16494421 Don Google 1.65921807 Don Huawei 1 1.28592648 Erik Apple 3 2.80809351 Erik Blackberry 3 2.79818695 Erik Danger 2.11767182 Erik Ericsson 3 3.2238255 Erik Figo 2.3743591 Erik Google 3 3.38177526 Erik Huawei 3 2.62094201
If you want to check only some rows to decide whether the prediction results are ready or more iterations of the algorithm should be run, the previous query can be wrapped in an outer query. The following example will select only the first 11 results.
SELECT /*+ parallel(48) */ * FROM (
SELECT /*+ parallel(48) */ MIN(vertex1.v) AS customer,
MIN(vertex2.v) AS item,
MIN(edges.vn) AS real,
SUM(l.v * r.v) AS predicted
FROM PHONESGE$ edges,
CUSTOMER_MAT l,
ITEM_MAT r,
PHONESVT$ vertex1,
PHONESVT$ vertex2
WHERE l.k = r.k
AND l.c = edges.svid(+)
AND r.p = edges.dvid(+)
AND l.c = vertex1.vid
AND r.p = vertex2.vid
GROUP BY l.c, r.p
ORDER BY l.c, r.p
) WHERE rownum <= 11;
This query may produce an output similar to the following.
CUSTOMER ITEM REAL PREDICTED ------------------------------------------------ Adam Apple 5 3.67375703 Adam Blackberry 3.66079652 Adam Danger 2.77049596 Adam Ericsson 4.21764858 Adam Figo 3.10631337 Adam Google 4 4.42429022 Adam Huawei 3 3.4289115 Ben Apple 2.82127589 Ben Blackberry 2 2.81132282 Ben Danger 3 2.12761307 Ben Ericsson 3 3.2389595
To get a prediction for a specific vertex (customer, item, or both) the query can be restricted with the desired ID values. For example, to get the predicted value of vertex 1 (customer) and vertex 105 (item), you can use the following query.
SELECT /*+ parallel(48) */ MIN(vertex1.v) AS customer,
MIN(vertex2.v) AS item,
MIN(edges.vn) AS real,
SUM(l.v * r.v) AS predicted
FROM PHONESGE$ edges,
CUSTOMER_MAT l,
ITEM_MAT r,
PHONESVT$ vertex1,
PHONESVT$ vertex2
WHERE l.k = r.k
AND l.c = edges.svid(+)
AND r.p = edges.dvid(+)
AND l.c = vertex1.vid
AND vertex1.vid = 1 /* Remove to get all predictions for item 105 */
AND r.p = vertex2.vid
AND vertex2.vid = 105 /* Remove to get all predictions for customer 1 */
/* Remove both lines to get all predictions */
GROUP BY l.c, r.p
ORDER BY l.c, r.p;
This query may produce an output similar to the following.
CUSTOMER ITEM REAL PREDICTED ------------------------------------------------ Adam Ericsson 4.21764858
Parent topic: SQL-Based Property Graph Analytics