PGX 19.1.0
Documentation
Note: This feature is only available in the following software packages: PGX on the Oracle Tech Network, Oracle Big Data Spatial and Graph

Matching Patterns in Graphs (PGQL)

This tutorial explains how to issue a pattern-matching query against a graph, and work with the results of that query.

The Datasets

In this tutorial, you will use two different datasets. The first dataset models relationships between politicians, athletes, celebrities, and companies. The dataset is located in <pgx root>/examples/graphs/.

The second dataset represents an electrical network with devices, connections and switches. The model is described here and the dataset can be loaded into PGX like explained here.

Read the Graphs

First, create a session by launching PGX in local mode:

cd $PGX_HOME
./bin/pgx
// starting the shell will create an implicit session
import oracle.pgx.api.*;

...

PgxSession session = Pgx.createSession("my-session");

Next, load the graphs into memory. For example, the first dataset can be loaded as follows:

pgx> connectionsGraph = session.readGraphWithProperties("examples/graphs/connections.edge_list.json")
import oracle.pgx.api.*;

...

PgxGraph connectionsGraph = session.readGraphWithProperties("examples/graphs/connections.edge_list.json");

Submit Queries

You can submit a graph pattern matching query in PGQL, an SQL-like declarative language that allows you to express a pattern that consists of vertices and edges and constraints on the properties of the vertices and edges.

To submit a query to PGX, you can use the queryPgql() method of PgxGraph (which is the type of object you get when you load a graph via the session).

PgqlResultSet queryPgql(String queryString)

Enemy of My Enemy is My Friend

Here, you will find a graph pattern inspired by the famous ancient proverb The enemy of my enemy is my friend. Specifically, you will find two entities which are connected by two edges of the feuds edge label. Vertices represent people or clans or countries. A pair of vertices which are feuding with each other will have an edge with the feuds edge label.

Such a query is written in PGQL as follows:

  SELECT x.name, z.name
   MATCH (x) -[e1:feuds]-> (y)
       , (y) -[e2:feuds]-> (z)
   WHERE x <> z
ORDER BY x.name, z.name

Note that in the query, we order the results by x.name and then z.name.

Submit the query to PGX:

pgx> resultSet = connectionsGraph.queryPgql("""
  SELECT x.name, z.name
   MATCH (x) -[e1:feuds]-> (y)
       , (y) -[e2:feuds]-> (z)
   WHERE x <> z
ORDER BY x.name, z.name""")
import oracle.pgx.api.*;

...

PgqlResultSet resultSet = connectionsGraph.queryPgql("SELECT x.name, z.name MATCH (x) -[e1:feuds]-> (y), (y) -[e2:feuds]-> (z) WHERE x <> z ORDER BY x.name, z.name");

PgqlResultSet manages a result set of a query. A result set contains multiple results (such a query may match many sub-graphs). Each result consists of a list of result elements. The order of result elements follows the order of variables in the SELECT clause of a query.

Iterating over a query results means iterating over a set of PgqlResultElement instances. A PgqlResultElement maintains the type and variable name of a result element.

You can get the list of PgqlResultElement instances as follows:

pgx> resultElements = resultSet.getPgqlResultElements()
import oracle.pgx.api.*;
import java.util.List;

...

List<PgqlResultElement> resultElements = resultSet.getPgqlResultElements();

Get the type and variable name of the first result element:

pgx> resultElement = resultElements.get(0)
pgx> type = resultElement.getElementType() // STRING
pgx> varName = resultElement.getVarName() // x.name
import oracle.pgx.api.*;

...

PgqlResultElement resultElement = resultElements.get(0);
PqglResultElement.Type = resultElement.getElementType(); // STRING
String varName = resultElement.getVarName(); // x.name

Iterate over a result set using the for-each style for loop. In the loop, you get a PgxResult instance which contains a query result.

pgx> resultSet.each { \
       // the variable 'it' is implicitly declared to reference each PgxResult instance
     }
import oracle.pgx.api.*;

...

for (PgxResult result : resultSet) {
  ...
}

You can print out the result set in textual format using print method of PqglResultSet.

pgx> resultSet.print(10) // print the first 10 results
import oracle.pgx.api.*;

...

resultSet.print(10); // print the first 10 results

You will see the following results.

+------------------------------+
| x.name        | z.name       |
+------------------------------+
| ABC           | CBS          |
| ABC           | NBC          |
| Alibaba       | Beyonce      |
| Alibaba       | Google       |
| Alibaba       | eBay         |
| Amazon        | Carl Icahn   |
| Amazon        | Facebook     |
| Amazon        | Tencent      |
| Angela Merkel | Barack Obama |
| Angela Merkel | John Kerry   |
+------------------------------+

You can also get a handle of individual PgxResult instances or their elements.

By the index of the result element:

pgx> nameX = it.getString(1)
pgx> nameZ = it.getString(2)
import oracle.pgx.api.*;

...

String nameX = result.getString(1);
String nameZ = result.getString(2);

By the variable name of the result element:

pgx> nameX = it.getString("x.name")
pgx> nameZ = it.getString("z.name")
import oracle.pgx.api.*;

...

String nameX = result.getString("x.name");
String nameZ = result.getString("z.name");

You can also get a result element without knowing its type:

pgx> nameX = it.getObject(1)
// or
pgx> nameX = it.getObject("x.name")
import oracle.pgx.api.*;

...

Object nameX = result.getObject(1);
// or
Object nameX = result.getObject("x.name");

Top 10 Most Collaborative People

Another interesting query is finding the top 10 most collaborative people in the graph in a decreasing order of the number of collaborators. Such a query exploits various features of PGQL which include grouping, aggregating, ordering, and limiting the graph patterns found in the MATCH clause. The following query string expresses a user's inquiry in PGQL.

pgx> resultSet = connectionsGraph.queryPgql("""
  SELECT x.name, COUNT(*) AS num_collaborators
   MATCH (x) -[:collaborates]-> ()
GROUP BY x
ORDER BY num_collaborators DESC, x.name
   LIMIT 10""")
import oracle.pgx.api.*;

...

PgqlResultSet resultSet = connectionsGraph.queryPgql("SELECT x.name, COUNT(*) AS num_collaborators MATCH (x) -[:collaborates]-> () GROUP BY x ORDER BY num_collaborators DESC, x.name LIMIT 10");

The above query does the following:

  1. Find all collaboration relationship patterns from the graph by matching the `collaborates' edge label.
  2. Group the found patterns by its source vertex.
  3. Apply the count aggregation to each group to find the number of collaborators.
  4. Order the groups by the number of collaborators in a decreasing order.
  5. Take only the first 10 results.

print() method shows the name and the number of collaborators of the top 10 collaborative people in the graph.

pgx> resultSet.print()
import oracle.pgx.api.*;

...

resultSet.print()

You can see the following in the console.

+-------------------------------------------------+
| x.name                      | num_collaborators |
+-------------------------------------------------+
| Barack Obama                | 10                |
| Charlie Rose                | 4                 |
| Dieudonne Nzapalainga       | 3                 |
| NBC                         | 3                 |
| Nicolas Guerekoyame Gbangou | 3                 |
| Omar Kobine Layama          | 3                 |
| Pope Francis                | 3                 |
| Angela Merkel               | 2                 |
| Beyonce                     | 2                 |
| Eric Holder                 | 2                 |
+-------------------------------------------------+

For the complete PGQL specification, please refer to PGQL Specification. For the complete API set, please refer to API reference for graph pattern matching.

Transitive Connectivity between Electrical Devices

Another interesting query is one that tests for reachability between vertices. We will use the electrical network graph in this example:

electric_graph_model_reachable

What we are interested in is whether every Device in the graph is transitively connected to every other Device. Note that devices are connected via Connection vertices and Switch vertices.

First, we find out how many devices there are in the graph by submitting the following PGQL query:

SELECT COUNT(*) AS numDevices
 MATCH (n:Device)

The result is as follows:

+------------+
| numDevices |
+------------+
| 6031       |
+------------+

For each device, we count the number of devices that can be reached by following zero or more Connection or Switch vertices (and necessary edges). This query can be expressed in PGQL as follows:

    PATH connects_to AS () <- (:Connection|Switch) -> ()
  SELECT n.nickname AS device, COUNT(*) AS reachabilityCount, COUNT(*) = 6031 AS reachesAllDevices
   MATCH (n:Device) -/:connects_to*/-> (m:Device)
GROUP BY n
ORDER BY COUNT(*), n.nickname

In the above query, we express connectivity between two neighboring devices/connections using a path pattern connects_to. We use a Kleene star (*) to express that the path pattern may repeatedly match zero or more times as we want to determine transitive connectivity.

The query uses GROUP BY to make a group for each of the source devices n and then counts the number of reachable destination devices m. The first 20 results are as follows:

+------------------------------------------------------------------+
| device                   | reachabilityCount | reachesAllDevices |
+------------------------------------------------------------------+
| 190-7361-M1089120        | 6031              | true              |
| 190-8581-D5587291-3_INT  | 6031              | true              |
| 190-8593-D5860423-3_INT  | 6031              | true              |
| 196-29518-L3122816       | 6031              | true              |
| 196-29519-L3066815       | 6031              | true              |
| 196-29520-L3160109       | 6031              | true              |
| 196-29521-N1136355       | 6031              | true              |
| 196-31070-D5861005-2_INT | 6031              | true              |
| 196-35541-M1108317       | 6031              | true              |
| 196-35813-N1140519       | 6031              | true              |
| 196-36167-L3011298       | 6031              | true              |
| 198-5320-221-311359      | 6031              | true              |
| 221-240988-L3141411      | 6031              | true              |
| 221-240991-L3066817      | 6031              | true              |
| 221-242079-L3011293      | 6031              | true              |
| 221-282818-N1230123      | 6031              | true              |
| 221-282819-N1230122      | 6031              | true              |
| 221-306686-L2970258      | 6031              | true              |
| 221-306687-L2916625      | 6031              | true              |
| 221-308718-L2803199      | 6031              | true              |
+------------------------------------------------------------------+

Since we sorted by increasing reachabilityCount and since even the first device in the result transitively connects to every device in the graph (reachesAllDevices = true), we now know that all the devices in the graph are fully reachable from each other.