PGX 2.7.0
Documentation

Running Graph Pattern Matching (PGQL)

What is Graph Pattern Matching?

Along with graph analytics, graph pattern matching is an important task on graph datasets, which aims to find patterns of users' interest from a large data graph as subgraphs. It has a wide range of applicability, such as semantic web search, link prediction, and fraud detection. For example, given a data graph below where a vertex has a string property name and an edge has a string property relationship, a user wants to find friends of John's friends. The data graph and such an inquiry can be expressed as the following graphs.

Data Graph

Figure: Friendship Graph and a query friends of John's friends

Similar to SQL for relational data, to specify an arbitrary query graph, PGX supports a query language, PGQL, which is designed for graph pattern matching on property graphs. The above example graph query can be written in PGQL as follows:

SELECT friendOfFriend.name
MATCH
  (me:Person) -[:friendOf]-> (friend:Person),
  (friend) -[:friendOf]-> (friendOfFriend)
WHERE
  me.name = 'John'

PGX provides APIs to submit a query in PGQL and fetching query results, which is detailed in the tutorial and API guide. Along with basic pattern matching (or topology matching), PGQL provides additional features including constraints on node, edge, and their properties, ordering results, grouping results, and aggregations. For the full PGQL specification, please refer to PGQL 1.1 Specification.

Graph Pattern Matching Semantic

Even with the same query, the query result varies depending on the desired semantics of structural graph equivalence - the definition of what it means for two graphs to match. PGX supports two representative graph pattern matching semantics -- graph homomorphism and graph isomorphism.

Under graph homomorphism, two graphs are considered structurally equivalent (their patterns match) if, when there is an edge between vertices in the query, there is an equivalent edge between two matched vertices in the graph. In this example, there are two subgraphs - the read and blue ones in the figure - which are solutions to the query. Thus, the answer for friends of John's friends are David and John himself.

Under graph isomorphism, two graphs must not only satisfy the conditions of graph homomorphism, but also a vertex in the data graph cannot match multiple query vertices - a vertex in the data being queried may not satisfy the role of more than one vertex in the query. This eliminates the possibility of John being one of the "friends of John's friends" - the red sub-graph in the figure (solution 1) is the only matched sub-graph, and the blue sub-graph (David) is the only match.

Solution

Figure: Solutions of friends of John's friends