Along with graph analytics, graph pattern matching is an
important task on graph datasets, which aims to find
patterns of users' interest from a large data graph as subgraphs. It has a wide
range of applicability, such as semantic web search, link prediction, and fraud
detection. For example, given a data graph below where a vertex has
a string property
name and an edge has a string property
relationship, a user wants to find
friends of John's friends. The
data graph and such an inquiry can be expressed as the following
Figure: Friendship Graph and a query
friends of John's friends
Similar to SQL for relational data, to specify an arbitrary query graph, PGX supports a query language, PGQL, which is designed for graph pattern matching on property graphs. The above example graph query can be written in PGQL as follows:
SELECT friendOfFriend.name MATCH (me:Person) -[:friendOf]-> (friend:Person), (friend) -[:friendOf]-> (friendOfFriend) WHERE me.name = 'John'
PGX provides APIs to submit a query in PGQL and fetching query results, which is detailed in the tutorial and API guide. Along with basic pattern matching (or topology matching), PGQL provides additional features including constraints on node, edge, and their properties, ordering results, grouping results, and aggregations. For the full PGQL specification, please refer to PGQL 1.1 Specification.
Even with the same query, the query result varies depending on the desired semantics of structural graph equivalence - the definition of what it means for two graphs to match. PGX supports two representative graph pattern matching semantics -- graph homomorphism and graph isomorphism.
Under graph homomorphism, two graphs are considered structurally equivalent
(their patterns match) if, when there is an edge between vertices in the
query, there is an equivalent edge between two matched vertices in the graph.
In this example, there are two subgraphs - the read and blue ones in the
figure - which are solutions to the query. Thus, the answer for
friends of John's friends are
Under graph isomorphism, two graphs must not only satisfy the conditions
of graph homomorphism, but also a vertex in the data graph cannot match
multiple query vertices - a vertex in the data being queried may not satisfy
the role of more than one vertex in the query. This eliminates the possibility
John being one of the "friends of John's friends" - the red sub-graph
in the figure (solution 1) is the only matched sub-graph, and the blue
David) is the only match.
Figure: Solutions of
friends of John's friends