******************************* Graph Pattern Matching Examples ******************************* This guide explains how to issue a pattern-matching query against a graph, and work with the results of that query. The Datasets ------------ This guide uses a dataset that models relationships between politicians, athletes, celebrities, and companies. Read the Graphs --------------- First, create a session by launching PGX in local mode: .. code-block:: python :linenos: session = pypgx.get_session(session_name="my-session") Next, load the graphs into memory. For example, the first dataset can be loaded as follows: .. code-block:: python :linenos: connections_graph = session.read_graph_with_properties( "examples/graphs/connections.edge_list.json" ) Submit Queries -------------- You can submit a graph pattern matching query in `PGQL `_, an SQL-like declarative language that allows you to express a pattern that consists of vertices and edges and constraints on the properties of the vertices and edges. To submit a query to PGX, you can use the :meth:`query_pgql()` method of :class:`PgxGraph` (which is the type of object you get when you load a graph via the ``session``). .. code-block:: python :linenos: session.query_pgql(query) Enemy of My Enemy is My Friend ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here, you will find a graph pattern inspired by the famous ancient proverb *The enemy of my enemy is my friend*. Specifically, you will find two entities which are connected by two edges of the ``feuds`` edge label. Vertices represent people or clans or countries. A pair of vertices which are feuding with each other will have an edge with the ``feuds`` edge label. Such a query is written in PGQL as follows: .. code-block:: sql :linenos: SELECT x.name, z.name FROM MATCH (x) -[e1:feuds]-> (y) , MATCH (y) -[e2:feuds]-> (z) WHERE x <> z ORDER BY x.name, z.name Note that in the query, we order the results by ``x.name`` and then ``z.name``. Submit the query to PGX: .. code-block:: python :linenos: result_set = connections_graph.query_pgql( """ SELECT x.name, z.name FROM MATCH (x) -[e1:feuds]-> (y), MATCH (y) -[e2:feuds]-> (z) WHERE x <> z ORDER BY x.name, z.name """ ) :class:`PgqlResultSet` manages a result set of a query. A result set contains multiple results (such a query may match many sub-graphs). Each result consists of a list of result elements. The order of result elements follows the order of variables in the ``SELECT`` clause of a query. Iterating over a query results means iterating over a :attr:`pgql_result_elements` dictionary. You can get the :attr:`pgql_result_elements` dictionary as follows: .. code-block:: python :linenos: result_elements = result_set.pgql_result_elements Top 10 Most Collaborative People ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another interesting query is finding the top 10 most collaborative people in the graph in a decreasing order of the number of collaborators. Such a query exploits various features of PGQL which include grouping, aggregating, ordering, and limiting the graph patterns found in the ``MATCH`` clause. The following query string expresses a user's inquiry in PGQL. .. code-block:: python :linenos: result_set = connections_graph.query_pgql( """ SELECT x.name, COUNT(*) AS num_collaborators FROM MATCH (x) -[:collaborates]-> () GROUP BY x ORDER BY num_collaborators DESC, x.name LIMIT 10 """ ) The above query does the following: 1. Find all collaboration relationship patterns from the graph by matching the ``collaborates`` edge label. 1. Group the found patterns by its source vertex. 1. Apply the count aggregation to each group to find the number of collaborators. 1. Order the groups by the number of collaborators in a decreasing order. 1. Take only the first 10 results. :meth:`print()` method shows the name and the number of collaborators of the top 10 collaborative people in the graph. .. code-block:: python :linenos: result_set.print() You can see the following in the console. +-----------------------------+-------------------+ | x.name | num_collaborators | +=============================+===================+ | Barack Obama | 10 | +-----------------------------+-------------------+ | Charlie Rose | 4 | +-----------------------------+-------------------+ | Dieudonne Nzapalainga | 3 | +-----------------------------+-------------------+ | NBC | 3 | +-----------------------------+-------------------+ | Nicolas Guerekoyame Gbangou | 3 | +-----------------------------+-------------------+ | Omar Kobine Layama | 3 | +-----------------------------+-------------------+ | Pope Francis | 3 | +-----------------------------+-------------------+ | Angela Merkel | 2 | +-----------------------------+-------------------+ | Beyonce | 2 | +-----------------------------+-------------------+ | Eric Holder | 2 | +-----------------------------+-------------------+ Transitive Connectivity ~~~~~~~~~~~~~~~~~~~~~~~ Another interesting query is one that tests for reachability between vertices. What we are interested in is whether every person in the graph is transitively connected to every other person. First, we find out how many persons there are in the graph by submitting the following PGQL query: .. code-block:: python :linenos: SELECT COUNT(*) AS numPersons FROM MATCH (n:person) The result is as follows: +------------+ | numPersons | +============+ | 62 | +------------+ For each persone, we count the number of persons that can be reached by following zero or more edges. This query can be expressed in PGQL as follows: .. code-block:: sql :linenos: PATH connects_to AS () <- () SELECT n.name, COUNT(*) AS reachabilityCount, COUNT(*) = 62 AS reachesAllPersons FROM MATCH (n:person) -/:connects_to*/-> (m:person) GROUP BY n ORDER BY COUNT(*) DESC, n.name LIMIT 20 In the above query, we express connectivity between two neighboring persons using a path pattern ``connects_to``. We use a Kleene star (``*``) to express that the path pattern may repeatedly match zero or more times as we want to determine _transitive_ connectivity. The query uses ``GROUP BY`` to make a group for each of the source devices ``n`` and then counts the number of reachable destination devices ``m``. The first 20 results are as follows: +-----------------------------+-------------------+-------------------+ | n.name | reachabilityCount | reachesAllPersons | +=============================+===================+===================+ | Michelle Bachelet | 45 | false | +-----------------------------+-------------------+-------------------+ | Nicolas Maduro | 45 | false | +-----------------------------+-------------------+-------------------+ | Dieudonne Nzapalainga | 43 | false | +-----------------------------+-------------------+-------------------+ | Nicolas Guerekoyame Gbangou | 43 | false | +-----------------------------+-------------------+-------------------+ | Omar Kobine Layama | 43 | false | +-----------------------------+-------------------+-------------------+ | Pope Francis | 43 | false | +-----------------------------+-------------------+-------------------+ | Abdel Fattah eL-Sisi | 39 | false | +-----------------------------+-------------------+-------------------+ | Abdullah Gul | 39 | false | +-----------------------------+-------------------+-------------------+ | Jenji Kohan | 39 | false | +-----------------------------+-------------------+-------------------+ | Robin Wright | 39 | false | +-----------------------------+-------------------+-------------------+ | Janet Yellen | 38 | false | +-----------------------------+-------------------+-------------------+ | Jason Collins | 38 | false | +-----------------------------+-------------------+-------------------+ | Mary Barra | 38 | false | +-----------------------------+-------------------+-------------------+ | Serena Williams | 38 | false | +-----------------------------+-------------------+-------------------+ | Alfonso Cuaron | 37 | false | +-----------------------------+-------------------+-------------------+ | Angela Merkel | 37 | false | +-----------------------------+-------------------+-------------------+ | Barack Obama | 37 | false | +-----------------------------+-------------------+-------------------+ | Benedict Cumberbatch | 37 | false | +-----------------------------+-------------------+-------------------+ | Beyonce | 37 | false | +-----------------------------+-------------------+-------------------+ | Carl Icahn | 37 | false | +-----------------------------+-------------------+-------------------+ Since we sorted by increasing ``reachabilityCount`` and since even the first person in the result transitively does not connect to every person in the graph (``reachesAllPersons`` = ``false``), we now know that all the persons in the graph are not fully reachable from each other.