PGX 20.1.1
Documentation

What is New in PGX 20.0.0

Support for Storing Graphs Containing Temporal Types in Distributed Execution Mode

In the distributed execution mode, users can store graphs containing the following types: local_date, local_time, local_timestamp, time_with_timezone and timestamp_with_timezone. This new feature is available for all the formats supported by the distributed execution mode: ADJ_LIST, EDGE_LIST and PGB. For more information on graph storing in distributed execution mode, please go to the corresponding page.

Support for Field Extraction from Temporal Data Types in Distributed Execution Mode

In the distributed execution mode, users can now use the EXTRACT keyword in PGQL queries. This feature is used to get the different fields (e.g. hour, year or timezone) from a temporal value. For instance, the following query will extract the month of the birthdate of all vertices in the graph: SELECT EXTRACT(MONTH FROM n.birthdate) MATCH (n)

Support for Numeric Functions in Distributed Execution Mode

In the distributed execution mode, users can now call the following numeric functions: ABS, CEIL or CEILING, FLOOR and ROUND. For more information, please refer to the PGQL Specification.

Loading from Oracle NoSQL and Apache HBase Databases in Distributed Execution Mode

In the distributed execution mode, users can now load property graphs from Oracle NoSQL and Apache HBase databases. For more information, please go to the corresponding page.

PGQL 1.3 for Shared-memory Execution Mode

PGX in shared-memory execution mode now supports the new PGQL 1.3, see http://pgql-lang.org/spec/1.3/.

Note that full backwards compatibility is maintained and that existing PGQL 1.0, PGQL 1.1 and PGQL 1.2 queries continue to work.

CHEAPEST Path Finding in PGQL 1.3

CHEAPEST path finding, which was previously already part of PGX, is now also part of the PGQL specification, see http://pgql-lang.org/spec/1.3/#cheapest-path.

MATCH Inside FROM and Optional ON Clauses in PGQL 1.3

In PGQL 1.3, MATCH clauses are part of the FROM clause such that each FROM clause has one or more comma-separated MATCH clauses. Inside each MATCH clause there is a single linear path pattern with optional ON clause for specifying the graph to match the pattern on. The ON clause is only needed if the name of the graph is not already provided through other means.

For example, a query without ON clauses:

SELECT *
FROM MATCH (n) -[e]-> (m)
   , MATCH (n) -[e2]-> (o)

Or, a query with ON clauses:

SELECT *
FROM MATCH (n) -[e]-> (m) ON myGraph
   , MATCH (n) -[e2]-> (o) ON myGraph

INSERT, UPDATE and DELETE in PGQL 1.3

INSERT, UPDATE and DELETE, which was previously in "beta state", is now part of the PGQL specification and no longer in beta state, see http://pgql-lang.org/spec/1.3/#graph-modification. Example queries are (note: the beta syntax is no longer supported):

UPDATE n SET ( n.dob = DATE '2000-04-15' )
FROM MATCH (n:Person)
WHERE n.name = 'Jacqueline'
INSERT VERTEX m LABELS ( Person ) PROPERTIES ( m.name = 'Alice', m.dob = DATE '1985-01-14' )
     , EDGE e BETWEEN n AND m LABELS ( knows ) PROPERTIES ( e.since = DATE '2020-02-03' )
FROM MATCH (n:Person)
WHERE n.name = 'Jacqueline'
DELETE n
FROM MATCH (n:Person)
WHERE n.name = 'Alice'

See Running Graph Pattern Matching Queries (PGQL) for how to execute INSERT, UPDATE and DELETE queries through the PGX API.

Case Insensitivity in PGQL 1.3

Finally, all unquoted identifiers including labels, property names, graph names, vertex and edge variable names, PATH macro names and aliases in SELECT and GROUP BY are now case insensitive when using the new PGQL 1.3 syntax.

For example, in the following query, Person, firstName, p (same as P) and personName (same as PeRsOnName) are case insensitive:

SELECT p.name AS personName
FROM MATCH (P:Person)
ORDER BY PeRsOnName

This means that, for example, the pattern (P:Person) will match vertices no matter if they have the label Person, person or PeRsOn.

If the query uses double-quoted identifier syntax, then it matches exactly and not in a case-insensitive manner:

SELECT "p"."name" AS "personName"
FROM MATCH ("p":"Person")
ORDER BY "personName"

For example, above, the pattern ("p":"Person") will only match vertices labeled Person but not vertices labeled e.g. person or PeRsOn.

Quoted Identifiers in PGQL 1.3

Quoted identifiers, which allow for encoding of any string of Unicode characters, were previously already supported for labels, property names and graph names. However, as part of PGQL 1.3, quoted identifiers are now supported also for:

  • Aliases in SELECT and GROUP BY
  • Vertex and edge variable names
  • Path macro names

An example that shows all the newly supported cases is:

PATH "my path macro" AS (src) -["my edge"]-> (dst) WHERE "my edge".weight > 100
SELECT "source vertex".prop1, "destination vertex".prop1 AS "my alias"
FROM MATCH ("source vertex") -/:"my path macro"*/-> ("destination vertex")

Subqueries After GROUP BY in Shared-memory Execution Mode

Previously, subqueries were supported in WHERE, SELECT, ORDER BY, INSERT and UPDATE clauses. However, subqueries in SELECT, ORDER BY, INSERT and UPDATE clauses were only supported in the case the outer query did not have a GROUP BY. Also, subqueries in HAVING clauses were not supported.

These limitations have been lifted such that it is now possible to have a query with GROUP BY that has one or more subqueries in SELECT, ORDER BY, HAVING, INSERT and UPDATE.

For example:

SELECT continent.name
     , year
     , COUNT(DISTINCT message) AS messageCountForContinentForYear
     , ( SELECT COUNT(*)
         FROM MATCH (message2:Post|Comment)
         WHERE EXTRACT(YEAR FROM message2.creationDate) = year
       ) AS messageCountForYear
FROM MATCH (message:Post|Comment) -[:isLocatedIn]-> (country:Country) -[:isPartOf]-> (continent:Continent)
GROUP BY continent, EXTRACT(YEAR FROM message.creationDate) AS year

Above, the fourth and last expression in the SELECT clause is a subuqery. The subquery references an expression from the outer GROUP BY, namely, EXTRACT(YEAR FROM message.creationDate) AS year.

Prepared Statement Support for INSERT, UPDATE and DELETE Queries

Previously, only SELECT queries were supported in prepared statements. Now, UPDATE can be prepared. To execute a prepared statement, use PreparedStatement#execute(Async). If this method is used to execute a SELECT query, use PreparedStatement#getResultSet(Async) to retrieve the ResultSet.

Loading from HDFS in Distributed Execution Mode

In the distributed execution mode, users can now load property graphs from the Hadoop Distributed File System (HDFS). For more information, please see the corresponding graph loading guide.

Support for CASE Statements in Distributed Execution Mode

In the distributed execution mode, users can now use CASE statements available in two forms: simple CASE statements and searched CASE statements.

Simple CASE statement example:

CASE n.age
 WHEN 1 THEN "One"
 WHEN 2 THEN "Two"
 WHEN 3 THEN "Three"
 ELSE "Older than three"
END

Searched CASE statement example:

CASE
 WHEN n.level = 'user' THEN 0
 WHEN n.authorized THEN 1
 ELSE -1
END

For more information, please refer to the PGQL Specification. Note that NULL values are not supported in the distributed runtime yet. If the query does not have an ELSE clause, a default value is returned instead of NULL.

Support for Functions Labels() on Edges and Label() on Vertices in PGQL in Distributed Execution Mode

In the distributed execution mode, users can now use the labels() function on edges as well as the label() function on vertices. These functions will behave in the same way as in the shared memory version: calling labels() on edges will return a set of labels, and calling label() on vertices will either return the single label of every vertex or output an error if they have zero or more than one label.

What is New in PGX 19.4.0

Improved GraphChangeSet Performance in Single-machine Execution Mode

PGX 19.4.0 brings improved performance for GraphChangeSet, particularly for the case where there are relatively few changes compared to the size of the graph.

The performance improvements come from better parallelization and therefore the improvements are the most significant on machines with high memory bandwidth and large number of cores. For example, we measured an improvement of 23x when adding a single edge in the Twitter graph (41M vertices, 1.4B edges) on an Oracle Server x5 with two Intel Xeon E5-2699 v3 CPUs and 378GB of RAM (24 DDR4 sticks of 16GB each, running at 1600 MHz).

Ability to Publish Graph Snapshots

PGX 19.4.0 introduces a new publishWithSnapshots() API call that makes snapshots of a published graph visible to other sessions. The updated programming guide about publishing graphs illustrates this new functionality.

Refactored Graph Sharing Model

PGX 19.4.0 introduces support for namespaces for named entities (e.g., graph and properties). Thanks to namespaces, any session can create graphs and properties with any name, regardless of the fact that a graph of property with the same name had already been published.

The updated programming guide about management illustrates this new feature with examples.

Extended PGQL Support in Distributed Execution Mode

PGX 19.4.0 provide extended support for PGQL in distributed execution mode.

Added Support for Expressions in Projections and GROUP BY / ORDER BY

In the distributed execution mode, users now have complete expression support in projections, GROUP BY, and ORDER BY.

Examples include:

simple expressions in projections:

SELECT n.age >= 21 OR m.age >= 21
MATCH (n)->(m)

expressions inside a single aggregation:

SELECT COUNT(*), AVG(n.salary * n.tax_rate) > 50000 AS threshold
MATCH (n)
GROUP BY threshold
ORDER BY threshold DESC

or expressions with aggregations in projections and GROUP BY / ORDER BY:

SELECT 42 + AVG(m.salary)
MATCH (n)->(m)
GROUP BY m
ORDER BY AVG(m.salary) % 10

Added Support for Prepared Statements in Projections

In the distributed execution mode, users can now use prepared statements in projections, e.g.,

ps = session.preparePgql("SELECT actor.name, actor.age > ? FROM movies MATCH (actor:Person) ORDER BY id(a)");
ps.setLong(1, 40)

Added Support for Explicit Casting (CAST)

In the distributed execution mode, users can now cast between data types: STRING, BOOLEAN, INT, LONG, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP, TIME WITH TIMEZONE, TIMESTAMP WITH TIMEZONE.

From \ To string exact numeric (int/long) approximate numeric (float/double) boolean date time timestamp time with tz timestamp with tz
string Y M M Y Y Y Y Y Y
exact numeric (int/long) Y M Y N N N N N N
approximate numeric (float/double) Y M Y N N N N N N
boolean Y N N Y N N N N N
date N N N N Y N Y N Y
time N N N N N Y Y Y Y
timestamp N N N N Y Y Y Y Y
time with timezone N N N N N Y Y Y Y
timestamp with timezone N N N N Y Y Y Y Y

In the table, Y indicates that casting is supported, N indicates that casting is not supported, and M indicates that casting is supported only if the numeric value is within the precision bounds of the specified target type (meaning that the value should be between the minimum and maximum values of the target type). Note that this table differs from the one in the cast specification because overflows are possible when casting from strings to numbers (when the number in the string is out of range) and it is always possible to cast from exact numbers to approximate numbers, since the ranges of approximate numbers include the ranges of exact numbers.

Example:

SELECT CAST(n.age AS FLOAT), CAST(n.tax_rate AS DOUBLE), CAST(n.salary AS INT), CAST('True' AS BOOLEAN), CAST(TRUE AS STRING), CAST('09:15:00+01:00' AS TIME WITH TIME ZONE)
MATCH (n)

Added Support for Temporal Types in Distributed Execution Mode

In the distributed execution mode, users can now use the following types: local_date, local_time, local_timestamp, time_with_timezone and timestamp_with_timezone. All PGQL features (except EXTRACT) support the temporal types. They are also supported in both scalars and maps. Temporal types can be loaded from all graph formats. However, there are still other limitations, as described in the pages related to the distributed execution mode.

Improved Concurrency and Cancellation Support in Distributed Execution Mode

Control, graph, property, and scalar commands now are processed concurrently.

Most concurrently processed commands in PGX distributed mode can be cancelled. Exceptions include fixed-length / short commands (e.g., Rename Graph) and deletion commands, such as Delete Property. Once these commands have started, they are not cancellable. Still, all concurrent commands can be cancelled after being submitted but not yet scheduled by the distributed runtime.

What is New in PGX 19.3.0

PGX 19.3.0 comes with several improvements and bug fixes, as listed on our changelog; here we summarize some highlights.

Additionally, PGX 19.3.0 comes with a release plan update: we are aligning the release of the tech preview versions of PGX with the Oracle Critical Patch Update (CPU) dates. Aligning to that schedule, we are releasing PGX 19.3.0 on October 15, 2019.

PGX Shell Based on Jshell

PGX now supports a PGX shell that runs on top of JShell; this shell is scheduled to replace the deprecated Groovy shell, which will be removed.

PGX Algorithm Taken out of Beta

The PGX Algorithm feature, which allows writing graph algorithms in Java, is taken out of beta. As part of this change, the package name has been renamed from oracle.pgx.api.beta to oracle.pgx.algorithm.

Improvements and New Features in Distributed Mode

The distributed server configuration now offers an enable_secure_handshake option that enables TLS-PSK based secured handshaking among PGX machines in a cluster. Moreover, the distributed execution mode gains extended PGQL support, support for graph changesets and for loading CSV files, and the ability to interrupt graph loading.

PGQL Enhancements in Distributed Mode

In the distributed execution mode, users can now use PGQL prepared statements. For example:

ps = session.preparePgql("SELECT * FROM movies MATCH (a:?)-[e:?]->() ORDER BY id(a) LIMIT ?");
ps.setString(1, "person")
ps.setString(2, "directed")
ps.setInt(3, 10)

Morover, users can now use regular expressions within the WHERE clause of PGQL queries. For example:

SELECT ... MATCH (a:person)-[e:likes]->(b:post) WHERE java_regexp_like(a.name, 'Jo[A-Za-z]*')

Finally, users can now alias GROUP BY variables, as illustrated in the following query.

SELECT person, COUNT(*) MATCH (a:person)->(b) GROUP BY a AS person

Support for Graph Mutation via Change Sets

PGX 19.3.0 adds support to mutate graphs via the Change Sets API in distributed execution mode. For example, users can add vertices and edges to an existing graph and generate a new one:

changeSet = existingGraph.createChangeSet(IdGenerationStrategy.USER_IDS, IdGenerationStrategy.USER_IDS)
changeSet.addVertex(12)
changeSet.addEdge(1, 3, 2)
newGraph = changeSet.build()

More information can be found on the Change Set page.

CSV File Loader

Loading graphs from CSV files will now work in distributed execution mode as well. Refer to the loading from CSV section for more information.

Support for Graph Loading Cancellation

The distributed execution mode now enables users to cancel graph loading and other operations (i.e., graph cloning, graph modification, graph building, and subgraph filtering). When it is cancelled, graph loading is stopped with a low latency and the system state is rolled back as if the command was never executed.

New Features in Shared-memory Execution Mode

PGX 19.3.0 also brings several improvements in shared-memory execution mode.

Support for Querying the TOP K CHEAPEST Path

Users can now find TOP k CHEAPEST paths between any pair of matched source and destination according to a custom cost function, and compute aggregations over their vertices/edges.

All path aggregation constructs that are supported for TOP k SHORTEST queries are also supported by TOP k CHEAPEST queries

For example, the following query returns the sum of the cost property along each of the three lowest-cost (i.e., cheapest) paths that connect any two vertices (src and dst), such that the age of src is lower than the age of dst.

SELECT src, SUM(e.cost), dst
MATCH TOP 3 CHEAPEST ((src) (-[e]-> COST e.cost)* (dst))
WHERE src.age < dst.age

More information can be found on the PGQL specification page.

Support for Partitioned Graphs

Heterogeneous graphs have been renamed to partitioned graphs, for better clarity. Partitioned graphs are now considered a "regular" feature of PGX and not a beta feature any more.

Moreover, PGX 19.3.0 ships several improvements for partitioned graphs:

  • It is now possible to use the PgxGraph.getVertex() and PgxGraph.getEdge() methods and the Property API.
  • The following analyst algorithms are now supported on partitioned graphs: DeepWalk, Pg2Vec, approximate vertex betweenness centrality and Prim

Performance Improvements for the PGQL Query Planner

  • Query planner is now much faster and scales better with number of variables. This has been done by early pruning sub-optimal plans involving cartesian product operations.

  • In distributed execution mode, the query planner now attempts to minimize the number of cross-machine messages that the plan requires. For example, a pattern such as (a)->(b), (a)->(c) can be executed as:

    1. match `a`, match `b`, match `c`; 2. match `b`, match `a`, match `c`; 3. more ...

If the first query plan is chosen, due to the distributed nature of the graph, the engine needs match a, move the intermediate result to b and match b (might be remote), and then move back to a (again, might be remote) to access the edge list of a and match c. Instead, query plan 2 forms a chain b<-a->c and thus does not include the step of moving back to a that could incur extra messaging. The query planner can now detect such patterns and choose the more favorable plan.

Support for Directed and Partitioned Graphs in PgxML

PGX 19.3.0 adds support to enable the algorithms in PgxML library, namely, Deepwalk and Pg2vec be executed on directed and partitioned graphs as well.

What is New in PGX 19.2.0

Extended PGQL Support in Distributed Execution Mode

Added Support for IN Filter Expressions

In distributed execution mode, users can now use IN filter expression within WHERE clause of PGQL queries:

SELECT ... MATCH (a:person)-[e:likes]->(b:post) WHERE a.name IN ('Emily', 'Paul', 'Juan')

Added Support for FROM Clause

In distributed execution mode, users can now run PGQL queries on the session, specifying the graph in the FROM clause.

This means that after loading a graph:

graph = session.readGraphWithProperties("graph.elist.json", "my_graph")

one can now execute:

session.queryPgql("SELECT a FROM my_graph MATCH (a)")

in addition to being able to execute PGQL directly on the graph:

graph.queryPgql("SELECT a MATCH (a)")

Added Support for Distinct Aggregations without Group-by

In distributed execution mode, users can now perform distinct aggregations without group-by in PGQL queries:

SELECT SUM(DISTINCT a.age) MATCH (a:person)

Added Support for Constant Literals in Projections

In distributed execution mode, users can now project literals:

graph.queryPgql("SELECT a, 1, 0.1 AS my_double, 'string', true MATCH (a)")

Beta Features in PGX Shared-memory Execution Mode

With 19.2.0, we also introduce some new features in beta mode, for which the syntax and semantics might change in future versions.

Support for MODIFY Queries Through PGQL

Users can now do the following modifications on pgql graphs * update value of existing session local properties * insert new vertices and edges into session local graphs * delete vertices and edges from session local graphs

The syntax is modified such that a MODIFY query can contain at most one of each modifications (INSERT/UPDATE/DELETE). A MODIFY query is executed on the matched results with snapshot isolation semantics, thus different DML operations under the same MODIFY clause don't see the result of each other. A MODIFY query returns null.

Updating Properties (change to the Beta API in 19.1.0)

The example MODIFY query for updating the value of property age of every person named John changes to the following

MODIFY/*beta*/ g
    ( UPDATE v SET PROPERTIES ( v.age = 42 ) )
FROM g
    MATCH (v:Person)
    WHERE v.name = 'John'

Multiple updates can be executed following the same semantics as in 19.1.0 using the following syntax

MODIFY/*beta*/ g
    ( UPDATE 
        v SET PROPERTIES ( v.age = 42 ),
        u SET PROPERTIES ( u.weight = 3500 )
    )
FROM g
    MATCH (v:Person) <-[:belongs_to]- (u:Car)
    WHERE v.name = 'John'

Inserting New Entities into the Graph

New vertices can be inserted using the INSERT VERTEX clause.

MODIFY/*beta*/ g
    ( INSERT VERTEX u LABELS ('Male') PROPERTIES ( u.age = 42 ) )
FROM g
    MATCH (v:Person)
    WHERE v.name = 'John'

New edges can be inserted using the INSERT EDGE construct.

MODIFY/*beta*/ g
    ( INSERT EDGE e BETWEEN v AND u LABELS ('has') PROPERTIES ( e.since = DATE '2017-09-21' ) )
FROM g
    MATCH (v:Person), (u: Car)
    WHERE v.name = 'John' AND u.id = 'ABC-123'

Deleting Entities from the Graph

Entities can be deleted by enumerating them under the DELETE construct. For example

MODIFY/*beta*/ g
    ( DELETE e, v )
FROM g
    MATCH (v:Person)-[e]->(u: Car)
    WHERE v.name = 'John' AND u.id = 'ABC-123'

More details on MODIFY queries can be found here

What is New in PGX 19.1.0


In 19.1.0, PGX now ships again in two variants: distributed as well as shared-memory execution mode.

Distributed Execution Mode

PGX 19.1.0 reintroduces distributed execution mode functionality and adds several new features compared to PGX 2.7.0 (the last version including distributed execution mode support).

PGQL 1.1 Support

Users can now query graphs using a subset of PGQL 1.1 features in PGX distributed execution mode. In brief, users can perform:

  • Pattern matching -- directed or any-directed edge patterns -- with filters (and labels), e.g., SELECT ... MATCH (a:person)-[e:likes]->(b:post) WHERE a.age > 18 AND a.group = b.group (directed); or SELECT ... MATCH (a:person)-[e:follows]-(b:person) (any-directed)
  • Aggregations without GROUP BY, e.g., SELECT COUNT(*), MAX(a.age), AVG(a.age) MATCH (a:person)-[e:likes]->(b:post) WHERE a.age > 18
  • GROUP BY, e.g., SELECT a.age, AVG(a.income) ... GROUP BY a.age
  • ORDER BY (with LIMIT and OFFSET), e.g., SELECT a.age, ... ORDER BY a.age

For more information on what PGQL 1.1 features are supported in distributed execution mode, refer to PGQL in distributed execution mode overview page.

Server-side Data Structure Support

Users can now create, update and query data structures (sets, sequences and maps) in distributed execution mode. Sets and sequences can only store vertices or edges, and maps allow integers, longs, floats, doubles, boolean, vertices or edges as keys and values.

These structures are used in the same manner as in the shared memory version:

set = g.createVertexSet()
set.add(vertex)
set.remove(vertex)
size = set.size()

Furthermore, a new algorithm is available in the distributed version: inDegreeDistribution, which computes the distribution of number of incoming edges for all vertices.

Two Tables RDBMS Support

Users can now load graphs in Two Tables RDBMS format in PGX distributed execution mode, by creating one database table for vertices, and one for edges, with the desired fields. For more information on this format, see Oracle Two Tables RDBMS.

Graph Builder API Support

In distributed execution mode, PGX now supports a subset of the graph builder API. This includes support for adding vertices and edges via the existing PgxSession.createGraphBuilder() API.

Miscellaneous

Distributed execution mode now supports graph cloning as well as sharing/publishing of graphs, see details here.

Shared Memory Execution Mode

PGX 19.1.0 introduces several new features also for the shared-memory execution mode.

New API to Execute Multiple Tasks from the Same Session Concurrently

Multiple tasks can now be executed concurrently from the same session using PgxSession.runConcurrently when ExecutionEnvironment.allowsConcurrentTasks is enabled in the execution environment.

PGQL 1.2 Support

In shared-memory execution mode, PGX now implements PGQL 1.2.

BETA Features

With 19.1.0, we also introduce some new features in beta mode, for which the syntax and semantics might change in future versions.

Support for Heterogeneous Graphs

In shared-memory execution mode, PGX now supports loading and storing of heterogeneous graphs for a limited set of graph formats (e.g CSV and PGB).

All PGQL operators are supported as well as updating the graph using GraphChangeSet. In addition, a limited set of Green-Marl algorithms are supported on heterogeneous graph, including reachability queries and Bellman-Ford.

Support for Updating Existing Vertex and Edge Properties Through PGQL

Users can now update values of existing, session local properties through PGQL. Please be aware that this is a beta feature, the exact syntax and semantics of the queries might change in the future. The beta features can be turned on with an additional /*beta*/ comment in the query.

The syntax is extended with update clauses that can contain one ore more assignments to properties. An update query returns null.

For example, the following query updates the value of property age of every person named John.

MODIFY/*beta*/ g
    UPDATE v SET PROPERTY v.age = 42
FROM g
    MATCH (v:Person)
    WHERE v.name = 'John'

Properties can be updated via any variable matched by the pattern under the MATCH clause.

MODIFY/*beta*/ g
    UPDATE v, u SET PROPERTIES v.age = 42, u.weight = 3500
FROM g
    MATCH (v:Person) <-[:belongs_to]- (u:Car)
    WHERE v.name = 'John'

More information can be found on the PGQL specification page.

New API to Execute PGQL Queries That Modify the Graph

None of the already existing API methods support the update queries defined above. In order to run them, the following beta API methods (and their asynchronous versions) were defined:

PgqlResultSet executePgql(String query)
PgxGraph cloneAndExecutePgql(String query)
PgxGraph cloneAndExecutePgql(String query, String graphName)

Further details on the behavior, parameters and return value of the new methods can be found here.

External Stores

In shared-memory execution mode, PGX introduces External Stores as a beta functionality for offloading graph properties to an external store such as Elasticsearch and enabling the use of an external store's specialized operations, e.g., text search.

What is New in PGX 3.2


PgxML (Beta) Functionality

Introducing the PgxML library for Graph Learning with two algorithms:

  • Deepwalk to compute vector representations of vertices based on the topology of a graph.
  • Pg2vec to compute vector representations of graphlets based on the topology and properties of those graphlets.
  • Trained models from these algorithms can be stored and reloaded for inference.
  • The PgxML library employs PgxFrames to communicate the output of some operations.

PGX Algorithm (Beta) Functionality

Introducing PGX Algorithm as a beta functionality to write graph algorithms in Java and have it automatically compiled to an efficient parallel implementation.

  • This functionalitu is enabled by setting the graph_algorithm_language configuration option to JAVA.
  • After starting the PGX shell, you can call myAlgorithm = session.compileProgram('/path/to/MyAlgorithm.java') to compile a PGX Algorithm.

PgxFrame (Beta) Functionality

Introducing PgxFrames as a beta functionality to load/store and manipulate tabular data:

  • Tabular data can be loaded/stored from/to CSV data (commma separated or any other separator) or from/to PGB.
  • PgxFrames can be manipulated with operations such as select(), renameColumns, print(), head(), tail(), flatten() .
  • PgqResultSets can be converted to PgxFrames with PgqlResultSet.toFrame() for postprocessing or storing PGQL query output.

CSV Loader

Added support for CSV Loader to load two-table graphs from csv files.

  • Vertices and edges stored in different files.
  • Depending on requirements, csv files could have either pre-defined headers or no headers.

Session Map APIs

Added support for session maps:

  • PGX now has two different types of maps:
    • Graph Maps depend on the graph.
    • Session Maps are mapped directly to the session and can be created using the oracle.pgx.api.PgxSession API.
  • See also graph and session maps tutorial in PGX data types.

Support Subgraph Creation out of a PGQL Result Set

Users can now create a subgraph from a PGQL result set. In the following example, the PGQL query selects all the vertices with values for the property age greater than 24 and then the filter() method creates a subgraph containing the matched vertices and their corresponding edges.

resultSet = g.queryPgql("SELECT x MATCH (x) WHERE x.age > 24")
resultSetVertexFilter = new ResultSetVertexFilter(resultSet, "x")
newGraph = g.filter(resultSetVertexFilter)

This operation is executed on the PGX server when running in client-server mode and removes the need to fetch the result set to the client side in order to create the subgraph. This improvement brings performance benefits for result sets of any size and enables handling very large result sets that may be troublesome to handle on a client machine.

More information can be found in the Filter Expressions tutorial.

Support TOP K SHORTEST Path and Path Returning Queries

Users can now find TOP k SHORTEST paths between any pair of matched source and destination and compute aggregations over their vertices/edges. The distance metric is represented by the number of hops.

For example the following query will output the sum of the edge weights along each of the top 3 shortest paths between each of the matched source and destination pairs:

SELECT src, SUM(e.weight), dst
MATCH TOP 3 SHORTEST ((src) (-[e]->)* (dst))
WHERE src.age < dst.age

The ARRAY_AGG construct allows users to output properties of edges/vertices along the path. For example in the following query:

 SELECT src, ARRAY_AGG(e.weight), dst
 MATCH TOP 3 SHORTEST ((src) (-[e]->)* (dst))
 WHERE src.age < dst.age

the ARRAY_AGG(e.weight) outputs a list containing the weight property of all the edges along the path.

More information can be found on the PGQL specification page.

What is New in PGX 3.1 (not Released to OTN)


PGQL 1.1 Extensions

Added Support for Scalar Sub-queries

Users can now have scalar sub-queries. For example the following query selects all the persons whose age is greater than the average age of their friends:

SELECT a.name MATCH (a) WHERE a.age > (SELECT AVG(b.age) MATCH (a)-[:friendOf]->(b))

Added Support for HAVING Clause

HAVING clause is now supported. One can add conditions on the groups generated by a GROUP BY clause. For example:

SELECT n.department, AVG(n.salary) MATCH (n) WHERE GROUP BY n.department HAVING AVG(n.age) > 30

What is New in PGX 3.0 (not Released to OTN)


New REST API

The new standardized REST API is versioned and easier to use/maintain. Users can refer to the published PGX REST documentation to learn about the PGX specific REST resources and then use REST clients to query the PGX server either in a synchronous mode or an asynchronous one through async-polling. The new API is expected to be more consistent with a better use of HTTP methods and response codes and more self-descriptive with the introduction of hypermedia in responses.

What is New in PGX 2.7


PGX API

PGX 2.7.0 extends the PGX API with support for scalar collections. Scalar collections are server-side data structures that can hold primitive data types like Integer, Long, Float and Double. Users can create and handle scalar collections using the oracle.pgx.api.PgxSession API; e.g., session.createSet(INTEGER) will return a reference to a ScalarSet<INTEGER>, which represents a set of integers. Scalar collections are subclasses of the new oracle.pgx.api.ScalarCollection class and are session-bound, i.e., they do not refer to a specific graph, but to a session. Refer to the new PGX data types tutorial for more information.

New Analytics Algorithms

PGX.SM, the single machine, shared memory execution engine included in PGX adds two algorithms to the list of supported builtins: Topological Sort and Topological Schedule.

PGX.D, the scale-out, distributed execution engine included in PGX extends its builtin analytics algorithm support with thirteen new algorithms from the supported algorithms list: Weighted Pagerank, Personalized Weighted Pagerank, Vertex Betweenness Centrality, Closeness Centrality Unit Length, Hits, Out Degree Centrality, In-Degree Centrality, Degree Centrality, Conductance, Partition Modularity, Partition Conductance, Diameter, and Periphery.

Extended PGQL 1.1 Support

PGX 2.7.0 further extends support for PGQL 1.1.

Support for EXISTS and NOT EXISTS Subqueries

Users can now express existential subqueries. For example:

SELECT a.name, b.name MATCH (a)-[:friendOf]->(b) WHERE
  EXISTS (SELECT * MATCH (a)-[:friendOf]->(c)-[:friendOf]->(b))

Support for Subqueries Inside the PATH Clause

Users can now express a sub-query in the WHERE clause of the PATH definition. For illustration, the following example defines a path ending in a vertex which is not the oldest in the graph:

PATH p AS (a) -> (b) WHERE EXISTS (SELECT * MATCH (x) WHERE x.age > b.age) SELECT ...

One limitation of the current implementation is that only simple constraints (constraints accessing at most one of the pattern variables) are allowed under the WHERE clause. Queries like:

PATH p AS (a) -> (b) WHERE EXISTS (SELECT * MATCH (a) -> (c) -> (b)) SELECT ...

are not yet supported.

Support for Creating Vertex/edge Sets out of a PGQL Result Set

Users can now create vertex or edge sets from of a given PGQL result set. In the following example, the PGQL query selects all the vertices with values for the property age greater than 24 and then the getVertices() creates a vertex set containing such vertices.

resultSet = g.queryPgql("SELECT x MATCH (x) WHERE x.age > 24")
vertexSet = g.getVertices(new ResultSetVertexFilter(resultSet, "x"))

This operation is executed on the PGX server when running in client-server mode and removes the need to fetch the result set to the client side in order to create a collection. This improvement brings performance benefits for result sets of any size and enables handling very large result sets that may be troublesome to handle on a client machine.

Ability to Share Graphs by Name

PGX now supports graph sharing by name, allowing to mark graphs as shared and retrieve them from any session the graph name. Users can mark graphs as shared with other sessions by calling graph.publish() and other sessions can get a reference to the published graphs via session.getGraph("<name>"). Users can see which graphs have been published by calling session.getGraphs(). The owner of a shared graph can modify its name by calling graph.rename("<name>").

Besides graphs, individual properties of a graph can be shared, too, by calling property.publish().

One implication of this feature is that all graph names must now be globally unique across sessions.

More information on graph sharing can be found in the "Publish a graph" tutorial.

Changelog

All the changes and fixes shipping with PGX 2.7 are listed in the changelog.

What is New in PGX 2.6


PGQL 1.1

Pgx 2.6 introduces support for PGQL 1.1 (see the specification at this page). PGQL 1.1 introduces several novelties:

  • Introduced a FROM clause for specifying an input graph.
  • The FROM clause is optional in PgxGraph#queryPgql(String), PgxGraph#preparePgql(String) and PgxGraph#explainPgql(String)
  • The FROM clause is mandatory in PgxSession#queryPgql(String), PgxSession#preparePgql(String) and PgxSession#explainPgql(String)
  • The WHERE clause from PGQL 1.0 has been split up into a MATCH clause and a WHERE clause.
  • The MATCH clause contains the graph pattern (vertices, edges and paths) while the WHERE clause in PGQL 1.1 contains only the filter predicates.
  • Common path expressions now have an optional WHERE clause for specifying filter expressions (e.g. PATH connects_to AS () -[e]-> () WHERE e.status = 'OPEN' SELECT * MATCH ...).
  • Inlined filter predicates through WITH ... have been removed.
  • OO-style function-call syntax (e.g. x.label() or x.id()) has been replaced by functional-style syntax (e.g. label(x) or id(x)).

Additionally, PGX 2.6 ships various improvements to PGQL 1.0, which is still supported (although deprecated); you can find more details in the changelog.

Other Notable Changes

All the changes and fixes shipping with PGX 2.6 are listed in the changelog, here are some highlights:

  • It is now possible to export a graph into multiple files, see documentation for more details.
  • PGX.D (PGX's distributed, scale-out runtime) now supports loading graphs with edge labels.
  • A new community-detection algorithm is available as a built-in: Infomap.

What is New in PGX 2.5


New Green-Marl Compiler

PGX 2.5 introduces a new implementation of the Green-Marl compiler, now based on the Spoofax framework. The new compiler (as usual, available with the OTN package) introduces support for new Green-Marl features:

  • Vertex and Edge filters.
  • Ordered iteration.
  • Iteration over keys and values in maps.

Moreover, the new compiler will work on any system supported by the JDK and it is now possible to compile Green-Marl in a Scala environment.

Notice that foreign syntax is no longer supported by the compiler. You can still get an instance of the old compiler to deal with legacy code; see the changelog for more information.

New Built-in Algorithms

PGX 2.5 introduces new built-in algorithms for cycle detection; see the reference documentation for more details.

Temporal Data Types

With PGX 2.5 you can have more precise control on time-related properties, with support for five temporal data types that map directly to the five temporal types in SQL as well as to the new Java 8 date-time types. The date property type is now deprecated and replaced by local_date, time, timestamp, time_with_timezone and timestamp_with_timezone; the new types are supported both in the PGX API and in PGQL, as the following table summarizes.

Type PGX property type Example plain text Example PGQL literal PGQL ResultSet API
TIMESTAMP WITH TIMEZONE timestamp_with_timezone "2017-08-18 20:15:00+08" TIMESTAMP '2017-08-18 20:15:00+08' java.time.OffsetDateTime getTimestampWithTimezone(..)
TIMESTAMP timestamp "2017-08-18 20:15:00" TIMESTAMP '2017-08-18 20:15:00' java.time.LocalDateTime getTimestamp(..)
TIME WITH TIMEZONE time_with_timezone "20:15:00+08" TIME '20:15:00+08' java.time.OffsetTime getTimeWithTimezone(..)
TIME time "20:15:00" TIME '20:15:00' java.time.LocalTime getTime(..)
DATE local_date "2017-08-18" DATE '2017-08-18' java.time.LocalDate getDate(..)

Spatial Data Type Point2D

PGX 2.5 introduces a new data type to store 2-dimensinal coordinates (e.g., latitude and longitude), as the following table illustrates.

Type PGX property type Example plain text Example PGQL literal PGQL ResultSet API
POINT2D point2d "-64.211157 114.257813" ST_X(vertex1.point2dProp) >= ST_X(vertex2.point2dProp) oracle.pgql.lang.spatial.Point2D getPoint2D(..)

You can refer to the PropertyType class javadoc, the GraphConfig documentation, the PGQL specification, and the pattern matching API reference for more information on Point2D and all the supported data types.

API Improvements

PGX 2.5 introduces several additions and improvements in the Java API:

New Features in PGQL

PGX 2.5 introduces several new features in PGQL; we highlight the most important here.

Prepared Statements

Prepared statements provide a way to safeguard your application from query injection. The use of prepared statements can also speed up query execution as queries do not need to get recompiled every time their bind values change. PGQL uses the question mark symbol (?) to indicate a bind variable. Values for the bind variables are then assigned through the PreparedStatement API, as explained in the reference documentation.

Undirected Edge Queries

PGQL has now support for undirected edge queries, which can be used to query undirected graphs or ignoring edge direction in directed graphs. These two use cases are illustrated in the following two queries:

SELECT d1.name WHERE (d1:Device) -[:connects_to]- (d2:Device), d1.name = 'LoadTransformer 2533'
SELECT m.name WHERE (n:Person) -[:follows]- (m:Person) , n.name = 'Bono'

The first query matches undirected edges labeled connects_to, the second query matches all people that follow or are followed by a person named 'Bono'.

Other Additions and Improvements in PGQL

  • PGQL now has an all_different(a, b, c, ...) function , wich allows to specify that a set of values (typically vertices or edges) are all different from each other.
  • Support for greater than, greater than equal, less than, and less than equal for comparing String values (also works for filter expressions in the Java API).
  • Support the new temporal data types in PGQL queries (see above).
  • Support the new Point2D spatial data type in PGQL queries (see above).
  • Added support for constraints on vertices in PATH patterns, e.g., PATH connects_to_high_volt_dev := (:Device) -> (:Device WITH voltage > 35000) SELECT .... Previously, only constraints on edges in PATH patterns were supported.

Loader Improvements

The PGX graph loader has extended capabilities with PGX 2.5:

  • The Apache Spark loader now supports Spark 2.X.
  • Columns names are now configurable when loading from the Oracle RDBMS in two tables format.
  • The two tables format now supports string, integer and long as vertex ID types.
  • Added support for directly loading compressed (gzip) graph data without the need to unpack the archives first.

Distributed Engine Improvements

PGX.D, the scale-out, distributed execution engine included in PGX, gets several improvements that reduce the functionality gap with the single-machine execution engine:

  • PGX.D now supports top-k and bottom-k for string properties.
  • Fixed a bug concering NULL values (Oracle bug #25491165).
  • Added support for edge properties of vector type.
  • Extended the supported endpoints in the client-server API: added support for rename(), getNeighbours(), getEdges(), getRandomVertex(), getRandomEdge(), getSource(), getDestination().

R Client Improvements

The PGX R clients (part of OAAgraph) got some new functionality:

  • The oaa.graph function now supports vertexIdType argument for string, integer and long.
  • The oaa.graph function now supports loadEdgeIds argument.
  • The oaa.create function now supports storeEdgeLabel and storeNodeLabels arguments.

Other Bug Fixes and Improvements

Numerous other improvements and bug fixes aimed at continuously improving the user experience with PGX also made it into this release; an incomplete list follows:

  • When :timing ON is set, the PGX shell now always outputs time in seconds in the fixed format X+.XXXs, where X is [0-9], instead of switching between seconds and milliseconds. This change should make it easier to parse the logs for timing information.
  • Improved log messages from property loading error handling, by adding the corresponding property name.
  • Improved log messages when parsing values for temporal typed properties fails.
  • Added a client configuration field max_client_http_connections for controlling the maximum number of connections between a PGX Client and PGX Server.
  • Reduced the latency of requests to refresh a graph.
  • Improved the performance of BFS-based algorithms on large data-sets.
  • Improved the performance of GROUP BY and ORDER BY in PGQL

Refer to the changelog for a complete list.

What is New in PGX 2.4


Type Casting in PGQL

PGX provides an extension to PGQL 1.0 that allows data values to be casted from one data type to another via a CAST function.

The syntax is like SQL's CAST function: CAST(expression AS datatype). For example:

SELECT CAST(n.age AS STRING), CAST('123' AS INT), CAST('true' AS BOOLEAN)
WHERE (n:Person)

For more details and supported data types, please see the PGQL Notice.

Transpose Mutation of Graphs

The transpose of a directed graph is the same graph but with all of the edges reversed - simply use the newly added method PgxGraph.transpose() to (non-destructively) return a new graph which is a transpose of the original.

Collapsed-by-default

Undirected graphs are unaffected by transposing.

Performance Improvements

  • Up to 10x performance improvement of two-table loader
  • Certain read-only graph APIs are now 40x - 500x faster
  • Graph refresh from Oracle Database will use delta-updates when possible

Algorithm Updates

  • PRIM algorithm added.
  • More personalization options for SALSA algorithm

Spark Support Improvements

  • Vertex IDs may now be strings
  • Added vertex and edge label support

API Improvements

  • The PGX Node.js Javascript client now supports edge labels
  • Added Java API to construct a PgxConfig object
  • Added API to start an PGX instance with an PgxConfig object
  • Added API to control whether or not graph builder / changeset APIs should keep vertex or edge IDs

Distributed PGX Improvements

The PGX distributed runtime implementation now supports properties of value type string.

R Client

PGX 2.4.0 introduces a new client implemented in the programming language R:

Collapsed-by-default

The R client will be distributed together with the Oracle R Enterprise product suite. Please send an email to oracle-r-enterprise@oracle.com if you are interested in trying it out.

What is New in PGX 2.3


Query Language (PGQL) Enhancements

  • Arithmetic expressions in SELECT, ORDER BY, GROUP BY: previously, arithmetic expressions were only allowed in the WHERE clause while the SELECT, GROUP BY and ORDER BY clauses only allowed basic expression. Now, all the clauses have full support for arithmetic expressions.
  • MIN, MAX and COUNT aggregates in PGQL now work with String values

See our graph pattern matching tutorial

Zeppelin Interpreter Improvements

  • Output pretty-printing
  • Compatibility with Zeppelin 1.6.1
  • Cancellation and timeout support for shell commands
  • Pretty printing of PGQL error messages with underlining of problematic text
  • Collapsed-by-default, pretty-printed stack traces

Collapsed-by-default

Spark Integration Improvements

PGX can now write data back into Spark data structures. See our Spark tutorial

PGX Shell Syntax Improvements

Arguments to Java methods (such as on the analyst built-in object) can now be either done in the traditional way - someMethod(param1, param2), in parameterized form - someMethod(graph : var1, vertex : var2), or as of PGX 2.3, in *abbreviated parameterizd form* -someMethod(g : var1, v : var2)`. For example:

analyst.whomToFollow(g: g, v: g.pickRandomVertex(), maxIt: 100, sDF: 0.85, sMD: 0.01)

Find more PGX shell in features in the PGX Shell reference.

Introducing the Enterprise Scheduler

  • Added an enterprise scheduler, used by default, to manage:
    • Concurrent execution of tasks from multiple sessions
    • Detailed configuration for thread pools with pool weight, priority and maximum number of threads
    • Dynamically sizing the I/O thread pool
    • Detailed, per task settings for each thread pool

Manipulation API Improvements

Loading, undirecting and simplifying graphs now include:

  • Filtering graphs during loading
  • Simplifying and undirecting graphs now can
  • Use a MutationStrategy to choose or merge properties
  • Use edge reduction strategies

See the Mutation API reference.

Loading from Files in Parallel

PGX can now read from multiple files in Adjacency List or Edge List format in parallel. See the graph configuration reference and the plain text formats reference.

Other Changes

Other changes include:

  • Performance improvements that further parallelize graph loading, improve memory allocation, partition creation and iteration
  • Pagerank algoritm now suports normalization
  • Distributed runtime now also supports Weighted Pagerank and Personalized SALSA algorithms

Find the full list in the changelog.



What is New in PGX 2.2


Load Graph from Apache Spark

PGX can load data from Apache Spark and create a graph instance out of it. The user can simply define two tables from Spark RDD or DataFrame and point it to a PGX instance. The whole PGX-Spark interaction can be seamlessly done from the Spark shell or a Spark application. Note that compared to native graph libraries in Spark, PGX achieves orders of magnitude better performance in graph processing. See this tutorial for more information.

var sqlc = new SQLContext(sc)

// dataFrame for vertex
var vRowRdd = ...
var vDataframeScheme = new StructType().add("ID", LongType).add("VProp1", IntegerType)
var vDataframe = sqlc.createDataFrame(vRowRdd, vDataframeScheme)

// dataFrame for edge
var eRowRdd = ...
var eDataframeScheme = new StructType().add("SRCID", LongType).add("DSTID", LongType).add("EProp1", DoubleType)
var eDataframe = sqlc.createDataFrame(eRowRdd, eDataframeScheme)

// connect to PGX and create a PGX graph from two tables (dataframes)
var pgxSession = Pgx.getInstance("http://...").createSession("my-spark-session");
var pgxContext = new PgxSparkContext(sparkContext, pgxSession);
var myGraph = pgxContext.read(vDataframe, eDataframe, "spark-test-graph");

Path Query in PGQL

PGX supports regular path queries as in PGQL 1.0 where the user can test the existence of arbitrary-length paths between set of vertices. Moreover, the user can define the pattern of arbitrary path as repetition of finite-length segments. For instance, the following query finds all common ancestors from Mario and Luigi vertex, where a vertex is an ancestor of another if reachable in any hops but only through has_mother or has_father edges:

See our tutorial for more examples of path queries.


Zeppelin Notebook Interpreter

PGX provides a Zeppelin Notebook interpeter for PGX so that you can add PGX graph analysis features into your own Zeppelin Notebook.


Exporting Compiled Algorithm

After compiling a custom Green-Marl algorithm, the user can export the compiled algorithm into a jar. By loading up this jar at subsequent PGX initialization, the user can re-use this algorithm without compiling the Green-Marl program again. See this tutorial for more examples.

gmComp = Compilers.findCompiler(Language.GM)
my_algorithm = gmComp.compile("../src/my_algorithm.gm", "my_algorithm")
my_algorithm.writeAsJar(new FileOutputStream("../lib/my_algorithm.jar");
export CLASSPATH=$CLASSPATH:$(some_path)/lib/my_algorithm.jar

Node.js Client

PGX now supports yet another client with different language binding -- namely JavaScript, Node.js to be precise. The PGX Node.js client connects to PGX server remotely and invokes PGX methods via the REST API. The PGX Node.js client comes with a convenient API layer for this.

'use strict'

const pgx = require('oracle-pgx-client');
let p = pgx.connect("https://...", ...);
...
p.then(function(session) {
  return session.readGraphWithProperties(...);
}).then(function(graph) {
  return graph.session.analyst.pagerank(graph);
}).then(function(property) {
  return property.iterateValues( ... )
});

Named Arguments in PGX Shell

It became easier to invoke built-in algorithms that have many arguments, as PGX shell supports named arguments with default values. That is, when invoking PGX built-in algorithms from PGX Shell, the user does not need to set all the method arguments. Instead, the user only needs to set specific argument names and their values that he/she wants to override the default values. See this document for more details

analyst.pagerank(graph:g, max:100, variant:'APPROXIMATE')


What is New in PGX 2.1


Delta Update and Auto Refresh

When graph data is loaded from Oracle Propery Graph, PGX is able to track down changes of the data in the database. Then PGX can quickly load up only those changes and create a new in-memory snapshot, rather than loading the whole graph again. PGX also provides automatic refresh mode so that such delta loading happens periodically. Details can be found in this tutorial.

Note: this feature is only available with Oracle Spatial and Graph Property Graph in Oracle 12.2c.


Subgraph Loading with Pushdown Filter

When dealing with a very large graph instance, the user can load up only an interesting subgraph of the original graph into PGX and analyze it. PGX allows the user to define such a subgraph with simple filter expressions. Moreover, the execution of those filters can happen at either PGX-side or data source side. Check out the related tutorial and reference.

{
  "format": "pg",
  "jdbc_url": "jdbc:oracle:thin:..."
  ...
  "loading": {
    "filter": {
      "type": "edge",
      "expression": "src.foo < 42 and edge.color == 'RED'"
    }
    "filter_strategy":"auto"
  }
}

Loading Graph from Oracle RDF

PGX provides a mechanism to load RDF data set into PGX by converting the RDF data into Property Graph. More specifically, a user can customize rules for data model conversion. See this guide for more information.



What is New in PGX 2.0


PGX Distributed Execution

PGX now provides an option for distributed (i.e. multi-machine) execution mode, which makes it possible to analyze very large graph instances that would not fit in a single address space. For details, please refer to the architectural overview and our tutorial for setting up PGX distributed execution.


Vertex and Edge Label

Labels are special strings that can be associated with vertices and edges, whose main purpose is to specify different categories of vertices and edges and to provide convenient mechanism for accessing those categories. For instance, you can efficiently retrieve vertices that are labeled as "student" or "professors" in your graph data set. Note that PGQL provides a special syntax for label access.

Check out the APIs for accessing labels as well as guides for loading up label information from various graph data sources.


Graph Builder and ChangeSet API

PGX provides a new API that allows to create an in-memory graph instance, programmatically. That is, the user can first create an empty graph instance and then add vertices and edges to it. Please see the related API guide and tutorial for details.

GraphBuilder<Integer> builder = session.newGraphBuilder();
builder.addVertex(0); // vertex id
builder.addVertex(1);
builder.addEdge(0, 1, 1024); // src vertex, edge vertex, edge id
PgxGraph graph = builder.build();

In addition, PGX provides a similar API for adding/removing vertices and edges to/from an existing graph instance. This tutorial contains more information about the API.



What is New in PGX 1.2


Graph Pattern Matching with a Query Language

PGX 1.2 implements a graph query engine which allows you to find every subgraph instance that matches with the given query pattern in your graph data. For example, you can ask questions like: "Find all (vertices) whom both Mario (vertex) and Luigi (vertex) like (edge) but is older (property) than Mario, by more than two years, and print their names and ages (property)."

Please check out our tutorial and use cases for details of our query language and its usages.


Implementing a Recommendation Engine with PGX

The PGX 1.2 built-in package includes a simple recommendation engine that is built upon a graph algorithm. Given a dataset of ratings from the users about their purchased items, the engine predicts future rating values for the unrated items from a user. Note that the ratings data set, usually recognized as a sparse matrix, is also viewed as a bipartite graph, which is how one can solve this problem with PGX.

Our use case documentation provides more information about the recommendation engine.


Very Fast Centrality Computation

PGX 1.2 brings you new fast implementations of Betweenness Centrality and Closeness Centrality algorithms. The following table compares the performance measurement of these implementations against a public open-source library, namely iGraph.

Algorithm PGX iGraph
Closeness Centrality 1.8 hours not finished after 5 days
Betweenness Centrality 24 hours not finished after 5 days

The performance was measured with a 16.5 million edge graph using a 36-core 2.6 GHz Intel Xeon machine.