Best Practices for Tuning PGQL Queries

18.11 Best Practices for Tuning PGQL Queries

This section describes best practices regarding memory allocation, parallelism, and query planning.

Memory Allocation
The graph server (PGX) has on-heap and off-heap memory, the earlier being the standard JVM heap while the latter being a separate heap that is managed by PGX. Just like graph data, intermediate and final results of PGQL queries are partially stored on-heap and partially off-heap. Therefore, both heaps are needed.
Parallelism
By default, all available processor threads are used to process PGQL queries. However, if needed, the number of threads can be limited by setting the parallelism option of the graph server (PGX).
Query Plan Explaining
The PgxGraph.explainPgql(String query) method is used to get insight into the query plan of the query. The method returns an instance of Operation (package oracle.pgx.api) which has the following methods:

Parent topic: Executing PGQL Queries Against the Graph Server (PGX)

18.11.1 Memory Allocation

The graph server (PGX) has on-heap and off-heap memory, the earlier being the standard JVM heap while the latter being a separate heap that is managed by PGX. Just like graph data, intermediate and final results of PGQL queries are partially stored on-heap and partially off-heap. Therefore, both heaps are needed.

In case of the on-heap memory, the default maximum is chosen upon startup of the JVM, but it can be overwritten through the -Xmx option.

In case of the off-heap, there is no maximum set by default and the off-heap memory usage, therefore, keeps increasing automatically until it depletes the system resources, in which case the operation is canceled, it's memory is released, and an appropriate exception is passed to the user. If needed, a maximum off-heap size can be configured through the max_off_heap_size option in the graph server (PGX).

A ratio of 1:1 for on-heap versus off-heap is recommended as a good starting point to allow for the largest possible graphs to be loaded and queried. See Configuring On-Heap Limits for the steps to configure the on-heap memory size.

Parent topic: Best Practices for Tuning PGQL Queries

18.11.2 Parallelism

By default, all available processor threads are used to process PGQL queries. However, if needed, the number of threads can be limited by setting the parallelism option of the graph server (PGX).

See Configuration Parameters for the Graph Server (PGX) Engine for more information on the graph server configuration parameters.

Parent topic: Best Practices for Tuning PGQL Queries

18.11.3 Query Plan Explaining

The PgxGraph.explainPgql(String query) method is used to get insight into the query plan of the query. The method returns an instance of Operation (package oracle.pgx.api) which has the following methods:

print(): for printing the operation and its child operations
getOperationType(): for getting the type of the operation
getPatternInfo(): for getting a string representation of the operation
getCostEstimate(): for getting the cost of the operation
getTotalCostEstimate(): for getting the cost of the operations and its child operations
getCardinatlityEstimate(): for getting the expected number of result rows
getChildren(): for accessing the child operations

Consider the following example:

g.explainPgql("SELECT COUNT(*) FROM MATCH (n) -[e1]-> (m) -[e2]-> (o)").print()
\--- GROUP BY  GroupBy {"cardinality":"42", "cost":"42", "accumulatedCost":"58.1"}
     \--- (m) -[e2]-> (o) NeighborMatch {"cardinality":"3.12", "cost":"3.12", "accumulatedCost":"16.1"}
          \--- (n) -[e1]-> (m) NeighborMatch {"cardinality":"5", "cost":"5", "accumulatedCost":"13"}
               \--- (n) RootVertexMatch {"cardinality":"8", "cost":"8", "accumulatedCost":"8"}

In the above example, the print() method is used to print the query plan.

If a query plan is not optimal, it is often possible to rewrite the query to improve its performance. For example, a SELECT query may be split into an UPDATE and a SELECT query as a way to improve the total runtime.

Note that the graph server (PGX) does not provide a hint mechanism.

Also, printing the query plan shows the filters used in the query. For example:

g.explainPgql("SELECT id(n) FROM MATCH (n)-[e]->(m) WHERE " +
...> "id(n) > 500 " +
...> "AND id(n) < 510 " +
...> "AND id(n) <> 509 " +
...> "AND id(n) <> 507 ").print()
\--- Projection {"cardinality":"146", "cost":"0", "accumulatedCost":"175"}
     \--- (n) -[e]-> (m) NeighborMatch {"cardinality":"146", "cost":"146", "accumulatedCost":"175"}
          \--- (n) RootVertexMatch {"cardinality":"29.2", "cost":"29.2", "accumulatedCost":"29.2"}
                WHERE $filter1
filter1: (id(n) <> 509) AND
         (id(n) <> 507) AND
         (id(n) > 500) AND
         (id(n) < 510)

In the preceding example, since the query has filters that spans more than three lines, the filters are shown displayed below the query plan. If the filters are less than three lines, then the filters are shown directly within the query plan tree as shown:

g.explainPgql("SELECT id(n) FROM MATCH (n)-[e]->(m) WHERE " +
...> "id(n) > 500 " +
...> "AND id(n) < 510 ").print()
\--- Projection {"cardinality":"162", "cost":"0", "accumulatedCost":"194"}
     \--- (n) -[e]-> (m) NeighborMatch {"cardinality":"162", "cost":"162", "accumulatedCost":"194"}
          \--- (n) RootVertexMatch {"cardinality":"32.4", "cost":"32.4", "accumulatedCost":"32.4"}
                WHERE (id(n) > 500) AND
                      (id(n) < 510)

Parent topic: Best Practices for Tuning PGQL Queries