18.11 Best Practices for Tuning PGQL Queries
This section describes best practices regarding memory allocation, parallelism, and query planning.
- Memory Allocation
The graph server (PGX) hason-heap
andoff-heap
memory, the earlier being the standard JVM heap while the latter being a separate heap that is managed by PGX. Just like graph data, intermediate and final results of PGQL queries are partially stored on-heap and partially off-heap. Therefore, both heaps are needed. - Parallelism
By default, all available processor threads are used to process PGQL queries. However, if needed, the number of threads can be limited by setting theparallelism
option of the graph server (PGX). - Query Plan Explaining
ThePgxGraph.explainPgql(String query)
method is used to get insight into the query plan of the query. The method returns an instance ofOperation (package oracle.pgx.api)
which has the following methods:
Parent topic: Executing PGQL Queries Against the Graph Server (PGX)
18.11.1 Memory Allocation
The graph server (PGX) has on-heap
and
off-heap
memory, the earlier being the standard JVM heap while the
latter being a separate heap that is managed by PGX. Just like graph data, intermediate and
final results of PGQL queries are partially stored on-heap and partially off-heap.
Therefore, both heaps are needed.
In case of the on-heap memory, the default maximum is chosen upon startup of
the JVM, but it can be overwritten through the -Xmx
option.
In case of the off-heap, there is no maximum set by default and the off-heap
memory usage, therefore, keeps increasing automatically until it depletes the system
resources, in which case the operation is canceled, it's memory is released, and an
appropriate exception is passed to the user. If needed, a maximum off-heap size can be
configured through the max_off_heap_size
option in the graph server
(PGX).
A ratio of 1:1 for on-heap versus off-heap is recommended as a good starting point to allow for the largest possible graphs to be loaded and queried. See Configuring On-Heap Limits for the steps to configure the on-heap memory size.
Parent topic: Best Practices for Tuning PGQL Queries
18.11.2 Parallelism
By default, all available processor threads are used to process PGQL
queries. However, if needed, the number of threads can be limited by
setting the parallelism
option of the graph server
(PGX).
See Configuration Parameters for the Graph Server (PGX) Engine for more information on the graph server configuration parameters.
Parent topic: Best Practices for Tuning PGQL Queries
18.11.3 Query Plan Explaining
The PgxGraph.explainPgql(String query)
method is used to
get insight into the query plan of the query. The method returns an instance of
Operation (package oracle.pgx.api)
which has the following
methods:
print()
: for printing the operation and its child operationsgetOperationType()
: for getting the type of the operationgetPatternInfo()
: for getting a string representation of the operationgetCostEstimate()
: for getting the cost of the operationgetTotalCostEstimate()
: for getting the cost of the operations and its child operationsgetCardinatlityEstimate()
: for getting the expected number of result rowsgetChildren()
: for accessing the child operations
Consider the following example:
g.explainPgql("SELECT COUNT(*) FROM MATCH (n) -[e1]-> (m) -[e2]-> (o)").print()
\--- GROUP BY GroupBy {"cardinality":"42", "cost":"42", "accumulatedCost":"58.1"}
\--- (m) -[e2]-> (o) NeighborMatch {"cardinality":"3.12", "cost":"3.12", "accumulatedCost":"16.1"}
\--- (n) -[e1]-> (m) NeighborMatch {"cardinality":"5", "cost":"5", "accumulatedCost":"13"}
\--- (n) RootVertexMatch {"cardinality":"8", "cost":"8", "accumulatedCost":"8"}
In the above example, the print()
method is used to print
the query plan.
If a query plan is not optimal, it is often possible to rewrite the query to
improve its performance. For example, a SELECT
query may be split into
an UPDATE
and a SELECT
query as a way to improve the
total runtime.
Note that the graph server (PGX) does not provide a hint mechanism.
Also, printing the query plan shows the filters used in the query. For example:
g.explainPgql("SELECT id(n) FROM MATCH (n)-[e]->(m) WHERE " +
...> "id(n) > 500 " +
...> "AND id(n) < 510 " +
...> "AND id(n) <> 509 " +
...> "AND id(n) <> 507 ").print()
\--- Projection {"cardinality":"146", "cost":"0", "accumulatedCost":"175"}
\--- (n) -[e]-> (m) NeighborMatch {"cardinality":"146", "cost":"146", "accumulatedCost":"175"}
\--- (n) RootVertexMatch {"cardinality":"29.2", "cost":"29.2", "accumulatedCost":"29.2"}
WHERE $filter1
filter1: (id(n) <> 509) AND
(id(n) <> 507) AND
(id(n) > 500) AND
(id(n) < 510)
In the preceding example, since the query has filters that spans more than three lines, the filters are shown displayed below the query plan. If the filters are less than three lines, then the filters are shown directly within the query plan tree as shown:
g.explainPgql("SELECT id(n) FROM MATCH (n)-[e]->(m) WHERE " +
...> "id(n) > 500 " +
...> "AND id(n) < 510 ").print()
\--- Projection {"cardinality":"162", "cost":"0", "accumulatedCost":"194"}
\--- (n) -[e]-> (m) NeighborMatch {"cardinality":"162", "cost":"162", "accumulatedCost":"194"}
\--- (n) RootVertexMatch {"cardinality":"32.4", "cost":"32.4", "accumulatedCost":"32.4"}
WHERE (id(n) > 500) AND
(id(n) < 510)
Parent topic: Best Practices for Tuning PGQL Queries