Performance Considerations for PGQL Queries

Recursive Queries

The following indexes are recommended in order to speed up execution of recursive queries:

For underlying VERTEX tables of the recursive pattern, an index on the key column
For underlying EDGE tables of the recursive pattern, an index on the source key column

Note:
You can also create index on (source key, destination key).

For example, consider the following CREATE PROPERTY GRAPH statement:

CREATE PROPERTY GRAPH people
  VERTEX TABLES(
    person
      KEY ( id )
      LABEL person
      PROPERTIES( name, age )
  )
  EDGE TABLES(
    knows
      key (person1, person2)
      SOURCE KEY ( person1 ) REFERENCES person (id)
      DESTINATION KEY ( person2 ) REFERENCES person (id)
      NO PROPERTIES
  )
  OPTIONS ( PG_VIEW )

And also consider the following query:

SELECT COUNT(*)
FROM MATCH ANY SHORTEST ( (n:Person) -[e:knows]->* (m:Person) )
WHERE n.id = 1234

In order to improve performance of the recursive part of the preceding query, the following indexes must exist:

CREATE INDEX <INDEX_NAME> ON PERSON(ID)
CREATE INDEX <INDEX_NAME> ON KNOWS(PERSON1) or
CREATE INDEX <INDEX_NAME> ON KNOWS(PERSON1, PERSON2)

Composite Vertex Keys

For composite vertex keys, query execution can be optimized with the creation of function-base indexes on the key columns:

For underlying VERTEX tables of the recursive pattern, a function-based index on the comma-separated concatenation of key columns
For underlying EDGE tables of the recursive pattern, a function-based index on the comma-separated concatenation of source key columns

Note:
You can also create index on (source key columns, destination key columns).

For example, consider the following CREATE PROPERTY GRAPH statement:

CREATE PROPERTY GRAPH people
  VERTEX TABLES(
    person
      KEY ( id1, id2 )
      LABEL person
      PROPERTIES( name, age )
  )
  EDGE TABLES(
    knows
      key (id)
      SOURCE KEY ( id1person1, id2person1 ) REFERENCES person (id1,id2)
      DESTINATION KEY ( id1person2, id2person2 ) REFERENCES person (id1,id2)
      NO PROPERTIES
  )
  OPTIONS ( PG_VIEW )

And also consider the following query:

SELECT COUNT(*)
FROM MATCH ANY SHORTEST ( (n:Person) -[e:knows]->* (m:Person) )
WHERE n.id = 1234

In order to improve performance of the recursive part of the preceding query, the following indexes must exist:

CREATE INDEX <INDEX_NAME> ON PERSON (ID1 || ',' || ID2)
CREATE INDEX <INDEX_NAME> ON KNOWS (ID1PERSON1 || ',' || ID2PERSON1) or
CREATE INDEX <INDEX_NAME> ON KNOWS (ID1PERSON1 || ',' || ID2PERSON1, ID1PERSON2 || ',' || ID2PERSON2)

If some of the columns in a composite vertex key is a string column, the column needs to be comma-escaped in the function-base index creation.

For example, if column ID1 in table PERSON of the preceding example is of typeVARCHAR2(10), you need to escape the comma for the column as follows:

replace(ID1, ',', '\,')

So, the indexes to improve performance will result as shown:

CREATE INDEX <INDEX_NAME> ON PERSON (replace(ID1, ',', '\,') || ',' || ID2)
CREATE INDEX <INDEX_NAME> ON KNOWS (replace(ID1PERSON1, ',', '\,') || ',' || ID2PERSON1)

Using Query Optimizer Hints

The following hints can be used to influence translation of PGQL variable-length path patterns to SQL:

REVERSE_PATH: Switches on or off the reverse path optimization (ON by default). If ON, it automatically determines if the pattern can best be evaluated from source to destination or from destination to source, based on specified filter predicates.
PUSH_SRC_HOPS: Switches on or off pushing source filter optimization (ON by default). If ON, then filter predicates are used to limit the number of source vertices (or destination vertices if path evaluation is reversed) and thereby the search space of variable-length path pattern evaluations.
PUSH_DST_HOPS: Switches on or off pushing destination filter optimization (OFF by default). If ON, then filter predicates are used to limit the number of destination vertices (or source vertices if path evaluation is reversed) and thereby the search space of variable-length path pattern evaluations.

The preceding hints can be configured as options parameter in the following Java API methods:

executeQuery(String pgql, String options)
translateQuery(String pgql, String options)
execute(String pgql, String matchOptions, String options)

For example, consider the following PGQL query:

SELECT v1.name AS v1, v2.name AS v2, v3.name As v3 
FROM MATCH (v1:Person)-[e1:friendOf]->(v2:Person), 
MATCH ANY (v2:Person)-[e2:friendOf]->*(v3:Person) 
WHERE v1.name= 'Bob'

When the preceding query is executed using the default option for PUSH_SRC_HOPS, the output for start_nodes_translation displays the filter expression as shown:

System.out.println(pgqlStatement.translateQuery(pgql).getSqlTranslation())
...
...
start_nodes_translation => (to_clob('SELECT ''PERSONS'' AS "src_table", e1.person_b AS "src_key"
FROM "GRAPHUSER"."PERSONS" "V1", "GRAPHUSER"."FRIENDSHIPS" "E1"
WHERE (((e1.person_a = v1.person_id) AND NOT(e1.person_b IS NULL)) AND (v1.name = ''Bob''))')),
     end_nodes_translation => (to_clob('SELECT ''PERSONS'' AS "dst_table", v3.person_id AS "dst_key"
FROM "GRAPHUSER"."PERSONS" "V3"')),
...
...

If the preceding query is executed with the hint PUSH_SRC_HOPS=F, then the query is translated into SQL as shown:

System.out.println(pgqlStatement.translateQuery(pgql,"PUSH_SRC_HOPS=F").getSqlTranslation())

...
...start_nodes_translation => (to_clob('SELECT ''PERSONS'' AS "src_table", v2.person_id AS "src_key"
FROM "GRAPHUSER"."PERSONS" "V2"')),
     end_nodes_translation => (to_clob('SELECT ''PERSONS'' AS "dst_table", v3.person_id AS "dst_key"
FROM "GRAPHUSER"."PERSONS" "V3"')),
...
...

Speed Up Query Translation Using Graph Metadata Cache and Translation Cache

The following global caches help to speed up PGQL query translation:

Graph Metadata Cache: Stores graph metadata such as tables, labels, properties, and so on.
Translation Cache: Stores PGQL to SQL translation.

You can configure the caches using the following Java APIs:

clearTranslationCache()
disableTranslationCache()
enableTranslationCache()
setTranslationCacheMaxCapacity(int maxCapacity)
clearGraphMetadataCache()
disableGraphMetadataCache()
enableGraphMetadataCache()
setGraphMetadataCacheMaxCapacity(int maxCapacity)

These preceding methods are part of the PgqlConnection class. Separate caches are maintained for each database user such that cached objects are shared between different PgqlConnection objects if they have the same connection URL and user underneath.