23.1 Configuration Parameters for the Graph Server (PGX) Engine

You can configure the graph server (PGX) engine parameters in the /etc/oracle/graph/pgx.conf JSON file.

During startup, the graph server (PGX) picks the settings in the /etc/oracle/graph/pgx.conf file, by default.

The following tables describe the different graph server (PGX) runtime configuration options.

Graph Server (PGX) Engine Parameters

The graph server (PGX) engine parameters are described in the following table:

Table 23-1 Runtime Parameters for the Graph Server (PGX) Engine

Parameter Type Description Default

admin_request_cache_timeout

integer

After how many seconds admin request results get removed from the cache. Requests which are not done or not yet consumed are excluded from this timeout. Note: This is only relevant if PGX is deployed as a webapp.

60

allow_idle_timeout_overwrite

boolean

If true, sessions can overwrite the default idle timeout.

true

allow_override_scheduling_information

boolean

If true, allow all users to override scheduling information like task weight, task priority, and number of threads

true

allow_task_timeout_overwrite

boolean

If true, sessions can overwrite the default task timeout.

true

allow_user_auto_refresh

boolean

If true, users may enable auto refresh for graphs they load. If false, only graphs mentioned in preload_graphs can have auto refresh enabled.

false
allowed_remote_loading_locations array of string Allow loading graphs into the PGX engine from remote locations (http, https, ftp, ftps, s3). If empty, as by default, no remote location is allowed. If "*" is specified in the array, all remote locations are allowed. Only the value "*" is currently supported. Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting. Note that this parameter reduces security and therefore use it only when needed. []
authorization array of object Mapping of users and roles to resources and permissions for authorization. []
authorization_session_create_allow_all boolean If true allow all users to create a PGX session regardless of permissions granted to them. false
basic_scheduler_config object Configuration parameters for the fork join pool backend. null

bfs_iterate_que_task_size

integer

Task size for BFS iterate QUE phase.

128

bfs_threshold_parent_read_based

number

Threshold of BFS traversal level items to switch to parent-read-based visiting strategy.

0.05

bfs_threshold_read_based

integer

Threshold of BFS traversal level items to switch to read-based visiting strategy.

1024

bfs_threshold_single_threaded

integer

Until what number of BFS traversal level items vertices are visited single-threaded.

128

character_set

string

Standard character set to use throughout PGX. UTF-8 is the default. Note: Some formats may not be compatible.

utf-8

cni_diff_factor_default

integer

Default diff factor value used in the common neighbor iterator implementations.

8

cni_small_default

integer

Default value used in the common neighbor iterator implementations, to indicate below which threshold a subarray is considered small.

128

cni_stop_recursion_default

integer

Default value used in the common neighbor iterator implementations, to indicate the minimum size where the binary search approach is applied.

96
data_memory_limits object Memory limits configuration parameters. null

dfs_threshold_large

integer

Value that determines at which number of visited vertices the DFS implementation will switch to data structures that are optimized for larger numbers of vertices.

4096

enable_csrf_token_checks

boolean

If true, the PGX webapp will verify the Cross-Site Request Forgery (CSRF) token cookie and request parameters sent by the client exist and match. This is to prevent CSRF attacks.

true

enable_gm_compiler

boolean

If true, enable dynamic compilation of PGX Algorithm API (or Green-Marl code) during runtime.

true

enable_graph_loading_cache

boolean

If true, activate the graph loading cache that will accelerate loading of graphs that were previously loaded (can only be disabled in embedded mode).

true

enable_graph_sharing

boolean

Indicates if a user is allowed to grant read permission on its published graphs to other users. This flag is only relevant for a remote server.

true

enable_memory_limits_checks

boolean

If true the graph server will enforce the configured memory limits.

true
enable_ml_accelerators boolean If true, the graph server will utilize the available ML accelerators to run faster machine learning trainings. true

enable_shutdown_cleanup_hook

boolean

If true, PGX will add a JVM shutdown hook that will automatically shutdown PGX at JVM shutdown. Notice: Having the shutdown hook deactivated and not explicitly shutting down PGX may result in pollution of your temp directory.

true
enable_snapshot_properties_publish_state_propagation boolean If true, properties in a new snapshot will inherit the publishing state of properties in the parent snapshot. true

enterprise_scheduler_config

object

Configuration parameters for the enterprise scheduler. See Table 23-3 and Table 23-4 for more information.

null

enterprise_scheduler_flags

object

[relevant for enterprise_scheduler]  Enterprise scheduler-specific settings.

null

explicit_spin_locks

boolean

true means spin explicitly in a loop until lock becomes available. false means using JDK locks which rely on the JVM to decide whether to context switch or spin. Setting this value to true usually results in better performance.

true
file_locations array of object The file locations that can be used in the authorization-config. []
graph_algorithm_language enum[GM, JAVA] Front-end compiler to use. JAVA
graph_sharing_option enum[allow_data_sharing, disallow_data_sharing, allow_traceable_data_sharing_for_same_user] This is to manage if a graph can be published and shared with other users. allow_data_sharing
graph_validation_level enum[low, high] Level of validation performed on newly loaded or created graphs. low
ignore_incompatible_backend_operations boolean If true, only log when encountering incompatible operations and configuration values in RTS or FJ pool. If false, throw exceptions. false
in_place_update_consistency_model enum[ALLLOW_INCONSISTENCIES, CANCEL_TASKS] Consistency model used when in-place updates occur. Only relevant if in-place updates are enabled. Currently updates are only applied in place if the updates are not structural (Only modifies properties). Two models are currently implemented, one only delays new tasks when an update occurs, the other also delays running tasks. ALLOW_INCONSISTENCIES
init_pgql_on_startup boolean If true PGQL is directly initialized on start-up of PGX. Otherwise, it is initialized during the first use of PGQL. true
interval_to_poll_max integer Exponential backoff upper bound (in ms), which once reached, the job status polling interval is fixed 1000
java_home_dir string The path to Java's home directory. If set to <system-java-home-dir>, use the java.home system property. <system-java-home-dir>
large_array_threshold integer Threshold when the size of an array is too big to use a normal Java array. This depends on the used JVM. (Defaults to Integer.MAX_VALUE - 3) 2147483644
max_active_sessions integer

Maximum number of sessions allowed to be active at a time.

1024
max_distinct_strings_per_pool integer [only relevant if string_pooling_strategy is indexed] Number of distinct strings per property after which to stop pooling. If the limit is reached, an exception is thrown. 65536
max_http_client_request_size long Maximum size in bytes of any http request sent to the PGX server over the REST API. Setting it to -1 allows requests of any size. 10485760
max_off_heap_size integer

Maximum amount of off-heap memory (in megabytes) that PGX is allowed to allocate before an OutOfMemoryError will be thrown.

Note that this limit is not guaranteed to never be exceeded, because of rounding and synchronization trade-offs. It only serves as threshold when PGX starts to reject new memory allocation requests.

<available-physical-memory>
max_on_heap_memory_usage_ratio number Maximum ratio of on-heap memory that PGX is allowed to use, between 0 and 1. 0.9
max_queue_size_per_session integer

The maximum number of pending tasks allowed to be in the queue, per session. If a session reaches the maximum, new incoming requests of that sesssion get rejected. A negative value means infinity or unlimited..

-1
max_snapshot_count integer

Number of snapshots that may be loaded in the engine at the same time. New snapshots can be created via auto or forced update. If the number of snapshots of a graph reaches this threshold, no more auto-updates will be performed, and a forced update will result in an exception until one or more snapshots are removed from memory. A value of zero indicates to support an unlimited amount of snapshots.

0
memory_allocator enum[basic_allocator, enterprise_allocator] The memory allocator to use. basic_allocator
memory_cleanup_interval integer

Memory cleanup interval in seconds.

5
min_array_compaction_threshold number Minimum value (only relevant for graphs optimized for updates) that can be used for the array_compaction_threshold value in graph configuration. If a graph configuration attempts to use a value lower than the one specified by min_array_compaction_threshold, it will use min_array_compaction_threshold instead. 0.2
min_fetch_interval_sec integer For delta-refresh (only relevant if the graph format supports delta updates), the lowest interval at which a graph source is queried for changes. You can tune this value to prevent PGX from hanging due to too frequent graph delta-refreshing. 2
min_update_interval_sec integer For auto-refresh, the lowest interval after which a new snapshot is created, either by reloading the entire graph or if the format supports delta-updates, out of the cached changes (only relevant if the format supports delta updates). You can tune this value to prevent PGX from hanging due to too frequent graph auto-refreshing. 2
ms_bfs_frontier_type_strategy enum[auto_grow, short, int]

The type strategy to use for MS-BFS frontiers.

auto_grow
num_spin_locks integer

Number of spin locks each generated app will create at instantiation. Trade-off: a small number implies less memory consumption; a large number implies faster execution (if algorithm uses spin locks).

1024
parallelism integer Number of worker threads to be used in thread pool. Note: If the caller thread is part of another thread-pool, this value is ignored and the parallelism of the parent pool is used. <number-of-cpus>
pattern_matching_supernode_cache_threshold integer Minimum number of a node's neighbor to be a supernode. This is for the pattern matching engine. 1000
permission_checks_interval integer Interval in seconds to perform permission checks on source graphs. 60
pgx_realm object Configuration parameters for the realm.

See Table 23-2.

null
pgx_server_base_url string This is used when deploying the graph server behind a load balancer to make clients before 21.3 backward compatible. The value should be set to the load balancer address. null
pooling_factor number [only relevant if string_pooling_strategy is on_heap] This value prevents the string pool to grow as big as the property size, which could render the pooling ineffective. 0.25
preload_graphs array of object

List of graph configs to be registered at start-up. Each item includes path to a graph config, the name of the graph and whether it should be published.

[]
random_generator_strategy enum[non_deterministic, deterministic] Method of generating random numbers in PGX. non_deterministic
random_seed long

[relevant for deterministic random number generator only] Seed for the deterministic random number generator used in pgx. The default is -24466691093057031.

-24466691093057031
readiness_memory_usage_ratio number

Memory limit ratio that should be considered to detect if PGX server is ready. This is used by isReady API and the default value is 1.0

1.0
release_memory_threshold number

Threshold percentage (decimal fraction) of used memory after which the engine starts freeing unused graphs. Examples: A value of 0.0 means graphs get freed as soon as their reference count becomes zero. That is, all sessions which loaded that graph were destroyed/timed out. A value of 1.0 means graphs never get freed, and the engine will throw OutOfMemoryErrors as soon as a graph is needed which does not fit in memory anymore. A value of 0.7 means the engine keeps all graphs in memory as long as total memory consumption is below 70% of total available memory, even if there is currently no session using them. When consumption exceeds 70% and another graph needs to get loaded, unused graphs get freed until memory consumption is below 70% again.

0.0
revisit_threshold integer Maximum number of matched results from a node to be cached. 4096
running_memory_usage_ratio number

Memory limit ratio that should be considered to detect if PGX server is running. This is used by isRunning API and the default value is 1.0

1.0
scheduler enum[basic_scheduler, enterprise_scheduler, low_latency_scheduler] The scheduler to use.
  • basic_scheduler: uses a scheduler with basic features
  • enterprise_scheduler: uses a scheduler with advanced enterprise features for running multiple tasks concurrently and providing better performance
  • low_latency_scheduler: uses a scheduler that privileges latency of tasks over throughput or fairness across multiple sessions. The low_latency_scheduler is only available in embedded mode.
enterprise_scheduler
session_idle_timeout_secs integer

Timeout of idling sessions in seconds. Zero (0) means infinity or no timeout.

14400
session_task_timeout_secs integer

Timeout in seconds to interrupt long-running tasks submitted by sessions (algorithms, I/O tasks). Zero (0) means infinity or no timeout.

0
small_task_length integer

Task length if the total amount of work is smaller than default task length (only relevant for task-stealing strategies).

128

strict_mode boolean

If true, exceptions are thrown and logged with ERROR level whenever the engine encounters configuration problems, such as invalid keys, mismatches, and other potential errors. If false, the engine logs problems with ERROR/WARN level (depending on severity) and makes best guesses and uses sensible defaults instead of throwing exceptions.

true
string_pooling_strategy enum[indexed, on_heap, none] The string pooling strategy to use. on_heap
task_length integer

Default task length (only relevant for task-stealing strategies). Should be between 100 and 10000. Trade-off: a small number implies more fine-grained tasks are generated, higher stealing throughput; a large number implies less memory consumption and GC activity.

4096
tmp_dir string

Temporary directory to store compilation artifacts and other temporary data. If set to <system-tmp-dir>, uses the standard tmp directory of the underlying system (/tmp on Linux).

"/tmp"
udf_config_directory string

Directory path containing UDF config files.

null
use_index_for_reachability_queries enum[auto, off] Create index for reachability queries. auto
use_memory_mapper_for_reading_pgb boolean If true, use memory mapped files for reading graphs in PGB format if possible; if false, always use a stream-based implementation. true
use_memory_mapper_for_storing_pgb boolean If true, use memory mapped files for storing graphs in PGB format if possible; if false, always use a stream-based implementation. true

The default values of the runtime configuration fields are optimized to deliver the best performance across a wide set of algorithms. Depending on your workload you may be able to improve performance further by experimenting with different strategies, sizes, and thresholds.

Advanced Access Configuration

The following table lists the fields in the pgx_realm object that can be used to customize login behavior.

Table 23-2 Advanced Access Configuration Options

Parameters Type Description Default
token_expiration_seconds integer After how many seconds the generated bearer token will expire. 3600 (1 hour)
refresh_time_before_token_expiry_seconds integer After how many seconds a token is automatically refreshed before it expires. Note that this value must always be less than the token_expiration_seconds value. 1800
connect_timeout_milliseconds integer After how many milliseconds an connection attempt to the specified JDBC URL will time out, resulting in the login attempt being rejected. 10000
max_pool_size integer Maximum number of JDBC connections allowed per user. If the number is reached, attempts to read from the database will fail for the current user.

Starting from 23.4 onwards, a new dedicated pool with one connection is provided for token refresh. This new dedicated pool does not affect the max_pool_size value.

64
max_num_users integer Maximum number of active, signed in users to allow. If this number is reached, the graph server will reject login attempts. 512
max_num_token_refresh integer Maximum amount of times a token can be automatically refreshed before requiring a login again. 24

Enterprise Scheduler Parameters

The following parameters are relevant only if the advanced scheduler is used. (They are ignored if the basic scheduler is used.)

Table 23-3 Enterprise Scheduler Parameters

Parameter Type Description Default
analysis_task_config object Configuration for analysis tasks
weight
<no-of-CPUs>
priority
MEDIUM
max_threads
<no-of-CPUs>
fast_analysis_task_config object Configuration for fast analysis tasks
weight
1
priority
HIGH
max_threads
<no-of-CPUs>
max_num_concurrent_io_tasks integer Maximum number of concurrent I/O tasks at a time 3
num_io_threads_per_task integer Number of I/O threads to use per task <no-of-cpus>

Basic Scheduler Parameters

The following parameters are relevant only if the basic scheduler is used. (They are ignored if the advanced scheduler is used.)

Table 23-4 Basic Scheduler Parameters

Field Type Description Default
num_workers_analysis integer This specifies how many worker threads to use for analysis tasks. <no-of-cpus>
num_workers_fast_track_analysis integer This specifies how many worker threads to use for fast-track analysis tasks. 1
num_workers_io integer This specifies how many worker threads to use for I/O tasks (load/refresh/write from/to disk). This value does not impact file-based loaders, as they are always single-threaded. Database loaders will open a new connection for each I/O worker. <no-of-cpus>

Example 23-1 Minimal Graph Server (PGX) Configuration

The following example causes the graph server (PGX) to initialize its analysis thread pool with 32 workers. (Default values are used for all other parameters.)

{
  "enterprise_scheduler_config": {
    "analysis_task_config": {
      "max_threads": 32
    }
  }
}

Example 23-2 Two Pre-loaded Graphs

This example sets more fields and specifies two fixed graphs for loading into memory during the graph server (PGX) startup.

{ 
  "enterprise_scheduler_config": {
    "analysis_task_config": {
      "max_threads": 32
    },
    "fast_analysis_task_config": {
      "max_threads": 32
    }
  }, 
  "memory_cleanup_interval": 600,
  "max_active_sessions": 1, 
  "release_memory_threshold": 0.2, 
  "preload_graphs": [
    {
      "path": "graph-configs/my-graph.bin.json",
      "name": "my-graph"
    },
    {
      "path": "graph-configs/my-other-graph.adj.json",
      "name": "my-other-graph",
      "publish": false
    }
  ],
  "authorization": [{
    "pgx_role": "GRAPH_DEVELOPER",
    "pgx_permissions": [{
      "preloaded_graph": "my-graph",
      "grant": "read"
    },
    {
      "preloaded_graph": "my-other-graph",
      "grant": "read"
    }]
  },	
	....
  ]
}

Relative paths in parameter values are always resolved relative to the parent directory of the configuration file in which they are specified. For example, if the preceding JSON is in /pgx/conf/pgx.conf, then the file path graph-configs/my-graph.bin.json inside that file would be resolved to /pgx/conf/graph-configs/my-graph.bin.json.