11.1 Configuration Parameters for the Graph Server (PGX) Engine
You can configure the graph server (PGX) engine and the PGX run-time library by assigning a single JSON file to the graph server (PGX) at start up.
To pass the PGX engine configuration file to the graph server (PGX), see Passing the Configuration File to the Graph Server (PGX).
The PGX engine parameters are shown in the following table:
Table 11-1 Configuration Parameters for the Graph Server (PGX) Engine
Parameter | Type | Description | Default |
---|---|---|---|
|
integer |
After how many seconds admin request results get removed from the cache. Requests which are not done or not yet consumed are excluded from this timeout. Note: This is only relevant if PGX is deployed as a webapp. |
60 |
|
boolean |
If true, sessions can overwrite the default idle timeout. |
true |
|
boolean |
If true, allow all users to override scheduling information like task weight, task priority, and number of threads |
true |
|
boolean |
If true, sessions can overwrite the default task timeout. |
true |
|
boolean |
If true, users may enable auto refresh for graphs they load. If false, only graphs mentioned in |
false |
allowed_remote_loading_locations |
array of string |
Allow loading graphs into the PGX engine from remote locations (http, https, ftp, ftps, s3, hdfs). If empty, as by default, no remote location is allowed. If "*" is specified in the array, all remote locations are allowed. Only the value "*" is currently supported. Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting.
WARNING: This parameter reduces security and therefore use it only when needed. |
[] |
basic_scheduler_config |
object |
Configuration parameters for the fork join pool backend. | null |
|
integer |
Task size for BFS iterate QUE phase. |
128 |
|
number |
Threshold of BFS traversal level items to switch to parent-read-based visiting strategy. |
0.05 |
|
integer |
Threshold of BFS traversal level items to switch to read-based visiting strategy. |
1024 |
|
integer |
Until what number of BFS traversal level items vertices are visited single-threaded. |
128 |
|
string |
Standard character set to use throughout PGX. UTF-8 is the default. Note: Some formats may not be compatible. |
utf-8 |
|
integer |
Default diff factor value used in the common neighbor iterator implementations. |
8 |
|
integer |
Default value used in the common neighbor iterator implementations, to indicate below which threshold a subarray is considered small. |
128 |
|
integer |
Default value used in the common neighbor iterator implementations, to indicate the minimum size where the binary search approach is applied. |
96 |
|
integer |
Value that determines at which number of visited vertices the DFS implementation will switch to data structures that are optimized for larger numbers of vertices. |
4096 |
|
boolean |
If true, the PGX webapp will verify the Cross-Site Request Forgery (CSRF) token cookie and request parameters sent by the client exist and match. This is to prevent CSRF attacks. |
true |
|
boolean |
If |
true |
|
boolean |
If true, PGX will add a JVM shutdown hook that will automatically shutdown PGX at JVM shutdown. Notice: Having the shutdown hook deactivated and not explicitly shutting down PGX may result in pollution of your temp directory. |
true |
|
object |
Configuration parameters for the enterprise scheduler. |
null |
|
object |
[relevant for enterprise_scheduler] Enterprise scheduler-specific settings. |
null |
|
boolean |
|
true |
file_locations |
array of object |
The file locations that can be used in the authorization-config. | [] |
graph_algorithm_language |
enum[GM_LEGACY, GM, JAVA] |
Front-end compiler to use. | gm |
graph_validation_level |
enum[low, high] |
Level of validation performed on newly loaded or created graphs. | low |
ignore_incompatible_backend_operations |
boolean |
If true , only log when encountering incompatible operations and configuration values in RTS or FJ pool. If false , throw exceptions.
|
false |
in_place_update_consistency_model |
enum[ALLLOW_INCONSISTENCIES, CANCEL_TASKS] |
Consistency model used when in-place updates occur. Only relevant if in-place updates are enabled. Currently updates are only applied in place if the updates are not structural (Only modifies properties). Two models are currently implemented, one only delays new tasks when an update occurs, the other also delays running tasks. | allow_inconsistencies |
init_pgql_on_startup |
boolean |
If true PGQL is directly initialized on start-up of PGX. Otherwise, it is initialized during the first use of PGQL.
|
true |
interval_to_poll_max |
integer |
Exponential backoff upper bound (in ms) to which -once reached, the job status polling interval is fixed | 1000 |
java_home_dir |
string |
The path to Java's home directory. If set to <system-java-home-dir> , use the java.home system property.
|
null |
large_array_threshold |
integer |
Threshold when the size of an array is too big to use a normal Java array. This depends on the used JVM. (Defaults to Integer.MAX_VALUE - 3 )
|
2147483644 |
max_active_sessions |
integer |
Maximum number of sessions allowed to be active at a time. |
1024 |
max_distinct_strings_per_pool |
integer |
[only relevant if string_pooling_strategy is indexed] Number of distinct strings per property after which to stop pooling. If the limit is reached, an exception is thrown. | 65536 |
max_http_client_request_size |
long |
Maximum size in bytes of any
http request sent to the PGX server over the REST API. Setting it
to -1 allows requests of any size.
|
10485760 |
max_off_heap_size |
integer |
Maximum amount of off-heap memory (in megabytes) that PGX is allowed to allocate before an OutOfMemoryError will be thrown.
Note: This limit is not guaranteed to never be exceeded, because of rounding and synchronization trade-offs. It only serves as threshold when PGX starts to reject new memory allocation requests. |
<available-physical-memory> |
max_queue_size_per_session |
integer |
The maximum number of pending tasks allowed to be in the queue, per session. If a session reaches the maximum, new incoming requests of that sesssion get rejected. A negative value means infinity or unlimited.. |
-1 |
max_snapshot_count |
integer |
Number of snapshots that may be loaded in the engine at the same time. New snapshots can be created via auto or forced update. If the number of snapshots of a graph reaches this threshold, no more auto-updates will be performed, and a forced update will result in an exception until one or more snapshots are removed from memory. A value of zero indicates to support an unlimited amount of snapshots. |
0 |
memory_allocator |
enum[basic_allocator, enterprise_allocator] |
The memory allocator to use. | basic_allocator |
memory_cleanup_interval |
integer |
Memory cleanup interval in seconds. |
600 |
min_array_compaction_threshold |
number |
Minimum value (only relevant for graphs optimized for updates) that can be used for the array_compaction_threshold value in graph configuration. If a graph configuration attemps to use a value lower than the one specified by min_array_compaction_threshold , it will use min_array_compaction_threshold instead.
|
0.2 |
min_fetch_interval_sec |
integer |
For delta-refresh (only relevant if the graph format supports delta updates), the lowest interval at which a graph source is queried for changes. You can tune this value to prevent PGX from hanging due to too frequent graph delta-refreshing. | 2 |
min_update_interval_sec |
integer |
For auto-refresh, the lowest interval after which a new snapshot is created, either by reloading the entire graph or if the format supports delta-updates, out of the cached changes (only relevant if the format supports delta updates). You can tune this value to prevent PGX from hanging due to too frequent graph auto-refreshing. | 2 |
ms_bfs_frontier_type_strategy |
enum[auto_grow, short, int] |
The type strategy to use for MS-BFS frontiers. |
auto_grow |
num_spin_locks |
integer |
Number of spin locks each generated app will create at instantiation. Trade-off: a small number implies less memory consumption; a large number implies faster execution (if algorithm uses spin locks). |
1024 |
parallelism |
integer |
Number of worker threads to be used in thread pool. Note: If the caller thread is part of another thread-pool, this value is ignored and the parallelism of the parent pool is used. | <number-of-cpus> |
pattern_matching_supernode_cache_threshold |
integer |
Minimum number of a node's neighbor to be a supernode. This is for the pattern matching engine. | 1000 |
pgx_realm |
object |
Configuration parameters for the realm. | null |
pgx_server_base_url |
string |
This is used when deploying the graph server behind a load balancer to make clients before 21.3 backward compatible. The value should be set to the load balancer address. | null |
pooling_factor |
number |
[only relevant if string_pooling_strategy is on_heap] This value prevents the string pool to grow as big as the property size, which could render the pooling ineffective. | 0.25 |
preload_graphs |
array of object |
List of graph configs to be registered at start-up. Each item includes path to a graph config, the name of the graph and whether it should be published. |
[] |
random_generator_strategy |
enum[non_deterministic, deterministic] |
Method of generating random numbers in PGX. | non_deterministic |
random_seed |
long |
[relevant for deterministic random number generator only] Seed for the deterministic random number generator used in pgx. The default is -24466691093057031. |
-24466691093057031 |
readiness_memory_usage_ratio |
number |
Memory limit ratio that should be considered to detect if PGX server
is ready. This is used by |
1.0 |
release_memory_threshold |
number |
Threshold percentage (decimal fraction) of used memory after which the engine starts freeing unused graphs. Examples: A value of 0.0 means graphs get freed as soon as their reference count becomes zero. That is, all sessions which loaded that graph were destroyed/timed out. A value of 1.0 means graphs never get freed, and the engine will throw OutOfMemoryErrors as soon as a graph is needed which does not fit in memory anymore. A value of 0.7 means the engine keeps all graphs in memory as long as total memory consumption is below 70% of total available memory, even if there is currently no session using them. When consumption exceeds 70% and another graph needs to get loaded, unused graphs get freed until memory consumption is below 70% again. |
0.85 |
revisit_threshold |
integer |
Maximum number of matched results from a node to be cached. | 4096 |
running_memory_usage_ratio |
number |
Memory limit ratio that should be considered to detect if PGX server
is running. This is used by |
1.0 |
scheduler |
enum[basic_scheduler, enterprise_scheduler, low_latency_scheduler] |
The scheduler to use.
|
enterprise_scheduler |
session_idle_timeout_secs |
integer |
Timeout of idling sessions in seconds. Zero (0) means infinity or no timeout |
0 |
session_task_timeout_secs |
integer |
Timeout in seconds to interrupt long-running tasks submitted by sessions (algorithms, I/O tasks). Zero (0) means infinity or no timeout. |
0 |
small_task_length |
integer |
Task length if the total amount of work is smaller than default task length (only relevant for task-stealing strategies). |
128 |
strict_mode |
boolean |
If true, exceptions are thrown and logged with ERROR level whenever the engine encounters configuration problems, such as invalid keys, mismatches, and other potential errors. If false, the engine logs problems with ERROR/WARN level (depending on severity) and makes best guesses and uses sensible defaults instead of throwing exceptions. |
true |
string_pooling_strategy |
enum[indexed, on_heap, none] |
The string pooling strategy to use. | on_heap |
task_length |
integer |
Default task length (only relevant for task-stealing strategies). Should be between 100 and 10000. Trade-off: a small number implies more fine-grained tasks are generated, higher stealing throughput; a large number implies less memory consumption and GC activity. |
4096 |
tmp_dir |
string |
Temporary directory to store compilation artifacts and other temporary data. If set to <system-tmp-dir>, uses the standard tmp directory of the underlying system (/tmp on Linux). |
null |
udf_config_directory |
string |
Directory path containing UDF config files. |
null |
use_index_for_reachability_queries |
enum[auto, off] |
Create index for reachability queries. | auto |
use_memory_mapper_for_reading_pgb |
boolean |
If true, use memory mapped files for reading graphs in PGB format if possible; if false, always use a stream-based implementation. | true |
use_memory_mapper_for_storing_pgb |
boolean |
If true, use memory mapped files for storing graphs in PGB format if possible; if false, always use a stream-based implementation. | true |
The default values of the runtime configuration fields are optimized to deliver the best performance across a wide set of algorithms. Depending on your workload you may be able to improve performance further by experimenting with different strategies, sizes, and thresholds.
Enterprise Scheduler Parameters
The following parameters are relevant only if the advanced scheduler is used. (They are ignored if the basic scheduler is used.)
-
analysis_task_config
Configuration for analysis tasks. Type: object. Default:
prioritymediummax_threads<no-of-CPUs>weight<no-of-CPUs>
-
fast_analysis_task_config
Configuration for fast analysis tasks. Type: object. Default:
priorityhighmax_threads<no-of-CPUs>weight1
-
maxnum_concurrent_io_tasks
Maximum number of concurrent tasks. Type: integer. Default: 3
-
num_io_threads_per_task
Configuration for fast analysis tasks. Type: object. Default:
<no-of-cpus>
Basic Scheduler Parameters
The following parameters are relevant only if the basic scheduler is used. (They are ignored if the advanced scheduler is used.)
-
num_workers_analysis
Number of worker threads to use for analysis tasks. Type: integer. Default:
<no-of-CPUs>
-
num_workers_fast_track_analysis
Number of worker threads to use for fast-track analysis tasks. Type: integer. Default: 1
-
num_workers_io
Number of worker threads to use for I/O tasks (load/refresh/write from/to disk). This value will not affect file-based loaders, because they are always single-threaded. Database loaders will open a new connection for each I/O worker. Default:
<no-of-CPUs>
Example 11-1 Minimal Graph Server (PGX) Configuration
The following example causes the graph server (PGX) to initialize its analysis thread pool with 32 workers. (Default values are used for all other parameters.)
{
"enterprise_scheduler_config": {
"analysis_task_config": {
"max_threads": 32
}
}
}
Example 11-2 Two Pre-loaded Graphs
This example sets more fields and specifies two fixed graphs for loading into memory during the graph server (PGX) startup.
{
"enterprise_scheduler_config": {
"analysis_task_config": {
"max_threads": 32
},
"fast_analysis_task_config": {
"max_threads": 32
}
},
"memory_cleanup_interval": 600,
"max_active_sessions": 1,
"release_memory_threshold": 0.2,
"preload_graphs": [
{
"path": "graph-configs/my-graph.bin.json",
"name": "my-graph"
},
{
"path": "graph-configs/my-other-graph.adj.json",
"name": "my-other-graph",
"publish": false
}
],
"authorization": [{
"pgx_role": "GRAPH_DEVELOPER",
"pgx_permissions": [{
"preloaded_graph": "my-graph",
"grant": "read"
},
{
"preloaded_graph": "my-other-graph",
"grant": "read"
}]
},
....
]
}
Relative paths in parameter values are always resolved relative to the parent
directory of the configuration file in which they are specified. For example, if the
preceding JSON is in /pgx/conf/pgx.conf
, then the file path
graph-configs/my-graph.bin.json
inside that file would be resolved to
/pgx/conf/graph-configs/my-graph.bin.json
.
- Configuration of the Graph Server (PGX) Run-Time Parameters
- Passing the Configuration File to the Graph Server (PGX)
- Memory Consumption by the Graph Server (PGX)
The graph server (PGX) loads the graph into main memory in order to carry out analysis on the graph and its properties.
Parent topic: Configuring the Graph Server (PGX) and the Graph Client