The server in distributed mode is configured by editing the $PGX_HOME/conf/server.conf
and
$PGX_HOME/conf/pgxd.conf
files in the distribution. server.conf
is responsible for configuring
the web server configuration, e.g., port number or authorization options. pgxd.conf
is responsible for configuring
runtime options of PGX distributed mode. The two configuration files are automatically detected by the provided script
file start-server
and will be passed to the server.
The server of the distributed mode supports the same configuration as in shared memory mode.
Two-way SSL/TLS is enabled by default. See the guide on configuring TLS/SSL certificates for details.
The distributed execution mode may be configured by assigning a variety of parameters in the pgxd.conf
.
Here, we document the parameters common to the system. Some applications may have application-specific
configuration options in addition to those listed below.
Field | Type | Description | Default |
---|---|---|---|
admin_request_cache_timeout | integer | after how many seconds admin request results get removed from the cache. Requests which are not done or not yet consumed are excluded from this timeout. Note: this is only relevant if PGX is deployed as a webapp. | 60 |
allow_local_filesystem (deprecated) | boolean | (This flag reduces security, enable only if you know what you are doing!) Allow loading from local filesystem, if in client/server mode. The list of directories that are allowed to be read should be listed in datasource_dir_whitelist. WARNING: This should only be enabled if you want to explicitly allow users of the PGX remote interface to access files on the local filesystem. Deprecated: since 20.1.1, define file-locations and use permissions instead | false |
allowed_remote_loading_locations | array of string | (This option may reduce security, use it only if you know what you are doing!) Allow loading graphs into the PGX engine from remote locations. If empty, as by default, no remote location is allowed. Any of the following locations can be listed: "https", "ftps", "s3", "hdfs" (all without colon ":"). Alternatively, "*" can be used to enable all locations at once; no other value is allowed with "*". Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting | [] |
authorization | array of object | mapping of users and roles to resources and permissions for authorization | [] |
authorization_session_create_allow_all | boolean | if true allow all users to create a PGX session regardless of permissions granted to them | false |
builtins_path | string | Path to the builtin algorithms directory. | null |
common_log_configure | string | Path to a log configuration in Log4j2 (version 2) syntax. Contents in the file only affects PGX.D backend components shared with PGX.SM via JNI. Note: This parameter is set automatically by `start-server.sh`, do not change it. | null |
data_memory_limits | object | memory limits configuration parameters | null |
datasource_dir_whitelist (deprecated) | array of string | if allow_local_filesystem is set, the list of directories from which it is allowed to read files. Deprecated: since 20.1.1, define file-locations and use permissions instead | [] |
enable_csrf_token_checks | boolean | if true, the PGX webapp will verify CSRF token cookie and request parameters sent by the client exist and match. This is to prevent CSRF attacks. | true |
enable_memory_limits_checks | boolean | if true, PGX will enforce the configured memory limits | true |
enable_secure_handshake | boolean | if true PGX will use TLS-PSK to establish RPC channels between remote backend processes in the cluster | true |
enable_shutdown_cleanup_hook | boolean | if true PGX will add a JVM shutdown hook that will automatically shutdown PGX at JVM shutdown. Notice: Having the shutdown hook deactivated and not shutting down PGX explicitly may result in pollution of your temp directory. | true |
executable_path | string | Path to the PGX.D executable. | null |
file_locations | array of object | the file-locations that can be used in the authorization-config | [] |
ghost_max_node_counts | integer | The maximum number of ghost vertices for each graph. | 40000 |
ghost_min_neighbors | integer | The minimum number of neighbors a vertex must have in order to be made a ghost (which is a vertex replicated on every machine). | 5000 |
handshake_port | integer | a TCP port which will be used for handshaking of distributed backend processes. | 7777 |
hostnames | array of string | A list of names or IP addresses of hosts which should be involved in a PGX.D cluster. The first host specified in the list will be the leader(master) host initially. | [] |
if_ethernet | string | IP network interface, used to initialize the network transport layer when using IP (Internet Protocol). Typically, this corresponds to Ethernet interface. | null |
if_infiniband | string | InfiniBand network interface, used to initialize the network transport layer when using InfiniBand. | null |
init_pgql_on_startup | boolean | if true PGQL is directly initialized on start-up of PGX. Otherwise, it is initialized during the first use of PGQL. | true |
internal_communication_port (deprecated) | integer | a TCP port which will be used for communication between PGX.D server and backend. Deprecated: since 19.3. This will be removed from 20.2 | 38003 |
java_class_path | string | Path to required java libraries. | null |
large_buf_count | integer | Number of large buffers populated in the pool. | 65536 |
large_buf_size_kb | integer | Size in kB of the large buffers. | 256 |
log_configure | string | Path to a log configuration in Log4j (version 1) syntax. Contents in the file only affects a PGX.D backend. Note: This parameter is set automatically by `start-server.sh`, do not change it. | null |
log_std_redirect | string | Path to a log file into which the standard output streams (stdout, stderr) of PGX.D backend should be redirected. No redirection happens when the path is null. | null |
max_http_client_request_size | long | maximum size in bytes of any http request sent to to the PGX server over the REST API. Setting it to -1 allows requests of any size. | 10485760 |
memory_cleanup_interval | integer | memory cleanup tick in seconds | 600 |
num_worker_threads | integer | Number of threads used for performing the main computation and for performing auxiliary functions related to remote data (e.g. communication). | 28 |
partitioning_ignore_ghostnodes | boolean | If set to true , the partitioning strategy will ignore the ghost nodes. | false |
partitioning_shuffle_vertices | boolean | If set to true , the vertices of the graph will be randomly shuffled among machines before partitioning. If the data source does not contain enough randomness, this could be beneficial for query and analytics performance on graphs with many properties, labels or data providers. Shuffling can however increase the time necessary to load the graph. | false |
partitioning_strategy | string | Partitioning strategy of the vertices of the graph. Valid values are 'out' (the vertices will be attributed to each machine so that every machine has the same total number of outgoing edges), 'in' (similar, but for incoming edges) and 'out_in' (same number of outgoing plus incoming edges). | out_in |
pgx_realm | object | configuration parameters for the realm | null |
preload_graphs | array of object | list of graph configs to be registered at start-up. Each item includes path to a graph config, the name of the graph and whether it should be published. | [] |
release_memory_threshold | number | threshold percentage of used memory after which the engine starts freeing un-used graphs. Examples: A value of 0.0 means graphs get freed as soon as their reference count becomes zero. That is, all sessions which loaded that graph were destroyed/timed out. A value of 1.0 means graphs get never freed. Engine will throw OutOfMemoryError s as soon as a graph is needed which doesn't fit in memory anymore. A value of 0.7 means the engine keeps all graphs in memory as long as total memory consumption is below 70% of total available memory, even if there is currently no session using them. Once the 70% are surpassed and another graph needs to get loaded, un-used graphs get freed until memory consumption is below 70% again. | 0.85 |
secure_handshake_secret_file | string | the file path of the secret in pkcs12 format. This is only used when enable_secure_handshaking is true . | null |
strict_mode | boolean | if true , exceptions are thrown and logged with ERROR level whenever engine encounters configuration problems, such as invalid keys, mismatches and other potential errors. If false , engine logs problems with ERROR /WARN level (depending on severity) and makes best guesses / uses sensible defaults instead of throwing exceptions. | true |
tmp_dir | string | Use this path as temporary directory to store compilation artifacts and other temporary data. If set to <system-tmp-dir>, use standard tmp directory of underlying system (/tmp on Linux) | null |
use_infiniband | boolean | If set to true , InfiniBand will be used. Must be set to false on systems that do not support InfiniBand. | true |
zookeeper_timeout | integer | If connecting to zookeeper service (for YARN deployment), the timeout in ms the runtime waits for connecting to zookeeper. | 10000 |
Configuring the system to run over InfiniBand or Ethernet is fairly simple.
The use_infiniband
parameter is set to true
in order to use InfiniBand and to false
for Ethernet.
In addition, the name of the network interface should be set accordingly: the parameters
if_infiniband
and if_ethernet
specify the name of the network interface of InfiniBand and Ethernet, respectively.
If you only use InfiniBand, you do not need to provide a value for if_ethernet
and vice-versa.
For $PGX_HOME/conf/server.conf
.
{ "if_infiniband": "ib0", "if_ethernet": "eno1", "use_infiniband": true, "executable_path": "distributed/bin/pgxd", "hostnames": [ "hostname0,hostname1" ], "builtins_path": "distributed/lib", "java_class_path": "distributed/jlib/*:distributed/jlib/common/*:distributed/jlib/third-party/*:shared-lib/common/*:shared-lib/third-party/*:shared-lib/embedded/*:shared-lib/server/*", "log_configure": "conf/dist_log4j.xml", "common_log_configure": "conf/log4j2.xml", "tmp_dir" : "tmp_data" }
See the example server.auth.json
here on mapping client certificates to roles.