PGX 20.2.2
Documentation

Server Configuration Guide

The server in distributed mode is configured by editing the $PGX_HOME/conf/server.conf and $PGX_HOME/conf/pgxd.conf files in the distribution. server.conf is responsible for configuring the web server configuration, e.g., port number or authorization options. pgxd.conf is responsible for configuring runtime options of PGX distributed mode. The two configuration files are automatically detected by the provided script file start-server and will be passed to the server.

Web Server Parameters

The server of the distributed mode supports the same configuration as in shared memory mode.

Configure SSL/TLS Security Certificates

Two-way SSL/TLS is enabled by default. See the guide on configuring TLS/SSL certificates for details.

Runtime Parameters

The distributed execution mode may be configured by assigning a variety of parameters in the pgxd.conf. Here, we document the parameters common to the system. Some applications may have application-specific configuration options in addition to those listed below.

Field Type Description Default
admin_request_cache_timeoutintegerafter how many seconds admin request results get removed from the cache. Requests which are not done or not yet consumed are excluded from this timeout.60
allow_local_filesystem (deprecated)boolean(This flag reduces security, enable only if you know what you are doing!) Allow loading from local filesystem, if in client/server mode. The list of directories that are allowed to be read should be listed in datasource_dir_whitelist. WARNING: This should only be enabled if you want to explicitly allow users of the PGX remote interface to access files on the local filesystem.
Deprecated: since 20.1.1, define file-locations and use permissions instead
false
allowed_remote_loading_locationsarray of string(This option may reduce security, use it only if you know what you are doing!) Allow loading graphs into the PGX engine from remote locations. If empty, as by default, no remote location is allowed. Any of the following locations can be listed: "https", "ftps", "s3", "hdfs" (all without colon ":"). Alternatively, "*" can be used to enable all locations at once; no other value is allowed with "*". Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting[]
authorizationarray of objectmapping of users and roles to resources and permissions for authorization[]
authorization_session_create_allow_allbooleanif true allow all users to create a PGX session regardless of permissions granted to themfalse
builtins_pathstringPath to the builtin algorithms directory.null
common_log_configurestringPath to a log configuration in Log4j2 (version 2) syntax. Contents in the file only affects PGX.D backend components shared with PGX.SM via JNI. Note: This parameter is set automatically by `start-server.sh`, do not change it.null
data_memory_limitsobjectmemory limits configuration parametersnull
datasource_dir_whitelist (deprecated)array of stringif allow_local_filesystem is set, the list of directories from which it is allowed to read files.
Deprecated: since 20.1.1, define file-locations and use permissions instead
[]
enable_csrf_token_checksbooleanif true, the PGX webapp will verify CSRF token cookie and request parameters sent by the client exist and match. This is to prevent CSRF attacks.true
enable_memory_limits_checksbooleanif true, PGX will enforce the configured memory limitstrue
enable_secure_handshakebooleanif true PGX will use TLS-PSK to establish RPC channels between remote backend processes in the clustertrue
enable_shutdown_cleanup_hookbooleanif true PGX will add a JVM shutdown hook that will automatically shutdown PGX at JVM shutdown. Notice: Having the shutdown hook deactivated and not shutting down PGX explicitly may result in pollution of your temp directory.true
executable_pathstringPath to the PGX.D executable.null
file_locationsarray of objectthe file-locations that can be used in the authorization-config[]
ghost_max_node_countsintegerThe maximum number of ghost vertices for each graph.40000
ghost_min_neighborsintegerThe minimum number of neighbors a vertex must have in order to be made a ghost (which is a vertex replicated on every machine).5000
handshake_portintegera TCP port which will be used for handshaking of distributed backend processes.7777
hostnamesarray of stringA list of names or IP addresses of hosts which should be involved in a PGX.D cluster. The first host specified in the list will be the leader(master) host initially.[]
if_ethernetstringIP network interface, used to initialize the network transport layer when using IP (Internet Protocol). Typically, this corresponds to Ethernet interface.null
if_infinibandstringInfiniBand network interface, used to initialize the network transport layer when using InfiniBand.null
init_pgql_on_startupbooleanif true PGQL is directly initialized on start-up of PGX. Otherwise, it is initialized during the first use of PGQL.true
internal_communication_port (deprecated)integera TCP port which will be used for communication between PGX.D server and backend.
Deprecated: since 19.3. This will be removed from 20.2
38003
java_class_pathstringPath to required java libraries.null
large_buf_countintegerNumber of large buffers populated in the pool.65536
large_buf_size_kbintegerSize in kB of the large buffers.256
log_configurestringPath to a log configuration in Log4j (version 1) syntax. Contents in the file only affects a PGX.D backend. Note: This parameter is set automatically by `start-server.sh`, do not change it.null
log_std_redirectstringPath to a log file into which the standard output streams (stdout, stderr) of PGX.D backend should be redirected. No redirection happens when the path is null.null
max_http_client_request_sizelongmaximum size in bytes of any http request sent to to the PGX server over the REST API. Setting it to -1 allows requests of any size.10485760
num_worker_threadsintegerNumber of threads used for performing the main computation and for performing auxiliary functions related to remote data (e.g. communication).28
partitioning_ignore_ghostnodesbooleanIf set to true, the partitioning strategy will ignore the ghost nodes.false
partitioning_shuffle_verticesbooleanIf set to true, the vertices of the graph will be randomly shuffled among machines before partitioning. If the data source does not contain enough randomness, this could be beneficial for query and analytics performance on graphs with many properties, labels or data providers. Shuffling can however increase the time necessary to load the graph.false
partitioning_strategystringPartitioning strategy of the vertices of the graph. Valid values are 'out' (the vertices will be attributed to each machine so that every machine has the same total number of outgoing edges), 'in' (similar, but for incoming edges) and 'out_in' (same number of outgoing plus incoming edges).out_in
pgx_realmobjectconfiguration parameters for the realmnull
preload_graphsarray of objectList of graph configs to be registered at start-up. Each item includes path to a graph config, the name of the graph and whether it should be published.[]
run_experimental_algorithmsboolean[GM-13080] Only for internal use.false
secure_handshake_secret_filestringthe file path of the secret in pkcs12 format. This is only used when enable_secure_handshaking is true.null
strict_modebooleanif true, exceptions are thrown and logged with ERROR level whenever engine encounters configuration problems, such as invalid keys, mismatches and other potential errors. If false, engine logs problems with ERROR/WARN level (depending on severity) and makes best guesses / uses sensible defaults instead of throwing exceptions.true
tmp_dirstringUse this path as temporary directory to store compilation artifacts and other temporary data. If set to <system-tmp-dir>, use standard tmp directory of underlying system (/tmp on Linux)null
use_infinibandbooleanIf set to true, InfiniBand will be used. Must be set to false on systems that do not support InfiniBand.true
zookeeper_timeoutintegerIf connecting to zookeeper service (for YARN deployment), the timeout in ms the runtime waits for connecting to zookeeper.10000

Switching Between Ethernet and InfiniBand

Configuring the system to run over InfiniBand or Ethernet is fairly simple. The use_infiniband parameter is set to true in order to use InfiniBand and to false for Ethernet. In addition, the name of the network interface should be set accordingly: the parameters if_infiniband and if_ethernet specify the name of the network interface of InfiniBand and Ethernet, respectively. If you only use InfiniBand, you do not need to provide a value for if_ethernet and vice-versa.

Example Configuration

For $PGX_HOME/conf/server.conf.

{
  "if_infiniband": "ib0",
  "if_ethernet": "eno1",
  "use_infiniband": true,
  "executable_path": "distributed/bin/pgxd",
  "hostnames": [
    "hostname0,hostname1"
  ],
  "builtins_path": "distributed/lib",
  "java_class_path": "distributed/jlib/*:distributed/jlib/common/*:distributed/jlib/third-party/*:shared-lib/common/*:shared-lib/third-party/*:shared-lib/embedded/*:shared-lib/server/*",
  "log_configure": "conf/dist_log4j.xml",
  "common_log_configure": "conf/log4j2.xml",
  "tmp_dir" : "tmp_data"
}

See the example server.auth.json here on mapping client certificates to roles.