This page illustrates how to start PGX with Hadoop on an arbitrary number of hosts. You are also free to use your favorite cluster management framework, e.g., YARN, Mesos, or Kubernetes).
Make sure you install PGX on every machine of the target cluster, and all the steps following should be done on every machine of the cluster identically.
PGX distributed mode server uses TCP communication for initial handshaking. By default,
PGX distributed mode server uses the TCP port 7777
. For Linux iptables, you may issue the following command to open the
TPC port 7777
:
sudo iptables -I INPUT -p tcp --dport 7777 -j ACCEPT
You can change the port by changing the handshake_port
option in the configuration file.
See the configurations and
Configuration section below for details.
[If your cluster does not have InfiniBand] Configure it to run on Ethernet, PGX distributed mode server uses UDP as the transport protocol. Therefore, the machines in your cluster should be able to receive UDP packets from other machines. Use the following command to accept UDP packets from a certain range of IP addresses:
sudo iptables -I INPUT -p udp -m iprange --src-range $IP-$IP -j ACCEPT
Two-way SSL/TLS is enabled by default. See the tutorial on configuring TLS/SSL certificates for details.
Disabling SSL/TLS
You need to properly configure SSL/TLS to start the server (See the web server configuration page for SSL/TLS options). Even though it is possible to turn SSL/TLS off, we strongly recommend to leave SSL/TLS turned on for any production deployment.
Configure $PGX_HOME/conf/server.conf
and $PGX_HOME/conf/pgxd.conf
.
hostnames
: Please note that we do not provide a default value of the hostnames
parameter in pgxd.conf
.
You must specify a list of host names of your cluster, either in the configuration file or given as arguments (see the next section).
secure_handshake_secret_file
: PGX distributed mode server uses TLS-PSK based secured handshaking among distributed
processes of the server by default. Create a keystore which stores a secret, then set the file path of the keystore to the
secure_handshake_secret_file
parameter. See the tutorial on creating a keystore
for how to create the keystore. You can disable the secured handshaking by setting enable_secure_handshake
parameter to false
in pgxd.conf
. In case the option is disabled, PGX distributed server will use normal unsecured
TPC channels for communication. We highly recommend to enable the option for any production deployment.
Multiple machines
If you are running the distributed server on multiple machines, the first three steps below should be executed on every machine.
Make sure the JAVA_HOME
environment variable points to the JDK8 home directory, e.g. export JAVA_HOME=/usr/lib/jvm/java-8-oracle
.
PGX distributed mode server requires libjvm.so
via LD_LIBRARY_PATH
system environment variable. Add a path for
libjvm.so
, for example,
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:$LD_LIBRARY_PATH
The distributed runtime requires Java 8u102 or greater
Some functionalities require to have at least this specific version.
Start the server using the built-in bash script start-server
with --dist
argument, the script will automatically detect the
configuration files and launch the distributed PGX server.
cd $PGX_HOME ./bin/start-server --dist
You can also start a server with the following commands, if you did not set hostnames
in pgxd.conf
.
cd $PGX_HOME ./bin/start-server --dist -Dpgx.hostnames=[hostname0,hostname1]
By adding -Dpgx.hostnames=[hostname0,hostname1]
, it will overwrite the option in pgxd.conf
with the given value.
Do not run with admin permissions
We strongly recommend not to execute the PGX server with admin permissions (e.g., as the root
user).
It might increase security risks on your system.
Check the access point of the PGX distributed mode server. After finishing the steps above for all machines in your cluster, you can see a log message as follows:
INFO: Starting ProtocolHandler ["http-nio-7007"]
Connect to the PGX cluster using the PGX Shell. For example,
cd $PGX_HOME ./bin/pgx-jshell --base_url=http://hostname0:7007
In this example, hostname0
is the first host name listed in the hostnames
parameter of pgxd.conf
. The first machine
in the list will act as the leader of the distributed PGX cluster. Note that clients should only connect to the leader
machine.