This chapter provides information about optimizing communications between Oracle Exadata Database Machine and Oracle Big Data Appliance. It describes how you can configure Oracle Exadata Database Machine to use InfiniBand alone, or SDP over InfiniBand, to communicate with Oracle Big Data Appliance.
This chapter contains the following sections:
Oracle Exadata Database Machine and Oracle Big Data Appliance use Ethernet by default, although typically they are also connected by an InfiniBand network. Ethernet communications are much slower than InfiniBand. After you configure Oracle Exadata Database Machine to communicate using InfiniBand, it can obtain data from Oracle Big Data Appliance many times faster than before.
Moreover, client applications that run on Oracle Big Data Appliance and push the data to Oracle Database can use Sockets Direct Protocol (SDP) for an additional performance boost. SDP is a standard communication protocol for clustered server environments, providing an interface between the network interface card and the application. By using SDP, applications place most of the messaging burden upon the network interface card, which frees the CPU for other tasks. As a result, SDP decreases network latency and CPU utilization, and thereby improves performance.
Oracle SQL Connector for Hadoop Distributed File System (HDFS) is an example of an application that pulls data into Oracle Exadata Database Machine. The connector enables an Oracle external table to access data stored in either HDFS files or a Hive table.
The external table provide access to the HDFS data. You can use the external table for querying HDFS data or for loading it into an Oracle database table.
Oracle SQL Connector for HDFS functions as a Hadoop client running on the database servers in Oracle Exadata Database Machine.
If you use Oracle SQL Connector for HDFS or another tool that pulls the data into Oracle Exadata Database Machine, then for the best performance, you should configure the system to use InfiniBand. See "Specifying the InfiniBand Connections to Oracle Big Data Appliance."
See Also
Oracle Big Data Connectors User's Guide for information about Oracle SQL Connector for HDFS
Oracle Loader for Hadoop is an example of an application that pushes data into Oracle Exadata Database Machine. The connector is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. You can use it to load data from Oracle Big Data Appliance to Oracle Exadata Database Machine.
Oracle Loader for Hadoop functions as a database client running on the Oracle Big Data Appliance. It must make database connections from Oracle Big Data Appliance to Oracle Exadata Database Machine over the InfiniBand network. Use of Sockets Direct Protocol (SDP) for these database connections further improves performance.
If you use Oracle Loader for Hadoop or another tool that pushes the data into Oracle Exadata Database Machine, then for the best performance, you should configure the system to use SDP over InfiniBand as described in this chapter.
See Also
Oracle Big Data Connectors User's Guide for information about Oracle Loader for Hadoop
Oracle Big Data Appliance and Oracle Exadata Database Machine racks must be cabled together using InfiniBand cables. The IP addresses must be unique across all racks and use the same subnet for the InfiniBand network.
See Also:
Oracle Big Data Appliance Owner's Guide about multirack cabling
Oracle Big Data Appliance Owner's Guide about IP addresses and subnets
You can configure Oracle Exadata Database Machine to use the InfiniBand IP addresses of the Oracle Big Data Appliance servers. Otherwise, the default network is Ethernet. Use of the InfiniBand network improves the performance of all data transfers between Oracle Big Data Appliance and Oracle Exadata Database Machine.
To identify the Oracle Big Data Appliance InfiniBand IP addresses:
If you have not done so already, install a CDH client on Oracle Exadata Database Machine. See "Providing Remote Client Access to CDH."
Obtain a list of private host names and InfiniBand IP addresses for all Oracle Big Data Appliance servers.
An Oracle Big Data Appliance rack can have 6, 12, or 18 servers.
Log in to Oracle Exadata Database Machine with root
privileges.
Edit /etc/hosts
on Oracle Exadata Database Machine and add the Oracle Big Data Appliance host names and InfiniBand IP addresses. The following example shows the sequential IP numbering:
192.168.8.1 bda1node01.example.com bda1node01 192.168.8.2 bda1node02.example.com bda1node02 192.168.8.3 bda1node03.example.com bda1node03 192.168.8.4 bda1node04.example.com bda1node04 192.168.8.5 bda1node05.example.com bda1node05 192.168.8.6 bda1node06.example.com bda1node06
Check /etc/nsswitch.conf
for a line like the following:
hosts: files dns
Ensure that the line does not reverse the order (dns files
); if it does, your additions to /etc/hosts
will not be used. Edit the file if necessary.
Ping all Oracle Big Data Appliance servers. Ensure that ping
completes and shows the InfiniBand IP addresses.
# ping bda1node01.example.com
PING bda1node01.example.com (192.168.8.1) 56(84) bytes of data.
64 bytes from bda1node01.example.com (192.168.8.1): icmp_seq=1 ttl=50 time=20.2 ms
.
.
.
Run CDH locally on Oracle Exadata Database Machine and test HDFS functionality by uploading a large file to an Oracle Big Data Appliance server. Check that your network monitoring tools (such as sar
) show I/O activity on the InfiniBand devices.
To upload a file, use syntax like the following, which copies localfile.dat to the HDFS testdir directory on node05 of Oracle Big Data Appliance:
hadoop fs -put localfile.dat hdfs://bda1node05.example.com/testdir/
You can configure Oracle Big Data Appliance to use the InfiniBand IP addresses of the Oracle Exadata Database Machine servers. This configuration supports applications on Oracle Big Data Appliance that must connect to Oracle Exadata Database Machine.
To identify the Oracle Exadata Database Machine InfiniBand IP addresses:
Obtain a list of private host names and InfiniBand IP addresses for all Oracle Exadata Database Machine servers.
Log in to Oracle Big Data Appliance with root
privileges.
Edit /etc/hosts
on Oracle Big Data Appliance and add the Oracle Exadata Database Machine host names and InfiniBand IP addresses.
Check /etc/nsswitch.conf
for a line like the following:
hosts: files dns
Ensure that the line does not reverse the order (dns files
); if it does, your additions to /etc/hosts will not be used. Edit the file if necessary.
# service dnsmasq restart
Ping all Oracle Exadata Database Machine servers. Ensure that ping
completes and shows the InfiniBand IP addresses.
Test the connection by downloading a large file to an Oracle Exadata Database Machine server. Check that your network monitoring tools (such as sar
) show I/O activity on the InfiniBand devices.
To download a file, use syntax like the following, which copies a file named mydata.json
to the dm01ce108
storage server:
$ scp mydata.json oracle@dm01cel08-priv.example.com:mybigdata.json oracle@dm01cel08-priv.example.com's password: password
SDP improves the performance of client applications that run on Oracle Big Data Appliance and push large data loads to Oracle Database on Oracle Exadata Database Machine.
The following procedure describes how to enable SDP on the database nodes in an Oracle Exadata Database Machine running Oracle Linux. You must also configure your application on a job-by-job basis to use SDP.
To enable SDP on Oracle Exadata Database Machine:
Open /etc/infiniband/openib.conf
file in a text editor, and add the following line:
set: SDP_LOAD=yes
Save these changes and close the file.
To enable both SDP and TCP, open /etc/ofed/libsdp.conf
in a text editor, and add the use both
rule:
use both server * : use both client * :
Save these changes and close the file.
Open /etc/modprobe.conf
file in a text editor, and add this setting:
options ib_sdp sdp_zcopy_thresh=0 recv_poll=0
Save these changes and close the file.
Replicate these changes across all database nodes in the Oracle Exadata Database Machine rack.
Restart all database nodes for the changes to take effect.
If you have multiple Oracle Exadata Database Machine racks, then repeat these steps on all of them.
To specify SDP protocol for a load job:
Add JVM options to the HADOOP_OPTS
environment variable to enable JDBC SDP export:
HADOOP_OPTS="-Doracle.net.SDP=true -Djava.net.preferIPv4Stack=true"
In either the Hadoop command or the configuration file for the job, set the mapred.child.java.opts
configuration property to enable the child task JVMs for SDP.
For example, use these options in the command line for a MapReduce job:
-D mapred.child.java.opts="-Doracle.net.SDP=true -Djava.net.preferIPv4Stack=true"
Configure standard Ethernet communications for the job.
For example, Oracle Loader for Hadoop reads the value of the oracle.hadoop.loader.connection.url
property from a job configuration file. The value has this syntax:
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=hostName)(PORT=portNumber))) (CONNECT_DATA=(SERVICE_NAME=serviceName)))
Replace hostName, portNumber, and serviceName with the appropriate values to identify the SDP listener on your Oracle Exadata Database Machine.
Configure the Oracle listener on Exadata to support the SDP protocol and bind it to a specific port address (such as 1522).
For example, Oracle Loader for Hadoop reads the value of the oracle.hadoop.loader.connection.oci_url
property from a job configuration file. The value has this syntax:
(DESCRIPTION=(ADDRESS=(PROTOCOL=SDP) (HOST=hostName) (PORT=portNumber)) (CONNECT_DATA=(SERVICE_NAME=serviceName)))
To add a listener for the Oracle Big Data Appliance connections coming in on the InfiniBand network, first add a network resource for the InfiniBand network with virtual IP addresses.
Note:
These instructions apply to Exadata V2, X2-2 , and X3-2 nodes running Oracle Linux 5. Document 1580584.1 in My Oracle Support provides instructions for these same systems as well as for X4-2, X5-2, and X6-2 nodes running Oracle Linux 6 .
This example below lists two nodes for an Oracle Exadata Database Machine quarter rack. If you have an Oracle Exadata Database Machine half or full rack, you must repeat node-specific lines for each node in the cluster.