4 Configuring Oracle Exadata Database Machine for Use with Oracle Big Data Appliance

This chapter provides information about optimizing communications between Oracle Exadata Database Machine and Oracle Big Data Appliance. It describes how you can configure Oracle Exadata Database Machine to use InfiniBand alone, or SDP over InfiniBand, to communicate with Oracle Big Data Appliance.

This chapter contains the following sections:

4.1 About Optimizing Communications

Oracle Exadata Database Machine and Oracle Big Data Appliance use Ethernet by default, although typically they are also connected by an InfiniBand network. Ethernet communications are much slower than InfiniBand. After you configure Oracle Exadata Database Machine to communicate using InfiniBand, it can obtain data from Oracle Big Data Appliance many times faster than before.

Moreover, client applications that run on Oracle Big Data Appliance and push the data to Oracle Database can use Sockets Direct Protocol (SDP) for an additional performance boost. SDP is a standard communication protocol for clustered server environments, providing an interface between the network interface card and the application. By using SDP, applications place most of the messaging burden upon the network interface card, which frees the CPU for other tasks. As a result, SDP decreases network latency and CPU utilization, and thereby improves performance.

4.1.1 About Applications that Pull Data Into Oracle Exadata Database Machine

Oracle SQL Connector for Hadoop Distributed File System (HDFS) is an example of an application that pulls data into Oracle Exadata Database Machine. The connector enables an Oracle external table to access data stored in either HDFS files or a Hive table.

The external table provide access to the HDFS data. You can use the external table for querying HDFS data or for loading it into an Oracle database table.

Oracle SQL Connector for HDFS functions as a Hadoop client running on the database servers in Oracle Exadata Database Machine.

If you use Oracle SQL Connector for HDFS or another tool that pulls the data into Oracle Exadata Database Machine, then for the best performance, you should configure the system to use InfiniBand. See "Specifying the InfiniBand Connections to Oracle Big Data Appliance."

See Also :

Oracle Big Data Connectors User's Guide for information about Oracle SQL Connector for HDFS

4.1.2 About Applications that Push Data Into Oracle Exadata Database Machine

Oracle Loader for Hadoop is an example of an application that pushes data into Oracle Exadata Database Machine. The connector is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. You can use it to load data from Oracle Big Data Appliance to Oracle Exadata Database Machine.

Oracle Loader for Hadoop functions as a database client running on the Oracle Big Data Appliance. It must make database connections from Oracle Big Data Appliance to Oracle Exadata Database Machine over the InfiniBand network. Use of Sockets Direct Protocol (SDP) for these database connections further improves performance.

If you use Oracle Loader for Hadoop or another tool that pushes the data into Oracle Exadata Database Machine, then for the best performance, you should configure the system to use SDP over InfiniBand as described in this chapter.

See Also :

Oracle Big Data Connectors User's Guide for information about Oracle Loader for Hadoop

4.2 Prerequisites for Optimizing Communications

Oracle Big Data Appliance and Oracle Exadata Database Machine racks must be cabled together using InfiniBand cables. The IP addresses must be unique across all racks and use the same subnet for the InfiniBand network.

See Also:

4.3 Specifying the InfiniBand Connections to Oracle Big Data Appliance

You can configure Oracle Exadata Database Machine to use the InfiniBand IP addresses of the Oracle Big Data Appliance servers. Otherwise, the default network is Ethernet. Use of the InfiniBand network improves the performance of all data transfers between Oracle Big Data Appliance and Oracle Exadata Database Machine.

To identify the Oracle Big Data Appliance InfiniBand IP addresses:

  1. If you have not done so already, install a CDH client on Oracle Exadata Database Machine. See "Providing Remote Client Access to CDH."

  2. Obtain a list of private host names and InfiniBand IP addresses for all Oracle Big Data Appliance servers.

    An Oracle Big Data Appliance rack can have 6, 12, or 18 servers.

  3. Log in to Oracle Exadata Database Machine with root privileges.

  4. Edit /etc/hosts on Oracle Exadata Database Machine and add the Oracle Big Data Appliance host names and InfiniBand IP addresses. The following example shows the sequential IP numbering:

    192.168.8.1       bda1node01.example.com    bda1node01
    192.168.8.2       bda1node02.example.com    bda1node02
    192.168.8.3       bda1node03.example.com    bda1node03
    192.168.8.4       bda1node04.example.com    bda1node04
    192.168.8.5       bda1node05.example.com    bda1node05
    192.168.8.6       bda1node06.example.com    bda1node06
    
  5. Check /etc/nsswitch.conf for a line like the following:

    hosts:      files dns 
    

    Ensure that the line does not reverse the order (dns files); if it does, your additions to /etc/hosts will not be used. Edit the file if necessary.

  6. Ping all Oracle Big Data Appliance servers. Ensure that ping completes and shows the InfiniBand IP addresses.

    # ping bda1node01.example.com
    PING bda1node01.example.com (192.168.8.1) 56(84) bytes of data.
    64 bytes from bda1node01.example.com (192.168.8.1): icmp_seq=1 ttl=50 time=20.2 ms
         .
         .
         .
    
  7. Run CDH locally on Oracle Exadata Database Machine and test HDFS functionality by uploading a large file to an Oracle Big Data Appliance server. Check that your network monitoring tools (such as sar) show I/O activity on the InfiniBand devices.

    To upload a file, use syntax like the following, which copies localfile.dat to the HDFS testdir directory on node05 of Oracle Big Data Appliance:

    hadoop fs -put localfile.dat hdfs://bda1node05.example.com/testdir/
    

4.4 Specifying the InfiniBand Connections to Oracle Exadata Database Machine

You can configure Oracle Big Data Appliance to use the InfiniBand IP addresses of the Oracle Exadata Database Machine servers. This configuration supports applications on Oracle Big Data Appliance that must connect to Oracle Exadata Database Machine.

To identify the Oracle Exadata Database Machine InfiniBand IP addresses:

  1. Obtain a list of private host names and InfiniBand IP addresses for all Oracle Exadata Database Machine servers.

  2. Log in to Oracle Big Data Appliance with root privileges.

  3. Edit /etc/hosts on Oracle Big Data Appliance and add the Oracle Exadata Database Machine host names and InfiniBand IP addresses.

  4. Check /etc/nsswitch.conf for a line like the following:

    hosts:      files dns 
    

    Ensure that the line does not reverse the order (dns files); if it does, your additions to /etc/hosts will not be used. Edit the file if necessary.

  5. Restart the dnsmasq service:

    # service dnsmasq restart
    
  6. Ping all Oracle Exadata Database Machine servers. Ensure that ping completes and shows the InfiniBand IP addresses.

  7. Test the connection by downloading a large file to an Oracle Exadata Database Machine server. Check that your network monitoring tools (such as sar) show I/O activity on the InfiniBand devices.

    To download a file, use syntax like the following, which copies a file named mydata.json to the dm01ce108 storage server:

    $ scp mydata.json oracle@dm01cel08-priv.example.com:mybigdata.json
    oracle@dm01cel08-priv.example.com's password: password
    

4.5 Enabling SDP on Exadata Database Nodes

SDP improves the performance of client applications that run on Oracle Big Data Appliance and push large data loads to Oracle Database on Oracle Exadata Database Machine.

The following procedure describes how to enable SDP on the database nodes in an Oracle Exadata Database Machine running Oracle Linux. You must also configure your application on a job-by-job basis to use SDP.

To enable SDP on Oracle Exadata Database Machine:

  1. Open /etc/infiniband/openib.conf file in a text editor, and add the following line:

    set: SDP_LOAD=yes
    
  2. Save these changes and close the file.

  3. To enable both SDP and TCP, open /etc/ofed/libsdp.conf in a text editor, and add the use both rule:

    use both server * : 
    use both client * : 
    
  4. Save these changes and close the file.

  5. Open /etc/modprobe.conf file in a text editor, and add this setting:

    options ib_sdp sdp_zcopy_thresh=0 recv_poll=0
    
  6. Save these changes and close the file.

  7. Replicate these changes across all database nodes in the Oracle Exadata Database Machine rack.

  8. Restart all database nodes for the changes to take effect.

  9. If you have multiple Oracle Exadata Database Machine racks, then repeat these steps on all of them.

To specify SDP protocol for a load job:

  1. Add JVM options to the HADOOP_OPTS environment variable to enable JDBC SDP export:

    HADOOP_OPTS="-Doracle.net.SDP=true -Djava.net.preferIPv4Stack=true"
    
  2. In either the Hadoop command or the configuration file for the job, set the mapred.child.java.opts configuration property to enable the child task JVMs for SDP.

    For example, use these options in the command line for a MapReduce job:

    -D mapred.child.java.opts="-Doracle.net.SDP=true -Djava.net.preferIPv4Stack=true"
    
  3. Configure standard Ethernet communications for the job.

    For example, Oracle Loader for Hadoop reads the value of the oracle.hadoop.loader.connection.url property from a job configuration file. The value has this syntax:

    jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
         (ADDRESS=(PROTOCOL=TCP)(HOST=hostName)(PORT=portNumber)))
         (CONNECT_DATA=(SERVICE_NAME=serviceName)))
    

    Replace hostName, portNumber, and serviceName with the appropriate values to identify the SDP listener on your Oracle Exadata Database Machine.

  4. Configure the Oracle listener on Exadata to support the SDP protocol and bind it to a specific port address (such as 1522).

    For example, Oracle Loader for Hadoop reads the value of the oracle.hadoop.loader.connection.oci_url property from a job configuration file. The value has this syntax:

    (DESCRIPTION=(ADDRESS=(PROTOCOL=SDP)
        (HOST=hostName) (PORT=portNumber))
        (CONNECT_DATA=(SERVICE_NAME=serviceName)))
    

4.6 Creating an SDP Listener on the InfiniBand Network

To add a listener for the Oracle Big Data Appliance connections coming in on the InfiniBand network, first add a network resource for the InfiniBand network with virtual IP addresses.

Note:

These instructions apply to Exadata V2, X2-2 , and X3-2 nodes running Oracle Linux 5. Document 1580584.1 in My Oracle Support provides instructions for these same systems as well as for X4-2, X5-2, and X6-2 nodes running Oracle Linux 6 .

This example below lists two nodes for an Oracle Exadata Database Machine quarter rack. If you have an Oracle Exadata Database Machine half or full rack, you must repeat node-specific lines for each node in the cluster.

  1. Edit /etc/hosts on each node in the Exadata rack to add the virtual IP addresses for the InfiniBand network. Make sure that these IP addresses are not in use. For example:
    # Added for Listener over IB
    192.168.10.21 dm01db01-ibvip.example.com dm01db01-ibvip
    192.168.10.22 dm01db02-ibvip.example.com dm01db02-ibvip 
    
  2. As the root user, create a network resource on one database node for the InfiniBand network. For example:
    # /u01/app/grid/product/12.1.0.1/bin/srvctl add network -k 2 -S 192.168.10.0/255.255.255.0/bondib0
    
  3. Verify that the network was added correctly with a command like the following examples:
    # /u01/app/grid/product/12.1.0.1/bin/crsctl stat res -t | grep net
    ora.net1.network
    ora.net2.network -- Output indicating new Network resource 
    

    or

    # /u01/app/grid/product/12.1.0.1/bin/srvctl config network -k 2
    Network exists: 2/192.168.10.0/255.255.255.0/bondib0, type static -- Output indicating Network resource on the 192.168.10.0 subnet 
    
  4. Add the virtual IP addresses on the network created in Step 2, for each node in the cluster. For example:
    # srvctl add vip -n dm01db01 -A dm01db01-ibvip/255.255.255.0/bondib0 -k 2
    #
    # srvctl add vip -n dm01db02 -A dm01db02-ibvip/255.255.255.0/bondib0 -k 2
    
  5. As the oracle user who owns Grid Infrastructure Home, add a listener for the virtual IP addresses created in Step 4.
    # srvctl add listener -l LISTENER_IB -k 2 -p TCP:1522,/SDP:1522
    
  6. For each database that will accept connections from the middle tier, modify the listener_networks init parameter to allow load balancing and failover across multiple networks (Ethernet and InfiniBand). You can either enter the full TNSNAMES syntax in the initialization parameter or create entries in tnsnames.ora in the $ORACLE_HOME/network/admin directory. The TNSNAMES.ORA entries must exist in GRID_HOME. The following example first updates tnsnames.ora.

    Complete this step on each node in the cluster with the correct IP addresses for that node. LISTENER_IBREMOTE should list all other nodes that are in the cluster. DBM_IB should list all nodes in the cluster.

    Note:

    The database instance reads the TNSNAMES only on startup. Thus, if you modify an entry that is referred to by any init.ora parameter (LISTENER_NETWORKS), then you must either restart the instance or issue an ALTER SYSTEM SET LISTENER_NETWORKS command for the modifications to take affect by the instance.

    DBM =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dm01-scan)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = dbm)
    ))
    DBM_IB =
    (DESCRIPTION =
    (LOAD_BALANCE=on)
    (ADDRESS = (PROTOCOL = TCP)(HOST = dm01db01-ibvip)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST = dm01db02-ibvip)(PORT = 1522))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = dbm)
    ))
    LISTENER_IBREMOTE =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dm01db02-ibvip.mycompany.com)(PORT = 1522))
    ))
    LISTENER_IBLOCAL =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dm01db01-ibvip.mycompany.com)(PORT = 1522))
    (ADDRESS = (PROTOCOL = SDP)(HOST = dm01db01-ibvip.mycompany.com)(PORT = 1523))
    ))
    LISTENER_IPLOCAL =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dm0101-vip.mycompany.com)(PORT = 1521))
    ))
    LISTENER_IPREMOTE =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dm01-scan.mycompany.com)(PORT = 1521))
    ))
    
  7. Connect to the database instance as sysdba.
  8. Modify the listener_networks init parameter by using the SQL ALTER SYSTEM command:
    SQL> alter system set listener_networks=
         '((NAME=network2) (LOCAL_LISTENER=LISTENER_IBLOCAL)
            (REMOTE_LISTENER=LISTENER_IBREMOTE))',
         '((NAME=network1)(LOCAL_LISTENER=LISTENER_IPLOCAL)
            (REMOTE_LISTENER=LISTENER_IPREMOTE))' scope=both;
    
  9. On the Linux command line, use the srvctl command to restart LISTENER_IB to implement the modification in Step 7:
    # srvctl stop listener -l LISTENER_IB
    # srvctl start listener -l LISTENER_IB