5 Cassandra Message Store Pre-Installation Tasks

This chapter provides information on the pre-installation tasks you must complete on Cassandra nodes before you can install Messaging Server software.

Summary of General Pre-Installation Tasks

The following list summarizes the general pre-installation tasks you must complete before installing any Messaging Server component.

  • Create a UNIX system user and group for Messaging Server, and set permissions for the directories and files owned by that user.

  • Check that DNS is running and configured properly for the Messaging Server host.

  • Check the number of file descriptors for Linux, and if this number is less than 16384, you need to increase the value.

  • Install Oracle Directory Server Enterprise Edition, if your site does not currently have Directory Server deployed.

See the chapter titled "Messaging Server Pre-Installation Tasks" in Messaging Server Installation and Configuration Guide for detailed information.

The following list summarizes the pre-installation tasks you must complete on Cassandra nodes:

Installing Java

To install Java, see "Prerequisites" on the Cassandra web site at:

http://cassandra.apache.org/doc/latest/getting_started/installing.html

Note:

The JAVA_HOME/bin directory must be in the PATH environment variable.

Installing Python

To install Python, see the Python documentation at:

https://docs.python.org/2/installing/

Be sure to use the version of Python that is supported by the version of Cassandra that you are installing.

Installing Apache Cassandra

The tasks to install Apache Cassandra are:

Downloading the Apache Cassandra Software

To download the Cassandra software:

  1. Download the Cassandra software from the Cassandra download site, located at:

    http://cassandra.apache.org/download/

  2. Copy the installer file to your Cassandra message store hosts.

Installing the Apache Cassandra Software

To install Cassandra software:

  1. On each Cassandra node, install the Cassandra software, and verify that Cassandra is running.

    For more information, see the Cassandra installation documentation at:

    http://cassandra.apache.org/doc/latest/getting_started/installing.html#

  2. Ensure that for Oracle Linux 6.x and later, the 32-bit versions of the glibc libraries are installed.

    For more information, see the Cassandra documentation at:

    http://cassandra.apache.org/doc/latest/getting_started/installing.html#

  3. Optionally, install msstatbot, a monitoring solution for Cassandra. For more information, see the Messaging Server System Administrator's Guide.

Installing Elasticsearch Cluster

To install Elasticsearch cluster, see the documentation at: https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html

Setting Up the Cassandra Cluster

To set up the Cassandra cluster, see the Cassandra documentation:

When setting up multiple data centers, the Messaging Server recommendation, which minimizes the overhead in replicating and repairing Cassandra keyspaces across all data centers, is to configure four data centers in three clusters with keyspaces arranged as shown in Table 5-1.

Table 5-1 Recommended Multiple Data Centers and Clusters Configuration

Data Center Name and Node Types Keyspaces Cluster Configuration

DC_MSG, Cassandra nodes

ms_msg

Cluster Content

DC_META, Cassandra nodes

ms_mbox, ms_index

Combined with DC_INDEX into Cluster Metadata

DC_CACHE, Cassandra nodes

ms_cache

Cluster Cache


Cluster settings, such as the cluster name and seed nodes, are defined in the cassandra.yaml file. See the following section for more information.

To support more concurrent index updates, the ratio of DC_META nodes to DC_INDEX nodes should be at least 1 to 2.

Changing Initial Cassandra Settings

On each Cassandra node, optimize the Cassandra installation by following the recommendations in the Cassandra documentation.

Changing Initial Tuning Settings

On each Cassandra node, change the configuration files described in this section so that the node operates correctly in the Cassandra message store deployment.

cassandra.yaml File

Make the changes in this section to the /etc/cassandra/cassandra.yaml file.

For all nodes, to enable separate clusters for better performance, specify cluster_name.

Make the following changes to the num_tokens setting:

  • DC_MSG, DC_META, and DC_CACHE nodes:

    num_tokens: 256
    

To improve performance, locate data on SSD drives:

  • data_file_directories:

    /var/lib/cassandra/data
    
  • commitlog_directory:

    /var/lib/cassandra/commitlog
    
  • saved_caches_directory:

    /var/lib/cassandra/saved_caches
    
  • hints_directory:

    /var/lib/cassandra/hints
    

To support large mailbox and message, increase the commitlog size:

commitlog_segment_size_in_mb: 256

To specify seed nodes, you must use two nodes from each data center in the cluster, preferably located on different racks, so that each cluster has different seeds, for example:

  • DC_MSG cluster:

    seeds: "192.0.2.12,192.0.2.24"
    
  • DC_META cluster:

    seeds: "192.0.2.1,192.0.2.2,192.0.2.10,192.0.2.3"
    
  • DC_CACHE cluster:

    seeds: "192.0.2.14,192.0.2.7"
    

For all nodes, make the following change to improve performance:

memtable_flush_writers: 8

For all nodes, specify listen_address, rpc_address, native_transport_address, and so on, according to your deployment.

cassandra-env.sh File

For all nodes, to specify the location of the heap dump, make the following change to the /etc/cassandra/cassandra-env.sh file:

export CASSANDRA_HEAPDUMP_DIR=/scratch/heapdump

jvm.options File

For DC_MSGDC_META, and DC_CACHE nodes, make the following heap size changes:

-Xms32G
-Xmx32G

For DC_META nodes, to improve performance, make the following heap size changes to improve performance:

-Xms16G
-Xmx16G

For all nodes, make the following changes to improve performance:

-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=12
-XX:ConcGCThreads=12 

For all nodes, print garbage collection measurements, which are useful for monitoring system performance:

-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M

cassandra-rackdc.properties File

For all nodes, make the following changes to the cassandra-rackdc.properties file.

  1. Configure the endpoint snitch:

    endpoint_snitch: GossipingPropertyFileSnitch
    
  2. Set the data center and rack names as appropriate:

    dc=mydc
    rack=myrac
    

    For example, for a node in DC_CACHE in a physical rack in one location, set dc=DC_CACHE and rack=RAC1. And, for another node in DC_CACHE in a physical rack in another location, set dc=DC_CACHE and rack=RAC2.

    Note:

    Data center and rack names are case sensitive.