5 Cassandra Message Store Pre-Installation Tasks

This chapter provides information on the pre-installation tasks you must complete on Cassandra nodes before you can install Messaging Server software.

Summary of General Pre-Installation Tasks

The following list summarizes the general pre-installation tasks you must complete before installing any Messaging Server component.

  • Create a UNIX system user and group for Messaging Server, and set permissions for the directories and files owned by that user.

  • Check that DNS is running and configured properly for the Messaging Server host.

  • Check the number of file descriptors for Linux, and if this number is less than 16384, you need to increase the value.

  • Install Oracle Directory Server Enterprise Edition, if your site does not currently have Directory Server deployed.

See the chapter titled "Messaging Server Pre-Installation Tasks" in Messaging Server Installation and Configuration Guide for detailed information.

The following list summarizes the pre-installation tasks you must complete on Cassandra nodes:

Installing Java

To install Java, see "Installing Oracle JDK on RHEL-based Systems" on the DataStax web site at:

http://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/install/installJdkRHEL.html

Note:

The JAVA_HOME/bin directory must be in the PATH environment variable.

Installing Python

To install Python, see the Python documentation at:

https://docs.python.org/2/installing/

Be sure to use the version of Python that is supported by the version of DataStax Enterprise Max that you are installing.

Installing DataStax Enterprise

The tasks to install DataStax Enterprise are:

Downloading the DataStax Enterprise Software

To download the DataStax Enterprise Max software:

  1. Register with DataStax and download the DSE software from the DataStax download site, located at:

    https://academy.datastax.com/downloads

  2. Copy the installer file to your Cassandra message store hosts.

Installing the DataStax Enterprise Software

To install DataStax Enterprise software:

  1. On each Cassandra/Solr node, you configure a datastax.repo file, install the DataStax Enterprise packages, start the DSE software, and verify that DSE is running.

    For more information, see the DataStax Enterprise installation documentation at:

    http://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/install/installTOC.html

  2. On all Solr nodes, enable Solr by setting the following option in the /etc/default/dse file:

    SOLR_ENABLED=1
    
  3. Ensure that for Oracle Linux 6.x and later, the 32-bit versions of the glibc libraries are installed.

    For more information, see the DataStax Enterprise documentation at:

    https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/install/installDseInstallGlibc.html

  4. Optionally, install OpsCenter, a visual management and monitoring solution for DataStax Enterprise. For more information, see the OpsCenter installation documentation at:

    https://docs.datastax.com/en/latest-opscenter/opsc/online_help/opscOverview_c.html

Setting Up the Cassandra Cluster

To set up the Cassandra cluster, see the following DataStax documentation:

When setting up multiple data centers, the Messaging Server recommendation, which minimizes the overhead in replicating and repairing DataStax keyspaces across all data centers, is to configure four data centers in three clusters with keyspaces arranged as shown in Table 5-1.

Table 5-1 Recommended Multiple Data Centers and Clusters Configuration

Data Center Name and Node Types Keyspaces Cluster Configuration

DC_MSG, Cassandra nodes

ms_msg

Cluster Content

DC_META, Cassandra nodes

ms_mbox, ms_index

Combined with DC_INDEX into Cluster Metadata

DC_INDEX, Cassandra/Solr nodes

ms_index

Combined with DC_META into Cluster Metadata

DC_CACHE, Cassandra nodes

ms_cache

Cluster Cache


Cluster settings, such as the cluster name and seed nodes, are defined in the cassandra.yaml file. See the following section for more information.

To support more concurrent index updates, the ratio of DC_META nodes to DC_INDEX nodes should be at least 1 to 2.

Changing Initial Cassandra Settings

On each Cassandra node, optimize the DataStax Enterprise installation by following the recommendations in the DataStax documentation at:

https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

Changing Initial Tuning Settings

On each Cassandra node, change the configuration files described in this section so that the node operates correctly in the Cassandra message store deployment.

Linux Tuning Settings

To optimize Cassandra on Linux, see the DataStax recommendations at:

https://docs.datastax.com/en/landing_page/doc/landing_page/recommendedSettingsLinux.html

dse File

For DC_INDEX nodes, which run Solr, make the following change to the /etc/default/dse file:

SOLR_ENABLED=1

dse.yaml File

For DC_INDEX (Solr) nodes, make the following changes to the /etc/dse/cassandra/dse.yaml file to improve performance:

max_solr_concurrency_per_core: 6
back_pressure_threshold_per_core: 5000
cql_slow_log_option:
     enabled: false

cassandra.yaml File

Make the changes in this section to the /etc/dse/cassandra/cassandra.yaml file.

For all nodes, to enable separate clusters for better performance, specify cluster_name.

Make the following changes to the num_tokens setting:

  • DC_MSG, DC_META, and DC_CACHE nodes:

    num_tokens: 256
    
  • DC_INDEX (Solr) nodes:

    num_tokens: 16
    allocate_tokens_for_local_replication_factor 
    

On DC_INDEX (Solr) nodes, make the following change to the allocate_tokens_for_local_replication_factor setting:

allocate_tokens_for_local_replication_factor replication_factor

where replication_factor is derived from the store.cassolrrf configuration option, and by default has a value of 2.

Note:

This recommendation is for the DataStax Enterprise 5.10. release.

To improve performance, locate data on SSD drives:

  • data_file_directories:

    /var/lib/cassandra/data
    
  • commitlog_directory:

    /var/lib/cassandra/commitlog
    
  • saved_caches_directory:

    /var/lib/cassandra/saved_caches
    
  • hints_directory:

    /var/lib/cassandra/hints
    

To support large mailbox and message, increase the commitlog size:

commitlog_segment_size_in_mb: 256

To specify seed nodes, you must use two nodes from each data center in the cluster, preferably located on different racks, so that each cluster has different seeds, for example:

  • DC_MSG cluster:

    seeds: "192.0.2.12,192.0.2.24"
    
  • DC_META/DC_INDEX cluster:

    seeds: "192.0.2.1,192.0.2.2,192.0.2.10,192.0.2.3"
    
  • DC_CACHE cluster:

    seeds: "192.0.2.14,192.0.2.7"
    

For DC_INDEX nodes, make the following changes to improve performance:

memtable_heap_space_in_mb: 2048

For all nodes, make the following change to improve performance:

memtable_flush_writers: 8

For all nodes, to specify listen_address, rpc_address, and so on, make the following changes:

listen_address: 10.128.128.12
rpc_address: 10.128.128.12

cassandra-env.sh File

For all nodes, to specify the location of the heap dump, make the following change to the /etc/dse/cassandra/cassandra-env.sh file:

export CASSANDRA_HEAPDUMP_DIR=/scratch/heapdump

jvm.options File

For DC_MSGDC_META, DC_INDEX, and DC_CACHE nodes, make the following heap size changes:

-Xms32G
-Xmx32G

For DC_META nodes, to improve performance, make the following heap size changes to improve performance:

-Xms16G
-Xmx16G

For all nodes, make the following changes to improve performance:

-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=12
-XX:ConcGCThreads=12 

For all nodes, print garbage collection measurements, which are useful for monitoring system performance:

-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M

cassandra-rackdc.properties File

For all nodes, make the following changes to the cassandra-rackdc.properties file.

  1. Configure the endpoint snitch:

    endpoint_snitch: GossipingPropertyFileSnitch
    
  2. Set the data center and rack names as appropriate:

    dc=mydc
    rack=myrac
    

    For example, for a node in DC_CACHE in a physical rack in one location, set dc=DC_CACHE and rack=RAC1. And, for another node in DC_CACHE in a physical rack in another location, set dc=DC_CACHE and rack=RAC2.

    Note:

    Data center and rack names are case sensitive.

Changing Initial Solr Tunings

On each Cassandra/Solr node, you might need to make changes to the solrconfig.xml file, which is the configuration file with the most tuning parameters affecting Solr itself. For more information about Solr tuning parameters, see the DataStax documentation at:

http://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/search/performanceTuningTOC.html

Note:

DSE Search has a live indexing feature to increase indexing throughput, which is turned off by default. Enabling this feature causes sporadic search failures under load. This is a known DSE bug (DSP-12600) as of DSE 5.0.4.

For more information about the live indexing feature, see the DataStax documentation at:

http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/tuningIndexing.html

To make changes to the solrconfig.xml file:

  1. Use the dsetool read_resource keyspace.table name=resfilename command to read the xml.

    where:

    • keyspace.table is ks-preindex (ks-pre is the prefix configured by the store.caskeyspaceprefix option; the default is ms_)

    • resfilename is solrconfig.xml

  2. Edit the xml.

  3. Use the dsetool write_resource keyspace.table name=resfilename file=path_to_file _to_upload command to write changes to the xml.

    where:

    • file=path_to_file_to_upload is the name and path of the resource file to upload

  4. Use the dsetool reload_core keyspace.table command to reload the Solr core.

Example:

dsetool read_resource ms_index.msgindex name=solrconfig.xml > /tmp/solrconfig.xml
vi /tmp/solrconfig.xml
##### Make required edits to the file #####
dsetool write_resource ms_index.msgindex name=solrconfig.xml
file=/tmp/solrconfig.xml
dsetool reload_core ms_index.msgindex