Note:

Use OCI Email Delivery to Send Slurm Lifecycle Notifications

Introduction

Simple Linux Utility Resource Management (Slurm) is an open-source cluster management and job scheduler. Since jobs are typically submitted to run at later times and can be long running, getting lifecycle notifications for jobs is vital to maximizing your High Performance Computing (HPC) investments. To help with this Oracle Cloud Infrastructure (OCI) has the Email Delivery service, a fast, reliable, and cost-effective solution for sending application-generated emails.

Objective

Prerequisites

As regular email notifications from Slurm can be brief and limited, I will be leveraging the Slurm-Mail project from @neilmunday, which gives much more information about jobs. It also allows you to customize emails using Slurm-Mail’s pre-built templates.

Task 1: Install Slurm-Mail

Refer to the Slurm-Mail Repository for installation instructions for various platforms. I opted to install from the source (as root).

$ git clone https://github.com/neilmunday/slurm-mail
$ cd slurm-mail
$ sudo pip install pathlib
$ sudo python3 setup.py install
$ sudo cp etc/logrotate.d/slurm-mail /etc/logrotate.d/
$ chown slurm:slurm /etc/logrotate.d/slurm-mail
$ install -d -m 700 -o slurm -g slurm /var/log/slurm-mail

Note: Remember where slurm-spool-mail and slurm-send-mail get installed.

Task 2: Configure Slurm-Mail

Edit the following parameters in the configuration file at /etc/slurm-mail/slurm-mail.conf.

Parameter Value Description
emailFromUserAddress Your approved sender Equates to the “From” address in the email
sacctExe bin/sacct Use which sacct to get the full path for your installation
scontrolExe bin/scontrol Use which scontrol to get the full path for your installation
smtpServer smtp.email.us-ashburn-1.oci.oraclecloud.com Your Email Delivery SMTP endpoint. Refer to Configuring SMTP Connection
smtpPort 587 Email delivery supports TLS on port 587 (recommended) or 25
smtpUseTls yes Must enforce TLS otherwise Email Delivery will reject it
smtpUserName Your SMTP username Generated when creating your SMTP Credentials
smtpPassword Your SMTP password Generated when creating your SMTP Credentials

Task 3: Set Permissions

Slurm-Mail works by creating email requests, placing them in the directory specified by the spoolDir parameter, and then having a cron job process the spool files. Therefore, we need to make sure the spoolDir exists and that Slurm-Mail can write to it. spoolDir is the /var/spool/slurm-mail default directory. Create this directory if directory does not already exist.

Slurm-Mail runs with user slurm. While more secure methods exist, for simplicity, I changed the owner and group of /var/spool/slurm-mail to slurm.

$ chown slurm /var/spool/slurm-mail
$ chgrp slurm /var/spool/slurm-mail

Confirm that you can now write files to spoolDir as the slurm user.

Task 4: Configure cron

The cron job that processes the spool files exists in /slurm-mail/etc/cron.d. By default, it thinks slurm-send-mail lives in /usr/bin. However, for me, slurm-send-mail installed in /usr/local/bin, so I needed to modify the job. After modifying, copy it to cron.d.

$ cp etc/cron.d/slurm-mail /etc/cron.d/

Task 5: Configure and Restart Slurm

Slurm has an optional configuration parameter MailProg, which is the fully qualified pathname to the program used to send email. We need to add this parameter and set it to where slurm-spool-mail lives (which is /usr/local/bin for me).

$ echo MailProg=/usr/local/bin/slurm-spool-mail >> /etc/slurm/slurm.conf

Restart Slurm.

$ systemctl restart slurmctld

Task 6: Create a Test Job

Create a test job with the proper email flags.

#!/bin/bash
#SBATCH --mail-user=<destination email address>
#SBATCH --mail-type=ALL
echo “Hello World!”

Note: --mail-type=ALL will send all event types. If you want only certain event notifications, refer to the sbatch documentation and change this value.

Task 7: Troubleshooting

  1. Check /var/log/slurm/slurmctld.log for any error messages.

  2. Look at the log files /var/log/slurm-mail/slurm-send-mail.log and /var/log/slurm-mail/slurm-spool-mail.log.

  3. If emails are still not sending, check out the troubleshooting section of the GitHub.

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.