Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Use OCI Email Delivery to Send Slurm Lifecycle Notifications
Introduction
Simple Linux Utility Resource Management (Slurm) is an open-source cluster management and job scheduler. Since jobs are typically submitted to run at later times and can be long running, getting lifecycle notifications for jobs is vital to maximizing your High Performance Computing (HPC) investments. To help with this Oracle Cloud Infrastructure (OCI) has the Email Delivery service, a fast, reliable, and cost-effective solution for sending application-generated emails.
Objective
- Use OCI Email Delivery service to send Slurm lifecycle notifications.
Prerequisites
-
Access to a Slurm managed cluster.
-
Set up OCI Email Delivery. For more information, see Step-by-step instructions to send email with OCI Email Delivery.
As regular email notifications from Slurm can be brief and limited, I will be leveraging the Slurm-Mail project from @neilmunday, which gives much more information about jobs. It also allows you to customize emails using Slurm-Mail’s pre-built templates.
Task 1: Install Slurm-Mail
Refer to the Slurm-Mail Repository for installation instructions for various platforms. I opted to install from the source (as root).
$ git clone https://github.com/neilmunday/slurm-mail
$ cd slurm-mail
$ sudo pip install pathlib
$ sudo python3 setup.py install
$ sudo cp etc/logrotate.d/slurm-mail /etc/logrotate.d/
$ chown slurm:slurm /etc/logrotate.d/slurm-mail
$ install -d -m 700 -o slurm -g slurm /var/log/slurm-mail
Note: Remember where
slurm-spool-mail
andslurm-send-mail
get installed.
Task 2: Configure Slurm-Mail
Edit the following parameters in the configuration file at /etc/slurm-mail/slurm-mail.conf
.
Parameter | Value | Description |
---|---|---|
emailFromUserAddress | Your approved sender | Equates to the “From” address in the email |
sacctExe | bin/sacct | Use which sacct to get the full path for your installation |
scontrolExe | bin/scontrol | Use which scontrol to get the full path for your installation |
smtpServer | smtp.email.us-ashburn-1.oci.oraclecloud.com | Your Email Delivery SMTP endpoint. Refer to Configuring SMTP Connection |
smtpPort | 587 | Email delivery supports TLS on port 587 (recommended) or 25 |
smtpUseTls | yes | Must enforce TLS otherwise Email Delivery will reject it |
smtpUserName | Your SMTP username | Generated when creating your SMTP Credentials |
smtpPassword | Your SMTP password | Generated when creating your SMTP Credentials |
Task 3: Set Permissions
Slurm-Mail works by creating email requests, placing them in the directory specified by the spoolDir
parameter, and then having a cron job process the spool files. Therefore, we need to make sure the spoolDir
exists and that Slurm-Mail can write to it. spoolDir
is the /var/spool/slurm-mail
default directory. Create this directory if directory does not already exist.
Slurm-Mail runs with user slurm
. While more secure methods exist, for simplicity, I changed the owner and group of /var/spool/slurm-mail
to slurm
.
$ chown slurm /var/spool/slurm-mail
$ chgrp slurm /var/spool/slurm-mail
Confirm that you can now write files to spoolDir
as the slurm
user.
Task 4: Configure cron
The cron job that processes the spool files exists in /slurm-mail/etc/cron.d
. By default, it thinks slurm-send-mail
lives in /usr/bin
. However, for me, slurm-send-mail
installed in /usr/local/bin
, so I needed to modify the job. After modifying, copy it to cron.d
.
$ cp etc/cron.d/slurm-mail /etc/cron.d/
Task 5: Configure and Restart Slurm
Slurm has an optional configuration parameter MailProg
, which is the fully qualified pathname to the program used to send email. We need to add this parameter and set it to where slurm-spool-mail
lives (which is /usr/local/bin
for me).
$ echo MailProg=/usr/local/bin/slurm-spool-mail >> /etc/slurm/slurm.conf
Restart Slurm.
$ systemctl restart slurmctld
Task 6: Create a Test Job
Create a test job with the proper email flags.
#!/bin/bash
#SBATCH --mail-user=<destination email address>
#SBATCH --mail-type=ALL
echo “Hello World!”
Note:
--mail-type=ALL
will send all event types. If you want only certain event notifications, refer to the sbatch documentation and change this value.
Task 7: Troubleshooting
-
Check
/var/log/slurm/slurmctld.log
for any error messages. -
Look at the log files
/var/log/slurm-mail/slurm-send-mail.log
and/var/log/slurm-mail/slurm-spool-mail.log
. -
If emails are still not sending, check out the troubleshooting section of the GitHub.
Related Links
Acknowledgments
-
Author - Cody Brinkman (Cloud Architect)
-
Contributor - Arun Mahajan (HPC Specialist)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Use OCI Email Delivery to Send Slurm Lifecycle Notifications
F91076-01
January 2024
Copyright © 2024, Oracle and/or its affiliates.