Sun Java System Messaging Server 6.3 Administration Guide

14.4 Using SpamAssassin

This section consists of the following subsections:

14.4.1 SpamAssassin Overview

Messaging Server supports the use of SpamAssassin, a freeware mail filter used to identify spam. SpamAssassin consists of a library written in Perl and a set of applications and utilities that can be used to integrate SpamAssassin into messaging systems.

SpamAssassin calculates a score for every message by performing a series of tests on the message header and body information. Each test succeeds or fails, and a verdict of true (spam) or false (not spam) is rendered. Scores are real numbers that can be positive or negative. Scores that exceed a specified threshold, typically 5.0, are considered to be spam. An example of a SpamAssassin result string is:

True ; 18.3 / 5.0

True indicates the message is spam. 18.3 is the SpamAssassin score. 5.0 is the threshold.

SpamAssassin is highly configurable. Tests may be added or removed at any time, and the scores of existing tests can be adjusted. This is all done through various configuration files. Further information on SpamAssassin can be found on the SpamAssassin web site.

The same mechanism used for calling out to the Brightmail spam and virus scanning library can be used to connect to the SpamAssassin spamd server. The module provided in Messaging Server is called libspamass.so.

14.4.2 SpamAssassin/Messaging Server Theory of Operations

spamd is the daemon version of SpamAssassin and can be invoked from the MTA. spamd listens on a socket for requests and spawns a child process to test the message. The child process dies after processing the message and sending back a result. In theory, the forking should be an efficient process because the code itself is shared among the children processes.

The client portion, spamc from the SpamAssassin installation, is not used. Instead, its function is done by a shared library called libspamass.so, which is part of the Messaging Server. libspamass.so is loaded the same way that the Brightmail SDK is loaded.

From the MTA’s point of view, you can almost transparently switch between SpamAssassin and Brightmail for spam filtering. It’s not completely transparent, because they do not have the same functions. For example, Brightmail can also filter for viruses, but SpamAssassin is only used to filter for spam. The result, or verdict, returned by the two software packages is also different. SpamAssassin provides a score, while Brightmail provides just the verdict name, so the configuration would also have some differences.

When using SpamAssassin with the MTA, only a score and verdict is returned from SpamAssassin. The message itself is not modified. That is, options such as adding headers and modifying subject lines must be done by Sieve scripts. In addition, the mode option allows you to specify the string that is returned to indicate the verdict. The string choices are null, default, SpamAssassin result string, or a verdict string. See 14.4.7 SpamAssassin Options for details.

14.4.3 SpamAssassin Requirements and Usage Considerations

SpamAssassin is free. Go to http//www.spamassassin.org for software and documentation.
SpamAssassin can be tuned and configured to provide very accurate detection of spam. The tuning is up to you and the SpamAssassin community. Messaging Server does not provide or enhance what SpamAssassin can do.
While no specific numbers are available, SpamAssassin seems to reduce throughput more than Brightmail.
SpamAssassin integrated with the MTA can be enabled for a user, a domain, or a channel.
SpamAssassin can be configured to use other online databases such as Vipul Razor or Distributed checksum clearinghouse (DCC).
Messaging Server does not supply an Secure Socket Layer (SSL) version of libspamass.so, however, it is possible to build SpamAssassin to use openSSL.
Perl 5.6 or later is required.

14.4.3.1 Where Should You Run SpamAssassin?

SpamAssassin can run on a separate system of its own, on the same system as the Messaging Server in a single system deployment, or on the same system as the MTA in a two-tier deployment. If Local Mail Transfer Protocol (LMTP) is used between the MTA and the message store, the filtering must be invoked from the MTA. It cannot be invoked from the message store. When SMTP is used between the MTA and the message store, it can be invoked from either one, and it can run on either system or a separate third system.

If you want to use a farm of servers running SpamAssassin, you would have to use a load balancer in front of them. The MTA is configured with only one address for the SpamAssassin server.

14.4.4 Deploying SpamAssassin

Perform the following steps to deploy SpamAssassin:

Install and configure SpamAssassin. Refer to the SpamAssassin software documentation for installation and configuration information. See also 14.4.7 SpamAssassin Options.
Load and configure the SpamAssassin client library. This involves specifying the client library, libspamass.so, and configuration file to the MTA (you must create this file). See 14.2.1 Loading and Configuring the Spam Filtering Software Client Library
Specify what messages to filter for spam. Messages can be filtered by user, domain, or channel. See 14.2.2 Specifying the Messages to Be Filtered
Specify what actions to take on spam messages. Spam can be discarded, filed into a folder, tagged on the subject line, and so on. See 14.2.3 Specifying Actions to Perform on Spam Messages
Set miscellaneous filter configuration parameters as desired. See Table 14–1

14.4.5 SpamAssassin Configuration Examples

This section describes some common SpamAssassin configuration examples:

Note –

These examples use a number of options and keywords. Refer to 12.12.5 Spam Filter Keywords and Table 14–1.

To File Spam to a Separate Folder

This example tests messages arriving at the local message store and files spam into a folder called spam. The first three steps can be done in any order.

Create the SpamAssassin configuration file.

The name and location of this file is specified in Step 2. A good name is spamassassin.opt. This file contains the following lines:
host=127.0.0.1 port=2000 mode=0 verdict=spam debug=1
host and port specify the name of the system where spamd is running and the port on which spamd listens for incoming requests. mode=0 specifies that a string, specified by verdict, is returned if the message is perceived as spam. debug=1 turns on debugging in the SpamAssassin library. See Table 14–3

Add the following lines to the option.dat file:

! for Spamassassin
spamfilter1_config_file=/opt/SUNWmsgsr/config/spamassassin.opt
spamfilter1_library=/opt/SUNWmsgsr/lib/libspamass.so
spamfilter1_optional=1
spamfilter1_string_action=data:,require "fileinto"; fileinto "$U";

spamfilter1_config_file specifies the SpamAssassin configuration file.

spamfilter1_library specifies the SpamAssassin shared library.

spamfilter1_optional=1 specifies that the MTA continue operation if there is a failure by spamd.

spamfilter1_string_action specifies the Sieve action to take for a spam messages.

In this example, spamfilter1_string_action is not necessary because the default value already is data:,require "fileinto"; fileinto "$U";. This line specifies that spam messages are sent to a folder. The name of the folder is the spam verdict value returned by SpamAssassin. The value returned by SpamAssassin is specified by the verdict option in spamassassin.opt. (See Step 1.) In this case, the folder name is spam.

Specify the messages to be filtered.

To filter all messages coming into the local message store, change the imta.cnf file by adding the destinationspamfilterXoptin spam keywords on the ims-ms channel:

!
! ims-ms
ims-ms defragment subdirs 20 notices 1 7 14 21 28 backoff "pt5m" "pt10m" 
"pt30m" "pt1h"  "pt2h" "pt4h" maxjobs 4 pool IMS_POOL fileinto
$U+$S@$D destinationspamfilter1optin spam
ims-ms-daemon

Recompile the configuration and restart the server. Only the MTA needs to be restarted. You do not need to execute stop-msg.
# imsimta cnbuild # imsimta restart

Start the spamd daemon. This is normally done with a command of the form:

spamd -d

spamd defaults to only accepting connections from the local system. If SpamAssassin and Messaging Server are running on different systems, this syntax is required:

spamd -d -i listen_ip_address -A allowed_hosts

where listen_ip_address is the address on which to listen and allowed_hosts is a list of authorized hosts or networks (using IP addresses) which can connect to this spamd instance.

Note –
0.0.0.0 can be used with -i listen_ip_address to have spamd listen on all addresses. Listening on all addresses is preferable because it spamfilterX_verdict_n avoids having to change command scripts when changing a system’s IP address.

To Add a Header Containing SpamAssassin Score to Spam Messages

This example adds the header Spam-test: result string to messages determined to be spam by SpamAssassin. An example header might be:

Spam-test: True ; 7.3 / 5.0

where Spam-test: is a literal and everything after that is the result string. True means that it is spam (false would be not spam). 7.3 is the SpamAssassin score. 5.0 is the threshold. This result is useful for setting up a Sieve filter that can file or discard mail above or between a certain score.

In addition, setting USE_CHECK to 0 returns the list of SpamAssassin tests that matched along with the verdict string. See USE_CHECK in Table 14–3.

Specify the messages to be filtered. This is described in Step 3 in To File Spam to a Separate Folder

Create the SpamAssassin configuration file.

The name and location of this file is specified with spamfilter_configX_file (see next step). It consists of the following lines:
host=127.0.0.1 port=2000 mode=1 field= debug=1
host and port specify the name of the system where spamd is running and the port on which spamd listens for incoming requests. mode=1 specifies that the SpamAssassin result string is returned if the message is found to be spam. field= specifies a string prefix for the SpamAssassin result string. In this example, a prefix is not desired because we are specifying it in the Sieve script. debug=1 turns on debugging in the SpamAssassin library.

Add the following lines to the option.dat file:

!for Spamassassin
spamfilter1_config_file=/opt/SUNWmsgsr/config/spamassassin.opt
spamfilter1_library=/opt/SUNWmsgsr/lib/libspamass.so
spamfilter1_optional=1
spamfilter1_string_action=data:,require ["addheader"];addheader "Spam-test: $U";

As in previous examples, the first three options specify the SpamAssassin configuration file, shared library, and to continue MTA operation if there is a failure by the shared library. The following line:

spamfilter1_string_action=data:,require ["addheader"];addheader "Spam-test: $U";

specifies that a header is added to spam messages. The header has the literal prefix Spam-text: followed by the string returned by SpamAssassin. Because mode=1 was specified in the previous step, the SpamAssassin result string is returned. For example: True; 7.3/5.0

Recompile the configuration, restart the server and start the spamd daemon.

See 14.4.5 SpamAssassin Configuration Examples.

To Add the SpamAssassin Result String to the Subject Line

By adding the SpamAssassin result string to the Subject line, users can determine whether they wish to read a message with a SpamAssassin score. For example:

Subject: [SPAM True ; 99.3 / 5.0] Free Money At Home with Prescription Xanirex!

Note that setting USE_CHECK to 0 returns the list of SpamAssassin tests that matched along with the verdict string (see 14.4.7 SpamAssassin Options in 14.4.7 SpamAssassin Options). This list can be very long, so it is best to set USE_CHECK to 1.

Specify the messages to be filtered.

See Step 3 in To File Spam to a Separate Folder

Create the SpamAssassin configuration file.

This step is described in To File Spam to a Separate Folder. mode=1 specifies that the SpamAssassin result string is returned if the message is found to be spam.
host=127.0.0.1 port=2000 mode=1 debug=1
host and port specify the name of the system where spamd is running and the port on which spamd listens for incoming requests. mode=1 specifies that the SpamAssassin result string is returned if the message is spam. debug=1 turns on debugging in the SpamAssassin library.

Add the following lines to the option.dat file:

!for Spamassassin
spamfilter1_config_file=/opt/SUNWmsgsr/config/spamassassin.opt
spamfilter1_library=/opt/SUNWmsgsr/lib/libspamass.so
spamfilter1_optional=1
spamfilter1_string_action=data:,addtag “[SPAM detected: $U]”;

spamfilter1_string_action=data:,addtag “[SPAM detected $U]”;

specifies that a tag be added to the Subject: line. It has the literal prefix SPAM detected followed by the field string (default: Spam-Test) followed by “[result string]” returned by SpamAssassin. Because mode=1 was specified in 14.4.5 SpamAssassin Configuration Examples, the SpamAssassin result string is returned. Thus, a subject line looks something like this:

Subject: [SPAM detected Spam-Test: True ; 11.3 / 5.0] Make Money!

You can also use addheader and addtag together:

spamfilter1_string_action=data:,require ["addheader"];addtag "[SPAM detected $U]";addheader "Spamscore: $U";

to get a message like this:

Subject: [SPAM detected Spam-Test: True ; 12.3 / 5.0] Vigaro Now!Spamscore: Spam-Test: True ; 12.3 / 5.0

Set field= in spamassassin.opt to remove the default value of Spam-Test. A cleaner message is returned:

Subject: [SPAM True ; 91.3 / 5.0] Vigaro Now!Spamscore: True ; 91.3 / 5.0

Recompile the configuration, restart the server and start the spamd daemon.

See To File Spam to a Separate Folder.

To Filter Messages Based on SpamAssassin Score

This example shows how to filter messages based on a SpamAssassin score. It uses the spamadjust and spamtest Sieve filter actions. In this example, a header containing the SpamAssassin score is added to all messages. This header can be used by the SpamAssassin software administrator to tune SpamAssassin for improved spam email detection. If the message has a SpamAssassin score between 5 and 10, the message is filtered to a spam folder within the user's account. If the message has a SpamAssassin score greater then 10, the message is discarded. Note that by default SpamAssassin considers messages with a score of 5 and greater to be spam.

Specify the messages to be filtered.

This is described in Step 3 of To File Spam to a Separate Folder.

Create the SpamAssassin configuration file.

The name and location of this file is specified with spamfilter_configX_file (see next step). It consists of the following lines:
debug=1 host=127.0.0.1 port=783 mode=2 field=
host and port specify the name of the system where spamd is running and the port on which spamd listens for incoming requests. mode=2 specifies that the SpamAssassin result string is always returned regardless of the score. field= specifies a string prefix for the SpamAssassin result string. In this example, a prefix is not desired because we are specifying it in the Sieve script. debug=1 turns on debugging in the SpamAssassin library.

Add the following lines to the option.dat file

! For SpamAssassin
spamfilter1_config_file=/opt/SUNWmsgsr/config/spamassassin.opt
spamfilter1_library=/opt/SUNWmsgsr/lib/libspamass.so
spamfilter1_optional=1
spamfilter1_string_action=data:, require ["addheader","spamtest"]; \
spamadjust "$U"; addheader "Spam-test: $U"

As in previous examples, the first three lines specify the SpamAssassin configuration file, shared library, and to continue MTA operation if there is a failure by the shared library. The last two lines specify that the SpamAssassin score should be extracted from the return string from SpamAssassin ($U), which is used in the spamtest operation, and a spam score header should be added to all messages (for example, Spam-test: True; 7.3/5.0)

Create a channel level filter to process the email based on the spam score.

Refer to To Create a Channel-level Filter. Add the following rule to that file:

require ["spamtest","relational","comparator-i;ascii-numeric","fileinto"];
if spamtest :value "ge" :comparator "i;ascii-numeric" "10" {discard;}
elsif spamtest :value "ge" :comparator "i;ascii-numeric" "5" {fileinto "spam";}
else {keep;}

The second line discards the spam email if the SpamAssassin score is greater or equal to 10. The third line files the email to the users "spam" folder if the score is greater or equal to 5. The last line else {keep;} keeps all messages which received a score less then 5.

Recompile the configuration, restart the server and start the spamd daemon

See the final steps in To File Spam to a Separate Folder.

14.4.6 Testing SpamAssassin

To test SpamAssassin, first set debug=1 in the spamassassion.opt file. You do not have to turn on the channel-specific master_debug or slave_debug in the imta.cnf. Then send a test message to a test user. The msg-svr-base/data/log/tcp_local_slave.log* file should have lines similar to these:

15:15:45.44: SpamAssassin callout debugging enabled; config 
/opt/SUNWmsgsr/config/spamassassin.opt
15:15:45.44: IP address 127.0.0.1 specified
15:15:45.44: Port 2000 selected
15:15:45.44: Mode 0 selected
15:15:45.44: Field "Spam-Test: " selected
15:15:45.44: Verdict "spam" selected
15:15:45.44: Using CHECK rather than SYMBOLS
15:15:45.44: Initializing SpamAssassin message context
...
15:15:51.42: Creating socket to connect to SpamAssassin
15:15:51.42: Binding SpamAssassin socket
15:15:51.42: Connecting to SpamAssassin
15:15:51.42: Sending SpamAssassin announcement
15:15:51.42: Sending SpamAssassin the message
15:15:51.42: Performing SpamAssassin half close
15:15:51.42: Reading SpamAssassin status
15:15:51.67: Status line: SPAMD/1.1 0 EX_OK
15:15:51.67: Reading SpamAssassin result
15:15:51.67: Result line: Spam: False ; 1.3 / 5.0
15:15:51.67: Verdict line: Spam-Test: False ; 1.3 / 5.0
15:15:51.67: Closing connection to SpamAssassin
15:15:51.73: Freeing SpamAssassin message context

If your log file does not contain lines similar to these, or if spamd is not running, the following error message is returned in your SMTP dialog after the last period (.) is sent to the SMTP server.

452 4.4.5 Error writing message temporaries - Temporary scan failure: End message status = -1

In addition, if spamfilter1_optional=1 (highly recommended) is set in option.dat, the message is accepted, but not filtered. It is as if spam filtering was not enabled, and the following lines appear in tcp_local_slave.log*:

15:35:15.69: Creating socket to connect to SpamAssassin
15:35:15.69: Binding SpamAssassin socket
15:35:15.69: Connecting to SpamAssassin
15:35:15.69: Error connecting socket: Connection refused
15:35:15.72: Freeing SpamAssassin message context

The call to SpamAssassin happens after the SMTP server received the entire message (that is, after the last “.” is sent to the SMTP server), but before the SMTP server acknowledges to the sender that it accepted the message.

Another test is to send a sample spam message using sample-spam.txt from, for example, the Mail-SpamAssassin-2.60 directory. This message has the following special text string in it:

XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X

The corresponding tcp_local_slave.log* contains something like this:

16:00:08.15: Creating socket to connect to SpamAssassin
16:00:08.15: Binding SpamAssassin socket
16:00:08.15: Connecting to SpamAssassin
16:00:08.15: Sending SpamAssassin announcement
16:00:08.15: Sending SpamAssassin the message
16:00:08.15: Performing SpamAssassin half close
16:00:08.15: Reading SpamAssassin status
16:00:08.43: Status line: SPAMD/1.1 0 EX_OK
16:00:08.43: Reading SpamAssassin result
16:00:08.43: Result line: Spam: True ; 1002.9 / 5.0
16:00:08.43: Verdict line: Spam-Test: True ; 1002.9 / 5.0
16:00:08.43: Closing connection to SpamAssassin
16:00:08.43: Mode 0 verdict of spam
16:00:08.43: Mode 0 verdict of spam
16:00:08.47: Freeing SpamAssassin message context

A corresponding entry in the mail.log_current file would look as follows. Note the +spam part of the destination address, which means the message is filed in the folder called spam:

15-Dec-2003 15:32:17.44 tcp_intranet ims-ms E 1 morchia@siroe.com rfc822;
morchia morchia+spam@ims-ms-daemon 15-Dec-2003 15:32:18.53 
ims-ms D 1 morchia@siroe.com rfc822;morchia morchia+spam@ims-ms-daemon

14.4.7 SpamAssassin Options

This section contains the SpamAssassin option table.

Table 14–3 SpamAssassin Options (spamassassin.opt )

Options

Description

Default

debug

Specifies whether to turn on debugging in the libspamass.so. Debugging of spamd itself is controlled by the command line invoking spamd. Set to an integer value. 0 is off, 1 is on, a setting of 2 or greater reports exactly what was received from spamd.

field

Specifies the string prefix for the SpamAssassin result. SpamAssassin results look like this:

Spam-Test: False ; 0.0 / 5.0 Spam-Test: True ; 27.7 / 5.0

The field option provides the means for changing the Spam-Test: part of the result. Note that the “: “ is removed if an empty field value is specified.

If USE_CHECK is set to 0, the result string will look similar to this:

Spam-test: False ; 0.3 / 4.5 ; HTML_MESSAGE,NO_REAL_NAME

Spam-test: True ; 8.8 / 4.5 ; NIGERIAN_BODY, NO_REAL_NAME,PLING_PLING,RCVD_IN_SBL,SUBJ_ALL_CAPS

“Spam-test”

host

The name of the system where spamd is running.

localhost

mode

Controls the translation of SpamAssassin filter results to verdict information. That is, it specifies what verdict information is returned after a message is processed. Four modes are available. See 14.4.7.1 The SpamAssassin mode Option for further explanation.

0 - Return a verdict string (specified by the verdict option), if the message is spam. The MTA option spamfilterX_string_action can be used to specify what to do if a verdict string is returned. If the verdict option (defined below) is empty or unspecified, and message is spam, a null verdict is returned. The MTA option spamfilterX_null_action can be used to specify what to do if a null verdict is returned.

Returns a SpamAssassin default result string if it is not spam. (A default verdict always means to take no action and deliver as normal.)

1 - Returns the SpamAssassin result string if the message is found to be spam. Returns a SpamAssassin default result string if it is not spam. (Again, a default verdict always means to take no action and deliver as normal.) A SpamAssassin result string looks something like this: True; 6.5 / 7.3

2 - Same as mode 1 except that the SpamAssassin result string is returned regardless of whether the message is spam or not spam. No default or null verdict is ever returned and the verdict option is never used.

3 - Return the SpamAssassin result string if the message is found to be spam; return the verdict string specified by the verdict option if it is not. You can control the action for the SpamAssassin result string by using the spamfilterX_verdict_n and spamfilterX_action_n matched pair. You can control the action for the verdict string by using spamfilterX_string_action.

port

Specifies the port number where spamd listens for incoming requests.

783

USE_CHECK

1 - The spamd CHECK command is used to return the SpamAssassin score.

0 - Enables use of the SYMBOLS command which returns a score and a list of the SpamAssassin tests that matched. The system may hang or have other problems with this option in pre-2.55 versions of SpamAssassin. See field above.

SOCKS_HOST

String. Specifies the name of an intermediate SOCKS server. If this option is specified the ICAP connection is made through the specified SOCKS server and not directly.

SOCKS_PORT

Specifies the port that the intermediate SOCKS server is running on.

1080

USERNAME_MAPPING

Specify the name of a mapping to probe with address information as the plugin receives recipient addresses from the MTA. The probe format is:

current-username|current-recipient-address|current-optin-string

If the mapping sets the $Y flag the output string is taken to be the updated username to pass to spamd.

verdict

Specifies the verdict string used for MODE 0.

“”

14.4.7.1 The SpamAssassin mode Option

After processing a message, SpamAssassin determines whether a message is spam or not. mode allows you to specify the string that is returned to indicate the verdict. The string choices are null, default, SpamAssassin result string, or a verdict string specified with the verdict option. (Note that default is neither null, the SpamAssassin result string, nor the string specified by verdict, but some other non-configurable result string.) The mode operations are outlined in the table below.

Table 14–4 Returned String for the SpamAssassin mode Option


`verdict`\ Setting	Spam?	mode=0	mode=1	mode=2	mode=3
`verdict`="" (not set)	yes	null	SpamAssassin result	SpamAssassin result	SpamAssassin result
`verdict`="" (not set)	no	default	default	SpamAssassin result	default
`verdict`=string	yes	`verdict` `string`	SpamAssassin result	SpamAssassin result	SpamAssassin result
`verdict`=string	no	default	default	SpamAssassin result	`verdict` `string`

The first column indicates whether the verdict option is set or not set. The second column indicates whether the message is spam or not. The mode columns indicate the string returned for the various modes. For example, if verdict is not set and mode is set to 0 and a message is not spam, a default string is returned. If the verdict is set to YO SPAM! and mode is set to 0 and a message is spam, the string YO SPAM! is returned.