Sun Java System Messaging Server 6 2005Q4 Administration Guide

Character Set Labeling

The MIME specification provides a mechanism to label the character set used in a plain text message. Specifically, a charset= parameter can be specified as part of the Content-type: header line. Various character set names are defined in MIME, including US-ASCII (the default), ISO-8859-1, ISO-8859-2, and many more that have been subsequently defined.

Some existing systems and user agents do not provide a mechanism for generating these character set labels; as a result, some plain text messages may not be properly labeled. The charset7, charset8, and charsetesc channel keywords provide a per-channel mechanism to specify character set names to be inserted into message headers which lack character set labelling. Each keyword requires a single argument giving the character set name. The names are not checked for validity. Note, however, that character set conversion can only be done on character sets specified in the character set definition file charsets.txt found in the MTA table directory. The names defined in this file should be used if possible.

The charset7 character set name is used if the message contains only seven bit characters; the charset8 character set name will be used if eight bit data is found in the message; charsetesc will be used if a message containing only seven bit data happens to contain escape characters also. If the appropriate keyword is not specified no character set name will be inserted into Content-type: header lines.

Note that the charset8 keyword also controls the MIME encoding of 8-bit characters in message headers (where 8-bit data is unconditionally illegal). The MTA normally MIME-encodes any (illegal) 8-bit data encountered in message headers, labeling it as the UNKNOWN charset if no charset8 value has been specified.

These character set specifications never override existing labels; that is, they have no effect if a message already has a character set label or is of a type other than text. It is usually appropriate to label MTA local channels as follows:


l ... charset7 US-ASCII charset8 ISO-8859-1 ...
hostname

If there is no Content-type header in the message, it is added. This keyword also adds the MIME-version: header line if it is missing.

The charsetesc keyword tends to be particularly useful on channels that receive unlabeled messages using Japanese or Korean character sets that contain the escape character.