Sun Java System Messaging Server 6 2005Q4 Administration Guide

Character Set Conversion and Message Reformatting

This section describes character set, formatting, and labelling conversions performed internally by the MTA. Note that some of the examples in this section use old or obsolete technology like DEC VMS, or the d channels. Although these technologies are old or obsolete, this does not make the examples DEC- or d channel-specific. The examples are still valid in describing how the conversion technology works. We will update the examples in a later release.

One very basic mapping table in Messaging Server is the character set conversion table. The name of this table is CHARSET-CONVERSION. It is used to specify what sorts of channel-to-channel character set conversions and message reformatting should be done.

On many systems there is no need to do character set conversions or message reformatting and therefore this table is not needed. Situations arise, however, where character conversions must be done. For example, sites running Japanese OpenVMS may need to convert between DEC Kanji and the ISO-2022 Kanji currently used on the Internet. Another possible use of conversions arises when multinational characters are so heavily used that the slight discrepancies between the DEC Multinational Character Set (DEC-MCS) and the ISO-8859-1 character set specified for use in MIME may become an issue, and actual conversion between the two may therefore be needed.

The CHARSET-CONVERSION mapping table can also be used to alter the format of messages. Facilities are provided to convert a number of non-MIME formats into MIME. Changes to MIME encodings and structure are also possible. These options are used when messages are being relayed to systems that only support MIME or some subset of MIME. And finally, conversion from MIME into non-MIME formats is provided in a small number of cases.

The MTA will probe the CHARSET-CONVERSION mapping table in two different ways. The first probe is used to determine whether or not the MTA should reformat the message and if so, what formatting options should be used. (If no reformatting is specified, the MTA does not bother to check for specific character set conversions.) The input string for this first probe has the general form:

IN-CHAN=in-channel;OUT-CHAN=out-channel;CONVERT

Here in-channel is the name of the source channel (where the message comes from) and out-channel is the name of the destination channel (where the message is going). If a match occurs the resulting string should be a comma-separated list of keywords. Table 13–7 lists the keywords.

Table 13–7 CHARSET-CONVERSION Mapping Table Keywords

Keyword 

Description 

Always

Force conversion even the message is going to be passed through the conversion channel before going to out-channel.

Appledouble

Convert other MacMIME formats to Appledouble format. 

Applesingle

Convert other MacMIME formats to Applesingle format. 

BASE64

Switch MIME encodings to BASE64. This keyword only applies to message parts that are already encoded. Messages with Content-transfer-encoding: 7BIT or 8bit do not require any special encoding and therefore this BASE64 option will have no effect on them. 

Binhex

Convert other MacMIME formats, or parts including Macintosh type and Mac creator information, to Binhex format. 

Block

Extract just the data fork from MacMIME format parts. 

Bottom

“Flatten” any message/rfc822 body part (forwarded message) into a message content part and a header part. 

Delete

“Flatten” any message/rfc822 body part (forwarded message) into a message content part, deleting the forwarded headers. 

Level

Remove redundant multipart levels from message. 

Macbinary

Convert other MacMIME formats, or parts including Macintosh type and Macintosh creator information, to Macbinary format. 

No

Disable conversion. 

QUOTED-PRINTABLE

Switch MIME encodings to QUOTED-PRINTABLE. 

Record,Text

Line wrap text/plain parts at 80 characters. 

Record,Text= n

Line wrap text/plain parts at n characters. 

RFC1154

Convert message to RFC 1154 format. 

Top

“Flatten” any message/rfc822 body part (forwarded message) into a header part and a message content part. 

UUENCODE

Switch MIME encodings to X-UUENCODE. 

Yes

Enable conversion. 

Character Set Conversion

If the MTA probes and finds that the message is to be reformatted, it will proceed to check each part of the message. Any text parts are found and their character set parameters are used to generate the second probe. Only when the MTA has checked and found that conversions may be needed does it ever perform the second probe. The input string in this second case looks like this:

IN-CHAN=in-channel;OUT-CHAN=out-channel;IN-CHARSET=in-char-set

The in-channel and out-channel are the same as before, and the in-char-set is the name of the character set associated with the particular part in question. If no match occurs for this second probe, no character set conversion is performed (although message reformatting, for example, changes to MIME structure, may be performed in accordance with the keyword matched on the first probe). If a match does occur it should produce a string of the form:

OUT-CHARSET=out-char-set

Here out-char-set specifies the name of the character set to which the in-char-set should be converted. Note that both of these character sets must be defined in the character set definition table, charsets.txt, located in the MTA table directory. No conversion will be done if the character sets are not properly defined in this file. This is not usually a problem since this file defines several hundred character sets; most of the character sets in use today are defined in this file. See the description of the imsimta chbuild (UNIX and NT) utility for further information on the charsets.txt file.

If all the conditions are met, the MTA will proceed to build the character set mapping and do the conversion. The converted message part will be relabelled with the name of the character set to which it was converted.

The charset-conversion mapping has been extended to provide several additional capabilities:


Example 13–2 Converting ISO-8859-1 to UTF-8 and back

Suppose that ISO-8859-1 is used locally, but this needs to be converted to UTF-8 for use on the Internet. In particular, suppose the connection to the Internet is via the tcp_local and tcp_internal and ims-ms are where internal messages originate and are delivered. The CHARSET-CONVERSION table shown below brings such conversions about. Note that each IN-CHAN entries must be on a single line. The backslash (\) is used to signify this.


CHARSET-CONVERSION

 IN-CHAN=tcp_internal;OUT-CHAN=tcp_local;CONVERT               Yes
 IN-CHAN=tcp_local;OUT-CHAN=tcp_internal;CONVERT               Yes
 IN-CHAN=tcp_local;OUT-CHAN=ims-ms;CONVERT                     Yes
 IN-CHAN=*;OUT-CHAN=*;CONVERT                                  No
 IN-CHAN=tcp_internal;OUT-CHAN=tcp_local;IN-CHARSET=ISO-8859-1 OUT-CHARSET=UTF-8
 IN-CHAN=tcp_local;OUT-CHAN=tcp_internal;IN-CHARSET=UTF-8 OUT-CHARSET=ISO-8859-1
 IN-CHAN=tcp_local;OUT-CHAN=ims-ms;IN-CHARSET=UTF-8       OUT-CHARSET=ISO-8859-1


Example 13–3 Converting EUC-JP to ISO-2022-JP and Back

The CHARSET-CONVERSION table shown below specifies a conversion between local usage of EUC-JP and the ISO 2022 based JP code.


CHARSET-CONVERSION

  IN-CHAN=ims-ms;OUT-CHAN=ims-ms;CONVERT                  No
  IN-CHAN=tcp_internal;OUT-CHAN=ims-ms;CONVERT            No
  IN-CHAN=tcp_internal;OUT-CHAN=tcp_internal;CONVERT      No
  IN-CHAN=tcp_internal;OUT-CHAN=*;CONVERT                 Yes
  IN-CHAN=*;OUT-CHAN=ims-ms;CONVERT                       Yes
  IN-CHAN=*;OUT-CHAN=tcp_internal;CONVERT                 Yes
  IN-CHAN=tcp_internal;OUT-CHAN=*;IN-CHARSET=EUC-JP      OUT-CHARSET=ISO-2022-JP
  IN-CHAN=*;OUT-CHAN=ims-ms;IN-CHARSET=ISO-2022-JP        OUT-CHARSET=EUC-JP
  IN-CHAN=*;OUT-CHAN=tcp_internal;IN-CHARSET=ISO-2022-JP  OUT-CHARSET=EUC-JP

Message Reformatting

As described above, the CHARSET-CONVERSION mapping table is also used to effect the conversion of attachments between MIME and several proprietary mail formats.

The following sections give examples of some of the other sorts of message reformatting which can be affected with the CHARSET-CONVERSION mapping table.

Non-MIME Binary Attachment Conversion

Mail in certain non-standard (non-MIME) formats; for example, mail in certain proprietary formats or mail from the Microsoft Mail (MSMAIL) SMTP gateway is automatically converted into MIME format if CHARSET-CONVERSION is enabled for any of the channels involved in handling the message. If you have a tcp_local channel then it is normally the incoming channel for messages from a Microsoft Mail SMTP gateway, and the following will enable the conversion of messages delivered to your local users:

CHARSET-CONVERSION  

  IN-CHAN=tcp_local;OUT-CHAN=ims-ms;CONVERT         Yes

You may also wish to add entries for channels to other local mail systems. For instance, an entry for the tcp_internal channel:

CHARSET-CONVERSION

  IN-CHAN=tcp_local;OUT-CHAN=l;CONVERT              Yes
  IN-CHAN=tcp_local;OUT-CHAN=tcp_internal;CONVERT   Yes

Alternatively, to cover every channel you can simply specify OUT-CHAN=* instead of OUT-CHAN=ims-ms. However, this may bring about an increase in message processing overhead as all messages coming in the tcp_local channel will now be scrutinized instead of just those bound to specific channels.

More importantly, such indiscriminate conversions might place your system in the dubious and frowned upon position of converting messages—not necessarily your own site’s—which are merely passing through your system, a situation in which you should merely be acting as a transport and not necessarily altering anything beyond the message envelope and related transport information.

To convert MIME into the format Microsoft Mail SMTP gateway understands, use a separate channel in your MTA configuration for the Microsoft Mail SMTP gateway; for example, tcp_msmail, and put the following in the mappings. file:

CHARSET-CONVERSION  

  IN-CHAN=*;OUT-CHAN=tcp_msmail;CONVERT        RFC1154

Relabelling MIME Headers

Some user agents or gateways may emit messages with MIME headers that are less informative than they might be, but that nevertheless contain enough information to construct more precise MIME headers. Although the best solution is to properly configure such user agents or gateways, if they are not under your control, you can instead ask the MTA to try to reconstruct more useful MIME headers.

If the first probe of the CHARSET-CONVERSION mapping table yields a Yes or Always keyword, then the MTA will check for the presence of a conversions file. If a conversions file exists, then the MTA will look in it for an entry with RELABEL=1 and if it finds such an entry, the MTA will then perform any MIME relabelling specified in the entry. See To Control Conversion Processing for information on conversions file entries.

For example, the combination of a CHARSET-CONVERSION table such as:


CHARSET-CONVERSION  

  IN-CHAN=tcp_local;OUT-CHAN=tcp_internal;CONVERT            Yes

and MTA conversion file entries of


out-chan=ims-ms; in-type=application; in-subtype=octet-stream; 
  in-parameter-name-0=name; in-parameter-value-0=*.ps; 
  out-type=application; out-subtype=postscript;   
  parameter-copy-0=*; relabel=1 

out-chan=ims-ms; in-type=application; in-subtype=octet-stream; 
  in-parameter-name-0=name; in-parameter-value-0=*.msw; 
  out-type=application; out-subtype=msword; 
     parameter-copy-0=* relabel=1

will result in messages that arrive on the tcp_local channel and are routed to the ims-ms channel, and that arrive originally with MIME labelling of application/octet-stream but have a filename parameter with the extension ps or msw, being relabelled as application/postscript or application/msword, respectively. (Note that this more precise labelling is what the original user agent or gateway should have performed itself.) Such a relabelling can be particularly useful in conjunction with a MIME-CONTENT-TYPES-TO-MR mapping table, used to convert such resulting MIME types back into appropriate MRTYPE tags, which needs precise MIME labelling in order to function optimally; if all content types were left labelled only as application/octet-stream, the MIME-CONTENT-TYPES-TO-MR mapping table could only, at best, unconditionally convert all such to one sort of MRTYPE.

With the above example and MIME-CONTENT-TYPES-TO-MR mapping table entries including

APPLICATION/POSTSCRIPT        PS 
APPLICATION/MSWORD              MW

a labelling coming in as, for example,

Content-type: application/octet-stream; name=stuff.ps

would be relabelled as

Content-type: application/postscript

and then converted into an MRTYPE tag PS to let Message Router know to expect PostScript.

Sometimes it is useful to do relabelling in the opposite sort of direction, “downgrading” specific MIME attachment labelling to application/octet-stream, the label for generic binary data. In particular, “downgrading” specific MIME labelling is often used in conjunction with the convert_octet_stream channel keyword on the mime_to_x400 channel (PMDF-X400) or xapi_local channel (PMDF-MB400) to force all binary MIME attachments to be converted to X.400 bodypart 14 format.

For instance, the combination of a CHARSET-CONVERSION mapping table such as

CHARSET-CONVERSION

    IN-CHAN=*;OUT-CHAN=mime_to_x400*;CONVERT Yes

and PMDF conversions file entries of

out-chan=mime_to_x400*; in-type=application; in-subtype=*;
   out-type=application; out-subtype=octet-stream; relabel=1
 
out-chan=mime_to_x400*; in-type=audio; in-subtype=*; 
   out-type=application; out-subtype=octet-stream; relabel=1 

out-chan=mime_to_x400*; in-type=image; in-subtype=*; 
   out-type=application; out-subtype=octet-stream; relabel=1 

out-chan=mime_to_x400*; in-type=video; in-subtype=*; 
   out-type=application; out-subtype=octet-stream; relabel=1

will result in downgrading various specific MIME attachment labelling to the generic application/octet-stream labelling (so that convert_octet_stream will apply) for all messages going to mime_to_x400* channels.

MacMIME Format Conversions

Macintosh files have two parts, a resource fork that contains Macintosh specific information, and a data fork that contains data usable on other platforms. This introduces an additional complexity when transporting Macintosh files, as there are four different formats in common use for transporting the Macintosh file parts. Three of the formats, Applesingle, Binhex, and Macbinary, consist of the Macintosh resource fork and Macintosh data fork encoded together in one piece. The fourth format, Appledouble, is a multipart format with the resource fork and data fork in separate parts. Appledouble is hence the format most likely to be useful on non-Macintosh platforms, as in this case the resource fork part may be ignored and the data fork part is available for use by non-Macintosh applications. But the other formats may be useful when sending specifically to Macintoshes.

The MTA can convert between these various Macintosh formats. The CHARSET-CONVERSION keywords Appledouble, Applesingle, Binhex, or Macbinary tell the MTA to convert other MacMIME structured parts to a MIME structure of multipart/appledouble, application/applefile, application/mac-binhex40, or application/macbinary, respectively. Further, the Binhex or Macbinary keywords also request conversion to the specified format of non-MacMIME format parts that do nevertheless contain X-MAC-TYPE and X-MAC-CREATOR parameters on the MIME Content-type: header. The CHARSET-CONVERSION keyword Block tells the MTA to extract just the data fork from MacMIME format parts, discarding the resource fork; (since this loses information, use of Appledouble instead is generally preferable).

For example, the following CHARSET-CONVERSION table would tell the MTA to convert to Appledouble format when delivering to the VMS MAIL mailbox or a GroupWise postoffice, and to convert to Macbinary format when delivering to the Message Router channel:

CHARSET-CONVERSION
   IN-CHAN=*;OUT-CHAN=l;CONVERT              Appledouble 
   IN-CHAN=*;OUT-CHAN=wpo_local;CONVERT      Appledouble 
   IN-CHAN=*;OUT-CHAN=tcp_internal;CONVERT   Macbinary

The conversion to Appledouble format would only be applied to parts already in one of the MacMIME formats. The conversion to Macbinary format would only be applied to parts already in one of the MacMIME formats, or non-MacMIME parts which included X-MAC-TYPE and X-MAC-CREATOR parameters on the MIME Content-type: header.

When doing conversion to Appledouble or Block format, the MAC-TO-MIME-CONTENT-TYPES mapping table may be used to indicate what specific MIME label to put on the data fork of the Appledouble part, or the Block part, depending on what the Macintosh creator and Macintosh type information in the original Macintosh file were. Probes for this table have the form format|type|creator|filename where format is one of SINGLE, BINHEX or MACBINARY, where type and creator are the Macintosh type and Macintosh creator information in hex, respectively, and where filename is the filename.

For example, to convert to Appledouble when sending to the ims-ms channel and when doing so to use specific MIME labels for any MS Word or PostScript documents converted from MACBINARY or BINHEX parts, appropriate tables might be:


CHARSET-CONVERSION 

  IN-CHAN=*;OUT-CHAN=ims-ms;CONVERT     Appledouble


MAC-TO-MIME-CONTENT-TYPES 

! PostScript 
    MACBINARY|45505346|76677264|*     APPLICATION/POSTSCRIPT$Y 
    BINHEX|45505346|76677264|*        APPLICATION/POSTSCRIPT$Y 
! Microsoft Word 
    MACBINARY|5744424E|4D535744|*     APPLICATION/MSWORD$Y 
    BINHEX|5744424E|4D535744|*        APPLICATION/MSWORD$Y

Note that the template (right hand side) of the mapping entry must have the $Y flag set in order for the specified labelling to be performed. Sample entries for additional types of attachments may be found in the file mac_mappings.sample in the MTA table directory.

If you wish to convert non-MacMIME format parts to Binhex or Macbinary format, such parts need to have X-MAC-TYPE and X-MAC-CREATOR MIME Content-type: parameter values provided. Note that MIME relabelling can be used to force such parameters onto parts that would not otherwise have them.

Service Conversions

The MTA’s conversion service facility may be used to process with site-supplied procedures a message so as to produce a new form of the message. Unlike either the sorts of CHARSET-CONVERSION operations discussed above or the conversion channel, which operate on the content of individual MIME message parts, conversion services operate on entire MIME message parts (MIME headers and content) as well as entire MIME messages. Also, unlike other CHARSET-CONVERSION operations or conversion channel operations, conversion services are expected to do their own MIME disassembly, decoding, re-encoding, and reassembly.

Like other CHARSET-CONVERSION operations, conversion services are enabled through the CHARSET-CONVERSION mapping table. If the first probe of the CHARSET-CONVESION mapping table yields a Yes or Always keyword, then the MTA will check for the presence of an MTA conversions file. If a conversions file exists, then the MTA will look in it for an entry specifying a SERVICE-COMMAND, and if it finds such an entry, execute it. The conversions file entries should have the form:


in-chan=channel-pattern; 
  in-type=type-pattern; in-subtype=subtype-pattern; 
  service-command=command

Of key interest is the command string. This is the command that should be executed to perform a service conversion (for example, invoke a document converter). The command must process an input file containing the message text to be serviced and produce as output a file containing the new message text. On UNIX, the command must exit with a 0 if successful and a non-zero value otherwise.

For instance, the combination of a CHARSET-CONVERSION table such as

CHARSET-CONVERSION

IN-CHAN=bsout_*;OUT-CHAN=*;CONVERT Yes

and an MTA conversions file entry on UNIX of


in-chan=bsout_*; in-type=*; in-subtype=*; 
service-command="/pmdf/bin/compress.sh compress $INPUT_FILE $OUTPUT_FILE"

will result in all messages coming from a BSOUT channel being compressed.

Environment variables are used to pass the names of the input and output files as well as the name of a file containing the list of the message's envelope recipient addresses. The names of these environment variables are:

The values of these three environment variables may be substituted into the command line by using standard command line substitution: that is, preceding the variable's name with a dollar character on UNIX. For example, when INPUT_FILE and OUTPUT_FILE have the values a.in and a.out, then the following declaration on UNIX:


in-chan=bsout_*; in-type=*; in-subtype=*; 
 service-command="/pmdf/bin/convert.sh $INPUT_FILE $OUTPUT_FILE"

executes the command

/pmdf/bin/convert.sh a.in a.out