If the MTA probes and finds that the message is to be reformatted, it will proceed to check each part of the message. Any text parts are found and their character set parameters are used to generate the second probe. Only when the MTA has checked and found that conversions may be needed does it ever perform the second probe. The input string in this second case looks like this:
IN-CHAN=in-channel;OUT-CHAN=out-channel;IN-CHARSET=in-char-set
The in-channel and out-channel are the same as before, and the in-char-set is the name of the character set associated with the particular part in question. If no match occurs for this second probe, no character set conversion is performed (although message reformatting, for example, changes to MIME structure, may be performed in accordance with the keyword matched on the first probe). If a match does occur it should produce a string of the form:
OUT-CHARSET=out-char-set
Here out-char-set specifies the name of the character set to which the in-char-set should be converted. Note that both of these character sets must be defined in the character set definition table, charsets.txt, located in the MTA table directory. No conversion will be done if the character sets are not properly defined in this file. This is not usually a problem since this file defines several hundred character sets; most of the character sets in use today are defined in this file. See the description of the imsimta chbuild (UNIX and NT) utility for further information on the charsets.txt file.
If all the conditions are met, the MTA will proceed to build the character set mapping and do the conversion. The converted message part will be relabelled with the name of the character set to which it was converted.
The charset-conversion mapping has been extended to provide several additional capabilities:
A IN-CHARSET option can be specified in the output template of a mapping entry. If present this overrides the charset specified in the encoded-word.
A RELABEL-ONLY option that accepts an integer 0 or 1 can be specified. If this option has a value of 1 the OUT-CHARSET simply replaces the IN-CHARSET; no relabelling is done.
If the IN-CHARSET option is used to set the input charset to * the charset will be “sniffed” to determine an appropriate label.
Suppose that ISO-8859-1 is used locally, but this needs to be converted to UTF-8 for use on the Internet. In particular, suppose the connection to the Internet is via the tcp_local and tcp_internal and ims-ms are where internal messages originate and are delivered. The CHARSET-CONVERSION table shown below brings such conversions about. Note that each IN-CHAN entries must be on a single line. The backslash (\) is used to signify this.
CHARSET-CONVERSION IN-CHAN=tcp_internal;OUT-CHAN=tcp_local;CONVERT Yes IN-CHAN=tcp_local;OUT-CHAN=tcp_internal;CONVERT Yes IN-CHAN=tcp_local;OUT-CHAN=ims-ms;CONVERT Yes IN-CHAN=*;OUT-CHAN=*;CONVERT No IN-CHAN=tcp_internal;OUT-CHAN=tcp_local;IN-CHARSET=ISO-8859-1 OUT-CHARSET=UTF-8 IN-CHAN=tcp_local;OUT-CHAN=tcp_internal;IN-CHARSET=UTF-8 OUT-CHARSET=ISO-8859-1 IN-CHAN=tcp_local;OUT-CHAN=ims-ms;IN-CHARSET=UTF-8 OUT-CHARSET=ISO-8859-1 |
The CHARSET-CONVERSION table shown below specifies a conversion between local usage of EUC-JP and the ISO 2022 based JP code.
CHARSET-CONVERSION IN-CHAN=ims-ms;OUT-CHAN=ims-ms;CONVERT No IN-CHAN=tcp_internal;OUT-CHAN=ims-ms;CONVERT No IN-CHAN=tcp_internal;OUT-CHAN=tcp_internal;CONVERT No IN-CHAN=tcp_internal;OUT-CHAN=*;CONVERT Yes IN-CHAN=*;OUT-CHAN=ims-ms;CONVERT Yes IN-CHAN=*;OUT-CHAN=tcp_internal;CONVERT Yes IN-CHAN=tcp_internal;OUT-CHAN=*;IN-CHARSET=EUC-JP OUT-CHARSET=ISO-2022-JP IN-CHAN=*;OUT-CHAN=ims-ms;IN-CHARSET=ISO-2022-JP OUT-CHARSET=EUC-JP IN-CHAN=*;OUT-CHAN=tcp_internal;IN-CHARSET=ISO-2022-JP OUT-CHARSET=EUC-JP |