The MTA has powerful facilities for parsing and decoding single and multipart messages formatted using the MIME Internet messaging format. Additionally, these facilities can convert messages with other formats to MIME, for example, text parts with BINHEX or UUENCODE data, the RFC 1154 format, and many other proprietary formats. The mtaDecodeMessage() routine provides access to these facilities, parsing either a queued message or a message from an arbitrary source such as a disk file or a data stream.
There are two usage modes for mtaDecodeMessage(). In the first mode, messages are simply parsed, any encoded content decoded, and each resulting, atomic message part presented to an inspection routine. This mode of usage is primarily of use to channels that interface the MTA to non-Internet mail systems such as SMS and X.400. The second mode of operation allows the message to be rewritten after inspection by an output routine. The output destination for this rewriting may be either the MTA channel queues, or an arbitrary destination via a caller-supplied output routine.
During the inspection process in this second usage mode, individual, atomic message parts may be discarded or replaced with text. This operational mode is primarily of use to intermediate processing channels that need to scan message content or perform content conversions, for example, virus scanners and encryption software.
Example 5–1 illustrates the first usage mode, while Example 5–2 illustrates the second.
Key to either usage mode for mtaDecodeMessage() is the inspection routine, pointed to with the inspect argument. The mtaDecodeMessage() routine presents each atomic message part to the inspection routine one line at a time. The presentation begins with the part’s header lines. Once all of the header lines have been presented, the lines of content are presented next. The following points should also be noted:
Message parts need not have any content. A common example is a single part message with no content for which the sender used the Subject: header line to express their message.
In the case of a non-multipart message, the message has a single part. The header for this sole part is the header for the message itself. As noted previously, there may or may not be any content to this single part.
In the case of a multipart message, individual parts need not have a part header. In such cases, MIME defaults apply and imply that the content is text/plain using the US-ASCII character set.
Regardless of the value of the Content-transfer-encoding: header line, the content presented will no longer be encoded.
In the case of a multipart message, the outermost header is not presented. However, it may be inspected by means of an output routine. For a discussion of the output routine, see Output Routine that follows.
The following code fragment shows the required syntax of an inspection routine:
int inspection_routine(void *ctx, mta_decode_t *dctx, int data_type, const char *data, size_t data_len); |
The following table lists each of the inspection routine’s arguments, and gives a description of each.
When an output routine is not used, the inspection routine can detect the transition from one message part to another by observing the part number on each call. The part number is obtained by calling mtaDecodeMessageInfoString() with an item value of MTA_DECODE_PART_NUMBER.
When the optional output routine (pointed to by the output argument) is used, an additional data type, MTA_DATA_NONE, is presented to the inspection routine. It is presented to the inspection routine after the part’s header and entire content have been presented. However, no data is actually presented for the MTA_DATA_NONE type. As such, this data type merely serves to (1) let the inspection routine know that the entire part has now been presented, and (2) allows the inspection routine a final chance to delete the part from the data being output to the output routine. For example, it allows a virus scanner to be activated and a judgment passed. Based upon the result of the virus scan, the part can then either be copied to the output or not.
If it is not copying the part to the output, the inspection routine must call mtaDecodeMessagePartDelete(). That call will either delete the part entirely, or optionally replace it with caller-supplied content. Calling mtaDecodeMessagePartCopy() makes the copy operation explicit; if neither routine is called, then the part will be implicitly copied to the message being output.
When using an output routine, the inspection routine may call mtaDecodeMessagePartDelete() or mtaDecodeMessagePartCopy() at any time. It is not necessary to wait until the inspection routine is called with a data type of MTA_DATA_NONE. For instance, a virus scanner may want to discard a part when it sees that the part’s content type indicates an executable program. However, once either of these routines is called, the inspection routine will not be called any further for that message part.
The message to be decoded is supplied by either a dequeue context or a caller-supplied input routine. When using a dequeue context, observe the following points:
Specify MTA_DECODE_DQ for the input_type call argument.
Pass the dequeue context from mtaDequeueStart() as the input argument.
The recipient list of the message being dequeued must have already been read by mtaDequeueRecipientNext() before calling mtaDecodeMessage().
mtaDequeueMessageFinish() must not yet have been called for the dequeue context.
After using a dequeue context with mtaDecodeMessage(), no further calls to mtaDequeueRecipientNext() can be made.
Calls to mtaDequeueLineNext() can only be performed after a call to mtaDequeueRewind().
When using a caller-supplied input routine to supply the message to be decoded, specify MTA_DECODE_PROC for the input_type argument, and pass the address of the input routine as the input argument.
The following code fragment shows the syntax of a caller-supplied input routine:
int input_routine(void * ctx, const char **line, size_t * line_len); |
The following table lists the arguments for a caller-supplied input routine, and gives a description of each.
Arguments |
Description |
---|---|
ctx |
The caller-supplied private context. |
line |
A pointer to the start of the next line or section of data to return. The line or data does not need to be NULL terminated. |
line_len |
The length in bytes of the line or block of data being returned. A zero length signifies zero bytes of data. That is, a zero length does not cause mtaMessageDecode() to automatically determine the length by searching for a NULL terminator. |
On each successful call, the input routine should return a status code of 0 (MTA_OK). When there is no more message data to provide, then the input routine should return MTA_EOF. The call that returns the last byte of data should return 0; it is the subsequent call that must return MTA_EOF. In the event of an error, the input routine should return a non-zero status code other than MTA_EOF, such as MTA_NO. This will terminate the message parsing process and mtaDecodeMessage() will return an error.
By default, each block of data must be a single line of the message. This corresponds to the MTA_TERM_NONE item code. If the MTA_TERM_CR, MTA_TERM_CRLF, MTA_TERM_LF, or MTA_TERM_LFCR item code is specified, then the block of data need not correspond to a single, complete line of message data It may be a portion of a line, multiple lines, or even the entire message. See Item Codes for information about mtaDecodeMessage() item codes.
The parsed message may be output either as a message enqueue or written to an arbitrary destination via a caller-supplied output routine. When using a message enqueue context, observe the following points:
Specify MTA_DECODE_NQ for the output_type call argument.
Pass the enqueue context from mtaEnqueueStart() as the output.
Specification of the message’s recipient list must have already been completed with mtaEnqueueTo() before calling mtaDecodeMessage().
mtaEnqueueFinish() must not yet have been called for the enqueue context.
After the call to mtaDecodeMessage() has completed successfully, complete the message enqueue with mtaEnqueueFinish().
In the event of an error, the message submission should be cancelled, with mtaEnqueueFinish().
mtaDecodeMessage() will write the entire message header and content. There is no need for the caller to write anything to the message’s header or content.
To use a caller-supplied output routine, specify the MTA_DECODE_PROC for the output_type call argument, and pass the address of the output routine as the output argument.
This code fragment shows the syntax of a caller-supplied output routine.
int output_routine(void *ctx, mta_decode_t *dctx, const char **line, size_t *line_len); |
The following table lists the arguments for a caller-supplied output routine, and gives a description of each.
Arguments |
Description |
---|---|
ctx |
The caller-supplied private context passed as ctx to mtaDecodeMessage(). |
dctx |
A decode context created by mtaDecodeMessage(). This decode context should be used with calls to the other decode routines requiring a decode context. This context is automatically disposed of by mtaDecodeMessage(). |
line |
Pointer to a line of the message to output. This line is not NULL terminated. The line will also lack any carriage return or line feed record terminators. |
line_len |
The length in bytes of the message line to output. A length of 0 indicates a blank line. The maximum line length presented will be BIGALFA_SIZE bytes (1024 bytes). |
Each line passed to the output routine represents a complete line of the message to be output. The output routine must add to the line any line terminators required by the output destination (for example, carriage return, line feed pairs if transmitting over the SMTP protocol, or line feed terminators if writing to a UNIX® text file). Supplying a value of zero for the output_type call argument, causes the output argument to be ignored. In this case no output routine will be used.
When mtaDecodeMessage() calls either a caller-supplied inspection or output routine, it passes to those routines a decode context. Through various SDK routine calls, this decode context may be queried to obtain information about the message part currently being processed.
The following table lists the informational message codes that can be obtained about a message part being processed, and gives a description of each, including the SDK routine used to obtain it.
Message Code |
Description |
---|---|
MTA_DECODE_CCHARSET |
The character set specified with the CHARSET parameter of the part’s Content-type: header line. If the part lacks a CHARSET specification, then the value us-ascii will be returned. Obtain with mtaDecodeMessageInfoString(). |
MTA_DECODE_CDISP |
Value of the Content-disposition: header line, less any optional parameters. Will be a zero length string if the part lacks a Content-disposition: header line. Obtain with mtaDecodeMessageInfoString(). |
MTA_DECODE_CDISP_PARAMS |
Parameter list to the Content-disposition: header line, if any. The parsed list is returned as a pointer to an option context. For further information, see mtaDecodeMessageInfoParams(). |
MTA_DECODE_CSUBTYPE |
The content subtype specified with the part’s Content-type: header line (for example, plain for text/plain, gif for image/gif). Defaults to plain when the part lacks a Content-type: header line. Obtain with mtaDecodeMessageInfoString(). |
MTA_DECODE_CTYPE |
The major content type specified with the part’s Content-type: header line (for example, text for text/plain, image for image/gif). Defaults to text when the part lacks a Content-type: header line. Obtain with mtaDecodeMessageInfoString(). |
MTA_DECODE_CTYPE_PARAMS |
Parameter list to the Content-type: header line, if any. The parsed list is returned as a pointer to an option context. For further information, see mtaDecodeMessageInfoParams(). |
MTA_DECODE_DTYPE |
Data type associated with this part. Obtain with mtaDecodeMessageInfoInt(). |
MTA_DECODE_PART_NUMBER |
Sequential part number for the current part. The first message part is part 0, the second part is 1, the third part is 2, and so on. Obtain with mtaDecodeMessageInfoInt(). |
The table that follows lists the item codes for the item_code argument passed to mtaDecodeMessage(). The list of item codes must be terminated with an item code with a value of 0.
Item Codes |
Additional Arguments |
Description |
---|---|---|
MTA_DECODE_LEVELS_MAX |
max_levels |
Place an upper limit on the depth of nested MIME multiparts that will be parsed. When this limit is reached no further parsing of deeper, nested multiparts is performed and the parts handed over for inspection include as text content these deeper, nested multiparts. By default, no limit is imposed. When dealing with looping notification messages, it is possible for the looping message to become deeply nested. This item code must be followed by one additional call argument whose value is the integer-valued upper limit to impose: max_levels. |
MTA_DECODE_PARTS_MAX |
max_parts |
Place an upper limit on the total number of message parts that will be parsed. When this limit is reached, no further parsing of parts is performed. By default, no limit is imposed. This item code must be followed by one additional call argument whose value is the integer-valued part limit to impose: max_parts. |
MTA_DECODE_THRURMAN |
None |
When specified, the MIME parser will first translate non-MIME formatted data to MIME. By default this translation is not performed. |
MTA_ITEM_LIST |
mta_item_list_t *item_list |
Specify a pointer to an item list array. The item list array must be terminated with a final array entry with an item code value of 0. For further information on item lists, see Item Codes and Item Lists. |
MTA_TERM_CR |
None |
Data supplied by the input routine, pointed to by the input argument, uses a single byte carriage return terminator to terminate each line of message data. This option is ignored when input_type has the value MTA_DECODE_DQ. |
MTA_TERM_CRLF |
None |
Data supplied by the input routine, pointed to by the input argument, uses a two byte carriage-return line-feed terminator to terminate each line of message data. This option is ignored when input_type has the value MTA_DECODE_DQ. |
MTA_TERM_LF |
None |
Data supplied by the input routine, pointed to by the input argument, uses a single byte line-feed terminator to terminate each line of message data. This option is ignored when input_type has the value MTA_DECODE_DQ. |
MTA_TERM_LFCR |
None |
Data supplied by the input routine, pointed to by the input argument, uses a two byte line-feed carriage-return terminator to terminate each line of message data. This option is ignored when input_type has the value MTA_DECODE_DQ. |
MTA_TERM_NONE |
None |
Data supplied by the input routine, pointed to by the input argument, uses no line terminators. Each call to the input routine returns a single, complete line of message data. This option is ignored when input_type has the value MTA_DECODE_DQ. |