cconv - man pages section 3: Basic Library Functions

Language:

cconv (3C)

Name

cconv - per character sequence based code conversion function

Synopsis

#include <iconv.h>
size_t cconv(cconv_t cd, char *inbuf, size_t *inlen, char *outbuf,
        size_t *outlen);

Description

The cconv() function converts a character sequence from one codeset, in the array specified by inbuf, into a corresponding character sequence in another codeset, in the array specified by outbuf. The codesets are those specified in the cconv_open() call (optionally modified with variants and flag arguments) that returned the conversion descriptor, cd. The inbuf points to a character byte array to the first character in the input array and inlen indicates the number of bytes in the array to be converted. The outbuf points to a character byte array to the first available byte in the output array and outlen indicates the number of the available bytes in the array.

Unlike iconv(3C) that does buffer-based character code conversion, cconv() does per-character sequence-based code conversion which converts a single character sequence that is the first character sequence in the inbuf at a call indicating a character sequence boundary.

A character sequence is one or more character codes that comprises a character. In case of graphic characters, it may be a single character code or multiple character codes such as combining or conjoining character sequences defined in the current code conversion. It also includes designator character sequences for state-dependent encodings that may have one or more character codes in a designator. See the NOTES section below for the definitions of the terms mentioned.

For state-dependent encodings, the conversion descriptor (cd) is placed into its initial shift state by a call for which inbuf is a null pointer. When cconv() is called in this way, and if outbuf is not a null pointer, and outlen has a positive value, cconv() will place, into the outbuf, the byte sequence to change the outbuf to its initial shift state. If the outbuf is not large enough to hold the entire reset sequence, cconv() will fail and set errno to E2BIG. Subsequent calls with inbuf as other than a null pointer cause the conversion to take place from the current state of the conversion descriptor.

If a sequence of input bytes does not form a valid character sequence in the specified codeset, the conversion will fail and set errno to EILSEQ. If the input array ends with an incomplete character or an incomplete shift sequence, conversion will fail and set errno to EINVAL. If the output array is not large enough to hold the entire converted input character sequence, conversion will fail and set errno to E2BIG. The value pointed to by inlen is decremented to reflect the number of bytes still not consumed by the code conversion in the inbuf. The value pointed to by outlen is decremented to reflect the number of bytes still available in outbuf. For state-dependent encodings, the conversion descriptor is updated to reflect the shift state in effect at the end of the successfully converted character sequence. For code conversions supporting combining or conjoining character sequences, the code conversion may consume and save some or all character codes of such a sequence into the conversion descriptor as a new conversion state if the sequence is the last character sequence in the input array and potentially an incomplete sequence.

If cconv() encounters a character sequence in the input array that is legal, but for which an identical character does not exist in the target codeset, cconv() performs an implementation-defined conversion, i.e., non-identical conversion, on this character sequence.

It is possible that cconv() consumes all the bytes from the input array without any failure return and yet produce no converted bytes or less number of characters at the output array if the current code conversion supports conversions of combining or conjoining character sequences and the input array has a potentially incomplete combining or conjoining character sequence. For instance, if the current code conversion supports conversions of two combining character sequences of a base character 'A' followed by a combining mark character acute and the same base character 'A' followed by two combining mark characters acute and dot below and the input array has the base character 'A' and a combining mark character acute, the code conversion will consume the two characters but may not produce any converted bytes at the output array for a possible case that the next call will bring in dot below as the next first character in the input array which may result in a different code conversion bytes. Similar to the reset operation requirement for state-dependent encodings, in the end of the conversion, an additional call of reset operation is thus also required to force the conversion to store any saved bytes in the conversion descriptor into the output array. See the EXAMPLES section below for actual usage.

The default conversion behavior mentioned above can be modified if one or more of the conversion behavior modification flag values are specified and such conversion behavior modifications are supported by the implementation of the corresponding cconv code conversion. For more detail, see cconv_open(3C) and cconvctl(3C).

Return Values

The cconv() function updates the values pointed to by inlen and outlen arguments to reflect the extent of the conversion and returns the number of non-identical conversions performed. If the entire bytes in the input array is converted, the value pointed to by inlen will be 0. If the input conversion is stopped due to any error conditions mentioned above, cconv() returns (size_t)-1 and sets errno to indicate the error.

Errors

The cconv() function will fail if:

EILSEQ: Input conversion stopped due to an input byte that does not belong to the input codeset.
E2BIG: Input conversion stopped due to lack of space in the output array.
EINVAL: Input conversion stopped due to an incomplete character or shift sequence at the end of the input array.

The cconv() function may fail if:

EBADF: The cd argument is not a valid open conversion descriptor.
ENOMEM: Insufficient storage space is available.

Examples

Example 1 Code Conversion from UTF-32 to ISO8859-1

The following example converts the first UTF-32 character sequence in inbuf into an ISO8859-1 character.

#include <stdio.h>
    #include <string.h>
    #include <errno.h>
    #include <iconv.h>

        :

    #define MY_BUFFER_SIZE 24

    uint32_t ib[MY_BUFFER_SIZE];
    char ob[MY_BUFFER_SIZE];
    size_t il;
    size_t ol;
    size_t i;
    size_t ret;
    cconv_t cd;

        :

    /*
     * As an example, initialize the input array, ib[],  with two
     * Unicode characters, 'a' U+0061 and COMBINING TILDE U+0303.
     */
    ib[0] = 0x61;
    ib[1] = 0x303;
    il = sizeof (uint32_t) * 2;
    ol = MY_BUFFER_SIZE;

    cd = cconv_open("ISO8859-1", 0, "UTF-32", 0, 0);
    if (cd == (cconv_t)-1) {
        /* Code conversion not supported? */
        return (-1);
    }

        :

    /* Do the conversion. */
    ret = cconv(cd, ib, &il, ob, &ol);
    if (ret == (size_t)-1) {
        /* Illegal character or bad conversion descriptor? */
        if (errno == EILSEQ || errno == EBADF)
            return (-2);

        /* Output array too small? */
        if (errno == E2BIG)
            return (-3);
    } else {
        for (i = 0; i < MY_BUFFER_SIZE - ol; i++)
            printf("Converted byte value = %d\n", ob[i]);
    }

    /*
     * Make sure to flush any character bytes possibly stored in
     * the conversion descriptor by doing a reset operation.
     */
    ol = MY_BUFFER_SIZE;
    ret = cconv(cd, NULL, &il, ob, &ol);
    if (ret != (size_t)-1 && ob < MY_BUFFER_SIZE)
        for (i = 0; i < MY_BUFFER_SIZE - ol; i++)
            printf("Converted byte value = %d\n", ob[i]);

        :

    (void) cconv_close(cd);

Files

/usr/lib/iconv/*.bt: cconv code conversion binary table files for iconv(1), cconv(3C), and iconv(3C).

Attributes

See attributes(7) for descriptions of the following attributes:

ATTRIBUTE TYPE	ATTRIBUTE VALUE
Interface Stability	Committed
MT-Level	MT-Safe

Notes

A combining character sequence is a sequence of multiple characters starting with a base character followed by one or more of combining mark characters and optionally control characters governing combining behavior. For instance, a base character 'A' followed by an acute combining mark followed by another combining mark, dot above.

A conjoining character sequence is a sequence of multiple characters starting with an initial character followed by one or more of conjoining characters, combining characters, and optionally control characters governing conjoining/combining behavior in some Asian scripts.

A designator sequence for state-dependent encodings is a sequence of one or more of characters that changes the state of the encoding. For instance, in ISO-2022-JP encoding, a designator (also known as an escape sequence) ESC $ B (i.e., 0x1b 0x24 0x42) indicates that the bytes following the designator are Japanese double byte characters in 7-bit encoding.

man pages section 3: Basic Library Functions