When supporting multibyte languages, it is important to understand the difference between multibyte, wide and Unicode characters, and the impact of these on software development.
In the Solaris operating environment, a multibyte character (or file code) is a sequence of one or more bytes terminated by a null string. Thus, a string may contain characters of different length. On the other hand, a wide character (or process code) is defined as a fixed-size number of bytes. In the Solaris operating environment, a wide character is defined to be four bytes long. The Solaris operating environment supports the Unicode UTF-8 format, a variable-length encoding similar to multibyte encoding
In many cases, there is no need to distinguish double-byte (or three-byte) characters from single-byte characters. It is simpler to convert multibyte strings (file code) to wide-character formats (process code) before manipulating or processing text data.
The following APIs convert multibyte characters:
mbstowcs(): Convert multibyte string to wide-character string
mbstowc(): Convert multibyte to wide-character code
The following wstring(3c) APIs process multibyte characters:
wcscmp(): Compare wide-character strings
wcscpy(): Copy wide-character strings
wcslen(): Get length of wide-character string
wcschr(): Find character in wide-character string
File code is in multibyte format. Process code is in wide-character format. Do not assume particular character encodings of the process code.