Sun Studio 12: C User's Guide

6.7.5 C Language Features

To give even more flexibility to the programmer in an Asian-language environment, ISO C provides wide character constants and wide string literals. These have the same form as their non-wide versions, except that they are immediately prefixed by the letter L:

Multibyte characters are valid in both the regular and wide versions. The sequence of bytes necessary to produce the ideogram¥ is encoding-specific, but if it consists of more than one byte, the value of the character constant ’¥’ is implementation-defined, just as the value of ’ab’ is implementation-defined. Except for escape sequences, a regular string literal contains exactly the bytes specified between the quotes, including the bytes of each specified multibyte character.

When the compilation system encounters a wide character constant or wide string literal, each multibyte character is converted into a wide character, as if by calling the mbtowc() function. Thus, the type of L¥’ is wchar_t; the type of abc¥xyz is array of wchar_t with length eight. Just as with regular string literals, each wide string literal has an extra zero-valued element appended, but in these cases, it is a wchar_t with value zero.

Just as regular string literals can be used as a shorthand method for character array initialization, wide string literals can be used to initialize wchar_t arrays:


wchar_t *wp = L"a¥z";
wchar_t x[] = L"a¥z";
wchar_t y[] = {L’a’, L’¥’, L’z’, 0};
wchar_t z[] = {’a’, L’¥’, ’z’, ’\0’};

In the above example, the three arrays x, y, and z, and the array pointed to by wp, have the same length. All are initialized with identical values.

Finally, adjacent wide string literals are concatenated, just as with regular string literals. However, with the 1990 ISO/IEC C standard, adjacent regular and wide string literals produce undefined behavior. Also, the 1990 ISO/IEC C standard specifies that a compiler is not required to produce an error if it does not accept such concatenations.