Unicode is the universal character encoding standard used for representation of text for computer processing. Unicode provides a consistent way of encoding multilingual text and facilitates exchanging of international text files.
The standard for coding multilingual text is ISO/IEC 10646. Although the ISO/IEC 10646 and Unicode standards contain all the same characters and encoding points, the Unicode standard provides additional information about the characters and their use.
Oracle Solaris 11 provides system-level support for the Unicode Standard Version 6.0 and ISO/IEC 10646:2011.
Each Unicode character is mapped to a code point, which is an integer between 0 and 1,114,111. Unicode code points are referred to using notation in the form U+nnnn, where nnnn is the code point's hexadecimal number, or by a text string describing the code point. For example, the lower case letter “a” can be represented by U+0061or the text string "LATIN SMALL LETTER A".
Code points can be encoded using different character encoding schemes. In Oracle Solaris Unicode locales, the UTF-8 form is used. UTF-8 is a variable-length encoding form of Unicode that preserves ASCII character code values transparently (see UTF-8 Overview).
For more details on the Unicode Standard and ISO/IEC 10646 and their various representative forms, refer to the following sources:
The Unicode Standard, Version 6.0 from the Unicode Consortium
ISO/IEC 10646:2011, Information Technology-Universal Multiple-Octet Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane