UTF-8 is a variable-length encoding form of Unicode. This form is used in Oracle Solaris Unicode locales.
The advantage of this form is that it is backward compatible with the ASCII encoding scheme and avoids the complications of endianness and byte order. Unicode code points are in UTF-8 represented by one to four 8-bit bytes. The following table specifies the bit distribution for UTF-8, showing the ranges of Unicode code points corresponding to one-byte, two-byte, three-byte, and four-byte sequences.
|
For more details about the UTF-8 encoding form, refer to the following sources:
The Unicode Standard, Version 6.0, Chapter 3 (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), Section 3.9 “Unicode Encoding Forms”, pp. 93 - 94