Oracle9i Globalization Support Guide Release 1 (9.0.1) Part Number A90236-02 |
|
This appendix offers an introduction to how Unicode assigns characters. This appendix contains:
Table B-1 contains Unicode details.
As shown in Table B-1, UTF-16 character codes for some characters (Additional Chinese/Japanese/Korean characters and Private Use Area #2) are represented in two units of 16-bits. These are the surrogate pairs. A surrogate pair consists of two 16-bit values. The first 16-bit value is the high surrogate (the values are from 0xD800 to 0xDBFF). The second 16-bit value is the low surrogate (the values are from 0xDC00 to 0xDFFF). With surrogate pairs, UTF-16 character codes can represent more than one million characters. Without surrogate pairs, only up to 65,536 characters could be represented. Oracle's AL16UTF16 character set supports surrogate pairs.
The UTF-8 character codes in Table B-1 show that:
Oracle's AL32UTF8 character set supports 1-byte, 2-byte, 3-byte, and 4-byte values. Oracle's UTF8 character set supports 1-byte, 2-byte, and 3-byte values, but not 4-byte values.
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|