B Unicode Character Code Assignments
This appendix provides an introduction to Unicode character assignments. This appendix contains the following topics:
B.1 Unicode Code Ranges
Table B-1 contains code ranges that have been allocated in Unicode for UTF-16 character codes.
Table B-1 Unicode Character Code Ranges for UTF-16 Character Codes
| Types of Characters | First 16 Bits | Second 16 Bits | 
|---|---|---|
| 
                                  ASCII  | 
                              
                                  0000-007F  | 
                              
                                  -  | 
                           
| 
                                  European (except ASCII), Arabic, Hebrew  | 
                              
                                  0080-07FF  | 
                              
                                  -  | 
                           
| 
                                  Iindic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean  | 
                              
                                  0800-0FFF 1000 - CFFF D000 - D7FF F900 - FFFF  | 
                              
                                  -  | 
                           
| 
                                  Private Use Area #1  | 
                              
                                  E000 - EFFF F000 - F8FF  | 
                              
                                  -  | 
                           
| 
                                  Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols  | 
                              
                                  D800 - D8BF D8CO - DABF DAC0 - DB7F  | 
                              
                                  DC00 - DFFF DC00 - DFFF DC00 - DFFF  | 
                           
| 
                                  Private Use Area #2  | 
                              
                                  DB80 - DBBF DBC0 - DBFF  | 
                              
                                  DC00 - DFFF DC00 - DFFF  | 
                           
Table B-2 contains code ranges that have been allocated in Unicode for UTF-8 character codes.
Table B-2 Unicode Character Code Ranges for UTF-8 Character Codes
| Types of Characters | First Byte | Second Byte | Third Byte | Fourth Byte | 
|---|---|---|---|---|
| 
                                  ASCII  | 
                              
                                  00 - 7F  | 
                              
                                  -  | 
                              
                                  -  | 
                              
                                  -  | 
                           
| 
                                  European (except ASCII), Arabic, Hebrew  | 
                              
                                  C2 - DF  | 
                              
                                  80 - BF  | 
                              
                                  -  | 
                              
                                  -  | 
                           
| 
                                  Indic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean  | 
                              
                                  E0 E1 - EC ED EF  | 
                              
                                  A0 - BF 80 - BF 80 - 9F A4 - BF  | 
                              
                                  80 - BF 80 - BF 80 - BF 80 - BF  | 
                              
                                  -  | 
                           
| 
                                  Private Use Area #1  | 
                              
                                  EE EF  | 
                              
                                  80 - BF 80 - A3  | 
                              
                                  80 - BF 80 - BF  | 
                              
                                  -  | 
                           
| 
                                  Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols  | 
                              
                                  F0 F1 - F2 F3  | 
                              
                                  90 - BF 80 - BF 80 - AF  | 
                              
                                  80 - BF 80 - BF 80 - BF  | 
                              
                                  80 - BF 80 - BF 80 - BF  | 
                           
| 
                                  Private Use Area #2  | 
                              
                                  F3 F4  | 
                              
                                  B0 - BF 80 - 8F  | 
                              
                                  80 - BF 80 - BF  | 
                              
                                  80 - BF 80 - BF  | 
                           
Note:
Blank spaces represent nonapplicable code assignments. Character codes are shown in hexadecimal representation.
B.2 UTF-16 Encoding
As shown in Table B-1, UTF-16 character codes for some characters (Additional Chinese/Japanese/Korean characters and Private Use Area #2) are represented in two units of 16-bits. These are supplementary characters. A supplementary character consists of two 16-bit values. The first 16-bit value is encoded in the range from 0xD800 to 0xDBFF. The second 16-bit value is encoded in the range from 0xDC00 to 0xDFFF. With supplementary characters, UTF-16 character codes can represent more than one million characters. Without supplementary characters, only 65,536 characters can be represented. The AL16UTF16 character set in Oracle Database supports supplementary characters.
See Also:
B.3 UTF-8 Encoding
The UTF-8 character codes in Table B-2 show that the following conditions are true:
- 
                           
ASCII characters use 1 byte
 - 
                           
European (except ASCII), Arabic, and Hebrew characters require 2 bytes
 - 
                           
Indic, Thai, Chinese, Japanese, and Korean characters as well as certain symbols such as the euro symbol require 3 bytes
 - 
                           
Characters in the Private Use Area #1 require 3 bytes
 - 
                           
Supplementary characters require 4 bytes
 - 
                           
Characters in the Private Use Area #2 require 4 bytes
 
In Oracle Database, the AL32UTF8 character set supports 1-byte, 2-byte, 3-byte, and 4-byte values. In Oracle Database, the UTF8 character set supports 1-byte, 2-byte, and 3-byte values, but not 4-byte values.