Oracle9i Globalization Support Guide
Release 1 (9.0.1)

Part Number A90236-02
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to next page

B
Unicode Character Code Assignments

This appendix offers an introduction to how Unicode assigns characters. This appendix contains:

Unicode Character Code Assignments

Table B-1 contains Unicode details.

Table B-1 Unicode Character Code Assignments  
Characters  UTF-16 Character Codes  UTF-8 Character Codes 
  First 16-bits  Second 16-bits  First Byte  Second Byte  Third Byte  Fourth Byte 

ASCII 

000-007F 

 

00-7F 

 

 

 

European (except ASCII), Arabic, Hebrew, etc. 

0080-07FF 

 

C2-DF 

80-BF 

 

 

Indic, Thai, certain symbols (for example, euro), Chinese, Japanese, Korean, etc. 

0800-0FFF 

 

E0 

A0-BF 

80-BF 

 

1000 - CFFF 

 

E1-EC 

80-BF 

80-BF 

 

D000 - D7FF 

 

ED 

80-9F 

80-0BF 

 

F900-FFFF 

 

EF 

A4-BF 

80-BF 

 

Private Use Area #1 

E000 - EFFF 

 

EE 

80-BF 

80-BF 

 

F000 - F8FF 

 

EF 

80-A3 

80-BF 

 

Additional Chinese/Japanese/Korean characters, historic characters, musical and mathematical symbols, etc. 

D800 - D8BF 

DC00 - DFFF 

F0 

90-BF 

80-BF 

80-BF 

D8C0 - DABF 

DC00 - DFFF 

F1-F2 

80-BF 

80-BF 

80-BF 

DAC0 - DB7F 

DC00 - DFFF 

F3 

80-AF 

80-BF 

80-BF 

Private Use Area #2 

DB80 - DBBF 

DC00 - DFFF 

F3 

B0-BF 

80-BF 

80-BF 

DBC0 - DBFF 

DC00 - DFFF 

F4 

80-8F 

80-BF 

80-BF 


Note:

Blank spaces represent non-applicable code assignments. Character codes in this table are shown in hexadecimal representation 


UTF-16 Encoding

As shown in Table B-1, UTF-16 character codes for some characters (Additional Chinese/Japanese/Korean characters and Private Use Area #2) are represented in two units of 16-bits. These are the surrogate pairs. A surrogate pair consists of two 16-bit values. The first 16-bit value is the high surrogate (the values are from 0xD800 to 0xDBFF). The second 16-bit value is the low surrogate (the values are from 0xDC00 to 0xDFFF). With surrogate pairs, UTF-16 character codes can represent more than one million characters. Without surrogate pairs, only up to 65,536 characters could be represented. Oracle's AL16UTF16 character set supports surrogate pairs.

See Also:

"Surrogate Characters" for further information regarding surrogate pairs 

UTF-8 Encoding

The UTF-8 character codes in Table B-1 show that:

Oracle's AL32UTF8 character set supports 1-byte, 2-byte, 3-byte, and 4-byte values. Oracle's UTF8 character set supports 1-byte, 2-byte, and 3-byte values, but not 4-byte values.


Go to previous page Go to next page
Oracle
Copyright © 1996-2001, Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback