Unicode Compliance Standards
The Unicode Standard is the universal character-encoding scheme for written characters and text. It defines a consistent way of way of encoding multilingual text that enables the exchange of text data internationally and creates the foundation for global software.
Facts about Unicode:
Unicode is a very large character set containing the characters of virtually every written language.
Unicode uses two bytes per character.
Up to 64,000 characters can be supported using two bytes. Unicode also has a mechanism called "surrogates," which uses pairs of two bytes to describe an additional one million characters.
0x00 is a valid byte in a character.
For example, the character "A" is described as 0x00 0x41, which means that normal string functions, such as strlen() and strcpy, do not work with Unicode data.
Do not use the data type char. Instead, use JCHAR for Unicode characters and ZCHAR for non-Unicode characters. Use ZCHAR instead of char in a code that needs to interface with non-Unicode APIs.
Old Syntax No Longer Available |
New Syntax Non-Unicode |
New Syntax Unicode |
---|---|---|
Char |
ZCHAR |
JCHAR |
char *, PSTR |
ZCHAR*, PZSTR |
JCHAR*, PJSTR |
'A' |
_Z('A') |
_J('A') |
"string" |
_Z("string") |
_J("string") |