The Java EE 5 Tutorial

Character Sets

A character set is a set of textual and graphic symbols, each of which is mapped to a set of nonnegative integers.

The first character set used in computing was US-ASCII. It is limited in that it can represent only American English. US-ASCII contains uppercase and lowercase Latin alphabets, numerals, punctuation, a set of control codes, and a few miscellaneous symbols.

Unicode defines a standardized, universal character set that can be extended to accommodate additions. When the Java program source file encoding doesn’t support Unicode, you can represent Unicode characters as escape sequences by using the notation \uXXXX, where XXXX is the character’s 16-bit representation in hexadecimal. For example, the Spanish version of the Duke’s Bookstore message file uses Unicode for non-ASCII characters:

{"TitleCashier", "Cajero"},
{"TitleBookDescription", "Descripci" + "\u00f3" + "n del
{"Visitor", "El visitante" + "\u00fa" + "mero "},
{"What", "Qu" + "\u00e9" + " libros leemos"},
{"Talk", " describe cómo los componentes de software de web
 pueden transformar la manera en que desarrollamos las
 aplicaciones para la web. Este libro es obligatorio para
 cualquier programador de respeto!"},
{"Start", "Empezar a Comprar"},