International Language Environments Guide for Oracle® Solaris 11.2

Exit Print View

Updated: July 2014
 
 

International Components for Unicode

Oracle Solaris 11 adds the International Components for Unicode (ICU) C/C++ libraries to the available interfaces. ICU is a mature, widely used set of libraries providing Unicode and globalization support for software applications. ICU is portable and gives applications the same results on all platforms and between C/C++ and Java software.

Some of the services provided by ICU include:

  • Code Page Conversion – Convert text data to or from Unicode and nearly any other character set or encoding.

  • Collation – Compare strings according to the conventions and standards of a particular language, region, or country.

  • Formatting – Format numbers, dates, times, and currency amounts according to chosen locale.

  • Time Calculations – Multiple types of calendars and a thorough set of timezone calculation APIs are provided.

  • Unicode Support – ICU closely tracks the Unicode standard, providing easy access to all of the many Unicode character properties, Unicode normalization, case folding, and other fundamental operations as specified by the Unicode Standard.

  • Regular Expression – ICU regular expressions fully support Unicode while providing very competitive performance.

  • Bidirectional text (Bidi) – Support for handling text containing a mixture of left-to-right and right-to-left data.

  • Text Boundaries – Locate the positions of words, sentences, and paragraphs within a range of text, or identify locations that would be suitable for line wrapping when displaying the text.

ICU on Oracle Solaris 11 is split into two packages: library/icu contains just the libraries, while developer/icu delivers header files and several utilities like uconv(1).

For more information, see the project's web site at http://site.icu-project.org. The libicui18n(3LIB), libicuio(3LIB), libicudata(3LIB), libicule(3LIB), libiculx(3LIB), libicutu(3LIB), and libicuuc(3LIB) man pages document how to use the libraries in Oracle Solaris.

uconv Utility

In addition to iconv(1), the uconv(1) command that is a part of the International Components for Unicode (ICU) toolset can also be used to convert text from one encoding to another. uconv supports 229 encodings along with more than 1000 aliases.

The tool is a part of the developer/icu package that is not installed by default. To install it, issue the following command:

# pkg install developer/icu

To convert a text in the cp-1252 encoding to UTF-8, you would type:

$ uconv -f cp1252 -t UTF-8 -o file_in_utf8.txt file_in_cp1252_encoding.txt

Another feature of uconv is transliteration - conversion of letters from one script to another without translating the underlying words. The following example converts a piece of Greek text to Latin characters:

$ echo “??????????????”| uconv -x Greek-Latin -f utf-8 -t utf-8
Solaris

For more information about this tool's features, see the uconv(1) man page.