JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
International Language Environments Guide for Oracle Solaris 11.1     Oracle Solaris 11.1 Information Library
PDF
search filter icon
search icon

Document Information

Preface

1.  Introduction

2.  Unicode and UTF-8 Locale Support

3.  Working with Languages and Locales

4.  Desktop Keyboard Preferences and Input Methods

5.  Configuring Fonts

6.  Advanced Topics

Code Set Conversion

iconv Utility

International Components for Unicode

uconv Utility

File Examiner (fsexam)

Auto Encoding Finder (auto_ef)

Internationalized Domain Name Support

GNU IDN Library

JPRS idnkit-2 Library

Printing Enhancement

mp utility

Interoperability with Other Platforms

NFS Server Considerations

File System Considerations

Archives Containing Non-ASCII Filenames

Creating a Custom Locale

Creating a New Locale Based on a System Locale

How to Create a Custom Locale

Creating a Locale From Scratch

A.  Available Locales

Index

Code Set Conversion

Support for code set conversion, or character set (charset) conversion, is an essential part of the operating system, as most of the applications rely on this capability to function properly.

The current release of Oracle Solaris also includes the International Components for Unicode (ICU), a widely used library and tools for Unicode support, software internationalization and software globalization.

Oracle Solaris 11 includes various tools and libraries for code set conversion. The core code set conversion utility, iconv, is built around the iconv library in Oracle Solaris libc.

iconv Utility

The iconv(1) command-line utility converts characters or sequences of characters from one code set to another. It supports a wide range of code sets. Because code set names often differ among platforms, many of the code sets are supported under multiple names thanks to an aliasing mechanism in iconv. Run the following command to obtain the list of code sets currently available in a system:

$ /usr/bin/iconv -l

Because multiple packages have iconv modules, you can extend the default list by installing additional packages. The default installation includes the system/library/iconv/utf-8 package, which covers the basic set of iconv modules for conversions among UTF-8 and other Unicode code sets and selected other code sets. Other packages are available in the System/Internationalization category in the Package Manager, or by using the system/library/iconv/* name pattern for installation with the pkg(1) command.

The iconv —f option defines the source code set and the -t option defines the target code set. You can use iconv to convert a file, or standard input, to standard output as follows:

$ /usr/bin/iconv -f eucJP -t UTF-8 file.txt

This example would convert file.txt filename from the eucJP code set (Extended UNIX Code Packed Format for Japanese) and write the result in UTF-8 to standard output.

In Oracle Solaris 11, iconv has been extended to include flags that modify the behavior of the conversion in these special situations:

Flags like //ILLEGAL_DISCARD, //NON_IDENTICAL_DISCARD, //IGNORE and //TRANSLIT can also be used at the command line. For more information, see the iconv_open(3C) man page.


Note - Some of the iconv modules in Oracle Solaris might implement only a subset of the flags described in the iconv_open(3C) man page.


For more information on iconv, see the iconv(1), iconv(3C), iconv_open(3C), and related man pages.

International Components for Unicode

Oracle Solaris 11 adds the International Components for Unicode (ICU) C/C++ libraries to the available interfaces. ICU is a mature, widely used set of libraries providing Unicode and globalization support for software applications. ICU is portable and gives applications the same results on all platforms and between C/C++ and Java software.

Some of the services provided by ICU include:

ICU on Oracle Solaris 11 is split into two packages: library/icu contains just the libraries, while developer/icu delivers header files and several utilities like uconv(1).

For more information, see the project's web site at http://site.icu-project.org. The libicui18n(3LIB), libicuio(3LIB), libicudata(3LIB), libicule(3LIB), libiculx(3LIB), libicutu(3LIB), and libicuuc(3LIB) man pages document how to use the libraries in Oracle Solaris.

uconv Utility

In addition to iconv(1), the uconv(1) command that is a part of the International Components for Unicode (ICU) toolset can also be used to convert text from one encoding to another. uconv supports 229 encodings along with more than 1000 aliases.

The tool is a part of the developer/icu package that is not installed by default. To install it, issue the following command:

# pkg install developer/icu

To convert a text in the cp-1252 encoding to UTF-8, you would type:

$ uconv -f cp1252 -t UTF-8 -o file_in_utf8.txt file_in_cp1252_encoding.txt

Another feature of uconv is transliteration - conversion of letters from one script to another without translating the underlying words. The following example converts a piece of Greek text to Latin characters:

$ echo “Σολαρις”| uconv -x Greek-Latin -f utf-8 -t utf-8
Solaris

For more information about this tool's features, see the uconv(1) man page.

File Examiner (fsexam)

The File Encoding Examiner fsexam utility enables you to convert the name of a file, or the contents of a plain text file, from a legacy character encoding to UTF-8 encoding. The fsexam utility includes the following new features:

To add fsexam to your system install the storage/fsexam package. For more information, see the fsexam(1) and fsexam(4) man pages.

Auto Encoding Finder (auto_ef)

Oracle Solaris includes auto_ef(1), a command-line utility to identify the encoding of a file. auto_ef judges the encoding by using the iconv code conversion, determining whether a certain code conversion was successful with the file. It also performs frequency analysis on the character sequences that appear in the file. For example,

$ auto_ef test_file
eucJP

With the -a option, it displays all possible encodings for the given file:

$ auto_ef -a test_file
eucJP           0.89
zh_CN.euc       0.40
ko_KR.euc       0.01

To add auto_ef to your system install the text/auto_ef package. For more information, see the auto_ef(1) man page.