International Language Environments Guide

Support for Code Set Independence

EUC is an abbreviation for Extended UNIX® Code. The Oracle Solaris operating system supports non-EUC encodings such as PC-Kanji (better known as Shift_JIS) in Japan, Big5 in Taiwan, and GBK in the People's Republic of China. Because a large part of the computer market demands non-EUC codeset support, the current Oracle Solaris environment provides a solid framework to enable both EUC and non-EUC code set support. This support is called Code Set Independence, or CSI.

The goal of CSI is to remove dependencies on specific code sets or encoding methods from Oracle Solaris operating system libraries and commands. The CSI architecture enables the Oracle Solaris operating system to support any UNIX file system safe encoding. CSI supports a number of new code sets, such as UTF-8, PC-Kanji, and Big5.

CSI Approach

Code set independence enables application and platform software developers to keep their code independent of any encoding, such as UTF-8. CSI also provides the ability to adopt any new encoding without having to modify the source code. This architecture approach differs from Java internationalization because applications do not have to be to be UTF-16–dependent.

Many existing internationalized applications (for example, Motif) automatically inherit CSI support from the underlying system. These applications work in the new locales without modification.

CSI is inherently independent from any code sets. However, the following assumptions about file code encodings (code sets) still apply to the current Oracle Solaris system:

CSI-enabled Commands

This section lists the CSI-enabled commands in the current Oracle Solaris environment. The man page for each command includes an attribute section that indicates whether the command is CSI-enabled.

All commands are in the /usr/bin directory, unless otherwise noted.

/usr/lib/diffh

cat

pack

/usr/sbin/accept

catman

paste

/usr/sbin/reject

chgrp

pcat

/usr/ucb/lpr

chmod

pg

/usr/xpg4/bin/awk

chown

printf

/usr/xpg4/bin/cp

cmp

priocntl

/usr/xpg4/bin/date

col

ps

/usr/xpg4/bin/du

comm

pwd

/usr/xpg4/bin/ed

compress

rcp

/usr/xpg4/bin/edit

cpio

red

/usr/xpg4/bin/egrep

csh

remsh

/usr/xpg4/bin/env

csplit

rksh

/usr/xpg4/bin/ex

cut

rsh

/usr/xpg4/bin/expr

diff

rsmdir

/usr/xpg4/bin/fgrep

diff3

script

/usr/xpg4/bin/lp

disable

sdiff

/usr/xpg4/bin/ls

echo

settime

/usr/xpg4/bin/more

expand

sh

/usr/xpg4/bin/mv

file

split

/usr/xpg4/bin/nice

find

strconf

/usr/xpg4/bin/nohup

fold

strings

/usr/xpg4/bin/od

ftp

sum

/usr/xpg4/bin/pr

gencat

tabs

/usr/xpg4/bin/rm

geteopt

tar

/usr/xpg4/bin/sed

getoptcvt

tee

/usr/xpg4/bin/sort

head

touch

/usr/xpg4/bin/tail

join

tty

/usr/xpg4/bin/tr

jsh

uncompress

/usr/xpg4/bin/vedit

kill

unexpand

/usr/xpg4/bin/vi

ksh

uniq

/usr/xpg4/bin/view

lp

unpack

acctcom

man

wc

apropos

mkdir

whatis

batch

msgfmt

write

bdiff

news

xargs

cancel

nroff

zcat

CSI-enabled Libraries

Nearly all functions in libc (/usr/lib/libc.so) are CSI-enabled. However, the following functions in libc are not CSI-enabled and therefore are EUC-dependent functions:

In the current Oracle Solaris environment, libgen /usr/ccs/lib/libgen.a and libcurses /usr/ccs/lib/libcurses.a are internationalized but not CSI-enabled.