EUC is an abbreviation for Extended UNIX® Code. The Oracle Solaris operating system supports non-EUC encodings such as PC-Kanji (better known as Shift_JIS) in Japan, Big5 in Taiwan, and GBK in the People's Republic of China. Because a large part of the computer market demands non-EUC codeset support, the current Oracle Solaris environment provides a solid framework to enable both EUC and non-EUC code set support. This support is called Code Set Independence, or CSI.
The goal of CSI is to remove dependencies on specific code sets or encoding methods from Oracle Solaris operating system libraries and commands. The CSI architecture enables the Oracle Solaris operating system to support any UNIX file system safe encoding. CSI supports a number of new code sets, such as UTF-8, PC-Kanji, and Big5.
Code set independence enables application and platform software developers to keep their code independent of any encoding, such as UTF-8. CSI also provides the ability to adopt any new encoding without having to modify the source code. This architecture approach differs from Java internationalization because applications do not have to be to be UTF-16–dependent.
Many existing internationalized applications (for example, Motif) automatically inherit CSI support from the underlying system. These applications work in the new locales without modification.
CSI is inherently independent from any code sets. However, the following assumptions about file code encodings (code sets) still apply to the current Oracle Solaris system:
NULL byte value (0x00) does not appear as part of multibyte character bytes for support of null-terminated multibyte character strings.
ASCII Slash character byte value (0x2f) does not appear as part of multibyte character bytes for support of the UNIX path names.
This section lists the CSI-enabled commands in the current Oracle Solaris environment. The man page for each command includes an attribute section that indicates whether the command is CSI-enabled.
All commands are in the /usr/bin directory, unless otherwise noted.
/usr/lib/diffh |
cat |
pack |
/usr/sbin/accept |
catman |
paste |
/usr/sbin/reject |
chgrp |
pcat |
/usr/ucb/lpr |
chmod |
pg |
/usr/xpg4/bin/awk |
chown |
printf |
/usr/xpg4/bin/cp |
cmp |
priocntl |
/usr/xpg4/bin/date |
col |
ps |
/usr/xpg4/bin/du |
comm |
pwd |
/usr/xpg4/bin/ed |
compress |
rcp |
/usr/xpg4/bin/edit |
cpio |
red |
/usr/xpg4/bin/egrep |
csh |
remsh |
/usr/xpg4/bin/env |
csplit |
rksh |
/usr/xpg4/bin/ex |
cut |
rsh |
/usr/xpg4/bin/expr |
diff |
rsmdir |
/usr/xpg4/bin/fgrep |
diff3 |
script |
/usr/xpg4/bin/lp |
disable |
sdiff |
/usr/xpg4/bin/ls |
echo |
settime |
/usr/xpg4/bin/more |
expand |
sh |
/usr/xpg4/bin/mv |
file |
split |
/usr/xpg4/bin/nice |
find |
strconf |
/usr/xpg4/bin/nohup |
fold |
strings |
/usr/xpg4/bin/od |
ftp |
sum |
/usr/xpg4/bin/pr |
gencat |
tabs |
/usr/xpg4/bin/rm |
geteopt |
tar |
/usr/xpg4/bin/sed |
getoptcvt |
tee |
/usr/xpg4/bin/sort |
head |
touch |
/usr/xpg4/bin/tail |
join |
tty |
/usr/xpg4/bin/tr |
jsh |
uncompress |
/usr/xpg4/bin/vedit |
kill |
unexpand |
/usr/xpg4/bin/vi |
ksh |
uniq |
/usr/xpg4/bin/view |
lp |
unpack |
acctcom |
man |
wc |
apropos |
mkdir |
whatis |
batch |
msgfmt |
write |
bdiff |
news |
xargs |
cancel |
nroff |
zcat |
Nearly all functions in libc (/usr/lib/libc.so) are CSI-enabled. However, the following functions in libc are not CSI-enabled and therefore are EUC-dependent functions:
csetcol()
csetlen()
csetno()
euccol()
euclen()
eucscol()
getwidth()
wcsetno()
In the current Oracle Solaris environment, libgen /usr/ccs/lib/libgen.a and libcurses /usr/ccs/lib/libcurses.a are internationalized but not CSI-enabled.