The Java Desktop System is a fully Unicode-enabled, multilingual system which supports languages with Unicode UTF-8 encoding. The Java Desktop System also provides codeset conversion to support legacy language encodings. This chapter describes issues that might arise when you migrate to Unicode multilingual computing.
The migration to Unicode multilingual computing affects a number of methods of importing and exporting data.
If a Windows partition exists on the local hard disk when you install the Java Desktop System, the installer software automatically mounts the partition. The installer software also sets the value of the iocharset option or the nls option in /etc/fstab to the correct value for the UTF-8 and zh_CN.gb18030 locales.
If you mount the partition manually, or if you mount the partition in other locales, you must configure particular mount options to browse multibyte file or directory names correctly. You must set the options as shown in the following table:
Table 12–1 Options to Set When You Mount Partitions in Locales
Locale |
NTFS |
VFAT |
---|---|---|
ja_JP.UTF-8 |
nls=utf8 |
iocharset=utf8,codepage=932 |
ko_KR.UTF-8 |
nls=utf8 |
iocharset=utf8,codepage=949 |
zh_CN.UTF-8 |
nls=utf8 |
iocharset=utf8,codepage=936 |
zh_CN.gb18030 |
nls=gb2312 |
iocharset=gb2312,codepage=936 |
zh_TW.UTF-8 |
nls=utf8 |
iocharset=utf8,codepage=950 See note below. |
zh_HK.UTF-8 |
nls=utf8 |
iocharset=utf8,codepage=950 See note below. |
en_US.UTF-8 |
nls=utf8 |
iocharset=utf8 |
de_DE.UTF-8 |
nls=utf8 |
iocharset=utf8 |
es_ES.UTF-8 |
nls=utf8 |
iocharset=utf8 |
fr_FR.UTF-8 |
nls=utf8 |
iocharset=utf8 |
it_IT.UTF-8 |
nls=utf8 |
iocharset=utf8 |
sv_SE.UTF-8 |
nls=utf8 |
iocharset=utf8 |
If you use the zh_TW.big5 or zh_HK.big5hkscs locales, use big5 instead of utf8 for the nls and iocharset options.
Sample entries for /etc/fstab for the Japanese UTF-8 locale are as follows:
/dev/sda1 |
/windows/C |
ntfs ro,users,gid=users,umask=0002,nls=utf8 0 0 |
/dev/sda2 |
/windows/C |
vfat users,gid=users,umask=0002,iocharset=utf8,codepage=932 |
On Microsoft Windows, floppy disks, Zip drives, and removable hard disks typically use the FAT and VFAT file systems. To browse multibyte file or directory names correctly, you might need to configure the codepage and iocharset mount options for these file system types.
For information about values to use for the codepage and iocharset options, see Table 12–1.
You can mount the following types of Microsoft Windows file system remotely:
File system that is exported from another system with Server Message Block (SMB).
File system that is shared with Common Internet File System (CIFS).
To browse multibyte file or directory names correctly, you might need to configure the codepage and iocharset mount options for these types of remote file system.
For example, if you import from Japanese Windows, a sample /etc/fstab entry is as follows:
server:/data /data smbfs iocharset=utf8,codepage=cp932,username=foo,password=bar |
For information about values to use for the codepage and iocharset options, see Table 12–1. The value for the codepage option must begin with cp. For example, for the Japanese UTF-8 locale, use the value cp932, not 932.
The Java Desktop System can use SMB to remotely access a file system on UNIX and Linux systems. The export server must run SMB or an equivalent application to export the remote file system. The client side can specify file system encoding if the legacy data is stored in legacy encodings. The codeset conversion of the filename is performed automatically.
Microsoft Office
files are encoded in Unicode. StarOffice
applications
can read and write the Unicode-encoded files.
HTML files created in HTML editors such as Mozilla Composer
, or HTML files saved by a web browser, usually
contain a charset encoding tag. After exporting or importing,
you can browse such HTML files with the Mozilla Navigator
web browser, or edit the files with Mozilla Composer
,
according to the encoding tag in the HTML file.
Some HTML files might display incomprehensible characters. This problem is typically due to the following reasons:
The charset encoding tag is incorrect.
The charset encoding tag is missing.
To find the charset encoding tag in the HTML file, perform the following steps:
Open the file in Mozilla
.
Choose View -> Page Info.
The charset information is at the bottom of the General tab, for example: Content-Type text/html; charset=us-ascii
If the string charset=us-ascii does not match the actual encoding of the file, the file might appear as broken. To edit the encoding of the HTML file, perform the following steps:
Open the file in Mozilla Composer
.
Choose File -> Save as Charset.
Choose the correct encoding. Mozilla Compose
automatically converts the encoding and the charset tag as appropriate.
Most email messages are tagged with the MIME charset tag. The email application of the Java Desktop System, Email and Calendar
, accepts MIME charset
tags. You do not need to perform any encoding conversion.
Plain text files do not have a charset tag. If the files are not in UTF-8 encoding, encoding conversion is needed. Use the iconv utility to perform the encoding conversion. For example, to convert a plain text file encoded in Traditional Chinese big5 to UTF-8, execute the following command: iconv -f big5 -t UTF-8 inputfilename > outputfilename
Alternatively, you can use File System Examiner
to perform the encoding conversion. To start File System Examiner
, click Launch, then choose Applications -> Utilities -> File System Examiner.
You can use Text Editor
to read and write
text files with various character encoding text automatically, so that you
do not need to perform a manual conversion. You can also specify an encoding
explicitly when you open or save a file in Text Editor
.
To start Text Editor
, click Launch,
then choose Applications -> Accessories -> Text Editor.
If file names and
directory names that use multibyte characters are not in UTF-8 encoding, encoding
conversion is needed. You can use File System Examiner
to convert file and directory names from legacy character encodings to UTF-8
encoding. To start File System Examiner
, click Launch, then choose Applications -> Utilities -> File System
Examiner. Refer to the online Help for File System Examiner
for more information.
When you use SMB from the file manager to access non-UTF-8 file or directory names on Microsoft Windows, you can access the non-UTF-8 file or directory names without encoding conversion.
The following utilities have been enhanced to convert file and directory names from legacy character encodings to UTF-8 encoding when you extract the files to Java Desktop System file system:
unzip
unzipsfx
funzip
ftp
For more information about the enhancements to these utilities, see the unzip and ftp man pages. In particular, see the information on the CODESET_LIST environment variable.
For applications that are not ready to migrate to Unicode UTF-8, you can create a launcher on a panel to start the application in a non-UTF-8 locale. You can also start command line interface (CLI) applications directly from the command line.
Perform the following steps:
Right-click on the panel where you want to create the launcher.
Choose Add to Panel -> Launcher.
Use the following format to type the entry in the Command field of the Create Launcher dialog:
env LANG=locale LC_ALL=locale application-name
For example, if you want to launch an application called motif-app
from the directory /usr/dt/bin
in the Chinese Big5 locale, use the following string in the Command field:
env LANG=zh_TW.BIG5 LC_ALL=zh_TW.BIG5 /usr/dt/bin/motif-app
You might also need to specify the appropriate value for the LD_LIBRARY_PATH environment variable for the application.
Click OK to create the launcher on the panel.
Perform the following steps:
Start the Terminal
application
in the legacy locale. To open a Terminal
window
in a legacy locale, enter the following command:
env LANG=locale LC_ALL=locale gnome-terminal --disable-factory
Run the CLI application in the Terminal
window.
Alternatively, perform the following steps:
Start the Terminal
application.
To start Terminal
, click Launch,
then choose Applications -> Utilities -> Terminal.
Choose Terminal -> Set Character Encoding, then switch the locale setting from UTF-8 to a legacy locale.
Set the LANG and LC_ALL environment variables to the current shell.
Run the CLI application in the Terminal
window.