The Java Desktop System is a fully Unicode-enabled, multilingual system that supports languages with Unicode UTF-8 encoding. The Java Desktop System also provides codeset conversion to support legacy language encodings.
The Java Desktop System is installed as the default desktop in all locales as part of the underlying operating system. However, Sun Microsystems provides full globalization support for the following locales within the Java Desktop System:
de_DE.UTF-8
fr_FR.UTF-8
es_ES.UTF-8
sv_SE.UTF-8
it_IT.UTF-8
ja_JP.UTF-8
ko_KR.UTF-8
zh_CN.UTF-8
zh_CN.GB18030
zh_TW.UTF-8
zh_TW.BIG5
There are a number of methods of importing and exporting data that are affected by the migration to Unicode multilingual computing.
The file system types FAT and VFAT are typically used for floppy disks, zip drives, and removable hard-disks on Microsoft Windows. The system administrator must configure the mount options codepage and iocharset for these file system types. For example, if you import from Traditional Chinese Windows, the settings must be as shown in the following table to browse the traditional Chinese filenames correctly.
Mount Option |
Traditional Chinese Setting |
---|---|
codepage |
950 |
iocharset |
big5 |
Sample entries for /etc/fstab for the Traditional Chinese example are as follows:
/dev/fd0h1440 |
/media/fd0h1440 |
vfat noauto,iocharset=big5,codepage=950 |
/dev/sda1 |
/media/iee1394disk |
vfat noauto,iocharset=big5,codepage=950 |
A system administrator must configure mount options codepage and iocharset to mount a remote Microsoft Windows file system shared using CIFS, or a file system exported from another system by SMB. For example, if you import the legacy files encoded in big5 on Traditional Chinese Windows, the iocharset parameter must be set to big5 and codepage must be set to 950 to browse the Traditional Chinese file names correctly. A sample /etc/fstab entry is as follows:
server:/data /data smbfs iocharset=big5,codepage=950,username=foo,password=bar |
The Java Desktop System can remotely access a file system on UNIX and Linux systems by using SMB. The export server must run Samba or equivalent to export the remote file system. The client side can specify file system encoding if the legacy data is stored in legacy encodings. The codeset conversion of the filename is done automatically.
Microsoft Office
files are encoded in Unicode. StarOffice
applications can read and write the Unicode encoded
files without problem.
HTML files authored using HTML editors such as Mozilla
Composer
, or HTML files saved by a web browser, usually contain
a charset encoding tag. You can browse such HTML files
with the Mozilla Navigator
web browser, or edit
the files with Mozilla Composer
, according to the
encoding tag in the HTML file.
Some HTML files might be displayed in garbage characters. This problem is typically due to the following reasons:
The charset encoding tag is incorrect.
The charset encoding tag is missing.
To find the charset encoding tag in the HTML file, perform the following actions:
Open the file with Mozilla
.
Press Ctrli, or click View to open the View menu.
Click on Page Info.
The charset information is in the bottom of the General tab, for example: Content-Type text/html; charset=us-ascii
If the string charset=us-ascii does not match with the actual encoding of the file, the file might appear as broken. To edit the encodings of the HTML file, perform the following actions:
Open the file with Mozilla Composer
.
Open the File menu.
Select Save As Charset.
Choose the correct encoding. Mozilla Compose
automatically converts the encoding and the charset tag as appropriate.
Modern emails are tagged with the MIME charset tag.
The mail application of the Java Desktop System, Evolution
,
accepts MIME charset tags. You do not need to perform any
encoding conversion.
Plain text files do not have a charset tag. If the files are not in UTF-8 encoding, encoding conversion is needed. For example, to convert a plain text file that is encoded in Traditional Chinese big5 to UTF-8, execute the following command: iconv -f big5 -t UTF-8 inputfilename > outputfilename