Java Desktop System Release 3 Administration Guide

Chapter 12 Migration to Unicode Multilingual Computing

The Java Desktop System is a fully Unicode-enabled, multilingual system which supports languages with Unicode UTF-8 encoding. The Java Desktop System also provides codeset conversion to support legacy language encodings. This chapter describes issues that might arise when you migrate to Unicode multilingual computing.

Importing and Exporting Data

The migration to Unicode multilingual computing affects a number of methods of importing and exporting data.

Mounting a Local Microsoft Windows Partition on Linux Systems

If a Windows partition exists on the local hard disk when you install the Java Desktop System, the installer software automatically mounts the partition. The installer software also sets the value of the iocharset option or the nls option in /etc/fstab to the correct value for the UTF-8 and zh_CN.gb18030 locales.

If you mount the partition manually, or if you mount the partition in other locales, you must configure particular mount options to browse multibyte file or directory names correctly. You must set the options as shown in the following table:

Table 12–1 Options to Set When You Mount Partitions in Locales

Locale 

NTFS 

VFAT 

ja_JP.UTF-8

nls=utf8

iocharset=utf8,codepage=932

ko_KR.UTF-8

nls=utf8

iocharset=utf8,codepage=949

zh_CN.UTF-8

nls=utf8

iocharset=utf8,codepage=936

zh_CN.gb18030

nls=gb2312

iocharset=gb2312,codepage=936

zh_TW.UTF-8

nls=utf8

iocharset=utf8,codepage=950

See note below. 

zh_HK.UTF-8

nls=utf8

iocharset=utf8,codepage=950

See note below. 

en_US.UTF-8

nls=utf8

iocharset=utf8

de_DE.UTF-8

nls=utf8

iocharset=utf8

es_ES.UTF-8

nls=utf8

iocharset=utf8

fr_FR.UTF-8

nls=utf8

iocharset=utf8

it_IT.UTF-8

nls=utf8

iocharset=utf8

sv_SE.UTF-8

nls=utf8

iocharset=utf8


Note –

If you use the zh_TW.big5 or zh_HK.big5hkscs locales, use big5 instead of utf8 for the nls and iocharset options.


Sample entries for /etc/fstab for the Japanese UTF-8 locale are as follows:

/dev/sda1

/windows/C

ntfs ro,users,gid=users,umask=0002,nls=utf8 0 0

/dev/sda2

/windows/C

vfat users,gid=users,umask=0002,iocharset=utf8,codepage=932

Mounting Removable Media on Linux Systems

On Microsoft Windows, floppy disks, Zip drives, and removable hard disks typically use the FAT and VFAT file systems. To browse multibyte file or directory names correctly, you might need to configure the codepage and iocharset mount options for these file system types.

For information about values to use for the codepage and iocharset options, see Table 12–1.

Mounting a Remote Microsoft Windows File System Using SMB on Linux Systems

You can mount the following types of Microsoft Windows file system remotely:

To browse multibyte file or directory names correctly, you might need to configure the codepage and iocharset mount options for these types of remote file system.

For example, if you import from Japanese Windows, a sample /etc/fstab entry is as follows:

server:/data /data smbfs iocharset=utf8,codepage=cp932,username=foo,password=bar


Note –

For information about values to use for the codepage and iocharset options, see Table 12–1. The value for the codepage option must begin with cp. For example, for the Japanese UTF-8 locale, use the value cp932, not 932.


Mounting a Remote UNIX File System Using SMB

The Java Desktop System can use SMB to remotely access a file system on UNIX and Linux systems. The export server must run SMB or an equivalent application to export the remote file system. The client side can specify file system encoding if the legacy data is stored in legacy encodings. The codeset conversion of the filename is performed automatically.

Microsoft Office Files

Microsoft Office files are encoded in Unicode. StarOffice applications can read and write the Unicode-encoded files.

HTML Files

HTML files created in HTML editors such as Mozilla Composer, or HTML files saved by a web browser, usually contain a charset encoding tag. After exporting or importing, you can browse such HTML files with the Mozilla Navigator web browser, or edit the files with Mozilla Composer, according to the encoding tag in the HTML file.

Fixing Problems With HTML Files

Some HTML files might display incomprehensible characters. This problem is typically due to the following reasons:

To find the charset encoding tag in the HTML file, perform the following steps:

  1. Open the file in Mozilla.

  2. Choose View -> Page Info.

The charset information is at the bottom of the General tab, for example: Content-Type text/html; charset=us-ascii

If the string charset=us-ascii does not match the actual encoding of the file, the file might appear as broken. To edit the encoding of the HTML file, perform the following steps:

  1. Open the file in Mozilla Composer.

  2. Choose File -> Save as Charset.

  3. Choose the correct encoding. Mozilla Compose automatically converts the encoding and the charset tag as appropriate.

Emails Saved as Portable Format

Most email messages are tagged with the MIME charset tag. The email application of the Java Desktop System, Email and Calendar, accepts MIME charset tags. You do not need to perform any encoding conversion.

Plain Text Files

Plain text files do not have a charset tag. If the files are not in UTF-8 encoding, encoding conversion is needed. Use the iconv utility to perform the encoding conversion. For example, to convert a plain text file encoded in Traditional Chinese big5 to UTF-8, execute the following command: iconv -f big5 -t UTF-8 inputfilename > outputfilename

Alternatively, you can use File System Examiner to perform the encoding conversion. To start File System Examiner, click Launch, then choose Applications -> Utilities -> File System Examiner.

You can use Text Editor to read and write text files with various character encoding text automatically, so that you do not need to perform a manual conversion. You can also specify an encoding explicitly when you open or save a file in Text Editor. To start Text Editor, click Launch, then choose Applications -> Accessories -> Text Editor.

File Names and Directory Names

If file names and directory names that use multibyte characters are not in UTF-8 encoding, encoding conversion is needed. You can use File System Examiner to convert file and directory names from legacy character encodings to UTF-8 encoding. To start File System Examiner, click Launch, then choose Applications -> Utilities -> File System Examiner. Refer to the online Help for File System Examiner for more information.

When you use SMB from the file manager to access non-UTF-8 file or directory names on Microsoft Windows, you can access the non-UTF-8 file or directory names without encoding conversion.

Unzip and FTP Utilities on Linux Systems

The following utilities have been enhanced to convert file and directory names from legacy character encodings to UTF-8 encoding when you extract the files to Java Desktop System file system:

For more information about the enhancements to these utilities, see the unzip and ftp man pages. In particular, see the information on the CODESET_LIST environment variable.

Launching Applications in Legacy Locales

For applications that are not ready to migrate to Unicode UTF-8, you can create a launcher on a panel to start the application in a non-UTF-8 locale. You can also start command line interface (CLI) applications directly from the command line.

To Create a Launcher for an Application to Start in a Non-UTF-8 Locale

Perform the following steps:

  1. Right-click on the panel where you want to create the launcher.

  2. Choose Add to Panel -> Launcher.

  3. Use the following format to type the entry in the Command field of the Create Launcher dialog:

    env LANG=locale LC_ALL=locale application-name

    For example, if you want to launch an application called motif-app from the directory /usr/dt/bin in the Chinese Big5 locale, use the following string in the Command field:

    env LANG=zh_TW.BIG5 LC_ALL=zh_TW.BIG5 /usr/dt/bin/motif-app

    You might also need to specify the appropriate value for the LD_LIBRARY_PATH environment variable for the application.

  4. Click OK to create the launcher on the panel.

To Run a CLI Application in a Non-UTF-8 Locale

Perform the following steps:

  1. Start the Terminal application in the legacy locale. To open a Terminal window in a legacy locale, enter the following command:

    env LANG=locale LC_ALL=locale gnome-terminal --disable-factory

  2. Run the CLI application in the Terminal window.

Alternatively, perform the following steps:

  1. Start the Terminal application. To start Terminal, click Launch, then choose Applications -> Utilities -> Terminal.

  2. Choose Terminal -> Set Character Encoding, then switch the locale setting from UTF-8 to a legacy locale.

  3. Set the LANG and LC_ALL environment variables to the current shell.

  4. Run the CLI application in the Terminal window.