Java Desktop System Release 2 Release Notes

Importing And Exporting Data

There are a number of methods of importing and exporting data that are affected by the migration to Unicode multilingual computing.

Removable Media

The system administrator must configure the mount options codepage and iocharset for the file system type FAT and VFAT that are typically used for floppy disks, zip drives, and removable hard-disks on Microsoft Windows. For example, if you import from Traditional Chinese Windows, the settings must be as shown in the following table to browse the traditional Chinese filenames correctly.

Mount Option 

Traditional Chinese Setting 

codepage

950 

iocharset

big5 

Sample entries for /etc/fstab for the Traditional Chinese example are as follows:

/dev/fd0h1440

/media/fd0h1440

vfat noauto,iocharset=big5,codepage=950

/dev/sda1

/media/iee1394disk

vfat noauto,iocharset=big5,codepage=950

Mounting a Remote Microsoft Windows File System Using Samba

A system administrator must configure mount options codepage and iocharset to mount a remote Microsoft Windows file system shared using CIFS, or a file system exported from another system by SMB. For example, if you import the legacy files encoded in big5 on Traditional Chinese Windows, the iocharset parameter must be set to big5 and codepage must be set to 950 to browse the Traditional Chinese file names correctly. A sample /etc/fstab entry is as follows:

server:/data /data smbfs iocharset=big5,codepage=950,username=foo,password=bar

Mounting a Remote UNIX File System Using Samba

The Java Desktop System can remotely access a file system on UNIX and Linux systems by using SMB. The export server must run Samba or equivalent to export the remote file system. The client side can specify file system encoding if the legacy data is stored in legacy encodings. The codeset conversion of the filename is done automatically.

Microsoft Office Files

Microsoft Office files are encoded in Unicode. StarOffice applications can read and write the Unicode encoded files without problem.

HTML Files

HTML files authored using HTML editors such as Mozilla Composer , or HTML files saved by a web browser, usually contain a charset encoding tag. After exporting or importing, you can browse such HTML files with the Mozilla Navigator web browser, or edit the files with Mozilla Composer, according to the encoding tag in the HTML file.

Fixing Broken HTML Files

Some HTML files might be displayed in garbage characters. This problem is typically due to the following reasons:

To find the charset encoding tag in the HTML file, perform the following actions:

  1. Open the file with Mozilla.

  2. Press Ctrli, or click View to open the View menu.

  3. Click on Page Info.

The charset information is in the bottom of the General tab, for example: Content-Type text/html; charset=us-ascii

If the string charset=us-ascii does not match with the actual encoding of the file, the file might appear as broken. To edit the encodings of the HTML file, perform the following actions:

  1. Open the file with Mozilla Composer.

  2. Open the File menu.

  3. Select Save As Charset.

  4. Choose the correct encoding. Mozilla Compose automatically converts the encoding and the charset tag as appropriate.

Emails Saved As Portable Format

Modern emails are tagged with the MIME charset tag. The mail application of the Java Desktop System, Evolution, accepts MIME charset tags. You do not need to perform any encoding conversion.

Plain Text Files

Plain text files do not have a charset tag. If the files are not in UTF-8 encoding, encoding conversion is needed. For example, to convert a plain text file encoded in Traditional Chinese big5 to UTF-8, execute the following command: iconv -f big5 -t UTF-8 inputfilename > outputfilename