There are a number of methods of importing and exporting data that are affected by the migration to Unicode multilingual computing.
The system administrator must configure the mount options codepage and iocharset for the file system type FAT and VFAT that are typically used for floppy disks, zip drives, and removable hard-disks on Microsoft Windows. For example, if you import from Traditional Chinese Windows, the settings must be as shown in the following table to browse the traditional Chinese filenames correctly.
Mount Option |
Traditional Chinese Setting |
---|---|
codepage |
950 |
iocharset |
big5 |
Sample entries for /etc/fstab for the Traditional Chinese example are as follows:
/dev/fd0h1440 |
/media/fd0h1440 |
vfat noauto,iocharset=big5,codepage=950 |
/dev/sda1 |
/media/iee1394disk |
vfat noauto,iocharset=big5,codepage=950 |
A system administrator must configure mount options codepage and iocharset to mount a remote Microsoft Windows file system shared using CIFS, or a file system exported from another system by SMB. For example, if you import the legacy files encoded in big5 on Traditional Chinese Windows, the iocharset parameter must be set to big5 and codepage must be set to 950 to browse the Traditional Chinese file names correctly. A sample /etc/fstab entry is as follows:
server:/data /data smbfs iocharset=big5,codepage=950,username=foo,password=bar |
The Java Desktop System
can remotely access
a file system on UNIX and Linux systems by using SMB. The export server must
run Samba or equivalent to export the remote file system. The client side
can specify file system encoding if the legacy data is stored in legacy encodings.
The codeset conversion of the filename is done automatically.
Microsoft Office
files are encoded in Unicode. StarOffice
applications can read and write the Unicode encoded
files without problem.
HTML files authored using HTML editors such as Mozilla
Composer
, or HTML files saved by a web browser, usually contain
a charset encoding tag. After exporting or importing, you
can browse such HTML files with the Mozilla Navigator
web browser, or edit the files with Mozilla Composer
,
according to the encoding tag in the HTML file.
Some HTML files might be displayed in garbage characters. This problem is typically due to the following reasons:
The charset encoding tag is incorrect.
The charset encoding tag is missing.
To find the charset encoding tag in the HTML file, perform the following actions:
Open the file with Mozilla
.
Press Ctrli, or click View to open the View menu.
Click on Page Info.
The charset information is in the bottom of the General tab, for example: Content-Type text/html; charset=us-ascii
If the string charset=us-ascii does not match with the actual encoding of the file, the file might appear as broken. To edit the encodings of the HTML file, perform the following actions:
Open the file with Mozilla Composer
.
Open the File menu.
Select Save As Charset.
Choose the correct encoding. Mozilla Compose
automatically converts the encoding and the charset tag as appropriate.
Modern emails are tagged with the MIME charset tag.
The mail application of the Java Desktop System
, Evolution
, accepts MIME charset tags. You
do not need to perform any encoding conversion.
Plain text files do not have a charset tag. If the files are not in UTF-8 encoding, encoding conversion is needed. For example, to convert a plain text file encoded in Traditional Chinese big5 to UTF-8, execute the following command: iconv -f big5 -t UTF-8 inputfilename > outputfilename