Interoperability with Other Platforms

Language:

The following sections describe certain considerations for multi-platform environments.

NFS Server Considerations

The NFS version 4 protocol (the default in Oracle Solaris) uses UTF-8 to handle file names and other strings. In most use cases no charset-related adjustments should be necessary. However, note that the charset option can be used if some or all NFS clients are using a specified character set.

For example, to share the /export directory using the ISO8859-1 character set, the following command would be used:

# share -o iso8859-1 /export

To share a directory using a specific character set for some systems only, the charset=access_list option can be used:

# share -o iso-8859-1=isosystem.example.com,koi8-r=koisystem.example.com /export

All file and path names created by the clients will be converted to UTF-8 at the server.

For more information, see the share_nfs(8) man page.

File System Considerations

mount_pcfs(1M) does not support the MS-DOS codepages, so non-ASCII characters on FAT filesystems created by MSDOS, a legacy version of MS Windows or the Linux "msdos" driver may be garbled. The later FAT implementations use Unicode for character representation and are fully supported on Oracle Solaris by default, both for reading and writing.

Archives Containing Non-ASCII Filenames

Archiving files with non-ASCII characters in filenames can cause issues, because support of non-ASCII filenames in the numerous implementations of archive formats differs significantly, although the situation is improving.

Recent tar implementations on UNIX and similar systems support the POSIX format specified by POSIX.1-2001, so the non-ASCII filenames are handled safely. On the MS Windows platform, a number of archival utilities store the filenames by using the current codepage so names of files extracted from such archives can become garbled.

For garbled archives, use the convmv(1) tool to repair them when the codepage is known:

$ convmv -f cp437 -t utf8 my_extracted_filename

In Zip files, the original specification sets the encoding of file names and file comments to IBM437. In 2007 PKWare extended the specification to also allow UTF-8. In the meantime, various Zip implementations adopted the strategy of using the current codepage as the filename encoding (usually on the MS Windows platform).

Info-ZIP's Zip 3.0, available in Oracle Solaris 10 and Oracle Solaris 11, stores filenames in UTF-8, so if both the compression and decompression utility are from this version, the archive contents are not corrupted.

When a Zip archive that uses a non-UTF-8 encoding to store the file names is extracted on Oracle Solaris, the file names might get garbled. You can use the convmv(1) tool to repair them, if the codepage is known:

$ convmv -f cp437 -t utf8 my-unzipped-filename