Working with non-Unicode characters

This section describes how to work with non-Unicode characters in the Discovery Framework.

Because the Discovery Framework is Java-based, it can only read Unicode or Latin-1 characters. In the case of other characters, you can work around this limitation by converting the native file to ASCII, using a converter such as native2ascii, which is freely available as part of the JDK.

Keep in mind the following guidelines:

  1. Use UTF-8 as your encoding. Lesser encodings cannot properly represent Japanese characters.
  2. Pick a valid character set, such as Shift-JIS or UTF-8/Unicode, and stick with it. You cannot change character sets midstream—if you change character sets, you must re-enter your values.
  3. Make sure the character set in your text editor matches the character set in native2ascii.

More information about working with non-Unicode characters can be found on the Liferay Portal Website.