Selecting Character Sets

This section discusses selecting character sets.

When configuring your PeopleSoft system, you need to consider the character set (or sets) that will be in use on the following tiers:

  • Client.

  • Web server.

  • Application server.

  • Database server.

  • File attachment storage location (FTP site, HTTP repository, or database table).

  • Email.

Some operations of your PeopleSoft system require the interaction of multiple tiers. For example, the uploading of a file attachment involves the browser on the client, the web server, the application server, the database server, and ultimately the file attachment storage location. To ensure the correct transfer of data and files between these tiers, Oracle recommends configuring each server tier (web server, application server, database server, and file storage location) to use the same character set as follows:

  • If your PeopleSoft system operates in a multi-language environment, use a UTF-8 character set on each server tier.

  • If your PeopleSoft system operates in a single language environment, use the native language character set for that language on each server tier. Alternatively, you could use a UTF-8 character set on each server tier, which would provide more flexibility than using the native language character set.

Clients can always be configured to use the native language of the user of that workstation or browser.

The following table depicts example character set settings across all tiers for three typical configurations—a multi-language environment, a single language environment (Western), and a single language environment (non-Western).

Note: This table shows examples for a particular combination of languages and platforms; your specific configuration could differ.

Tier (Platform)

Multi-Language

Single Language(Western: French)

Single Language(Non-Western: Japanese)

Where to Check

Client (Windows)

Any (for example, English uses CP1252).

French (uses CP1252).

Japanese (uses CP932).

Start, Settings, Control Panel, Regional Options

Web server (Linux) – Shell processes

en_US.utf8

fr_FR.iso88915

ja_JP.sjis

locale command

Application server (Linux) – PSAPPSRV processes

utf-8

latin15

sjis

psappsrv.cfg [PSTOOLS] Character Set

Application server (Linux) – Email processes

utf-8

utf-8

utf-8

psappsrv.cfg [SMTP Settings] SMTP Character Set

Application server (Linux) – Shell processes

en_US.utf8

fr_FR.iso88915

ja_JP.sjis

locale command

Database server (Oracle)

AL32UTF8

WE8ISO885915

JS16SJISTILDE

NLS_DATABASE_PARAMETERS

File attachments: FTP site4 (Linux) – Shell processes

en_US.utf8

fr_FR.iso88915

ja_JP.sjis

locale command

4 For file attachments, if the storage location is a database table or an HTTP repository, then the configuration of one of the other server tiers will also configure the character set in use for a file attachment storage location on that tier. Specifically, a database table as a storage location depends on the settings for the database server; an HTTP file repository as a storage location depends on the web server settings if the HTTP repository is deployed on the web server. In the preceding table, information is provided for an FTP site as a storage location only because an FTP site can be deployed independently from the other server tiers.

Failure to configure character sets correctly across server tiers can result in garbled file names. To minimize character corruption issues, try to use the multi-language settings where possible.

The primary character set decision that you must make when installing a PeopleSoft implementation is which character set to use for the database system. Ideally, all databases are encoded in Unicode; however, in some cases Unicode requires several bytes to represent each character when only one byte may be required in a non-Unicode character set. Therefore, the PeopleSoft system enables you to use certain non-Unicode character sets for the database.

By using a Unicode encoded database, you can maintain a single database with data in any combination of languages. A single PeopleSoft application server can serve multiple users connecting to the mixed-language database, regardless of the language or character set of those users’ client machines. The only restriction on a user’s ability to access mixed-language data is the capability of the user’s client workstation to interpret, display, and accept keyboard entry of the characters from the various languages.

Most language or region-specific non-Unicode character sets provide sufficient characters for only a few languages. If you create a non-Unicode database, you must ensure that all of the characters for all of the languages that you plan on using can be represented in the character set that you choose.

The following table lists whether a PeopleSoft language is supported in a Unicode or non-Unicode database character set:

Language Code

Language

Database Character Set

ARA

Arabic

Unicode

BUL

Bulgarian

Unicode

CFR

Canadian French

Unicode or non-Unicode

CRO

Croatian

Unicode

CZE

Czech

Unicode

DAN

Danish

Unicode or non-Unicode

DUT

Dutch

Unicode or non-Unicode

ENG

US English

Unicode or non-Unicode

FIN

Finnish

Unicode or non-Unicode

ESP

Spanish

Unicode or non-Unicode

FRA

French

Unicode or non-Unicode

GER

German

Unicode or non-Unicode

HUN

Hungarian

Unicode

ITA

Italian

Unicode or non-Unicode

JPN

Japanese

Unicode or non-Unicode

KOR

Korean

Unicode

NOR

Norwegian

Unicode or non-Unicode

POL

Polish

Unicode

POR

Portuguese

Unicode or non-Unicode

ROM

Romanian

Unicode

RUS

Russian

Unicode

SER

Serbian

Unicode

SLK

Slovak

Unicode

SLV

Slovenian

Unicode

SVE

Swedish

Unicode or non-Unicode

THA

Thai

Unicode

UKE

English

Unicode or non-Unicode

ZHS

Simplified Chinese

Unicode

ZHT

Traditional Chinese

Unicode

Depending on the data that you store and how the database stores Unicode characters, a Unicode database can be significantly larger than a non-Unicode database. However, only the storage of character data is affected; the space that is required for non-character data, such as numbers and dates (which are stored by the database system as numbers), is not affected.

Depending on the database platform, you can use one of the four character set types (SBCS, nonshifting DBCS, shifting DBCS, or Unicode) when creating the database. However, the number of characters that you can store in each column is affected greatly by the type of character set that you choose for the database encoding.

All data that is stored in memory and processed by the PeopleTools application server is held in Unicode. However, the application server allows files on the server (created through PeopleCode file layout objects) and log and trace files to be Unicode or non-Unicode. Although the PeopleSoft application server uses Unicode internally for all data processing, it can create these files in Unicode or in a non-Unicode character set.

Each PeopleSoft application server is configured with a default character set, UTF-8. If a file operation must create a non-Unicode file, this character set is used, unless another character set is explicitly specified in the file operation. For example, if you create a file layout object to write a non-Unicode file, but you don’t specify in which character set the file should be created, the default non-Unicode character set of the application server is used.

Microsoft Windows enables you to change the default character set of the system, although as installed, the default character set matches the default locale of the Microsoft Windows installation. To change the system default locale (and therefore the character set), on Microsoft Windows servers, use the Control Panel’s Regional Options menu. In the Language settings for the system section, click the Set Default button.

When running on Unix/Linux, the PeopleSoft application server enables you to specify the default non-Unicode character set in the application server’s configuration file, which you select by using the PSADMIN tool. Any valid PeopleSoft character set with a character set type of SBCS or nonshifting DBCS is a valid default non-Unicode character set for PeopleSoft application servers that run on Unix/Linux.

You must consider the client components of PeopleTools when you are planning your language strategy. The requirements for language support on client workstations are different, depending on whether you are using the PeopleSoft Pure Internet Architecture or the PeopleTools development tools for Microsoft Windows.

This section discusses:

  • Character sets and fonts in the PeopleSoft Pure Internet Architecture.

  • Fonts and the PeopleTools development environment.

  • Input methods.

Character Sets and Fonts in the PeopleSoft Pure Internet Architecture

The PeopleSoft Pure Internet Architecture serves all HTML pages in the UTF-8 encoding of Unicode. This encoding is recognized automatically by the web browser, because the encoding of the page is announced in the HTTP header when the browser communicates with the web server. All browsers supported by PeopleTools can support UTF-8 encoded HTML pages.

However, the browser needs other components to correctly display and enter the vast array of characters that are available in Unicode. Specifically, you need appropriate fonts to display the various scripts in which you expect data to be maintained. In addition, you might need alternate keyboard layouts or, in the case of ideographic scripts such as Chinese, Japanese, and Korean, you need input method editors (IMEs) to convert sequences of keystrokes into ideographs. The requirement for alternate keyboard and IMEs is the same for both the PeopleSoft Pure Internet Architecture and the PeopleTools development environment.

Not all fonts contain a full repertoire of Unicode characters, because many fonts are tailored to address a specific list of languages and contain only the glyphs that are required by those languages. If you try to view Unicode data with a font that does not contain the appropriate characters for the displayed language, you will most likely see square boxes in place of the appropriate characters. The data has not been corrupted; there is just no glyph available in the current font for the character that the system is trying to display. For this reason, you may need to license or configure several fonts for a global PeopleSoft system.

The PeopleSoft Pure Internet Architecture includes a set of style sheets, defined with Application Designer, that determine the font that is used to display HTML pages. In some cases, the application data may contain characters that are not present in this font and that require a different font.

The Albany TrueType fonts shipped in the PS_HOME\fonts\truetype directory support all of the languages supported by the PeopleSoft system. Alternatively, you may need to obtain and configure fonts that contain the characters for the languages that you are planning to use, if your workstations are not already configured with these fonts. Obtain fonts from the following sources:

  • Many Microsoft Windows and other operating system applications are packaged with Unicode fonts containing glyphs covering a large range of languages.

    Microsoft Office is packaged with several fonts containing a large portion of the characters in Unicode, including the Microsoft Sans Serif font. Use these fonts in the PeopleSoft Pure Internet Architecture by specifying them in the Application Designer style sheet definitions or by following the browser-specific instructions in this section.

  • Many public domain fonts exist that contain a large character repertoire for use in web browsers. The unifont.org web site is one location to get additional information on public domain fonts.

    See Unicode Font Guide For Free/Libre Open Source Operating Systems.

  • Several font foundries license fonts for individual or corporate use.

    Some of these foundries include Monotype, Bitstream, and Tiro Typeworks.

Depending on your browser, you can also download fonts from your browser’s manufacturer.

To enable the display of GB18030 characters, you can use either the SimSun-18030 font from Microsoft or the Albany fonts shipped in the PS_HOME\fonts directory. Both of these fonts have glyphs for the supported ranges of the GB18030 character set.

Fonts and the PeopleTools Development Environment

PeopleTools enables you to specify the font that is used for all graphical components for all PeopleTools modules that run on Windows, such as Application Designer. Use these methods to specify fonts:

  • Configuration Manager font setting (Display tab)

    This setting affects the font that is used by all of the designer components of PeopleTools, including all of the text that is contained in the Microsoft Windows resource files

    See Understanding PeopleTools Translation.

    Changing this font setting may be necessary if your workstation’s default locale does not contain the characters that are used for the language that you are attempting to display or maintain. For example, if you are attempting to view Japanese characters on an English Microsoft Windows workstation, you can change the PeopleSoft Configuration Manager font setting to select a font that contains the characters for the language that you are trying to display.

    The Albany TrueType fonts shipped in PS_HOME\fonts directory support all of the languages supported by the PeopleSoft system.

    In addition, several fonts that are shipped with Microsoft Windows and Microsoft Office, including Arial Unicode MS and Microsoft Sans Serif, contain a large number of glyphs covering most of the languages that are supported by the Unicode character set. Microsoft Windows can also be configured with fonts for most worldwide languages by selecting the required languages under the Regional Settings Control Panel menu.

  • PeopleCode font

    The PeopleCode editor in Application Designer also enables you to select a font for character display in the editor’s window itself. This is useful if the PeopleCode programs that you are working on contain Unicode characters. To set the font in Application Designer, open the PeopleCode program, select Edit, Display Fonts and Color.

Fonts in PeopleSoft Charts and PDF Documents

The operating system on the client workstation provides the fonts for the browser to render text in Peoplesoft Pure Internet Architecture pages. In certain circumstances, the application server provides the fonts when text is rendered on the server. For example, for PeopleSoft charts sent as rendered images and for PDF documents produced by reporting tools such as SQR, or BI Publisher, the application server might need to be configured with fonts that contain the needed glyphs. If the font is not configured correctly, squares or blanks might appear in PeopleSoft charts or PDF documents when the same characters are rendering correctly in the browser.

Note: The most common case in which fonts are rendered on the application server is when Java is used to draw charts or reports. The fonts shipped by PeopleSoft can be found in PS_HOME\fonts\ttf, as well as in PS_HOME\jre\lib\fonts.

See Understanding Report Template Types.

Input Methods

If users will enter translated data by using PeopleSoft Pure Internet Architecture or the PeopleTools development environment, you must ensure that an appropriate keyboard layout or input method editor is installed on the workstation.

Most alphabetic languages can be typed by using a relatively simple keyboard layout. Several specialized keyboard layouts exist for most languages; configure these keyboard layouts through your operating system. For example, a Spanish keyboard layout contains keys for the n-tilde character (ñ) and several other accented characters.

However, certain PeopleSoft hot keys do not work as expected on alternate, non-U.S. keyboard layouts. For example, Alt+', Alt+\, and Alt+/ do not produce the expected results on the AZERTY keyboard. This occurs because some keys on non-U.S. keyboards produce different key codes than the same key on a U.S. keyboard (also known as a QWERTY keyboard).

A solution to this problem can be found in the appendix.

See PeopleSoft Hot Keys Do Not Function As Expected on a non-U.S. Keyboard.

There are several ways of entering these characters by using a nonlocalized keyboard. Your operating system manual can help you use specialized keyboard layouts, such as the English international layout, which enables you to enter accented characters by using two keystrokes. The Microsoft web site contains information about keyboards that are supported by Microsoft Windows and instructions for installing and configuring Windows keyboard layouts.

Ideographic languages, such as Chinese, Japanese, and Korean, require the use of a front-end processor to intercept multiple keyboard strokes and transform them into an ideographic character. These are known as IMEs, and they must be installed on each workstation where you plan to enter the ideographic languages.

Most localized versions of operating systems for these languages come preconfigured with IMEs that are appropriate for the language that is supported by the operating system. But on systems where the default locale is not Chinese, Japanese, or Korean, you may need to configure or license an IME from a third-party vendor. The PeopleSoft Pure Internet Architecture supports any IME that is supported by your browser. The designer tools in Microsoft Windows support all standard Microsoft IMEs.

The PeopleSoft system supports UTF-8 for outgoing Simple Mail Transfer Protocol (SMTP) email messages from PeopleTools application servers. In addition, the PeopleSoft system supports several additional encodings for outgoing email.

PeopleSoft application servers support the following for outgoing email:

  • UTF-8 (default).

  • ISO-2022-JP, Shift_JIS, EUC-JP (for Japanese).

  • ISO-2022-KR, EUC-KR (for Korean).

  • GBK, Big5, GB18030 (for Chinese).

Specifying Email Character Sets

You specify an email character set in the SMTPCharacterSet parameter in the application server configuration file, psappsrv.cfg. By default, the SMTPCharacterSet parameter is set to UTF-8.

Note: You should specify a value for the SMTPCharacterSet. If you do not specify a value for the parameter, email is sent as-is, with no encoding. Leave the parameter set to the default value of UTF-8 if you are not certain about which value to use.

For example, to use ISO-2022-JP encoding for outgoing SMTP mail, in the psappsrv.cfg file, set the SMTPCharacterSet parameter to ISO-2022-JP, as shown in the following example:

[SMTP Setting]
...
SMTPCharacterSet=ISO-2022-JP
SMTPEncodingDLL=blank

You can also write your own SMTPEncodingDLL modules, if necessary.

Using Extended Japanese Characters

To use certain Windows-31J (also known as Microsoft CP932) characters—specifically, NEC special characters, NEC-selected IBM extended characters, IBM extension characters, and user-defined characters—in incoming or outgoing email messages with the ISO-2022-JP Japanese character set, you must complete additional configuration of your web server (for incoming email) and application server or PeopleSoft Process Scheduler (for outgoing email).

For incoming email on the web server, the following JVM setting must be added to the JAVA_OPTIONS_WIN32 parameter in the setenv.cmd file:

SET JAVA_OPTIONS_WIN32=
"-Dsun.nio.cs.map=x-windows-iso2022jp/ISO-2022-JP"

For outgoing email on the application server or PeopleSoft Process Scheduler, the following JVM option must be added to either the psappsrv.cfg file or the psprcs.cfg file depending on whether the application server or an AE program, respectively, will be handling outgoing email messages. JVM options are set in the PSTOOLS section of the file:

[PSTOOLS]
...
JavaVM Options=-Dsun.nio.cs.map=x-windows-iso2022jp/ISO-2022-JP

In addition, your web server, application server, and PeopleSoft Process Scheduler must be using a Java Runtime Environment (JRE) or Java Development Kit (JDK) that is supported for extended Japanese characters. See the release notes on My Oracle Support website.

See My Oracle Support, Knowledge, Tools and Technology, Documentation, Release Notes.