Skip Headers
Oracle® Real User Experience Insight User's Guide
Release 6.0.1 for Linux x86-64

Part Number E16359-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

G Working With National Language Support

This appendix provides a detailed discussion of the character encoding standards supported by RUEI when monitoring network traffic. Restrictions to the identification of such things as domain names, custom headers, and functional errors are highlighted. The operation of data masking and user ID matching when working with international character sets is also discussed.

G.1 Introduction

The Collector can monitor network traffic containing data in a wide variety of encoding standards. A complete list of the encoding standards currently supported by RUEI is shown in Table G-1.

Table G-1 Supported Encodings

Canonical Name MIME NameFoot 1  Description

Big5

Big5

Traditional Chinese.

EUC-JP

EUC-JP

EUC-encoding Japanese.

GB_2312-80

GB_2312-80, gb2312, chinese

Chinese.

GBK

GBK, CP936, MS936, windows-936

Simplified Chinese.

ISO-8859-1

ISO-8859-1, ISO_8859-1, latin1

Latin alphabet no. 1.

ISO-8859-10

ISO-8859-10, latin6

Latin alphabet no. 6 (Nordic).

ISO-8859-13

ISO-8859-13

Latin alphabet no. 7 (Baltic Rim).

ISO-8859-14

ISO-8859-14, latin8

Latin alphabet no. 8 (Celtic).

ISO-8859-15

ISO-8859-15, latin9

Latin alphabet no. 9.

ISO-8859-16

ISO-8859-16, latin10

Latin alphabet no. 10 (south-eastern Europe).

ISO-8859-2

ISO-8859-2, ISO_8859-2, latin2

Latin alphabet no. 2 (central and eastern Europe).

ISO-8859-3

ISO-8859-3, latin3

Latin alphabet no. 3 (southern Europe).

ISO-8859-4

ISO-8859-4, latin4

Latin alphabet no. 4 (northern Europe).

ISO-8859-5

ISO-8859-5, cyrillic

Cyrillic.

ISO-8859-6

ISO-8859-6, arabic

Arabic.

ISO-8859-7

ISO-8859-7, greek

Greek.

ISO-8859-8

ISO-8859-8, hebrew

Hebrew.

ISO-8859-9

ISO-8859-9, latin5

Latin alphabet no. 5 (Turkish).

KOI8-R

KOI8-R

Russian.

Shift_JIS

Shift_JIS, shift-JIS

Japanese.

US-ASCII

US-ASCII, ascii

American Standard Code for Information Interchange (ASCII).

UTF- 32

UTF-32

32-bit UCS transformation format. Also known as UCS-4.

UTF-16

UTF-16

16-bit UCS transformation format, byte order identified by an optional byte-order mark.

UTF-16BE

UTF16BE

16-bit unicode transformation format, big-endian byte order.

UTF-16LE

UTF16LE

16-bit unicode transformation format, little-endian byte order.

UTF-32BE

UTF32BE

32-bit unicode transformation format, big-endian byte order.

UTF-32LE

UTF32LE

32-bit unicode transformation format, little-endian byte order.

UTF-8

UTF-8

8-bit UCS transformation format.

windows-1250

windows-1250

Microsoft Windows Eastern European.

windows-1251

windows-1251

Microsoft Windows Cyrillic (Russian)

windows-1252

windows-1252

Microsoft Windows Latin.

windows-1253

windows-1253

Microsoft Windows Greek.

windows-1254

windows-1254

Microsoft Windows Turkish.

windows-1255

windows-1255

Microsoft Windows Hebrew.

windows-1256

windows-1256

Microsoft Windows Arabic.

windows-1257

windows-1257

Microsoft Windows Baltic.

windows-1258

windows-1258

Microsoft Windows Vietnamese.


Footnote 1 The name (and supported aliases) as recognized in the HTTP encoding declarations.

Note that vendor-specific Web site encoding may not be supported. Network traffic containing non-supported encoding is still recorded, but matching may not be possible. For example, the content of a page can still be viewed in the replay viewer, but the page's defined name may not be correctly associated with it.

Web Site Configuration

In order to correctly monitor a multi-byte Web site, it is essential the Web site is properly configured. For example, if its Web server advertises UTF-8, but the actual pages are not UTF-8 encoded, RUEI cannot correctly monitor them, even when some Web browsers can autodetect and correct the unsupported contents. Therefore, such things as functional error and content checks will not operate correctly for these pages.

G.2 Implementation Considerations

Data Masking

The Collector can be configured to omit the logging of sensitive information. This is described in Section 8.4, "Masking User Information". Only ASCII argument names are supported. The encoding used in the argument's content does not matter because it is replaced anyway.

Particular attention should be paid to variable names that contain a dollar ($) character. For example, foo$bar can be transmitted in monitored traffic as foo%24bar (this is browser dependent). In this case, to mask this variable correctly, the percent-encoded variable name should be specified.

Be aware that the variables to be masked must be specified in ASCII format, and be specified exactly as they are reported within the Session diagnostics facility. For example, the variable name user name would be reported with the Session diagnostics facility as user%20name, but can also appear as user+name. Hence, both variable names should be specified for masking.

If the argument name contains non-ASCII characters, you should use the Session Diagnostics facility (described in Section 3.9, "Working With the Session Diagnostics Facility") to see how it is reported, and specify this reported name as the variable to be masked. In addition, you should regularly check the log files to ensure the data is being correctly masked.

Note the restrictions and requirements described above for masking URL arguments also apply to any situation in which you want direct access to a URL argument. For example, custom dimensions or application definitions.

Custom Headers and Cookies

All header names must be encoded in ASCII because this is required by the HTTP protocol. Within header contents, all non-ASCII characters are replaced by a placeholder.

User ID Matching

Within RUEI, user identification is first based on the HTTP Authorization field. If this is not found, the application's user identification scheme is used. This can be specified in terms of URLs, cookies, request or response headers, or XPath expressions. This is explained in Section 6.2.9, "Defining User Identification".

Because a URL argument is a name=value combination, the name part is specified as the source argument from which the user ID will be read. The value part is extracted and reported as the user ID. The specified source argument is subject to the same requirements as explained earlier for data masking. However, the value part of the combination can be specified in any supported encoding. RUEI attempts to translate the value from its native encoding (for example, Shift-JIS) to UTF-8 so that it can be rendered within the user interface in the native language (for example, Japanese).

However, when the native encoding of the value is not known, the user ID cannot be properly rendered within the user interface, and the reported value is garbled. Due to the limitations of the HTTP protocol, user IDs on some Web sites may not be rendered as expected. In that case, it is recommended you specify the fallback encoding that should be used. This is explained in Section 9.3, "Specifying the Fallback Collector Encoding". Note the encoding specified for this setting is only applicable to URL and POST arguments. Content-based reporting (for example, functional errors) is not affected by this setting. Because this does not guarantee the correct rendering of all values, you should also review the Web site definitions, and verify all user IDs are ASCII only.

G.3 Specifying Content Checks

Be aware that, when specifying page content checks, the content rendered within the client browser (and seen by the end user) may differ from the underlying HTML page source. This is because of underlying font, format, and link tags, as well entity definitions, and so on. Hence, simply copying and pasting a portion of text from the rendered page within a client browser may not always work as expected.

Normally, this problem can be overcome by copying and pasting from the View source facility within the client browser. However, for pages that use an encoding other than UTF-8, this approach does not work if you are using Internet Explorer 6 or 7. The reason for this is that IE uses Notepad as its source viewer, and this only supports UTF-8. As a result, the source may appear garbled, and cannot meaningfully be copied and pasted into RUEI.

Because Mozilla Firefox employs an internal HTML source rendering tool, it is always able to render the HML source accurately, even for non-UTF-8 encodings. Therefore, it is recommended you use this browser as the basis for content-based checks, and whenever an accurate rendition of the HTML source is required.