PeopleTools 8.51 PeopleBook: Global Technology

Running COBOL in a Unicode Environment

This chapter provides an overview of COBOL in a Unicode environment and discusses how to:

Run the COBOL conversion utility.
Fine-tune COBOL programs.

Understanding COBOL in a Unicode Environment

This section discusses:

Unicode encodings in PeopleSoft COBOL.
Expanded storage space requirements.
Special logic for parsing Unicode strings.
COBOL sorting.
Unicode-specific error messages.

Unicode Encodings in PeopleSoft COBOL

The character set that is used for PeopleSoft COBOL processing must match the character set for your database. If you created a Unicode database for the PeopleSoft system, you must also run COBOL in Unicode.

Note. In this document, the word character refers to a single character in any language, regardless of how many bytes are required to store the character.

The Unicode standard provides several methods of encoding Unicode characters into a byte stream. Each encoding has specific properties that make it suitable for use in different environments. The two main encodings that are important to understanding how PeopleSoft COBOL operates when running in Unicode are:

UCS-2 (2-byte Universal Character Set) — which is the Unicode encoding that PeopleTools uses internally for data that is held in memory on the application server.

UCS-2 encodes all characters into a fixed storage space of two bytes.
UTF-8 (8-bit Unicode Transformation Format) — which is the encoding that the PeopleSoft system uses in COBOL.

UTF-8 uses a format that varies from one to four bytes per character. Currently, the PeopleSoft system supports Unicode’s Basic Multilingual Plane (BMP), which requires one to three bytes per character. Four-byte UTF-8 characters are required to represent supplementary characters that are outside Unicode’s BMP.

In UTF-8, the actual number of bytes to encode a character can be determined by the first three bits of the first, and sometimes only, byte of a character. The following table shows how the bit pattern of the first byte is related to the number of bytes needed to encode the UTF-8 character.

Unicode Code Point Range	UTF-8 Bit Pattern	UTF-8 Character Length
U+0000 – U+007F	0xxxxxxx	One byte
U+0080 – U+07FF	110xxxxx 10xxxxxx	Two bytes
U+0800 – U+FFFF	1110xxxx 10xxxxxx 10xxxxxx	Three bytes

The x bit positions are filled with the bits of the character code number in binary representation. The rightmost x is the least-significant bit (a big-endian representation). In multi-byte sequences (for Unicode code points greater than U+007F), the number of leading 1 bits in the first byte is identical to the number of bytes in the entire sequence. In addition, each byte in a multi-byte sequence has the most significant bit set.

This section includes Unicode encoding examples for the following characters:

Character	Unicode Code Point	Description
a	U+0061	Latin small letter a.
ñ	U+00F1	Latin small letter ñ.
€	U+20AC	Euro symbol

The following table shows the difference in how UCS-2 and UTF-8 encode several characters:

Character	Unicode Code Point	UCS-2 Byte Values	UTF-8 Byte Values
a	U+0061	0x00 0x61	0x61
ñ	U+00F1	0x00 0xF1	0xC3 0xB1
€	U+20AC	0x20 0xAC	0xE2 0x82 0xAC

The PeopleSoft system transparently handles the conversion between UCS-2 and UTF-8 when data is passed into the COBOL program from the database. If you are reading or writing files directly from a COBOL program, the input and output files are UTF-8 encoded when running PeopleSoft COBOL programs in Unicode.

See Also

The Unicode Standard

Expanded Storage Space Requirements

Moving to a COBOL Unicode environment means that character data can potentially require three times the storage space that is required in a single-character environment, given the variable length of a character that is encoded in UTF-8 from one to three bytes. To allow for this, all internal data definitions for character-type data in COBOL programs must be expanded to allow for three times as many bytes. This expansion is critical because in a Unicode PeopleSoft database, column sizes are calculated based on a character length, not a byte length. So, a CHAR(10) column on a Unicode database allows the storage of 10 characters, regardless of how many bytes each character takes to store. Given the three-bytes-per-character maximum requirement of UTF-8 (four-byte UTF-8 characters are not yet supported by the PeopleSoft system), the maximum byte size of this CHAR(10) column is 30 bytes. Therefore, a COBOL type of PIC X(30) may be required to store the contents of a CHAR(10) field on a Unicode database.

The PeopleSoft system provides a COBOL conversion utility to automatically expand the character-data fields in the working storage to accommodate the number of bytes in the UTF-8 encoding scheme.

See Also

Running the COBOL Conversion Utility

Special Logic for Parsing Unicode Strings

In a non-Unicode COBOL installation, parsing through a string is easy because you can assume that all characters coming in are one byte in length. But in UTF-8, a character can vary between one byte and three bytes in length. Therefore, you must incorporate special logic to handle string parsing when you are dealing with characters in UTF-8 format.

Defining Single Character Arrays

COBOL Sorting

Any in-memory sorting that is performed by using COBOL functions is performed as a binary sort in the current character encoding that is used for COBOL processing and may not necessarily match the sort order that is returned by the database in response to an ORDER BY clause. If you require the database to return data that is sorted by using a binary sort of its encoding rather than the default linguistically correct sort, you must use the %BINARYSORT meta-Structured Query Language (meta-SQL) function around each column in the WHERE or ORDER BY clause where binary ordering is important.

However, for DB2 UDB for OS/390 and z/OS implementations, this binary sorting is equivalent only when the COBOL program is run on a DB2 UDB for OS/390 and z/OS server. For example, the binary sort that is produced in COBOL differs from the binary sort that is produced by the database, because the database is encoded in EBCDIC and the client is in an ASCII-based encoding. Therefore, use the %BINARYSORT meta-SQL function only in COBOL programs that are not run through RemoteCall (the DB2 UDB for OS/390 and z/OS platform is not supported as a RemoteCall server).

When running against non-z/OS systems, the %BINARYSORT function can be used in both RemoteCall and non-RemoteCall programs.

For example:

SELECT RECNAME FROM PSRECDEFN  WHERE %BINARYSORT(RECNAME) < %BINARYSORT('xxx')
SELECT RECNAME FROM PSRECDEFN  ORDER BY %BINARYSORT(RECNAME)

Note. Using the %BINARYSORT Meta-SQL token in WHERE and ORDER BY clauses often negates the use of any indexes, because most databases can't use indexes for functional comparisons (for example, WHERE %BINARYSORT(column) > 'X'). Use this syntax only when sorting equivalence of SQL statement results and when COBOL memory order is absolutely required.

See Also

%BINARYSORT

Unicode-Specific Error Messages

These error messages can occur when you are running a COBOL program against a Unicode database:

Fetch failed: unsuccessful UCS-2 to UTF-8 conversion on column column_number.
Bind of parameter failed: unsuccessful UTF-8 to UCS-2 conversion on column column_number.
Attempting to use a non-Unicode API to access a Unicode database.
Attempting to use a non-Unicode COBOL with a Unicode database.
Attempting to use a Unicode API to access a non-Unicode database.
Fetch failed: the converted Unicode length of length exceeds the allocated buffer length length for column column_number.

These messages appear in the COBOL output log file.

Running the COBOL Conversion Utility

This section provides an overview of the COBOL conversion utility and discusses how to:

Run the conversion utility.
Identify converted COBOL programs.
Understand what is expanded.
Use utility directives.
View error logs.

Understanding the COBOL Conversion Utility

As delivered, PeopleSoft COBOL programs are configured to run only on non-Unicode databases. To run the PeopleSoft-delivered COBOL on a Unicode database, it first must be converted by using the PeopleTools COBOL conversion utility. This utility is typically called automatically by the PeopleSoft installation process; however in certain circumstances, such as when you adapt COBOL code or apply a PeopleSoft-provided patch to a COBOL program, you may need to run the COBOL conversion utility manually.

Moving to a COBOL Unicode environment means that character data can potentially require three times the storage space that is required in a single-character environment. To allow for this, all internal data definitions for character-type data in COBOL programs must be expanded to allow for three times as many bytes.

Adapt and apply patches to only one set of COBOL source code—non-Unicode source. It is much easier to write COBOL programs without having to remember to triple the size of your working storage as you go. Once your adaptation or patch is complete and you are ready to compile the program, first run it through the COBOL conversion utility, then compile it. This approach has several benefits over customizing the converted code:

You maintain a single source tree for all of your COBOL—the non-Unicode source.

This way you don’t run the risk of accidentally adapting both the non-Unicode COBOL programs and the Unicode-converted COBOL programs and potentially losing the modifications to the converted programs the next time you run the converter.
Although PeopleSoft developers test all delivered COBOL programs and patches in both Unicode and non-Unicode environments, only non-Unicode versions of the source are delivered.

Therefore, any time you apply a PeopleSoft COBOL patch to a Unicode system, the patched source code must be run through the COBOL converter. If you had already modified the post-converted source, the reconversion would obliterate your modifications.

If the COBOL conversion utility makes modifications to your code that are undesirable, instead of modifying the postconverted code, the PeopleSoft system provides a series of directives to the utility that can tell it how specific lines of code should (or should not be) converted. This enables you to limit your changes only to the nonconverted code and to make the conversion completely automated.

In a non-Unicode (also known as ANSI) implementation, 1 character typically occupies one byte of storage. So for a 10-character field, you can define a PICTURE clause of PIC X(10). In a Unicode implementation, however, you must allow for the maximum number of storage bytes that are required for any character field. Therefore, in the Unicode environment, you must define this same 10-character field with a PICTURE clause of PIC X(30).

To accommodate the number of bytes in the UTF-8 encoding scheme, the PeopleSoft system provides a COBOL conversion utility to expand the character fields in the working storage.

Running the Conversion Utility

Use the following command syntax to run the COBOL conversion utility:

PS_HOME\bin\client\winx86\pscblucvrt.exe -s:Source_Directory -t:Destination_⇒ Directory [-r:TEMP_Directory]

Command	Description
-s:Source_Directory	Specify the source directory where the non-Unicode version of COBOL resides. For the directory, you must specify where the COBOL subdirectories reside (\BASE, \WIN32, \Unix, and so on). Example: -s:d:\PT8\SRC\CBL
-t:Destination_Directory	Specify where you want to place the expanded version of COBOL. The utility puts the modified source file in the same COBOL subdirectory in which it was found. Example: -t:d:\PT8\SRC\CBLUNICODE
-r or -rd:Temp_directory	See Viewing Error Logs. -r generates only the summary log file; -rd generates all of the log files.

The utility produces a new source file for each .CBL file that is found. These new files are placed in the PS_HOME\src\ directory.

As delivered, the PeopleSoft batch utilities that compile your COBOL programs include logic to convert all programs and copybooks before compiling. This logic is triggered only when the Unicode version of PeopleTools is installed.

Compiling COBOL in Microsoft Windows

Use the PS_HOME\setup\cbl2uni.bat command to convert all of the COBOL programs and copybooks that are found in the PS_HOME\src\cbl directory. After the conversion, PS_HOME\src\cbl Unicode contains the expanded COBOL source codes.

Compiling COBOL in Unix/Linux

Use the PS_HOME/install/pscbl.mak command to trigger the conversion before any COBOL programs are compiled. This utility stores all converted programs in the PS_HOME/src/cblunicode directory.

Identifying Converted COBOL Programs

When the COBOL conversion utility runs, it places a comment at the beginning of each COBOL program that it converts:

This comment line identifies converted programs in two ways:

A person looking at the program can tell whether it has been converted.
If you attempt to convert the COBOL source file again, this comment line prevents the conversion utility program from expanding the working storage of this COBOL source file again.

Understanding What Is Expanded

For the utility to recognize when it is appropriate to expand data, strict adherence to the PeopleSoft COBOL coding standards is required. The utility looks for certain code-style patterns to make these decisions.

The conversion utility expands all PIC X[(N)] data fields to triple their original size, with the following exceptions:

Exception 1: SQL buffer setup data.
Exception 2: Redefined character fields.
Exception 3: Fields that appear to be dates.
Exception 4: Arrays comprising a single character.

The utility also converts copybooks on the fly: the first time that a copybook is referenced inside a code module, it is expanded immediately.

The utility processes an entire set of COBOL modules in a single run. It maintains a record of what it has converted to avoid converting copybooks twice.

Note. The COBOL conversion utility ensures that edited lines do not go past the 72^nd column. If the conversion would normally cause a line to exceed that limitation, the utility removes some of the blank spaces between the field name and the PIC X string so that the line fits in the allowed area.

Exception 1: SQL Buffer Setup Data

SQL buffer setup data that refers to the numeric or date data types of SELECT-SETUP or BIND-SETUP is not expanded.

For the interface to PTPSQLRT, a COBOL program passes a SELECT list (SELECT-DATA) and a descriptor area (SELECT-SETUP). The program also passes similar data and setup areas for bind variables. The descriptors that are passed are always character-type data with embedded values that signal the actual data type and length of the data fields. Because these descriptors represent the lengths of the associated data fields in the corresponding SELECT-DATA and BIND-DATA structures, the utility adjusts only the length of the descriptors that are representing character-type data.

Example 1: In the following code, the select list contains two character fields (EMPLID and NAME), a small integer (EMPL_RCD), and a date (EFFDT):

SELECT-SETUP.
02  FILLER   PIC X(60)   VALUE ALL 'C'.
02  FILLER   PIC X(2)   VALUE ALL 'S'.
02  FILLER   PIC X(10)   VALUE ALL 'D'.
02  FILLER   PIC X(90)   VALUE ALL 'C'.
SELECT-DATA.
02  EMPLID   PIC X(60).
02  EMPL_RCD   PIC 99   COMP.
02  EFFDT   PIC X(10).
02  NAME      PIC X(90).

In Unicode, the only fields that should be expanded are the two character fields (EMPLID and NAME). Numeric data is never affected by Unicode, and (according to the PeopleTools definition), dates are not affected either: they are treated as numeric strings and cannot have special characters.

Thus, the utility converts this code as follows:

SELECT-SETUP.
02  FILLER  PIC X(60)  VALUE ALL ‘C’.
02  FILLER  PIC X(2)  VALUE ALL ‘S’.
02  FILLER  PIC X(10)  VALUE ALL ‘D’.
02  FILLER  PIC X(90)  VALUE ALL ‘C’.
 
SELECT-DATA.
02  EMPLID  PIC X(60).
02  EMPL_RCD  PIC 99  COMP.
02  EFFDT  PIC X(10).
02  NAME    PIC X(90).

Example 2: The following code represents non-Unicode COBOL (COBOL that has not yet been expanded):

01  I-ERRL.
           05  SQL-STMT            PIC X(18)   VALUE
                                               'INPXPROC_I_ERRL'.
           05  BIND-SETUP.
               10  FILLER          PIC X(10)   VALUE ALL 'C'.
               10  FILLER          PIC X(6)    VALUE '0PPPPP'.
               10  FILLER          PIC X(4)    VALUE ALL 'I'.
               10  FILLER          PIC X       VALUE 'H'.
               10  FILLER          PIC X(18)   VALUE ALL 'C'.
               10  FILLER          PIC X(4)    VALUE ALL 'I'.
               10  FILLER          PIC X(4)    VALUE ALL 'N'.
               10  FILLER          PIC X(30)   VALUE ALL 'H'.
               10  FILLER          PIC X(30)   VALUE ALL 'C'.
               10  FILLER          PIC X(30)   VALUE ALL 'H'.
               10  FILLER          PIC X(10)   VALUE ALL 'C'.
               10  FILLER          PIC X(6)    VALUE '0PPPPP'.
               10  FILLER          PIC X(8)    VALUE '0PPPPPPP'.
               10  FILLER          PIC X       VALUE 'Z'.
           05  BIND-DATA.
               10  TSE-JOBID       PIC X(10)   VALUE SPACES.
               10  TSE-PROC-INSTANCE PIC 9(10) VALUE ZERO  COMP-3.
               10  TSE-SET-NBR     PIC 9(6)    VALUE ZERO  COMP.
               10  TSE-EDIT-TYPE   PIC X       VALUE SPACE.
               10  TSE-FIELDNAME   PIC X(18)   VALUE SPACES.
               10  MESSAGE-SET-NBR PIC 9(5)    VALUE ZERO  COMP.
               10  MESSAGE-NBR     PIC 9(5)    VALUE ZERO  COMP.
               10  MESSAGE-PARM    PIC X(30)   VALUE SPACES.
               10  MESSAGE-PARM2   PIC X(30)   VALUE SPACES.
               10  MESSAGE-PARM3   PIC X(30)   VALUE SPACES.
               10  BUSINESS-UNIT   PIC X(10)   VALUE SPACES.
               10  TRANSACTION-NBR PIC 9(10)   VALUE ZERO  COMP-3.
               10  SEQ-NBR         PIC 9(15)   VALUE ZERO  COMP-3.
               10  FILLER          PIC X       VALUE 'Z'.

The utility converts this code as follows:

01  I-ERRL.
           05  SQL-STMT            PIC X(54)   VALUE
                                               'INPXPROC_I_ERRL'.
           05  BIND-SETUP.
               10  FILLER          PIC X(30)   VALUE ALL 'C'.
               10  FILLER          PIC X(6)    VALUE '0PPPPP'.
               10  FILLER          PIC X(4)    VALUE ALL 'I'.
               10  FILLER          PIC X(3)   VALUE  ALL 'H'.
               10  FILLER          PIC X(54)   VALUE ALL 'C'.
               10  FILLER          PIC X(4)    VALUE ALL 'I'.
               10  FILLER          PIC X(4)    VALUE ALL 'N'.
               10  FILLER          PIC X(90)   VALUE ALL 'H'.
               10  FILLER          PIC X(90)   VALUE ALL 'C'.
               10  FILLER          PIC X(90)   VALUE ALL 'H'.
               10  FILLER          PIC X(30)   VALUE ALL 'C'.
               10  FILLER          PIC X(6)    VALUE '0PPPPP'.
               10  FILLER          PIC X(8)    VALUE '0PPPPPPP'.
               10  FILLER          PIC X       VALUE 'Z'.
           05  BIND-DATA.
               10  TSE-JOBID       PIC X(30)   VALUE SPACES.
               10  TSE-PROC-INSTANCE PIC 9(10) VALUE ZERO  COMP-3.
               10  TSE-SET-NBR     PIC 9(6)    VALUE ZERO  COMP.
               10  TSE-EDIT-TYPE   PIC X(3)    VALUE SPACES.
               10  TSE-FIELDNAME   PIC X(54)   VALUE SPACES.
               10  MESSAGE-SET-NBR PIC 9(5)    VALUE ZERO  COMP.
               10  MESSAGE-NBR     PIC 9(5)    VALUE ZERO  COMP.
               10  MESSAGE-PARM    PIC X(90)   VALUE SPACES.
               10  MESSAGE-PARM2   PIC X(90)   VALUE SPACES.
               10  MESSAGE-PARM3   PIC X(90)   VALUE SPACES.
               10  BUSINESS-UNIT   PIC X(30)   VALUE SPACES.
               10  TRANSACTION-NBR PIC 9(10)   VALUE ZERO  COMP-3.
               10  SEQ-NBR         PIC 9(15)   VALUE ZERO  COMP-3.
               10  FILLER          PIC X       VALUE 'Z'.

Exception 2: Redefined Character Fields

Character fields that are redefined to a numeric field (and group-level fields that contain such character fields) are not expanded. In instances where the redefined field is also redefined as a character field, the original character field and the redefinition that is a character field are expanded.

Example 1: In this example, the DB-PIC-PRECIS-CHAR is not expanded:

07  DB-PIC-PRECIS-CHAR  PIC X(2).
07  DB-PIC-PRECIS-NUM   REDEFINES
     DB-PIC-PRECIS-CHAR PIC 9(2).

Example 2: In this example, the I-REMIT-ADDR-SEQ is not expanded:

02  I-REMIT-ADDR-SEQ        PIC 9(04).
02  I-REMIT-ADDR-SEQ-C REDEFINES
    I-REMIT-ADDR-SEQ        PIC X(04).

Example 3: In this example, the original definition is a character-type field. Although some of the redefined fields are numeric fields, all of the character fields, including the original definition, are expanded.

02  MSGDATA1                PIC X(30)   VALUE SPACES.
02  FILLER       REDEFINES MSGDATA1.
    03  MSGDATA1-INT        PIC Z(9)9-.
    03  INT-FILL1           PIC X(19).
02  FILLER       REDEFINES MSGDATA1.
    03  MSGDATA1-DOL        PIC Z(9)9.99-.
    03  DOL-FILL1           PIC X(16).
02  FILLER       REDEFINES MSGDATA1.
    03  MSGDATA1-DEC        PIC Z(9)9.9(5)-.
    03  DEC-FILL1           PIC X(13).

Exception 3: Fields That Appear to Be Dates

Fields and group-level fields that appear to be dates are not expanded, unless the EXPAND directive is specified for this field or group-level field.

The following table describes the criteria that are used to determine fields or group-level fields as dates:

DATE Data Type	Field or Group-Level Field Name	Field Length or Total Length of a Group-Level Field*
Date	Contains -DT or DATE	10
Time	Contains -TM or TIME	15
Date-Time	Contains -DTTM, DATE, or TIME	26

* When calculating the total length, the utility considers that a group-level field may contain REDEFINE fields. The length of the REDEFINE field is not included when determining the total length of the group field.

Example 1: The field in this example is not expanded:

10  START-DATE          PIC X(10)   VALUE SPACES.

Example 2: The fields in this example are not expanded:

02  W-EMP-BIRTHDATE.
03 YEAR             PIC 9(4).
03 FILLER           PIC X(1).
03 MONTH            PIC 99.
03 FILLER           PIC X(1).
03 DAYS             PIC 99.

Example 3: The fields in this example are not expanded:

03  PAY-DATE-TIME.
04  PAY-DTTM-DATE   PIC X(10).
04  PAY-DTTM-DELIM1 PIC X         VALUE '-'.
04  PAY-DTTM-TIME   PIC X(15).

Example 4: The fields in this example are not expanded:

05 BEGIN-DTTM-TIME. 07 SYS-HOUR PIC 99 VALUE ZERO. 07 FILLER PIC X VALUE SPACE. 07 SYS-MINUTE PIC 99 VALUE ZERO. 07 FILLER PIC X VALUE SPACE. 07 SYS-SECOND PIC 99 VALUE ZERO. 07 FILLER PIC X VALUE SPACE. 07 SYS-MICRO-SECOND PIC 9(6) VALUE ZERO.

Example 5: In this example, the group field contains REDEFINE fields. The conversion utility expands the fields because the group field meets the criteria for expansion: it has a total length of 10 and the field name includes the -DT string.

02 END-DT. 03 END-DT-YY PIC X(4). 03 END-DT-YY-NUM REDEFINES END-DT-YY PIC 9999. 03 FILLER PIC X. 03 END-DT-MM PIC XX. 03 END-DT-MM-NUM REDEFINES END-DT-MM PIC 99. 03 FILLER PIC X. 03 END-DT-DD PIC XX. 03 END-DT-DD-NUM REDEFINES END-DT-DD PIC 99.

Exception 4: Arrays Comprising a Single Character

For arrays that comprise a single character, the PIC clause is expanded for character data, but the OCCURS clause is not expanded. However, if the data name ends with -POS, -CHAR, or -BYTE, the OCCURS clause is expanded, instead of the element size.

Example 1: In this example, the field is expanded:

01 CHAR-ARRAY PIC X OCCURS 80 TIMES. Is expanded to: 01 CHAR-ARRAY PIC X(3) OCCURS 80 TIMES.

Example 2: In this example, the data name ends with -POS; therefore, the OCCURS clause is expanded, instead of the element size:

01 CHAR-POS PIC X OCCURS 80 TIMES. Is expanded to: 01 CHAR-POS PIC X OCCURS 240 TIMES.

Using Utility Directives

The COBOL conversion utility accepts various directives in the first six columns of COBOL code. Use these directives to override the utility’s normal mode of processing for a single source code line or for a block of lines.

Directive	Description	Purpose
NOCBGN	No conversion: begin	Starting with this line, do not perform expansions.
NOCEND	No conversion: end	End the NOCBGN directive following this line (the directive line is not expanded).
NOCLN	No conversion: line	Do not perform expansions in this single line.
COCCUR	Convert occurrence	Expand the OCCURS clause instead of the PIC clause in this line.
EXPEOF	Expand end of field	Expand a group item by increasing the length of the last field in the group.
EXPAND	Instruct utility to expand field	Force expansion of fields that would normally not be expanded because they appear to be date, time, or datetime fields.

The following examples use existing PeopleSoft COBOL programs to illustrate possible uses for the utility directives.

NOCBGN, NOCEND, and NOCLN Directives

One of the COBOL programs for PeopleSoft Human Resources has a unique way of setting the PAY-PERIODS group field. The program defines an 88-level definition based on the concatenated value of the five, one-column, character-type fields. If the conversion utility were to convert the program without the special directives, none of the cases that are defined in the 88-level field would ever be true.

NOCBGN 03 PAY-PERIODS. 88 PAY-PERIODS-ALL VALUE 'YYYYY'. 88 PAY-PERIODS-ALL-BIWEEKLY VALUE 'YYYNN'. 88 PAY-PERIODS-ALL-SEMIMONTHLY VALUE 'YYNNN'. 88 PAY-PERIODS-NONE VALUE 'NNNNN'. 04 PAY-PERIOD1 PIC X. 04 PAY-PERIOD2 PIC X. 04 PAY-PERIOD3 PIC X. 04 PAY-PERIOD4 PIC X. 04 PAY-PERIOD5 PIC X. 03 FILLER REDEFINES PAY-PERIODS. 04 PAY-PERIOD PIC X OCCURS 5. 88 PAY-PERIOD-YES VALUE 'Y'. NOCEND 88 PAY-PERIOD-NO VALUE 'N'. 01 S-DEDPDS. 02 SQL-STMT PIC X(18) VALUE 'PSPDCFSA_S_DEDPDS'. 02 BIND-SETUP. 03 FILLER PIC X(10) VALUE ALL 'C'. 03 FILLER PIC X(10) VALUE ALL 'H'. 03 FILLER PIC X(10) VALUE ALL 'D'. 03 FILLER PIC X(10) VALUE ALL 'A'. NOCBGN 03 FILLER PIC X VALUE ALL 'C'. 03 FILLER PIC X VALUE ALL 'H'. 03 FILLER PIC X VALUE ALL 'C'. 03 FILLER PIC X VALUE ALL 'H'. NOCEND 03 FILLER PIC X VALUE ALL 'C'. 03 FILLER PIC X VALUE 'Z'. 02 BIND-DATA. 03 COMPANY PIC X(10). 03 PAYGROUP PIC X(10). 03 PAY-END-DT PIC X(10). 03 YEAR-END-DT PIC X(10). NOCLN 03 PAY-PERIODS PIC X(5). 03 FILLER PIC X VALUE 'Z'.

COCCUR Directive

The conversion utility doesn't normally expand the size of the array in this line from one of the PeopleTools COBOL programs. Using the COCCUR directive ensures that the OCCURS clause is expanded:

02 PARM. COCCUR 05 PARM-CH OCCURS 30 TIMES PIC X.

EXPEOF Directive

In the following example, the FIELDNAME group-level field is broken down to check the first 4 characters of the string. In this instance, it makes more sense to adjust the length of the FILLER field. By using the EXPEOF directive, you direct the utility to expand the FILLER field to a length of 50:

EXPEOF 02 FIELDNAME. 03 FIELDNAME4 PIC X(4) VALUE SPACE. 88 FIELDNAME-TSE VALUE 'TSE_'. 03 FILLER PIC X(14) VALUE SPACE.

Viewing Error Logs

The COBOL conversion utility produces a set of error and warning logs with messages that identify nonstandard code styles and inconsistencies. The utility also logs expansion actions that may require manual review.

The utility produces the following logs:

Exception log	This log contains warnings that occurred because of ambiguous working storage definitions. You may need to modify code or add utility directives to resolve the issues logged.
Exception BIND/SELECT log	This log contains warnings and errors that occurred because of ambiguous BIND and SETUP definitions.
Exception date log	This log lists all group-level date fields that are detected by the utility.
Summary log	This log provides general statistics regarding the number of programs that are processed.

When you specify the -4 flag, you see only the summary log. Set the -rd flag on the conversion utility command line if you want the utility to produce all of the detail logs: exception, BIND/SELECT, and exception date.

Messages from the Exception Log

The following tables summarize all of the messages that can appear in the three exception log files. Errors indicate problems that are encountered by the conversion utility.

Message	Type	Note
Non-matching conversion block found in line line number.	Error	Detected the NOCBGN directive, but couldn’t find the corresponding NOCEND.
Error in determining numeric length in line line number.	Error	Couldn’t decipher the numeric PICTURE clause.
The size of the one character array will be expanded in line line number.	Warning	Detected a one-character array where the field contains the string -BYTE, -POS, or -CHAR.
A one-character array is found in line number. The conversion routine will expand this to PIC X(3).	Warning	None.
Unable to find the copy library copy library name.	Error	Couldn't locate the copy library file that the program references.

Messages from the Exception BIND/SELECT Log

The following table lists messages from the BIND/SELECT log:

Message	Type	Note
Didn't find the corresponding DATA section for DATA field name in line line number.	Error	Detected either a BIND-DATA or a SELECT-DATA, but cannot find the SETUP group field.
No delimiter found on group field name section in line line number.	Warning	Didn’t find a FILLER field with a value Z in a DATA or SETUP group field.
Unable to convert the group field name section due to problems reading the Copylib.	Error	Couldn't locate a copy library that DATA or SETUP references.
The group field name found in line line number has a mismatch count.	Warning and error	The number of columns in DATA doesn’t match the count for the corresponding SETUP.
Incompatible date type match for field in lineline number.	Error	The data type definition in SETUP doesn’t correspond to the data type in DATA.

Messages from the Exception Date Log

The following table lists messages from the exception date log:

Message	Type	Note
Date/time/datetime detected and will not be expanded in line.	Warning	None.
Verify if a date/time/datetime field in line number: line number.	Warning	Found a character-type field or group field with a total length of 10, 15, or 26 that could be a date, time, or datetime.

Fine-Tuning COBOL Programs

Although the COBOL conversion makes most of the changes that are needed to run COBOL in a Unicode environment, some manual fine-tuning may still be necessary.

This section discusses how to:

Identify Unicode and non-Unicode data.
Specify column lengths in dynamic SQL.
Define single character arrays.

Identifying Unicode and Non-Unicode Data

A COBOL program may need to determine whether it's dealing with non-Unicode or Unicode data. For example, if the program parses a string character, it must apply different logic depending on whether the string is non-Unicode or Unicode. The program can get this information from the ENCODING-MODE-SW in the PTCSQLRT copy library (ANSI-Mode is the same as non-Unicode):

03 ENCODING-MODE-SW PIC X(3) VALUE SPACE. 88 ANSI-MODE VALUE 'A'. 88 UNICODE-MODE VALUE 'U'.

The ENCODING-MODE-SW value is set by the COBOL application programming interface (API), which determines which type of data it's dealing with by checking the value of the UNICODE_ENABLED field in the PSSTATUS table. When the value of the UNICODE_ENABLED flag is set to 1 (true), this signals the COBOL API that it is accessing a Unicode database.

The COBOL API also performs the necessary translations between the UTF-8 encodings that are required by COBOL and the UCS-2 encodings that are used elsewhere in the PeopleSoft system.

Specifying Column Lengths in Dynamic SQL

Perhaps the biggest effort in getting COBOL fully functional in a Unicode environment is setting up the bind parameters and select buffers of any dynamic SQL statements.

Programs that use dynamic SQL must specify the column lengths for bind or select fields before calling the PTPDYSQL program. Within a COBOL program, there are two ways that you can assign bind parameters and select buffers of dynamic SQL statements:

By using a predefined working storage area with the dynamic SQL statement.

This method is similar to the method that is used for stored SQL statements. In this case, PTPDYSQL adjusts the length of character-data fields that are passed to PTPSQLRT. This is necessary because the COBOL Unicode conversion utility expands only the working storage fields; it does not modify the length of fields that are hard-coded in the PROGRAM-DIVISION section of the COBOL programs.

Because PTPDYSQL sends the correct length to PTPSQLRT, no changes to the COBOL program are necessary.
By using a buffer array.

At runtime, this array is partitioned based on the properties of all of the fields that are referenced by the dynamic SQL statement. The properties of those fields are retrieved from the PSDBFIELD table, and include both the field’s data type and the field’s length.

In this case, you must modify the COBOL program to adjust the length that is specified for a character field. Adjust the length by a factor of three.

To adjust the length of the character field appropriately, the program must recognize the encoding scheme that is used by the COBOL API. The program can take advantage of the ENCODING-MODE-SW field in PTCSQLRT to determine when the length of the field needs to be adjusted.

This example illustrates the use of a buffer array to calculate the length of a character field in the Unicode environment:

02 SELECT-DATA. 03 FIELDNAME PIC X(54) VALUE SPACE. 03 FIELDNUM PIC 9(3) VALUE ZERO COMP. 03 DEFFIELDNAME PIC X(90) VALUE SPACES. 03 FIELDLEN PIC 9(3) VALUE ZERO COMP. 03 FIELDTYPE PIC 9(2) VALUE ZERO COMP. 88 RDM-CHAR VALUE ZERO. 88 RDM-LONG-CHAR VALUE 1. 88 RDM-NUMBER VALUE 2. 88 RDM-SIGNED-NUMBER VALUE 3. 88 RDM-DATE VALUE 4. 88 RDM-TIME VALUE 5. 88 RDM-DATETIME VALUE 6. 03 DECIMALPOS PIC 9(2) VALUE ZERO COMP. 03 FILLER PIC X VALUE 'Z'. 88 RDM-NUMBER VALUE 2. 88 RDM-SIGNED-NUMBER VALUE 3. 88 RDM-DATE VALUE 4. 88 RDM-TIME VALUE 5. 88 RDM-DATETIME VALUE 6. 03 DECIMALPOS PIC 9(2) VALUE ZERO COMP. 03 FILLER PIC X VALUE 'Z'. . . . MOVE CORR SELECT-DATA OF S-RECFLD TO FLD-ARRAY OF RECFLD (FLD-IDX) * CONVERT FIELD TYPE FROM PSDBFIELD TYPE TO SQLRT CODE. MOVE FIELDLEN OF S-RECFLD TO SETUP-LENGTH OF RECFLD (FLD-IDX) IF RDM-CHAR OF S-RECFLD SET SETUP-TYPE-CHAR OF RECFLD (FLD-IDX) TO TRUE IF UNICODE-MODE OF SQLRT COMPUTE SETUP-LENGTH OF RECFLD (FLD-IDX) = SETUP-LENGTH OF RECFLD (FLD-IDX) * 3 END-IF  END-IF

Defining Single Character Arrays

Some COBOL programs define single-character arrays to parse or examine a string of characters, one character at a time. In a Unicode environment, be sure that you’re examining the string one character at a time, not one byte at a time.

This example shows a code fragment in which the program is examining a string one character at a time:

01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX. 88 FIELD-DELIM VALUE ‘*’. . . . . . . SET CHAR-IDX TO 1 SEARCH CHAR-ARRAY WHEN FIELD-DELIM(CHAR-IDX) SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET END-SEARCH

The intent of the code in the previous example is to examine each character of the array, looking for the first delimiter character. When that character is found, the code displays the position of the delimiter.

In a non-Unicode environment that uses only the Latin1 character set, this works because there is one byte (one array element) per character. In a Unicode environment (or in a non-Unicode environment that allows double-byte character sets), this fails because what could potentially be examined is the second or third byte of a two- or three-byte character. It's possible for the second or third byte to match the bit pattern of the delimiter character, thus falsely passing the test and ending the search loop.

To correct this situation, you must know the length (in bytes) of each character that is being processed. A new COBOL function, PTPSTRFN, is available that returns the length of a character so that the code can take this into account when performing a character search. The PTPSTRFN subroutine works for both Unicode character sets and ANSI double-byte characters sets.

The PTPSTRFN subroutine offers two ways of retrieving the byte length of a character:

By requesting the length of a single character.
By requesting a map of an entire character string.

Choose this option if the application program needs to get the length information of all characters within a string.

Requesting the Length of a Single Character

The input parameters to the PTPSTRFN function are:

Parameter	Value	Notes
ACTION-TYPE	ACTION-CHARLEN	None.
CHAR-CD	The character whose length you want to verify.	This variable is included in the PTCSTRFN.CBL copy library.

These values are returned:

Parameter	Value	Notes
CHAR-LENGTH	The subroutine returns one of the following values, representing the length of the character that is referenced by CHAR-CD: ONE-BYTE TWO-BYTES THREE-BYTES	This variable is included in the PTCSTRFN.CBL copy library.
STRFN-RC	Returns one of the following values: STRFN-RC-OK STRFN-INVALID-ACTION	This variable is included in the PTCSTRFN.CBL copy library.

Parameter

Value

Notes

CHAR-LENGTH

The subroutine returns one of the following values, representing the length of the character that is referenced by CHAR-CD:

ONE-BYTE
TWO-BYTES
THREE-BYTES

This variable is included in the PTCSTRFN.CBL copy library.

STRFN-RC

Returns one of the following values:

STRFN-RC-OK
STRFN-INVALID-ACTION

This variable is included in the PTCSTRFN.CBL copy library.

At the beginning of this section, there was a code fragment in which the program was examining a string, one character at a time, looking for the first delimiter character:

After the code is modified for Unicode, it looks like this:

01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX. 88 FIELD-DELIM VALUE ‘*’. 01 STR-FUNC COPY ‘PTCSTRFN’. . . . . . . SET CHAR-IDX TO 1 SEARCH CHAR-ARRAY WHEN FIELD-DELIM(CHAR-IDX) SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET WHEN OTHER MOVE CHAR-POS(CHAR-IDX) TO CHAR-CD OF STR-FUNC CALL 'PTPSTRFN' USING ACTION-CHARLEN STR-FUNC IF TWO-BYTES OF STR-FUNC SET CHAR-IDX UP BY 1 ELSE IF THREE-BYTES OF STR-FUNC SET CHAR-IDX UP BY 2 END-IF  END-SEARCH

The modification ensures that the code continues to function properly in a Unicode environment. However, we can be sure that modification works only when the delimiter character for which the program is searching is one byte in length.

Consider the following code fragment in which the delimiter character that the program is searching for may be longer than one byte:

01 W-DELIMITER PIC X(3) VALUE 'some extended character'. 01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX CHAR-IDX2 CHAR-IDX3. 88 FIELD-DELIM VALUE ‘*’. . . . . . . SET CHAR-IDX TO 1 SEARCH CHAR-ARRAY WHEN CHAR-POS(CHAR-IDX) = W-DELIMITER SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET END-SEARCH

For this code to work in a Unicode environment, a Unicode-specific search algorithm must be used. Ensure that the program always compares the correct bytes from the array (up to three bytes, based on the current character length) to the fixed three-byte field containing the search value.

The proper search method looks like this:

01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX. 88 FIELD-DELIM VALUE ‘*’. 01 STR-FUNC COPY ‘PTCSTRFN’. . . . . . . SET CHAR-IDX TO 1 PERFORM UNTIL CHAR-IDX > 256 MOVE CHAR-POS(CHAR-IDX) TO CHAR-CD OF STR-FUNC CALL 'PTPSTRFN' USING ACTION-CHARLEN STR-FUNC INITIALIZE W-WORK EVALUATE TRUE WHEN ONE-BYTE OF STR-FUNC MOVE CHAR-POS(CHAR-IDX) TO W-WORK SET CHAR-IDX UP BY 1 WHEN TWO-BYTES OF STR-FUNC SET CHAR-IDX2 TO CHAR-IDX SET CHAR-IDX2 UP BY 1 STRING CHAR-POS(CHAR-IDX) CHAR-POS(CHAR-IDX2) DELIMITED BY SIZE INTO W-WORK END-STRING SET CHAR-IDX UP BY 2 WHEN THREE-BYTES OF STR-FUNC SET CHAR-IDX2 TO CHAR-IDX SET CHAR-IDX2 UP BY 1 SET CHAR-IDX3 TO CHAR-IDX SET CHAR-IDX3 UP BY 2 STRING CHAR-POS(CHAR-IDX) CHAR-POS(CHAR-IDX2) CHAR-POS(CHAR-IDX3) DELIMITED BY SIZE INTO W-WORK END-STRING SET CHAR-IDX UP BY 3 WHEN OTHER DISPLAY ‘**ERROR** INVALID CHARACTER LENGTH’ <ABEND> END-EVALUATE IF W-WORK = W-DELIMITER SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET END-IF END-PERFORM

As you can see from the previous example, searching a string array for a particular value that may be an extended character can be difficult; if possible, avoid such a search.

Requesting a Map of an Entire Character String

The input parameters to the PTPSTRFN function are:

Parameter	Value	Notes
ACTION-TYPE	ACTION-STRMAP.	None.
STRING-LENGTH	Length of the character string String Parameter 1.	This variable is included in the PTCSTRFN.CBL copy library.
String Parameter 1	The character string.	None.
String Parameter 2	The buffer area to be updated by the subroutine.	None.

This table lists the values that are returned:

Parameter	Value	Notes
String Parameter 2	This buffer is updated with the appropriate values. This field contains at least one of these values: 1 The next character position is part of a one-byte character. 2X The next two character positions are part of a two-byte character. 3XX The next three character positions are part of a three-byte character.	Refer to the example following this table to see how the function works.
STRFN-RC	Returns one of the following values: STRFN-RC-OK STRFN-INVALID-ACTION	This variable is included in the PTCSTRFN.CBL copy library.

Parameter

Value

Notes

String Parameter 2

This buffer is updated with the appropriate values. This field contains at least one of these values:

1

The next character position is part of a one-byte character.
2X

The next two character positions are part of a two-byte character.
3XX

The next three character positions are part of a three-byte character.

Refer to the example following this table to see how the function works.

STRFN-RC

Returns one of the following values:

STRFN-RC-OK
STRFN-INVALID-ACTION

This variable is included in the PTCSTRFN.CBL copy library.

The following sample COBOL code provides an example of how the PTPSTRFN COBOL function can be used to map the byte length of an entire character string:

01 W-WORK. 02 LANGUAGE PIC X(20). 02 UNICODE-TEXT PIC X(300). 02 UNICODE-TEXT-MAP PIC X(300). 02 DATA-LEN PIC 9(3) COMP. 02 BYTE-POS-MAX PIC 9(4) COMP. 02 COUNTERS. 05 COUNT-1BYTE-CHAR PIC 9(02) VALUE ZEROS. 05 COUNT-2BYTE-CHAR PIC 9(02) VALUE ZEROS. 05 COUNT-3BYTE-CHAR PIC 9(02) VALUE ZEROS. 01 BYTE-ARRAY. 02 BYTE-POS PIC X OCCURS 300 TIMES INDEXED BY BYTE-IDX. 88 ONE-BYTE-CHAR VALUE '1'. 88 TWO-BYTES-CHAR VALUE '2'. 88 THREE-BYTES-CHAR VALUE '3'. 88 BYTE-STRING-END VALUE SPACE. 01 STR-FUNC COPY 'PTPSTRFN'. . . . . . . Code to retrieve the text from the database and assign to the appropriate⇒ fields . . . . . . * Initialize the string map before calling the function MOVE SPACES TO UNICODE-TEXT-MAP CALL 'PTPSTRFN' USING ACTION-STRMAP STR-FUNC UNICODE-TEXT UNICODE-TEXT-MAP IF NOT STRFN-RC-OF OF STR-FUNC <ABEND PROGRAM> END-IF SET BYTE-POS-MAX TO 300 MOVE 300 TO DATA-LEN OF W-WORK MOVE UNICODE-TEXT-MAP TO BYTE-ARRAY PERFORM VARYING BYTE-IDX FROM 300 BY -1 UNTIL BYTE-IDX <= 1 OR NOT BYTE-STRING-END(BYTE-IDX) SUBTRACT 1 FROM DATA-LEN OF W-WORK END-PERFORM * Initialize counters MOVE ZEROS TO COUNT-1BYTE-CHAR MOVE ZEROS TO COUNT-2BYTE-CHAR MOVE ZEROS TO COUNT-3BYTE-CHAR PERFORM UNTIL BYTE-IDX > DATA-LEN OF W-WORK EVALUATE TRUE WHEN ONE-BYTE-CHAR ADD 1 TO COUNT-1BYTE-CHAR WHEN TWO-BYTES-CHAR ADD 1 TO COUNT-2BYTE-CHAR WHEN THREE-BYTES-CHAR ADD 1 TO COUNT-3BYTE-CHAR END-EVALUATE SET BYTE-IDX UP BY 1 END-PERFORM DISPLAY ' LANGUAGE = ' LANGUAGE DISPLAY ' UTF8 TEXT: (LENGTH = ' DATA-LEN ')' DISPLAY ' ' UNICODE-TEXT DISPLAY ' ' DISPLAY ' UTF8 BYTE MAPPING:' DISPLAY ' ' UNICODE-TEXT-MAP DISPLAY ' ' DISPLAY ' TALLY:' DISPLAY ' NUMBER OF ONE-BYTE CHAR: ' COUNT-1BYTE-CHAR DISPLAY ' NUMBER OF TWO-BYTES CHAR: ' COUNT-2BYTE-CHAR DISPLAY ' NUMBER OF THREE-BYTES CHAR: ' COUNT-3BYTE-CHAR DISPLAY ' ' DISPLAY ' '

The following table provides sample Unicode text as input for the PTPSTRFN function:

Language	Sample Unicode Text
Catalan	Quan el món vol conversar, parla Unicode
Chinese (Simplified)	当世界需要沟通时，请用Unicode！
Chinese (Traditional)	當世界需要溝通時，請用統一碼（Unicode）
Danish	Når verden vil tale, taler den Unicode
Dutch	Als de wereld wil praten, spreekt hij Unicode
English	When the world wants to talk, it speaks Unicode
Esperanto	Kiam la mondo volas paroli, ĝi parolas Unicode
Finnish	Kun maailma haluaa puhua, se puhuu Unicodea
French	Quand le monde veut communiquer, il parle en Unicode

For each row in the table, the code performs the following functions:

Calls PTPSTRFN to get the string mapping of the UTF-8 character string of the text that is retrieved for the UNITEXT field.
Displays the UTF-8 string equivalent of the text.
Tallies the number of one-byte, two-byte, and three-byte characters of the text.

The output of this program appears as follows:

Note. Certain strings appear to be garbled. This is because the system running the program has printed the output by individual bytes and not by multi-byte characters.

LANGUAGE = Catalan UTF8 TEXT: (LENGTH = 0041) Quan el mÃ³n vol conversar, parla Unicode UTF8 BYTE MAPPING: 1111111112X111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 039 NUMBER OF TWO-BYTES CHAR: 001 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Chinese (Simplified) UTF8 TEXT: (LENGTH = 0043) å½”ä¸–ç•Œéœ?è¦?æ²Ÿé?šæ—¶ï¼Œè¯·ç”̈Unicodeï¼? UTF8 BYTE MAPPING: 3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX TALLY: NUMBER OF ONE-BYTE CHAR: 007 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 012 LANGUAGE = Chinese (Traditional UTF8 TEXT: (LENGTH = 0055) ç•¶ä¸–ç•Œéœ?è¦?æº?é?šæ™?ï¼Œè«?ç”̈çµ±ä¸?ç¢¼ï¼∘Unicodeï UTF8 BYTE MAPPING: 3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX TALLY: NUMBER OF ONE-BYTE CHAR: 007 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 016 LANGUAGE = Danish UTF8 TEXT: (LENGTH = 0039) NÃ¥r verden vil tale, taler den Unicode UTF8 BYTE MAPPING: 12X111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 037 NUMBER OF TWO-BYTES CHAR: 001 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Dutch UTF8 TEXT: (LENGTH = 0045) Als de wereld wil praten, spreekt hij Unicode UTF8 BYTE MAPPING: 111111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 045 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = English UTF8 TEXT: (LENGTH = 0047) When the world wants to talk, it speaks Unicode UTF8 BYTE MAPPING: 11111111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 047 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Esperanto UTF8 TEXT: (LENGTH = 0047) Kiam la mondo volas paroli, Ä¥i parolas Unicode UTF8 BYTE MAPPING: 11111111111111111111111111112X11111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 045 NUMBER OF TWO-BYTES CHAR: 001 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Finnish UTF8 TEXT: (LENGTH = 0043) Kun maailma haluaa puhua, se puhuu Unicodea UTF8 BYTE MAPPING: 1111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 043 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = French UTF8 TEXT: (LENGTH = 0052) Quand le monde veut communiquer, il parle en Unicode UTF8 BYTE MAPPING: 1111111111111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 052 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000