Running COBOL in a Unicode Environment

This chapter provides an overview of COBOL in a Unicode environment and discusses how to:

Click to jump to parent topicUnderstanding COBOL in a Unicode Environment

This section discusses:

Click to jump to top of pageClick to jump to parent topicUnicode Encodings in PeopleSoft COBOL

 

The character set that is used for PeopleSoft COBOL processing must match the character set for your database. If you created a Unicode database for the PeopleSoft system, you must also run COBOL in Unicode.

Note. In this document, the word character refers to a single character in any language, regardless of how many bytes are required to store the character.

The Unicode standard provides several methods of encoding Unicode characters into a byte stream. Each encoding has specific properties that make it suitable for use in different environments. The two main encodings that are important to understanding how PeopleSoft COBOL operates when running in Unicode are:

In UTF-8, the actual number of bytes to encode a character can be determined by the first three bits of the first, and sometimes only, byte of a character. The following table shows how the bit pattern of the first byte is related to the number of bytes needed to encode the UTF-8 character.

Unicode Code Point Range

UTF-8 Bit Pattern

UTF-8 Character Length

U+0000 – U+007F

0xxxxxxx

One byte

U+0080 – U+07FF

110xxxxx 10xxxxxx

Two bytes

U+0800 – U+FFFF

1110xxxx 10xxxxxx 10xxxxxx

Three bytes

The x bit positions are filled with the bits of the character code number in binary representation. The rightmost x is the least-significant bit (a big-endian representation). In multi-byte sequences (for Unicode code points greater than U+007F), the number of leading 1 bits in the first byte is identical to the number of bytes in the entire sequence. In addition, each byte in a multi-byte sequence has the most significant bit set.

This section includes Unicode encoding examples for the following characters:

Character

Unicode Code Point

Description

a

U+0061

Latin small letter a.

ñ

U+00F1

Latin small letter ñ.

U+20AC

Euro symbol

The following table shows the difference in how UCS-2 and UTF-8 encode several characters:

Character

Unicode Code Point

UCS-2 Byte Values

UTF-8 Byte Values

a

U+0061

0x00 0x61

0x61

ñ

U+00F1

0x00 0xF1

0xC3 0xB1

U+20AC

0x20 0xAC

0xE2 0x82 0xAC

The PeopleSoft system transparently handles the conversion between UCS-2 and UTF-8 when data is passed into the COBOL program from the database. If you are reading or writing files directly from a COBOL program, the input and output files are UTF-8 encoded when running PeopleSoft COBOL programs in Unicode.

See Also

The Unicode Standard

Click to jump to top of pageClick to jump to parent topicExpanded Storage Space Requirements

Moving to a COBOL Unicode environment means that character data can potentially require three times the storage space that is required in a single-character environment, given the variable length of a character that is encoded in UTF-8 from one to three bytes. To allow for this, all internal data definitions for character-type data in COBOL programs must be expanded to allow for three times as many bytes. This expansion is critical because in a Unicode PeopleSoft database, column sizes are calculated based on a character length, not a byte length. So, a CHAR(10) column on a Unicode database allows the storage of 10 characters, regardless of how many bytes each character takes to store. Given the three-bytes-per-character maximum requirement of UTF-8 (four-byte UTF-8 characters are not yet supported by the PeopleSoft system), the maximum byte size of this CHAR(10) column is 30 bytes. Therefore, a COBOL type of PIC X(30) may be required to store the contents of a CHAR(10) field on a Unicode database.

The PeopleSoft system provides a COBOL conversion utility to automatically expand the character-data fields in the working storage to accommodate the number of bytes in the UTF-8 encoding scheme.

See Also

Running the COBOL Conversion Utility

Click to jump to top of pageClick to jump to parent topicSpecial Logic for Parsing Unicode Strings

In a non-Unicode COBOL installation, parsing through a string is easy because you can assume that all characters coming in are one byte in length. But in UTF-8, a character can vary between one byte and three bytes in length. Therefore, you must incorporate special logic to handle string parsing when you are dealing with characters in UTF-8 format.

See Also

Identifying Unicode and Non-Unicode Data

Defining Single Character Arrays

Click to jump to top of pageClick to jump to parent topicCOBOL Sorting

Any in-memory sorting that is performed by using COBOL functions is performed as a binary sort in the current character encoding that is used for COBOL processing and may not necessarily match the sort order that is returned by the database in response to an ORDER BY clause. If you require the database to return data that is sorted by using a binary sort of its encoding rather than the default linguistically correct sort, you must use the %BINARYSORT meta-Structured Query Language (meta-SQL) function around each column in the WHERE or ORDER BY clause where binary ordering is important.

However, for DB2 UDB for OS/390 and z/OS implementations, this binary sorting is equivalent only when the COBOL program is run on a DB2 UDB for OS/390 and z/OS server. For example, the binary sort that is produced in COBOL differs from the binary sort that is produced by the database, because the database is encoded in EBCDIC and the client is in an ASCII-based encoding. Therefore, use the %BINARYSORT meta-SQL function only in COBOL programs that are not run through RemoteCall (the DB2 UDB for OS/390 and z/OS platform is not supported as a RemoteCall server).

When running against non-z/OS systems, the %BINARYSORT function can be used in both RemoteCall and non-RemoteCall programs.

For example:

SELECT RECNAME FROM PSRECDEFN WHERE %BINARYSORT(RECNAME) < %BINARYSORT('xxx') SELECT RECNAME FROM PSRECDEFN ORDER BY %BINARYSORT(RECNAME)

Note. Using the %BINARYSORT Meta-SQL token in WHERE and ORDER BY clauses often negates the use of any indexes, because most databases can't use indexes for functional comparisons (for example, WHERE %BINARYSORT(column) > 'X'). Use this syntax only when sorting equivalence of SQL statement results and when COBOL memory order is absolutely required.

See Also

%BINARYSORT

Click to jump to top of pageClick to jump to parent topicUnicode-Specific Error Messages

These error messages can occur when you are running a COBOL program against a Unicode database:

These messages appear in the COBOL output log file.

Click to jump to parent topicRunning the COBOL Conversion Utility

This section provides an overview of the COBOL conversion utility and discusses how to:

Click to jump to top of pageClick to jump to parent topicUnderstanding the COBOL Conversion Utility

As delivered, PeopleSoft COBOL programs are configured to run only on non-Unicode databases. To run the PeopleSoft-delivered COBOL on a Unicode database, it first must be converted by using the PeopleTools COBOL conversion utility. This utility is typically called automatically by the PeopleSoft installation process; however in certain circumstances, such as when you adapt COBOL code or apply a PeopleSoft-provided patch to a COBOL program, you may need to run the COBOL conversion utility manually.

Moving to a COBOL Unicode environment means that character data can potentially require three times the storage space that is required in a single-character environment. To allow for this, all internal data definitions for character-type data in COBOL programs must be expanded to allow for three times as many bytes.

Adapt and apply patches to only one set of COBOL source code—non-Unicode source. It is much easier to write COBOL programs without having to remember to triple the size of your working storage as you go. Once your adaptation or patch is complete and you are ready to compile the program, first run it through the COBOL conversion utility, then compile it. This approach has several benefits over customizing the converted code:

If the COBOL conversion utility makes modifications to your code that are undesirable, instead of modifying the postconverted code, the PeopleSoft system provides a series of directives to the utility that can tell it how specific lines of code should (or should not be) converted. This enables you to limit your changes only to the nonconverted code and to make the conversion completely automated.

In a non-Unicode (also known as ANSI) implementation, 1 character typically occupies one byte of storage. So for a 10-character field, you can define a PICTURE clause of PIC X(10). In a Unicode implementation, however, you must allow for the maximum number of storage bytes that are required for any character field. Therefore, in the Unicode environment, you must define this same 10-character field with a PICTURE clause of PIC X(30).

To accommodate the number of bytes in the UTF-8 encoding scheme, the PeopleSoft system provides a COBOL conversion utility to expand the character fields in the working storage.

 

Click to jump to top of pageClick to jump to parent topicRunning the Conversion Utility

Use the following command syntax to run the COBOL conversion utility:

PS_HOME\bin\client\winx86\pscblucvrt.exe -s:Source_Directory -t:Destination_⇒ Directory [-r:TEMP_Directory]

Command

Description

-s:Source_Directory

Specify the source directory where the non-Unicode version of COBOL resides. For the directory, you must specify where the COBOL subdirectories reside (\BASE, \WIN32, \Unix, and so on).

Example: -s:d:\PT8\SRC\CBL

-t:Destination_Directory

Specify where you want to place the expanded version of COBOL. The utility puts the modified source file in the same COBOL subdirectory in which it was found.

Example: -t:d:\PT8\SRC\CBLUNICODE

-r or -rd:Temp_directory

See Viewing Error Logs.

-r generates only the summary log file; -rd generates all of the log files.

The utility produces a new source file for each .CBL file that is found. These new files are placed in the PS_HOME\src\ directory.

As delivered, the PeopleSoft batch utilities that compile your COBOL programs include logic to convert all programs and copybooks before compiling. This logic is triggered only when the Unicode version of PeopleTools is installed.

Compiling COBOL in Microsoft Windows

Use the PS_HOME\setup\cbl2uni.bat command to convert all of the COBOL programs and copybooks that are found in the PS_HOME\src\cbl directory. After the conversion, PS_HOME\src\cbl Unicode contains the expanded COBOL source codes.

Compiling COBOL in Unix/Linux

Use the PS_HOME/install/pscbl.mak command to trigger the conversion before any COBOL programs are compiled. This utility stores all converted programs in the PS_HOME/src/cblunicode directory.

Click to jump to top of pageClick to jump to parent topicIdentifying Converted COBOL Programs

When the COBOL conversion utility runs, it places a comment at the beginning of each COBOL program that it converts:

This comment line identifies converted programs in two ways:

Click to jump to top of pageClick to jump to parent topicUnderstanding What Is Expanded

For the utility to recognize when it is appropriate to expand data, strict adherence to the PeopleSoft COBOL coding standards is required. The utility looks for certain code-style patterns to make these decisions.

The conversion utility expands all PIC X[(N)] data fields to triple their original size, with the following exceptions:

The utility also converts copybooks on the fly: the first time that a copybook is referenced inside a code module, it is expanded immediately.

The utility processes an entire set of COBOL modules in a single run. It maintains a record of what it has converted to avoid converting copybooks twice.

Note. The COBOL conversion utility ensures that edited lines do not go past the 72nd column. If the conversion would normally cause a line to exceed that limitation, the utility removes some of the blank spaces between the field name and the PIC X string so that the line fits in the allowed area.

Exception 1: SQL Buffer Setup Data

SQL buffer setup data that refers to the numeric or date data types of SELECT-SETUP or BIND-SETUP is not expanded.

For the interface to PTPSQLRT, a COBOL program passes a SELECT list (SELECT-DATA) and a descriptor area (SELECT-SETUP). The program also passes similar data and setup areas for bind variables. The descriptors that are passed are always character-type data with embedded values that signal the actual data type and length of the data fields. Because these descriptors represent the lengths of the associated data fields in the corresponding SELECT-DATA and BIND-DATA structures, the utility adjusts only the length of the descriptors that are representing character-type data.

Example 1: In the following code, the select list contains two character fields (EMPLID and NAME), a small integer (EMPL_RCD), and a date (EFFDT):

SELECT-SETUP. 02  FILLER   PIC X(60)   VALUE ALL 'C'. 02  FILLER   PIC X(2)   VALUE ALL 'S'. 02  FILLER   PIC X(10)   VALUE ALL 'D'. 02  FILLER   PIC X(90)   VALUE ALL 'C'. SELECT-DATA. 02  EMPLID   PIC X(60). 02  EMPL_RCD   PIC 99   COMP. 02  EFFDT   PIC X(10). 02  NAME      PIC X(90).

In Unicode, the only fields that should be expanded are the two character fields (EMPLID and NAME). Numeric data is never affected by Unicode, and (according to the PeopleTools definition), dates are not affected either: they are treated as numeric strings and cannot have special characters.

Thus, the utility converts this code as follows:

SELECT-SETUP. 02 FILLER PIC X(60) VALUE ALL ‘C’. 02 FILLER PIC X(2) VALUE ALL ‘S’. 02 FILLER PIC X(10) VALUE ALL ‘D’. 02 FILLER PIC X(90) VALUE ALL ‘C’. SELECT-DATA. 02 EMPLID PIC X(60). 02 EMPL_RCD PIC 99 COMP. 02 EFFDT PIC X(10). 02 NAME PIC X(90).

Example 2: The following code represents non-Unicode COBOL (COBOL that has not yet been expanded):

01 I-ERRL. 05 SQL-STMT PIC X(18) VALUE 'INPXPROC_I_ERRL'. 05 BIND-SETUP. 10 FILLER PIC X(10) VALUE ALL 'C'. 10 FILLER PIC X(6) VALUE '0PPPPP'. 10 FILLER PIC X(4) VALUE ALL 'I'. 10 FILLER PIC X VALUE 'H'. 10 FILLER PIC X(18) VALUE ALL 'C'. 10 FILLER PIC X(4) VALUE ALL 'I'. 10 FILLER PIC X(4) VALUE ALL 'N'. 10 FILLER PIC X(30) VALUE ALL 'H'. 10 FILLER PIC X(30) VALUE ALL 'C'. 10 FILLER PIC X(30) VALUE ALL 'H'. 10 FILLER PIC X(10) VALUE ALL 'C'. 10 FILLER PIC X(6) VALUE '0PPPPP'. 10 FILLER PIC X(8) VALUE '0PPPPPPP'. 10 FILLER PIC X VALUE 'Z'. 05 BIND-DATA. 10 TSE-JOBID PIC X(10) VALUE SPACES. 10 TSE-PROC-INSTANCE PIC 9(10) VALUE ZERO COMP-3. 10 TSE-SET-NBR PIC 9(6) VALUE ZERO COMP. 10 TSE-EDIT-TYPE PIC X VALUE SPACE. 10 TSE-FIELDNAME PIC X(18) VALUE SPACES. 10 MESSAGE-SET-NBR PIC 9(5) VALUE ZERO COMP. 10 MESSAGE-NBR PIC 9(5) VALUE ZERO COMP. 10 MESSAGE-PARM PIC X(30) VALUE SPACES. 10 MESSAGE-PARM2 PIC X(30) VALUE SPACES. 10 MESSAGE-PARM3 PIC X(30) VALUE SPACES. 10 BUSINESS-UNIT PIC X(10) VALUE SPACES. 10 TRANSACTION-NBR PIC 9(10) VALUE ZERO COMP-3. 10 SEQ-NBR PIC 9(15) VALUE ZERO COMP-3. 10 FILLER PIC X VALUE 'Z'.

The utility converts this code as follows:

01 I-ERRL. 05 SQL-STMT PIC X(54) VALUE 'INPXPROC_I_ERRL'. 05 BIND-SETUP. 10 FILLER PIC X(30) VALUE ALL 'C'. 10 FILLER PIC X(6) VALUE '0PPPPP'. 10 FILLER PIC X(4) VALUE ALL 'I'. 10 FILLER PIC X(3) VALUE ALL 'H'. 10 FILLER PIC X(54) VALUE ALL 'C'. 10 FILLER PIC X(4) VALUE ALL 'I'. 10 FILLER PIC X(4) VALUE ALL 'N'. 10 FILLER PIC X(90) VALUE ALL 'H'. 10 FILLER PIC X(90) VALUE ALL 'C'. 10 FILLER PIC X(90) VALUE ALL 'H'. 10 FILLER PIC X(30) VALUE ALL 'C'. 10 FILLER PIC X(6) VALUE '0PPPPP'. 10 FILLER PIC X(8) VALUE '0PPPPPPP'. 10 FILLER PIC X VALUE 'Z'. 05 BIND-DATA. 10 TSE-JOBID PIC X(30) VALUE SPACES. 10 TSE-PROC-INSTANCE PIC 9(10) VALUE ZERO COMP-3. 10 TSE-SET-NBR PIC 9(6) VALUE ZERO COMP. 10 TSE-EDIT-TYPE PIC X(3) VALUE SPACES. 10 TSE-FIELDNAME PIC X(54) VALUE SPACES. 10 MESSAGE-SET-NBR PIC 9(5) VALUE ZERO COMP. 10 MESSAGE-NBR PIC 9(5) VALUE ZERO COMP. 10 MESSAGE-PARM PIC X(90) VALUE SPACES. 10 MESSAGE-PARM2 PIC X(90) VALUE SPACES. 10 MESSAGE-PARM3 PIC X(90) VALUE SPACES. 10 BUSINESS-UNIT PIC X(30) VALUE SPACES. 10 TRANSACTION-NBR PIC 9(10) VALUE ZERO COMP-3. 10 SEQ-NBR PIC 9(15) VALUE ZERO COMP-3. 10 FILLER PIC X VALUE 'Z'.

Exception 2: Redefined Character Fields

Character fields that are redefined to a numeric field (and group-level fields that contain such character fields) are not expanded. In instances where the redefined field is also redefined as a character field, the original character field and the redefinition that is a character field are expanded.

Example 1: In this example, the DB-PIC-PRECIS-CHAR is not expanded:

07 DB-PIC-PRECIS-CHAR PIC X(2). 07 DB-PIC-PRECIS-NUM REDEFINES DB-PIC-PRECIS-CHAR PIC 9(2).

Example 2: In this example, the I-REMIT-ADDR-SEQ is not expanded:

02 I-REMIT-ADDR-SEQ PIC 9(04). 02 I-REMIT-ADDR-SEQ-C REDEFINES I-REMIT-ADDR-SEQ PIC X(04).

Example 3: In this example, the original definition is a character-type field. Although some of the redefined fields are numeric fields, all of the character fields, including the original definition, are expanded.

02 MSGDATA1 PIC X(30) VALUE SPACES. 02 FILLER REDEFINES MSGDATA1. 03 MSGDATA1-INT PIC Z(9)9-. 03 INT-FILL1 PIC X(19). 02 FILLER REDEFINES MSGDATA1. 03 MSGDATA1-DOL PIC Z(9)9.99-. 03 DOL-FILL1 PIC X(16). 02 FILLER REDEFINES MSGDATA1. 03 MSGDATA1-DEC PIC Z(9)9.9(5)-. 03 DEC-FILL1 PIC X(13).

Exception 3: Fields That Appear to Be Dates

Fields and group-level fields that appear to be dates are not expanded, unless the EXPAND directive is specified for this field or group-level field.

The following table describes the criteria that are used to determine fields or group-level fields as dates:

DATE Data Type

Field or Group-Level Field Name

Field Length or Total Length of a Group-Level Field*

Date

Contains -DT or DATE

10

Time

Contains -TM or TIME

15

Date-Time

Contains -DTTM, DATE, or TIME

26

* When calculating the total length, the utility considers that a group-level field may contain REDEFINE fields. The length of the REDEFINE field is not included when determining the total length of the group field.

Example 1: The field in this example is not expanded:

10 START-DATE PIC X(10) VALUE SPACES.

Example 2: The fields in this example are not expanded:

02 W-EMP-BIRTHDATE. 03 YEAR PIC 9(4). 03 FILLER PIC X(1). 03 MONTH PIC 99. 03 FILLER PIC X(1). 03 DAYS PIC 99.

Example 3: The fields in this example are not expanded:

03 PAY-DATE-TIME. 04 PAY-DTTM-DATE PIC X(10). 04 PAY-DTTM-DELIM1 PIC X VALUE '-'. 04 PAY-DTTM-TIME PIC X(15).

Example 4: The fields in this example are not expanded:

05 BEGIN-DTTM-TIME. 07 SYS-HOUR PIC 99 VALUE ZERO. 07 FILLER PIC X VALUE SPACE. 07 SYS-MINUTE PIC 99 VALUE ZERO. 07 FILLER PIC X VALUE SPACE. 07 SYS-SECOND PIC 99 VALUE ZERO. 07 FILLER PIC X VALUE SPACE. 07 SYS-MICRO-SECOND PIC 9(6) VALUE ZERO.

Example 5: In this example, the group field contains REDEFINE fields. The conversion utility expands the fields because the group field meets the criteria for expansion: it has a total length of 10 and the field name includes the -DT string.

02 END-DT. 03 END-DT-YY PIC X(4). 03 END-DT-YY-NUM REDEFINES END-DT-YY PIC 9999. 03 FILLER PIC X. 03 END-DT-MM PIC XX. 03 END-DT-MM-NUM REDEFINES END-DT-MM PIC 99. 03 FILLER PIC X. 03 END-DT-DD PIC XX. 03 END-DT-DD-NUM REDEFINES END-DT-DD PIC 99.

Exception 4: Arrays Comprising a Single Character

For arrays that comprise a single character, the PIC clause is expanded for character data, but the OCCURS clause is not expanded. However, if the data name ends with -POS, -CHAR, or -BYTE, the OCCURS clause is expanded, instead of the element size.

Example 1: In this example, the field is expanded:

01 CHAR-ARRAY PIC X OCCURS 80 TIMES. Is expanded to: 01 CHAR-ARRAY PIC X(3) OCCURS 80 TIMES.

Example 2: In this example, the data name ends with -POS; therefore, the OCCURS clause is expanded, instead of the element size:

01 CHAR-POS PIC X OCCURS 80 TIMES. Is expanded to: 01 CHAR-POS PIC X OCCURS 240 TIMES.

Click to jump to top of pageClick to jump to parent topicUsing Utility Directives

The COBOL conversion utility accepts various directives in the first six columns of COBOL code. Use these directives to override the utility’s normal mode of processing for a single source code line or for a block of lines.

Directive

Description

Purpose

NOCBGN

No conversion: begin

Starting with this line, do not perform expansions.

NOCEND

No conversion: end

End the NOCBGN directive following this line (the directive line is not expanded).

NOCLN

No conversion: line

Do not perform expansions in this single line.

COCCUR

Convert occurrence

Expand the OCCURS clause instead of the PIC clause in this line.

EXPEOF

Expand end of field

Expand a group item by increasing the length of the last field in the group.

EXPAND

Instruct utility to expand field

Force expansion of fields that would normally not be expanded because they appear to be date, time, or datetime fields.

The following examples use existing PeopleSoft COBOL programs to illustrate possible uses for the utility directives.

NOCBGN, NOCEND, and NOCLN Directives

One of the COBOL programs for PeopleSoft Human Resources has a unique way of setting the PAY-PERIODS group field. The program defines an 88-level definition based on the concatenated value of the five, one-column, character-type fields. If the conversion utility were to convert the program without the special directives, none of the cases that are defined in the 88-level field would ever be true.

NOCBGN 03 PAY-PERIODS. 88 PAY-PERIODS-ALL VALUE 'YYYYY'. 88 PAY-PERIODS-ALL-BIWEEKLY VALUE 'YYYNN'. 88 PAY-PERIODS-ALL-SEMIMONTHLY VALUE 'YYNNN'. 88 PAY-PERIODS-NONE VALUE 'NNNNN'. 04 PAY-PERIOD1 PIC X. 04 PAY-PERIOD2 PIC X. 04 PAY-PERIOD3 PIC X. 04 PAY-PERIOD4 PIC X. 04 PAY-PERIOD5 PIC X. 03 FILLER REDEFINES PAY-PERIODS. 04 PAY-PERIOD PIC X OCCURS 5. 88 PAY-PERIOD-YES VALUE 'Y'. ​NOCEND 88 PAY-PERIOD-NO VALUE 'N'. 01 S-DEDPDS. 02 SQL-STMT PIC X(18) VALUE 'PSPDCFSA_S_DEDPDS'. 02 BIND-SETUP. 03 FILLER PIC X(10) VALUE ALL 'C'. 03 FILLER PIC X(10) VALUE ALL 'H'. 03 FILLER PIC X(10) VALUE ALL 'D'. 03 FILLER PIC X(10) VALUE ALL 'A'. ​NOCBGN 03 FILLER PIC X VALUE ALL 'C'. 03 FILLER PIC X VALUE ALL 'H'. 03 FILLER PIC X VALUE ALL 'C'. 03 FILLER PIC X VALUE ALL 'H'. ​NOCEND 03 FILLER PIC X VALUE ALL 'C'. 03 FILLER PIC X VALUE 'Z'. 02 BIND-DATA. 03 COMPANY PIC X(10). 03 PAYGROUP PIC X(10). 03 PAY-END-DT PIC X(10). 03 YEAR-END-DT PIC X(10). ​NOCLN 03 PAY-PERIODS PIC X(5). 03 FILLER PIC X VALUE 'Z'.

COCCUR Directive

The conversion utility doesn't normally expand the size of the array in this line from one of the PeopleTools COBOL programs. Using the COCCUR directive ensures that the OCCURS clause is expanded:

02 PARM. ​COCCUR 05 PARM-CH OCCURS 30 TIMES PIC X.

EXPEOF Directive

In the following example, the FIELDNAME group-level field is broken down to check the first 4 characters of the string. In this instance, it makes more sense to adjust the length of the FILLER field. By using the EXPEOF directive, you direct the utility to expand the FILLER field to a length of 50:

EXPEOF 02 FIELDNAME. 03 FIELDNAME4 PIC X(4) VALUE SPACE. 88 FIELDNAME-TSE VALUE 'TSE_'. 03 FILLER PIC X(14) VALUE SPACE.

Click to jump to top of pageClick to jump to parent topicViewing Error Logs

The COBOL conversion utility produces a set of error and warning logs with messages that identify nonstandard code styles and inconsistencies. The utility also logs expansion actions that may require manual review.

The utility produces the following logs:

Exception log

This log contains warnings that occurred because of ambiguous working storage definitions. You may need to modify code or add utility directives to resolve the issues logged.

Exception BIND/SELECT log

This log contains warnings and errors that occurred because of ambiguous BIND and SETUP definitions.

Exception date log

This log lists all group-level date fields that are detected by the utility.

Summary log

This log provides general statistics regarding the number of programs that are processed.

When you specify the -4 flag, you see only the summary log. Set the -rd flag on the conversion utility command line if you want the utility to produce all of the detail logs: exception, BIND/SELECT, and exception date.

Messages from the Exception Log

The following tables summarize all of the messages that can appear in the three exception log files. Errors indicate problems that are encountered by the conversion utility.

Message

Type

Note

Non-matching conversion block found in line line number.

Error

Detected the NOCBGN directive, but couldn’t find the corresponding NOCEND.

Error in determining numeric length in line line number.

Error

Couldn’t decipher the numeric PICTURE clause.

The size of the one character array will be expanded in line line number.

Warning

Detected a one-character array where the field contains the string -BYTE, -POS, or -CHAR.

A one-character array is found in line number. The conversion routine will expand this to PIC X(3).

Warning

None.

Unable to find the copy library copy library name.

Error

Couldn't locate the copy library file that the program references.

Messages from the Exception BIND/SELECT Log

The following table lists messages from the BIND/SELECT log:

Message

Type

Note

Didn't find the corresponding DATA section for DATA field name in line line number.

Error

Detected either a BIND-DATA or a SELECT-DATA, but cannot find the SETUP group field.

No delimiter found on group field name section in line line number.

Warning

Didn’t find a FILLER field with a value Z in a DATA or SETUP group field.

Unable to convert the group field name section due to problems reading the Copylib.

Error

Couldn't locate a copy library that DATA or SETUP references.

The group field name found in line line number has a mismatch count.

Warning and error

The number of columns in DATA doesn’t match the count for the corresponding SETUP.

Incompatible date type match for field in lineline number.

Error

The data type definition in SETUP doesn’t correspond to the data type in DATA.

Messages from the Exception Date Log

The following table lists messages from the exception date log:

Message

Type

Note

Date/time/datetime detected and will not be expanded in line.

Warning

None.

Verify if a date/time/datetime field in line number: line number.

Warning

Found a character-type field or group field with a total length of 10, 15, or 26 that could be a date, time, or datetime.

Click to jump to parent topicFine-Tuning COBOL Programs

Although the COBOL conversion makes most of the changes that are needed to run COBOL in a Unicode environment, some manual fine-tuning may still be necessary.

This section discusses how to:

Click to jump to top of pageClick to jump to parent topicIdentifying Unicode and Non-Unicode Data

A COBOL program may need to determine whether it's dealing with non-Unicode or Unicode data. For example, if the program parses a string character, it must apply different logic depending on whether the string is non-Unicode or Unicode. The program can get this information from the ENCODING-MODE-SW in the PTCSQLRT copy library (ANSI-Mode is the same as non-Unicode):

03 ENCODING-MODE-SW PIC X(3) VALUE SPACE. 88 ANSI-MODE VALUE 'A'. 88 UNICODE-MODE VALUE 'U'.

The ENCODING-MODE-SW value is set by the COBOL application programming interface (API), which determines which type of data it's dealing with by checking the value of the UNICODE_ENABLED field in the PSSTATUS table. When the value of the UNICODE_ENABLED flag is set to 1 (true), this signals the COBOL API that it is accessing a Unicode database.

The COBOL API also performs the necessary translations between the UTF-8 encodings that are required by COBOL and the UCS-2 encodings that are used elsewhere in the PeopleSoft system.

Click to jump to top of pageClick to jump to parent topicSpecifying Column Lengths in Dynamic SQL

Perhaps the biggest effort in getting COBOL fully functional in a Unicode environment is setting up the bind parameters and select buffers of any dynamic SQL statements.

Programs that use dynamic SQL must specify the column lengths for bind or select fields before calling the PTPDYSQL program. Within a COBOL program, there are two ways that you can assign bind parameters and select buffers of dynamic SQL statements:

To adjust the length of the character field appropriately, the program must recognize the encoding scheme that is used by the COBOL API. The program can take advantage of the ENCODING-MODE-SW field in PTCSQLRT to determine when the length of the field needs to be adjusted.

This example illustrates the use of a buffer array to calculate the length of a character field in the Unicode environment:

02 SELECT-DATA. 03 FIELDNAME PIC X(54) VALUE SPACE. 03 FIELDNUM PIC 9(3) VALUE ZERO COMP. 03 DEFFIELDNAME PIC X(90) VALUE SPACES. 03 FIELDLEN PIC 9(3) VALUE ZERO COMP. 03 FIELDTYPE PIC 9(2) VALUE ZERO COMP. 88 RDM-CHAR VALUE ZERO. 88 RDM-LONG-CHAR VALUE 1. 88 RDM-NUMBER VALUE 2. 88 RDM-SIGNED-NUMBER VALUE 3. 88 RDM-DATE VALUE 4. 88 RDM-TIME VALUE 5. 88 RDM-DATETIME VALUE 6. 03 DECIMALPOS PIC 9(2) VALUE ZERO COMP. 03 FILLER PIC X VALUE 'Z'. 88 RDM-NUMBER VALUE 2. 88 RDM-SIGNED-NUMBER VALUE 3. 88 RDM-DATE VALUE 4. 88 RDM-TIME VALUE 5. 88 RDM-DATETIME VALUE 6. 03 DECIMALPOS PIC 9(2) VALUE ZERO COMP. 03 FILLER PIC X VALUE 'Z'. . . . MOVE CORR SELECT-DATA OF S-RECFLD TO FLD-ARRAY OF RECFLD (FLD-IDX) * CONVERT FIELD TYPE FROM PSDBFIELD TYPE TO SQLRT CODE. MOVE FIELDLEN OF S-RECFLD TO SETUP-LENGTH OF RECFLD (FLD-IDX) IF RDM-CHAR OF S-RECFLD SET SETUP-TYPE-CHAR OF RECFLD (FLD-IDX) TO TRUE ​IF UNICODE-MODE OF SQLRT COMPUTE SETUP-LENGTH OF RECFLD (FLD-IDX) = SETUP-LENGTH OF RECFLD (FLD-IDX) * 3 END-IF ​ END-IF

See Also

Identifying Unicode and Non-Unicode Data

Click to jump to top of pageClick to jump to parent topicDefining Single Character Arrays

Some COBOL programs define single-character arrays to parse or examine a string of characters, one character at a time. In a Unicode environment, be sure that you’re examining the string one character at a time, not one byte at a time.

This example shows a code fragment in which the program is examining a string one character at a time:

01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX. 88 FIELD-DELIM VALUE ‘*’. . . . . . . SET CHAR-IDX TO 1 SEARCH CHAR-ARRAY WHEN FIELD-DELIM(CHAR-IDX) SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET END-SEARCH

The intent of the code in the previous example is to examine each character of the array, looking for the first delimiter character. When that character is found, the code displays the position of the delimiter.

In a non-Unicode environment that uses only the Latin1 character set, this works because there is one byte (one array element) per character. In a Unicode environment (or in a non-Unicode environment that allows double-byte character sets), this fails because what could potentially be examined is the second or third byte of a two- or three-byte character. It's possible for the second or third byte to match the bit pattern of the delimiter character, thus falsely passing the test and ending the search loop.

To correct this situation, you must know the length (in bytes) of each character that is being processed. A new COBOL function, PTPSTRFN, is available that returns the length of a character so that the code can take this into account when performing a character search. The PTPSTRFN subroutine works for both Unicode character sets and ANSI double-byte characters sets.

The PTPSTRFN subroutine offers two ways of retrieving the byte length of a character:

Requesting the Length of a Single Character

The input parameters to the PTPSTRFN function are:

Parameter

Value

Notes

ACTION-TYPE

ACTION-CHARLEN

 None.

CHAR-CD

The character whose length you want to verify.

This variable is included in the PTCSTRFN.CBL copy library.

These values are returned:

Parameter

Value

Notes

CHAR-LENGTH

The subroutine returns one of the following values, representing the length of the character that is referenced by CHAR-CD:

  • ONE-BYTE

  • TWO-BYTES

  • THREE-BYTES

This variable is included in the PTCSTRFN.CBL copy library.

STRFN-RC

Returns one of the following values:

  • STRFN-RC-OK

  • STRFN-INVALID-ACTION

This variable is included in the PTCSTRFN.CBL copy library.

At the beginning of this section, there was a code fragment in which the program was examining a string, one character at a time, looking for the first delimiter character:

01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX. 88 FIELD-DELIM VALUE ‘*’. . . . . . . SET CHAR-IDX TO 1 SEARCH CHAR-ARRAY WHEN FIELD-DELIM(CHAR-IDX) SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET END-SEARCH

After the code is modified for Unicode, it looks like this:

01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX. 88 FIELD-DELIM VALUE ‘*’. ​01 STR-FUNC COPY ‘PTCSTRFN’. . . . . . . SET CHAR-IDX TO 1 ​SEARCH CHAR-ARRAY WHEN FIELD-DELIM(CHAR-IDX) SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET WHEN OTHER MOVE CHAR-POS(CHAR-IDX) TO CHAR-CD OF STR-FUNC CALL 'PTPSTRFN' USING ACTION-CHARLEN ​ STR-FUNC ​IF TWO-BYTES OF STR-FUNC SET CHAR-IDX UP BY 1 ELSE IF THREE-BYTES OF STR-FUNC SET CHAR-IDX UP BY 2 END-IF ​ END-SEARCH

The modification ensures that the code continues to function properly in a Unicode environment. However, we can be sure that modification works only when the delimiter character for which the program is searching is one byte in length.

Consider the following code fragment in which the delimiter character that the program is searching for may be longer than one byte:

01 W-DELIMITER PIC X(3) VALUE 'some extended character'. 01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX CHAR-IDX2 CHAR-IDX3. 88 FIELD-DELIM VALUE ‘*’. . . . . . . SET CHAR-IDX TO 1 SEARCH CHAR-ARRAY WHEN CHAR-POS(CHAR-IDX) = W-DELIMITER SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET END-SEARCH

For this code to work in a Unicode environment, a Unicode-specific search algorithm must be used. Ensure that the program always compares the correct bytes from the array (up to three bytes, based on the current character length) to the fixed three-byte field containing the search value.

The proper search method looks like this:

01 CHAR-ARRAY. 02 CHAR-POS PIC X OCCURS 256 TIMES INDEXED BY CHAR-IDX. 88 FIELD-DELIM VALUE ‘*’. ​01 STR-FUNC COPY ‘PTCSTRFN’. . . . . . . SET CHAR-IDX TO 1 ​PERFORM UNTIL CHAR-IDX > 256 MOVE CHAR-POS(CHAR-IDX) TO CHAR-CD OF STR-FUNC CALL 'PTPSTRFN' USING ACTION-CHARLEN STR-FUNC INITIALIZE W-WORK EVALUATE TRUE WHEN ONE-BYTE OF STR-FUNC MOVE CHAR-POS(CHAR-IDX) TO W-WORK SET CHAR-IDX UP BY 1 WHEN TWO-BYTES OF STR-FUNC SET CHAR-IDX2 TO CHAR-IDX SET CHAR-IDX2 UP BY 1 STRING CHAR-POS(CHAR-IDX) CHAR-POS(CHAR-IDX2) DELIMITED BY SIZE INTO W-WORK END-STRING SET CHAR-IDX UP BY 2 WHEN THREE-BYTES OF STR-FUNC SET CHAR-IDX2 TO CHAR-IDX SET CHAR-IDX2 UP BY 1 SET CHAR-IDX3 TO CHAR-IDX SET CHAR-IDX3 UP BY 2 STRING CHAR-POS(CHAR-IDX) CHAR-POS(CHAR-IDX2) CHAR-POS(CHAR-IDX3) DELIMITED BY SIZE INTO W-WORK END-STRING SET CHAR-IDX UP BY 3 WHEN OTHER DISPLAY ‘**ERROR** INVALID CHARACTER LENGTH’ <ABEND> END-EVALUATE IF W-WORK = W-DELIMITER SET W-OFFSET TO CHAR-IDX DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET END-IF ​END-PERFORM

As you can see from the previous example, searching a string array for a particular value that may be an extended character can be difficult; if possible, avoid such a search.

Requesting a Map of an Entire Character String

The input parameters to the PTPSTRFN function are:

Parameter

Value

Notes

ACTION-TYPE

ACTION-STRMAP.

None.

STRING-LENGTH

Length of the character string String Parameter 1.

This variable is included in the PTCSTRFN.CBL copy library.

String Parameter 1

The character string.

None.

String Parameter 2

The buffer area to be updated by the subroutine.

None.

This table lists the values that are returned:

Parameter

Value

Notes

String Parameter 2

This buffer is updated with the appropriate values. This field contains at least one of these values:

  • 1

    The next character position is part of a one-byte character.

  • 2X

    The next two character positions are part of a two-byte character.

  • 3XX

    The next three character positions are part of a three-byte character.

Refer to the example following this table to see how the function works.

STRFN-RC

Returns one of the following values:

  • STRFN-RC-OK

  • STRFN-INVALID-ACTION

This variable is included in the PTCSTRFN.CBL copy library.

The following sample COBOL code provides an example of how the PTPSTRFN COBOL function can be used to map the byte length of an entire character string:

01 W-WORK. 02 LANGUAGE PIC X(20). 02 UNICODE-TEXT PIC X(300). 02 UNICODE-TEXT-MAP PIC X(300). 02 DATA-LEN PIC 9(3) COMP. 02 BYTE-POS-MAX PIC 9(4) COMP. 02 COUNTERS. 05 COUNT-1BYTE-CHAR PIC 9(02) VALUE ZEROS. 05 COUNT-2BYTE-CHAR PIC 9(02) VALUE ZEROS. 05 COUNT-3BYTE-CHAR PIC 9(02) VALUE ZEROS. 01 BYTE-ARRAY. 02 BYTE-POS PIC X OCCURS 300 TIMES INDEXED BY BYTE-IDX. 88 ONE-BYTE-CHAR VALUE '1'. 88 TWO-BYTES-CHAR VALUE '2'. 88 THREE-BYTES-CHAR VALUE '3'. 88 BYTE-STRING-END VALUE SPACE. 01 STR-FUNC COPY 'PTPSTRFN'. . . . . . . ​Code to retrieve the text from the database and assign to the appropriate⇒ fields . . . . . . * Initialize the string map before calling the function MOVE SPACES TO UNICODE-TEXT-MAP CALL 'PTPSTRFN' USING ACTION-STRMAP STR-FUNC UNICODE-TEXT UNICODE-TEXT-MAP IF NOT STRFN-RC-OF OF STR-FUNC <ABEND PROGRAM> END-IF SET BYTE-POS-MAX TO 300 MOVE 300 TO DATA-LEN OF W-WORK MOVE UNICODE-TEXT-MAP TO BYTE-ARRAY PERFORM VARYING BYTE-IDX FROM 300 BY -1 UNTIL BYTE-IDX <= 1 OR NOT BYTE-STRING-END(BYTE-IDX) SUBTRACT 1 FROM DATA-LEN OF W-WORK END-PERFORM * Initialize counters MOVE ZEROS TO COUNT-1BYTE-CHAR MOVE ZEROS TO COUNT-2BYTE-CHAR MOVE ZEROS TO COUNT-3BYTE-CHAR PERFORM UNTIL BYTE-IDX > DATA-LEN OF W-WORK EVALUATE TRUE WHEN ONE-BYTE-CHAR ADD 1 TO COUNT-1BYTE-CHAR WHEN TWO-BYTES-CHAR ADD 1 TO COUNT-2BYTE-CHAR WHEN THREE-BYTES-CHAR ADD 1 TO COUNT-3BYTE-CHAR END-EVALUATE SET BYTE-IDX UP BY 1 END-PERFORM DISPLAY ' LANGUAGE = ' LANGUAGE DISPLAY ' UTF8 TEXT: (LENGTH = ' DATA-LEN ')' DISPLAY ' ' UNICODE-TEXT DISPLAY ' ' DISPLAY ' UTF8 BYTE MAPPING:' DISPLAY ' ' UNICODE-TEXT-MAP DISPLAY ' ' DISPLAY ' TALLY:' DISPLAY ' NUMBER OF ONE-BYTE CHAR: ' COUNT-1BYTE-CHAR DISPLAY ' NUMBER OF TWO-BYTES CHAR: ' COUNT-2BYTE-CHAR DISPLAY ' NUMBER OF THREE-BYTES CHAR: ' COUNT-3BYTE-CHAR DISPLAY ' ' DISPLAY ' '

The following table provides sample Unicode text as input for the PTPSTRFN function:

Language

Sample Unicode Text

Catalan

Quan el món vol conversar, parla Unicode

Chinese (Simplified)

当世界需要沟通时,请用Unicode!

Chinese (Traditional)

當世界需要溝通時,請用統一碼(Unicode)

Danish

Når verden vil tale, taler den Unicode

Dutch

Als de wereld wil praten, spreekt hij Unicode

English

When the world wants to talk, it speaks Unicode

Esperanto

Kiam la mondo volas paroli, ĝi parolas Unicode

Finnish

Kun maailma haluaa puhua, se puhuu Unicodea

French

Quand le monde veut communiquer, il parle en Unicode

For each row in the table, the code performs the following functions:

The output of this program appears as follows:

Note. Certain strings appear to be garbled. This is because the system running the program has printed the output by individual bytes and not by multi-byte characters.

LANGUAGE = Catalan UTF8 TEXT: (LENGTH = 0041) Quan el món vol conversar, parla Unicode UTF8 BYTE MAPPING: 1111111112X111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 039 NUMBER OF TWO-BYTES CHAR: 001 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Chinese (Simplified) UTF8 TEXT: (LENGTH = 0043) 彔世界éœ?è¦?沟é?šæ—¶ï¼Œè¯·ç”̈Unicodeï¼? UTF8 BYTE MAPPING: 3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX TALLY: NUMBER OF ONE-BYTE CHAR: 007 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 012 LANGUAGE = Chinese (Traditional UTF8 TEXT: (LENGTH = 0055) 當世界éœ?è¦?æº?é?šæ™?,è«?ç”̈çµ±ä¸?碼ï¼∘Unicodeï UTF8 BYTE MAPPING: 3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX TALLY: NUMBER OF ONE-BYTE CHAR: 007 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 016 LANGUAGE = Danish UTF8 TEXT: (LENGTH = 0039) NÃ¥r verden vil tale, taler den Unicode UTF8 BYTE MAPPING: 12X111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 037 NUMBER OF TWO-BYTES CHAR: 001 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Dutch UTF8 TEXT: (LENGTH = 0045) Als de wereld wil praten, spreekt hij Unicode UTF8 BYTE MAPPING: 111111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 045 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = English UTF8 TEXT: (LENGTH = 0047) When the world wants to talk, it speaks Unicode UTF8 BYTE MAPPING: 11111111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 047 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Esperanto UTF8 TEXT: (LENGTH = 0047) Kiam la mondo volas paroli, Ä¥i parolas Unicode UTF8 BYTE MAPPING: 11111111111111111111111111112X11111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 045 NUMBER OF TWO-BYTES CHAR: 001 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = Finnish UTF8 TEXT: (LENGTH = 0043) Kun maailma haluaa puhua, se puhuu Unicodea UTF8 BYTE MAPPING: 1111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 043 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000 LANGUAGE = French UTF8 TEXT: (LENGTH = 0052) Quand le monde veut communiquer, il parle en Unicode UTF8 BYTE MAPPING: 1111111111111111111111111111111111111111111111111111 TALLY: NUMBER OF ONE-BYTE CHAR: 052 NUMBER OF TWO-BYTES CHAR: 000 NUMBER OF THREE-BYTES CHAR: 000