Fine-Tuning COBOL Programs for the z/OS Unicode Environment

This section discusses fine tuning COBOL programs for the z/OS Unicode environment.

Although the COBOL Unicode conversion utility for z/OS makes most of the changes that are needed to run COBOL in a Unicode environment on z/OS, some manual fine-tuning may still be necessary.

A COBOL program may need to determine whether it’s dealing with non-Unicode data, Unicode data for Microsoft Windows/Unix in UTF-8 format, or Unicode data for the z/OS environment in UCS-2 format.

The conversion utility can get this information from the ENCODING-MODE-SW data field in the PTCSQLRT copy library. ANSI-mode is the same as non-Unicode. Unicode-mode represents Unicode processing on Microsoft Windows/Unix using UTF-8, and National-mode represents Unicode processing on z/OS using UCS-2.

03  ENCODING-MODE-SW    PIC X(1)    VALUE SPACE.
88  ANSI-MODE                       VALUE 'A'.
88  UNICODE-MODE                    VALUE 'U'.
88  NATIONAL-MODE                   VALUE 'N'.

The ENCODING-MODE-SW field value is set by default during installation. If when installing the file server you select the Unicode database option and build the PeopleSoft Process Scheduler environment on z/OS by running the PeopleSoft server transfer process, the value is automatically set to National-mode in the z/OS Unicode environment.

Note: It is very important to distinguish Unicode processing on Microsoft Windows/Unix from Unicode processing on z/OS. Unicode processing on Microsoft Windows/Unix requires expanding storage size by explicitly updating the field length. In the z/OS environment, Unicode values are stored using different data type (national), and you need not expand the storage size explicitly.

The COBOL Unicode conversion utility for z/OS changes the usage of alphanumeric character fields (PIC X) to usage NATIONAL (PIC N). This change works for the majority of the data fields, but may have side effects.

As in UCS-2 encoding, any character takes up two bytes. The size of one PIC N element is two bytes. The conversion from PIC X to PIC N implicitly doubles the field size. This can be a problem when a field must be a specific byte size. For example, in the following example, the PIC X field is used as filler to extract a portion of byte stream from numeric data field. This type of PIC X field should not be converted to PIC N.

01  W-WORK.

           03  NUM-OUT-AREA            PIC X(16).
           03  NUM-OUT-0               REDEFINES NUM-OUT-AREA
                                       PIC S9(31) COMP-3.
           03  NUM-OUT-1               REDEFINES NUM-OUT-AREA
                                       PIC S9(30)V9(1) COMP-3.
           03  NUM-OUT-2               REDEFINES NUM-OUT-AREA
                                       PIC S9(29)V9(2) COMP-3.
           03  NUM-OUT-3               REDEFINES NUM-OUT-AREA
                                       PIC S9(28)V9(3) COMP-3.

           03  FILLER                  REDEFINES NUM-OUT-AREA.
               05  FILLER              PIC X(12).
               05  NUM-OUT-INT         PIC S9(8) COMP.
           03  FILLER                  REDEFINES NUM-OUT-AREA.
               05  FILLER              PIC X(14).
05  NUM-OUT-SMALLINT    PIC S9(4) COMP.

The utility converts all alphanumeric character fields to PIC N, except for a very limited case where it can detect it should not convert this field data type. For the cases like that shown in the previous example, you must manually modify the code and test it.

Additional Considerations

Consider the following:

  • Be very careful when deciding not to convert a PIC X field. COBOL does not allow moving data between PIC X and PIC N field without character set conversion between EBCDIC and Unicode, which adds extra complexity in your program and can cause performance overhead.

  • You cannot use national and non-national data in some statements, such as STRING and UNSTRING.

  • PIC N is two-byte field but content is not limited to Unicode strings. You can store binary, packed decimal or pointer values in the PIC N field (using redefinition) and can copy the value to another PIC N field. Therefore, it is not always necessary to use PIC X field to store non-Unicode values. Use the PeopleSoft-delivered COBOL Unicode conversion utility for z/OS to convert all fields to PIC N unless the field cannot be converted to PIC N (instead of trying to convert only the field that can contain international characters).

The COBOL Unicode conversion utility for z/OS updates COBOL programs to use Unicode for all character data for internal processing, including numbers and date. However, there maybe cases where the Unicode data must be converted to non-Unicode character set, typically when communicating with external systems that do not fully support Unicode.

The following table lists IBM Enterprise COBOL statements and functions that convert between Unicode and non-Unicode character sets.

Statement/Function

Description

MOVE

This statement converts data in system EBCDIC character set to Unicode in the following syntax:

move PIC X field or alphanumeric string literal to PIC N field

Note: Using the statement move PIC N field or national literal to PIC X causes a compiler error. When converting from Unicode data item to non-Unicode, you must insert the DISPLAY-OF() function to convert the Unicode data to non-Unicode first.

DISPLAY-OF()

This function converts data from Unicode to a non-Unicode character. The DISPLAY-OF() function has two forms:

  • The following converts from Unicode to system EBCDIC character set.

    FUNCTION DISPLAY-OF(PIC N field or national literal)
  • The following converts from Unicode to a character set specified in the CCSID parameter.

    FUNCTION DISPLAY-OF(PIC N field or national literal, CCSID)

NATIONAL-OF()

This function converts data from a non-Unicode character set to Unicode. NATIONAL-OF() function has two forms:

  • The following converts from the system EBCDIC character set to Unicode.

    FUNCTION NATIONAL-OF(PIC X field or alphanumeric literal)
  • The following converts from a character set specified in the CCSID parameter to Unicode.

    FUNCTION NATIONAL-OF(PIC X field or alphanumeric literal, CCSID)

Note: To change the newline characters, use one of transformations - NewLine LF, NewLine CR, or NewLine CRLF at the end of the command. This utility is available to use in the bin directory.