Fine-Tuning COBOL Programs

Although the COBOL conversion makes most of the changes that are needed to run COBOL in a Unicode environment, some manual fine-tuning may still be necessary.

This section discusses how to fine tune COBOL programs.

A COBOL program may need to determine whether it's dealing with non-Unicode or Unicode data. For example, if the program parses a string character, it must apply different logic depending on whether the string is non-Unicode or Unicode. The program can get this information from the ENCODING-MODE-SW in the PTCSQLRT copy library (ANSI-Mode is the same as non-Unicode):

03  ENCODING-MODE-SW    PIC X(3)    VALUE SPACE.
                   88  ANSI-MODE                   VALUE 'A'.
                   88  UNICODE-MODE                VALUE 'U'.

The ENCODING-MODE-SW value is set by the COBOL application programming interface (API), which determines which type of data it's dealing with by checking the value of the UNICODE_ENABLED field in the PSSTATUS table. When the value of the UNICODE_ENABLED flag is set to 1 (true), this signals the COBOL API that it is accessing a Unicode database.

The COBOL API also performs the necessary translations between the UTF-8 encodings that are required by COBOL and the UCS-2 encodings that are used elsewhere in the PeopleSoft system.

Perhaps the biggest effort in getting COBOL fully functional in a Unicode environment is setting up the bind parameters and select buffers of any dynamic SQL statements.

Programs that use dynamic SQL must specify the column lengths for bind or select fields before calling the PTPDYSQL program. Within a COBOL program, there are two ways that you can assign bind parameters and select buffers of dynamic SQL statements:

  • By using a predefined working storage area with the dynamic SQL statement.

    This method is similar to the method that is used for stored SQL statements. In this case, PTPDYSQL adjusts the length of character-data fields that are passed to PTPSQLRT. This is necessary because the COBOL Unicode conversion utility expands only the working storage fields; it does not modify the length of fields that are hard-coded in the PROGRAM-DIVISION section of the COBOL programs.

    Because PTPDYSQL sends the correct length to PTPSQLRT, no changes to the COBOL program are necessary.

  • By using a buffer array.

    At runtime, this array is partitioned based on the properties of all of the fields that are referenced by the dynamic SQL statement. The properties of those fields are retrieved from the PSDBFIELD table, and include both the field’s data type and the field’s length.

    In this case, you must modify the COBOL program to adjust the length that is specified for a character field. Adjust the length by a factor of three.

To adjust the length of the character field appropriately, the program must recognize the encoding scheme that is used by the COBOL API. The program can take advantage of the ENCODING-MODE-SW field in PTCSQLRT to determine when the length of the field needs to be adjusted.

This example illustrates the use of a buffer array to calculate the length of a character field in the Unicode environment:

02  SELECT-DATA.
               03  FIELDNAME           PIC X(54)   VALUE SPACE.
               03  FIELDNUM            PIC 9(3)    VALUE ZERO  COMP.
               03  DEFFIELDNAME        PIC X(90)   VALUE SPACES.
               03  FIELDLEN            PIC 9(3)    VALUE ZERO  COMP.
               03  FIELDTYPE           PIC 9(2)    VALUE ZERO  COMP.
                   88  RDM-CHAR                    VALUE ZERO.
                   88  RDM-LONG-CHAR               VALUE 1.
                   88  RDM-NUMBER                  VALUE 2.
                   88  RDM-SIGNED-NUMBER           VALUE 3.
                   88  RDM-DATE                    VALUE 4.
                   88  RDM-TIME                    VALUE 5.
                   88  RDM-DATETIME                VALUE 6.
               03  DECIMALPOS          PIC 9(2)    VALUE ZERO  COMP.
               03  FILLER              PIC X       VALUE 'Z'.
                   88  RDM-NUMBER                  VALUE 2.
                   88  RDM-SIGNED-NUMBER           VALUE 3.
                   88  RDM-DATE                    VALUE 4.
                   88  RDM-TIME                    VALUE 5.
                   88  RDM-DATETIME                VALUE 6.
               03  DECIMALPOS          PIC 9(2)    VALUE ZERO  COMP.
               03  FILLER              PIC X       VALUE 'Z'.
       . . .
MOVE CORR SELECT-DATA OF S-RECFLD
                        TO FLD-ARRAY OF RECFLD (FLD-IDX)
            *  CONVERT FIELD TYPE FROM PSDBFIELD TYPE TO SQLRT CODE.
MOVE FIELDLEN OF S-RECFLD TO SETUP-LENGTH OF RECFLD (FLD-IDX)
IF RDM-CHAR OF S-RECFLD
           SET SETUP-TYPE-CHAR OF RECFLD (FLD-IDX) TO TRUE
                     IF UNICODE-MODE OF SQLRT
                 COMPUTE SETUP-LENGTH OF RECFLD (FLD-IDX) =
                      SETUP-LENGTH OF RECFLD (FLD-IDX) * 3
           END-IF
           END-IF

Some COBOL programs define single-character arrays to parse or examine a string of characters, one character at a time. In a Unicode environment, be sure that you’re examining the string one character at a time, not one byte at a time.

This example shows a code fragment in which the program is examining a string one character at a time:

01  CHAR-ARRAY.
    02  CHAR-POS    PIC X  OCCURS 256 TIMES
          INDEXED BY CHAR-IDX.
        88  FIELD-DELIM        VALUE ‘*’.
. . .
. . .
SET CHAR-IDX  TO  1
SEARCH CHAR-ARRAY
    WHEN FIELD-DELIM(CHAR-IDX)
        SET W-OFFSET  TO  CHAR-IDX
        DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
END-SEARCH

The intent of the code in the previous example is to examine each character of the array, looking for the first delimiter character. When that character is found, the code displays the position of the delimiter.

In a non-Unicode environment that uses only the Latin1 character set, this works because there is one byte (one array element) per character. In a Unicode environment (or in a non-Unicode environment that allows double-byte character sets), this fails because what could potentially be examined is the second or third byte of a two- or three-byte character. It's possible for the second or third byte to match the bit pattern of the delimiter character, thus falsely passing the test and ending the search loop.

To correct this situation, you must know the length (in bytes) of each character that is being processed. A new COBOL function, PTPSTRFN, is available that returns the length of a character so that the code can take this into account when performing a character search. The PTPSTRFN subroutine works for both Unicode character sets and ANSI double-byte characters sets.

The PTPSTRFN subroutine offers two ways of retrieving the byte length of a character:

  • By requesting the length of a single character.

  • By requesting a map of an entire character string.

    Choose this option if the application program needs to get the length information of all characters within a string.

Requesting the Length of a Single Character

The input parameters to the PTPSTRFN function are:

Parameter

Value

Notes

ACTION-TYPE

ACTION-CHARLEN

 None.

CHAR-CD

The character whose length you want to verify.

This variable is included in the PTCSTRFN.CBL copy library.

These values are returned:

Parameter

Value

Notes

CHAR-LENGTH

The subroutine returns one of the following values, representing the length of the character that is referenced by CHAR-CD:

  • ONE-BYTE

  • TWO-BYTES

  • THREE-BYTES

This variable is included in the PTCSTRFN.CBL copy library.

STRFN-RC

Returns one of the following values:

  • STRFN-RC-OK

  • STRFN-INVALID-ACTION

This variable is included in the PTCSTRFN.CBL copy library.

At the beginning of this section, there was a code fragment in which the program was examining a string, one character at a time, looking for the first delimiter character:

01  CHAR-ARRAY.
    02  CHAR-POS    PIC X  OCCURS 256 TIMES
          INDEXED BY CHAR-IDX.
        88  FIELD-DELIM        VALUE ‘*’.
. . .
. . .
SET CHAR-IDX  TO  1
SEARCH CHAR-ARRAY
    WHEN FIELD-DELIM(CHAR-IDX)
        SET W-OFFSET  TO  CHAR-IDX
        DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
END-SEARCH

After the code is modified for Unicode, it looks like this:

01  CHAR-ARRAY.
    02  CHAR-POS    PIC X  OCCURS 256 TIMES
        INDEXED BY CHAR-IDX.
        88  FIELD-DELIM     VALUE ‘*’.
01   STR-FUNC                  COPY ‘PTCSTRFN’. 
       . . .
. . .
SET CHAR-IDX  TO  1
SEARCH CHAR-ARRAY
    WHEN FIELD-DELIM(CHAR-IDX)
        SET W-OFFSET  TO  CHAR-IDX
        DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
    WHEN OTHER
       MOVE CHAR-POS(CHAR-IDX)   TO CHAR-CD OF STR-FUNC
       CALL 'PTPSTRFN'  USING   ACTION-CHARLEN

STR-FUNC
IF TWO-BYTES OF STR-FUNC
          SET CHAR-IDX UP BY 1
       ELSE IF THREE-BYTES OF STR-FUNC
          SET CHAR-IDX UP BY 2
       END-IF
       END-SEARCH

The modification ensures that the code continues to function properly in a Unicode environment. However, we can be sure that modification works only when the delimiter character for which the program is searching is one byte in length.

Consider the following code fragment in which the delimiter character that the program is searching for may be longer than one byte:

01  W-DELIMITER    PIC X(3)  VALUE 'some extended character'.
01  CHAR-ARRAY.
    02  CHAR-POS        PIC X  OCCURS 256 TIMES
          INDEXED BY CHAR-IDX
                     CHAR-IDX2
           CHAR-IDX3.
        88  FIELD-DELIM      VALUE ‘*’.
. . .
. . .
SET CHAR-IDX  TO  1
SEARCH CHAR-ARRAY
        WHEN CHAR-POS(CHAR-IDX)  =  W-DELIMITER
    SET W-OFFSET  TO  CHAR-IDX
    DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
END-SEARCH

For this code to work in a Unicode environment, a Unicode-specific search algorithm must be used. Ensure that the program always compares the correct bytes from the array (up to three bytes, based on the current character length) to the fixed three-byte field containing the search value.

The proper search method looks like this:

01  CHAR-ARRAY.
    02  CHAR-POS    PIC X  OCCURS 256 TIMES
        INDEXED BY CHAR-IDX.
        88  FIELD-DELIM         VALUE ‘*’.
01  STR-FUNC                  COPY ‘PTCSTRFN’.
       . . .
. . .
SET CHAR-IDX  TO  1
PERFORM UNTIL CHAR-IDX > 256
    MOVE CHAR-POS(CHAR-IDX)   TO CHAR-CD OF STR-FUNC
    CALL 'PTPSTRFN'    USING   ACTION-CHARLEN
                               STR-FUNC
    INITIALIZE W-WORK
    EVALUATE TRUE
        WHEN ONE-BYTE OF STR-FUNC
            MOVE CHAR-POS(CHAR-IDX)  TO  W-WORK
            SET CHAR-IDX UP BY 1
        WHEN TWO-BYTES OF STR-FUNC
            SET CHAR-IDX2  TO  CHAR-IDX
            SET CHAR-IDX2  UP  BY 1
            STRING  CHAR-POS(CHAR-IDX)
                    CHAR-POS(CHAR-IDX2)
                    DELIMITED BY SIZE
                    INTO W-WORK
            END-STRING
            SET CHAR-IDX UP BY 2
        WHEN THREE-BYTES OF STR-FUNC
            SET CHAR-IDX2  TO  CHAR-IDX
            SET CHAR-IDX2  UP  BY 1
            SET CHAR-IDX3  TO  CHAR-IDX
            SET CHAR-IDX3  UP  BY 2
            STRING  CHAR-POS(CHAR-IDX)
                    CHAR-POS(CHAR-IDX2)
                    CHAR-POS(CHAR-IDX3)
                    DELIMITED BY SIZE
                    INTO W-WORK
            END-STRING
            SET CHAR-IDX UP BY 3
    WHEN OTHER
        DISPLAY ‘**ERROR** INVALID CHARACTER LENGTH’
        <ABEND>
    END-EVALUATE
    IF W-WORK  =  W-DELIMITER
        SET W-OFFSET  TO  CHAR-IDX
        DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
    END-IF
END-PERFORM

As you can see from the previous example, searching a string array for a particular value that may be an extended character can be difficult; if possible, avoid such a search.

Requesting a Map of an Entire Character String

The input parameters to the PTPSTRFN function are:

Parameter

Value

Notes

ACTION-TYPE

ACTION-STRMAP.

None.

STRING-LENGTH

Length of the character string String Parameter 1.

This variable is included in the PTCSTRFN.CBL copy library.

String Parameter 1

The character string.

None.

String Parameter 2

The buffer area to be updated by the subroutine.

None.

This table lists the values that are returned:

Parameter

Value

Notes

String Parameter 2

This buffer is updated with the appropriate values. This field contains at least one of these values:

  • 1

    The next character position is part of a one-byte character.

  • 2X

    The next two character positions are part of a two-byte character.

  • 3XX

    The next three character positions are part of a three-byte character.

Refer to the example following this table to see how the function works.

STRFN-RC

Returns one of the following values:

  • STRFN-RC-OK

  • STRFN-INVALID-ACTION

This variable is included in the PTCSTRFN.CBL copy library.

The following sample COBOL code provides an example of how the PTPSTRFN COBOL function can be used to map the byte length of an entire character string:

       01  W-WORK.
           02  LANGUAGE                PIC X(20).
           02  UNICODE-TEXT            PIC X(300).
           02  UNICODE-TEXT-MAP        PIC X(300).
           02  DATA-LEN                PIC 9(3)    COMP.
           02  BYTE-POS-MAX            PIC 9(4)    COMP.
           02  COUNTERS.
               05  COUNT-1BYTE-CHAR    PIC 9(02)   VALUE ZEROS.
               05  COUNT-2BYTE-CHAR    PIC 9(02)   VALUE ZEROS.
               05  COUNT-3BYTE-CHAR    PIC 9(02)   VALUE ZEROS.
       01  BYTE-ARRAY.
           02  BYTE-POS                PIC X   OCCURS 300   TIMES
                                        INDEXED BY BYTE-IDX.
               88  ONE-BYTE-CHAR               VALUE '1'.
               88  TWO-BYTES-CHAR              VALUE '2'.
               88  THREE-BYTES-CHAR            VALUE '3'.
               88  BYTE-STRING-END             VALUE SPACE.
       01  STR-FUNC                   COPY 'PTPSTRFN'.
       . . .
       . . .
        Code to retrieve the text from the database and assign to the appropriate fields
       . . .
       . . .
      * Initialize the string map before calling the function
          MOVE SPACES    TO UNICODE-TEXT-MAP
          CALL 'PTPSTRFN'    USING   ACTION-STRMAP
                                     STR-FUNC
                                     UNICODE-TEXT
                                     UNICODE-TEXT-MAP
          IF NOT STRFN-RC-OF OF STR-FUNC
              <ABEND PROGRAM>
          END-IF
          SET BYTE-POS-MAX TO 300
          MOVE 300 TO DATA-LEN OF W-WORK
          MOVE UNICODE-TEXT-MAP  TO  BYTE-ARRAY
          PERFORM VARYING BYTE-IDX FROM 300  BY -1
              UNTIL BYTE-IDX  <=  1
              OR NOT BYTE-STRING-END(BYTE-IDX)
               SUBTRACT 1  FROM  DATA-LEN OF W-WORK
          END-PERFORM
      * Initialize counters
          MOVE ZEROS   TO COUNT-1BYTE-CHAR
          MOVE ZEROS   TO COUNT-2BYTE-CHAR
          MOVE ZEROS   TO COUNT-3BYTE-CHAR
          PERFORM UNTIL BYTE-IDX > DATA-LEN OF W-WORK
              EVALUATE TRUE
                 WHEN ONE-BYTE-CHAR
                      ADD 1 TO COUNT-1BYTE-CHAR
                 WHEN TWO-BYTES-CHAR
                      ADD 1 TO COUNT-2BYTE-CHAR
                 WHEN THREE-BYTES-CHAR
                      ADD 1 TO COUNT-3BYTE-CHAR
              END-EVALUATE
              SET BYTE-IDX UP BY 1
          END-PERFORM
          DISPLAY ' LANGUAGE = ' LANGUAGE
          DISPLAY '   UTF8 TEXT: (LENGTH = ' DATA-LEN ')'
          DISPLAY '          ' UNICODE-TEXT
          DISPLAY ' '
          DISPLAY '   UTF8 BYTE MAPPING:'
          DISPLAY '          ' UNICODE-TEXT-MAP
          DISPLAY ' '
          DISPLAY '   TALLY:'
          DISPLAY '     NUMBER OF ONE-BYTE CHAR:    '
                         COUNT-1BYTE-CHAR
          DISPLAY '     NUMBER OF TWO-BYTES CHAR:   '
                         COUNT-2BYTE-CHAR
          DISPLAY '     NUMBER OF THREE-BYTES CHAR: '
                         COUNT-3BYTE-CHAR
          DISPLAY ' '
          DISPLAY ' '

The following table provides sample Unicode text as input for the PTPSTRFN function:

Language

Sample Unicode Text

Catalan

Quan el món vol conversar, parla Unicode

Chinese (Simplified)

当世界需要沟通时,请用Unicode!

Chinese (Traditional)

當世界需要溝通時,請用統一碼(Unicode)

Danish

Når verden vil tale, taler den Unicode

Dutch

Als de wereld wil praten, spreekt hij Unicode

English

When the world wants to talk, it speaks Unicode

Finnish

Kun maailma haluaa puhua, se puhuu Unicodea

French

Quand le monde veut communiquer, il parle en Unicode

For each row in the table, the code performs the following functions:

  • Calls PTPSTRFN to get the string mapping of the UTF-8 character string of the text that is retrieved for the UNITEXT field.

  • Displays the UTF-8 string equivalent of the text.

  • Tallies the number of one-byte, two-byte, and three-byte characters of the text.

The output of this program appears as follows:

Note: Certain strings appear to be garbled. This is because the system running the program has printed the output by individual bytes and not by multi-byte characters.

LANGUAGE = Catalan
   UTF8 TEXT: (LENGTH = 0041)
          Quan el món vol conversar, parla Unicode
   UTF8 BYTE MAPPING:
          1111111112X111111111111111111111111111111
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    039
     NUMBER OF TWO-BYTES CHAR:   001
     NUMBER OF THREE-BYTES CHAR: 000
  
 LANGUAGE = Chinese (Simplified)
   UTF8 TEXT: (LENGTH = 0043)
          彔世界éœ?è¦?沟é?šæ—¶ï¼Œè¯·ç”̈Unicodeï¼?
   UTF8 BYTE MAPPING:
          3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    007
     NUMBER OF TWO-BYTES CHAR:   000
     NUMBER OF THREE-BYTES CHAR: 012
  
 LANGUAGE = Chinese (Traditional
   UTF8 TEXT: (LENGTH = 0055)
          當世界éœ?è¦?æº?é?šæ™?,è«?ç”̈çµ±ä¸?碼ï¼∘Unicodeï
   UTF8 BYTE MAPPING:
          3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    007
     NUMBER OF TWO-BYTES CHAR:   000
     NUMBER OF THREE-BYTES CHAR: 016
  
 LANGUAGE = Danish
   UTF8 TEXT: (LENGTH = 0039)
          NÃ¥r verden vil tale, taler den Unicode
   UTF8 BYTE MAPPING:
          12X111111111111111111111111111111111111
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    037
     NUMBER OF TWO-BYTES CHAR:   001
     NUMBER OF THREE-BYTES CHAR: 000
  
 LANGUAGE = Dutch
   UTF8 TEXT: (LENGTH = 0045)
          Als de wereld wil praten, spreekt hij Unicode
   UTF8 BYTE MAPPING:
          111111111111111111111111111111111111111111111
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    045
     NUMBER OF TWO-BYTES CHAR:   000
     NUMBER OF THREE-BYTES CHAR: 000
  
 LANGUAGE = English
   UTF8 TEXT: (LENGTH = 0047)
          When the world wants to talk, it speaks Unicode
   UTF8 BYTE MAPPING:
          11111111111111111111111111111111111111111111111
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    047
     NUMBER OF TWO-BYTES CHAR:   000
     NUMBER OF THREE-BYTES CHAR: 000
  
 LANGUAGE = Finnish
   UTF8 TEXT: (LENGTH = 0043)
          Kun maailma haluaa puhua, se puhuu Unicodea
   UTF8 BYTE MAPPING:
          1111111111111111111111111111111111111111111
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    043
     NUMBER OF TWO-BYTES CHAR:   000
     NUMBER OF THREE-BYTES CHAR: 000 
  
 LANGUAGE = French
   UTF8 TEXT: (LENGTH = 0052)
          Quand le monde veut communiquer, il parle en Unicode
   UTF8 BYTE MAPPING:
          1111111111111111111111111111111111111111111111111111
   TALLY:
     NUMBER OF ONE-BYTE CHAR:    052
     NUMBER OF TWO-BYTES CHAR:   000
     NUMBER OF THREE-BYTES CHAR: 000