Fine-Tuning COBOL Programs
Although the COBOL conversion makes most of the changes that are needed to run COBOL in a Unicode environment, some manual fine-tuning may still be necessary.
This section discusses how to fine tune COBOL programs.
A COBOL program may need to determine whether it's dealing with non-Unicode or Unicode data. For example, if the program parses a string character, it must apply different logic depending on whether the string is non-Unicode or Unicode. The program can get this information from the ENCODING-MODE-SW in the PTCSQLRT copy library (ANSI-Mode is the same as non-Unicode):
03 ENCODING-MODE-SW PIC X(3) VALUE SPACE.
88 ANSI-MODE VALUE 'A'.
88 UNICODE-MODE VALUE 'U'.
The ENCODING-MODE-SW value is set by the COBOL application programming interface (API), which determines which type of data it's dealing with by checking the value of the UNICODE_ENABLED field in the PSSTATUS table. When the value of the UNICODE_ENABLED flag is set to 1 (true), this signals the COBOL API that it is accessing a Unicode database.
The COBOL API also performs the necessary translations between the UTF-8 encodings that are required by COBOL and the UCS-2 encodings that are used elsewhere in the PeopleSoft system.
Perhaps the biggest effort in getting COBOL fully functional in a Unicode environment is setting up the bind parameters and select buffers of any dynamic SQL statements.
Programs that use dynamic SQL must specify the column lengths for bind or select fields before calling the PTPDYSQL program. Within a COBOL program, there are two ways that you can assign bind parameters and select buffers of dynamic SQL statements:
By using a predefined working storage area with the dynamic SQL statement.
This method is similar to the method that is used for stored SQL statements. In this case, PTPDYSQL adjusts the length of character-data fields that are passed to PTPSQLRT. This is necessary because the COBOL Unicode conversion utility expands only the working storage fields; it does not modify the length of fields that are hard-coded in the PROGRAM-DIVISION section of the COBOL programs.
Because PTPDYSQL sends the correct length to PTPSQLRT, no changes to the COBOL program are necessary.
By using a buffer array.
At runtime, this array is partitioned based on the properties of all of the fields that are referenced by the dynamic SQL statement. The properties of those fields are retrieved from the PSDBFIELD table, and include both the field’s data type and the field’s length.
In this case, you must modify the COBOL program to adjust the length that is specified for a character field. Adjust the length by a factor of three.
To adjust the length of the character field appropriately, the program must recognize the encoding scheme that is used by the COBOL API. The program can take advantage of the ENCODING-MODE-SW field in PTCSQLRT to determine when the length of the field needs to be adjusted.
This example illustrates the use of a buffer array to calculate the length of a character field in the Unicode environment:
02 SELECT-DATA.
03 FIELDNAME PIC X(54) VALUE SPACE.
03 FIELDNUM PIC 9(3) VALUE ZERO COMP.
03 DEFFIELDNAME PIC X(90) VALUE SPACES.
03 FIELDLEN PIC 9(3) VALUE ZERO COMP.
03 FIELDTYPE PIC 9(2) VALUE ZERO COMP.
88 RDM-CHAR VALUE ZERO.
88 RDM-LONG-CHAR VALUE 1.
88 RDM-NUMBER VALUE 2.
88 RDM-SIGNED-NUMBER VALUE 3.
88 RDM-DATE VALUE 4.
88 RDM-TIME VALUE 5.
88 RDM-DATETIME VALUE 6.
03 DECIMALPOS PIC 9(2) VALUE ZERO COMP.
03 FILLER PIC X VALUE 'Z'.
88 RDM-NUMBER VALUE 2.
88 RDM-SIGNED-NUMBER VALUE 3.
88 RDM-DATE VALUE 4.
88 RDM-TIME VALUE 5.
88 RDM-DATETIME VALUE 6.
03 DECIMALPOS PIC 9(2) VALUE ZERO COMP.
03 FILLER PIC X VALUE 'Z'.
. . .
MOVE CORR SELECT-DATA OF S-RECFLD
TO FLD-ARRAY OF RECFLD (FLD-IDX)
* CONVERT FIELD TYPE FROM PSDBFIELD TYPE TO SQLRT CODE.
MOVE FIELDLEN OF S-RECFLD TO SETUP-LENGTH OF RECFLD (FLD-IDX)
IF RDM-CHAR OF S-RECFLD
SET SETUP-TYPE-CHAR OF RECFLD (FLD-IDX) TO TRUE
IF UNICODE-MODE OF SQLRT
COMPUTE SETUP-LENGTH OF RECFLD (FLD-IDX) =
SETUP-LENGTH OF RECFLD (FLD-IDX) * 3
END-IF
END-IF
Some COBOL programs define single-character arrays to parse or examine a string of characters, one character at a time. In a Unicode environment, be sure that you’re examining the string one character at a time, not one byte at a time.
This example shows a code fragment in which the program is examining a string one character at a time:
01 CHAR-ARRAY.
02 CHAR-POS PIC X OCCURS 256 TIMES
INDEXED BY CHAR-IDX.
88 FIELD-DELIM VALUE ‘*’.
. . .
. . .
SET CHAR-IDX TO 1
SEARCH CHAR-ARRAY
WHEN FIELD-DELIM(CHAR-IDX)
SET W-OFFSET TO CHAR-IDX
DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
END-SEARCH
The intent of the code in the previous example is to examine each character of the array, looking for the first delimiter character. When that character is found, the code displays the position of the delimiter.
In a non-Unicode environment that uses only the Latin1 character set, this works because there is one byte (one array element) per character. In a Unicode environment (or in a non-Unicode environment that allows double-byte character sets), this fails because what could potentially be examined is the second or third byte of a two- or three-byte character. It's possible for the second or third byte to match the bit pattern of the delimiter character, thus falsely passing the test and ending the search loop.
To correct this situation, you must know the length (in bytes) of each character that is being processed. A new COBOL function, PTPSTRFN, is available that returns the length of a character so that the code can take this into account when performing a character search. The PTPSTRFN subroutine works for both Unicode character sets and ANSI double-byte characters sets.
The PTPSTRFN subroutine offers two ways of retrieving the byte length of a character:
By requesting the length of a single character.
By requesting a map of an entire character string.
Choose this option if the application program needs to get the length information of all characters within a string.
Requesting the Length of a Single Character
The input parameters to the PTPSTRFN function are:
Parameter |
Value |
Notes |
---|---|---|
ACTION-TYPE |
ACTION-CHARLEN |
None. |
CHAR-CD |
The character whose length you want to verify. |
This variable is included in the PTCSTRFN.CBL copy library. |
These values are returned:
Parameter |
Value |
Notes |
---|---|---|
CHAR-LENGTH |
The subroutine returns one of the following values, representing the length of the character that is referenced by CHAR-CD:
|
This variable is included in the PTCSTRFN.CBL copy library. |
STRFN-RC |
Returns one of the following values:
|
This variable is included in the PTCSTRFN.CBL copy library. |
At the beginning of this section, there was a code fragment in which the program was examining a string, one character at a time, looking for the first delimiter character:
01 CHAR-ARRAY.
02 CHAR-POS PIC X OCCURS 256 TIMES
INDEXED BY CHAR-IDX.
88 FIELD-DELIM VALUE ‘*’.
. . .
. . .
SET CHAR-IDX TO 1
SEARCH CHAR-ARRAY
WHEN FIELD-DELIM(CHAR-IDX)
SET W-OFFSET TO CHAR-IDX
DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
END-SEARCH
After the code is modified for Unicode, it looks like this:
01 CHAR-ARRAY.
02 CHAR-POS PIC X OCCURS 256 TIMES
INDEXED BY CHAR-IDX.
88 FIELD-DELIM VALUE ‘*’.
01 STR-FUNC COPY ‘PTCSTRFN’.
. . .
. . .
SET CHAR-IDX TO 1
SEARCH CHAR-ARRAY
WHEN FIELD-DELIM(CHAR-IDX)
SET W-OFFSET TO CHAR-IDX
DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
WHEN OTHER
MOVE CHAR-POS(CHAR-IDX) TO CHAR-CD OF STR-FUNC
CALL 'PTPSTRFN' USING ACTION-CHARLEN
STR-FUNC
IF TWO-BYTES OF STR-FUNC
SET CHAR-IDX UP BY 1
ELSE IF THREE-BYTES OF STR-FUNC
SET CHAR-IDX UP BY 2
END-IF
END-SEARCH
The modification ensures that the code continues to function properly in a Unicode environment. However, we can be sure that modification works only when the delimiter character for which the program is searching is one byte in length.
Consider the following code fragment in which the delimiter character that the program is searching for may be longer than one byte:
01 W-DELIMITER PIC X(3) VALUE 'some extended character'.
01 CHAR-ARRAY.
02 CHAR-POS PIC X OCCURS 256 TIMES
INDEXED BY CHAR-IDX
CHAR-IDX2
CHAR-IDX3.
88 FIELD-DELIM VALUE ‘*’.
. . .
. . .
SET CHAR-IDX TO 1
SEARCH CHAR-ARRAY
WHEN CHAR-POS(CHAR-IDX) = W-DELIMITER
SET W-OFFSET TO CHAR-IDX
DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
END-SEARCH
For this code to work in a Unicode environment, a Unicode-specific search algorithm must be used. Ensure that the program always compares the correct bytes from the array (up to three bytes, based on the current character length) to the fixed three-byte field containing the search value.
The proper search method looks like this:
01 CHAR-ARRAY.
02 CHAR-POS PIC X OCCURS 256 TIMES
INDEXED BY CHAR-IDX.
88 FIELD-DELIM VALUE ‘*’.
01 STR-FUNC COPY ‘PTCSTRFN’.
. . .
. . .
SET CHAR-IDX TO 1
PERFORM UNTIL CHAR-IDX > 256
MOVE CHAR-POS(CHAR-IDX) TO CHAR-CD OF STR-FUNC
CALL 'PTPSTRFN' USING ACTION-CHARLEN
STR-FUNC
INITIALIZE W-WORK
EVALUATE TRUE
WHEN ONE-BYTE OF STR-FUNC
MOVE CHAR-POS(CHAR-IDX) TO W-WORK
SET CHAR-IDX UP BY 1
WHEN TWO-BYTES OF STR-FUNC
SET CHAR-IDX2 TO CHAR-IDX
SET CHAR-IDX2 UP BY 1
STRING CHAR-POS(CHAR-IDX)
CHAR-POS(CHAR-IDX2)
DELIMITED BY SIZE
INTO W-WORK
END-STRING
SET CHAR-IDX UP BY 2
WHEN THREE-BYTES OF STR-FUNC
SET CHAR-IDX2 TO CHAR-IDX
SET CHAR-IDX2 UP BY 1
SET CHAR-IDX3 TO CHAR-IDX
SET CHAR-IDX3 UP BY 2
STRING CHAR-POS(CHAR-IDX)
CHAR-POS(CHAR-IDX2)
CHAR-POS(CHAR-IDX3)
DELIMITED BY SIZE
INTO W-WORK
END-STRING
SET CHAR-IDX UP BY 3
WHEN OTHER
DISPLAY ‘**ERROR** INVALID CHARACTER LENGTH’
<ABEND>
END-EVALUATE
IF W-WORK = W-DELIMITER
SET W-OFFSET TO CHAR-IDX
DISPLAY ‘FIELD DELIMITER FOUND AT POSITION ‘ W-OFFSET
END-IF
END-PERFORM
As you can see from the previous example, searching a string array for a particular value that may be an extended character can be difficult; if possible, avoid such a search.
Requesting a Map of an Entire Character String
The input parameters to the PTPSTRFN function are:
Parameter |
Value |
Notes |
---|---|---|
ACTION-TYPE |
ACTION-STRMAP. |
None. |
STRING-LENGTH |
Length of the character string String Parameter 1. |
This variable is included in the PTCSTRFN.CBL copy library. |
String Parameter 1 |
The character string. |
None. |
String Parameter 2 |
The buffer area to be updated by the subroutine. |
None. |
This table lists the values that are returned:
Parameter |
Value |
Notes |
---|---|---|
String Parameter 2 |
This buffer is updated with the appropriate values. This field contains at least one of these values:
|
Refer to the example following this table to see how the function works. |
STRFN-RC |
Returns one of the following values:
|
This variable is included in the PTCSTRFN.CBL copy library. |
The following sample COBOL code provides an example of how the PTPSTRFN COBOL function can be used to map the byte length of an entire character string:
01 W-WORK.
02 LANGUAGE PIC X(20).
02 UNICODE-TEXT PIC X(300).
02 UNICODE-TEXT-MAP PIC X(300).
02 DATA-LEN PIC 9(3) COMP.
02 BYTE-POS-MAX PIC 9(4) COMP.
02 COUNTERS.
05 COUNT-1BYTE-CHAR PIC 9(02) VALUE ZEROS.
05 COUNT-2BYTE-CHAR PIC 9(02) VALUE ZEROS.
05 COUNT-3BYTE-CHAR PIC 9(02) VALUE ZEROS.
01 BYTE-ARRAY.
02 BYTE-POS PIC X OCCURS 300 TIMES
INDEXED BY BYTE-IDX.
88 ONE-BYTE-CHAR VALUE '1'.
88 TWO-BYTES-CHAR VALUE '2'.
88 THREE-BYTES-CHAR VALUE '3'.
88 BYTE-STRING-END VALUE SPACE.
01 STR-FUNC COPY 'PTPSTRFN'.
. . .
. . .
Code to retrieve the text from the database and assign to the appropriate fields
. . .
. . .
* Initialize the string map before calling the function
MOVE SPACES TO UNICODE-TEXT-MAP
CALL 'PTPSTRFN' USING ACTION-STRMAP
STR-FUNC
UNICODE-TEXT
UNICODE-TEXT-MAP
IF NOT STRFN-RC-OF OF STR-FUNC
<ABEND PROGRAM>
END-IF
SET BYTE-POS-MAX TO 300
MOVE 300 TO DATA-LEN OF W-WORK
MOVE UNICODE-TEXT-MAP TO BYTE-ARRAY
PERFORM VARYING BYTE-IDX FROM 300 BY -1
UNTIL BYTE-IDX <= 1
OR NOT BYTE-STRING-END(BYTE-IDX)
SUBTRACT 1 FROM DATA-LEN OF W-WORK
END-PERFORM
* Initialize counters
MOVE ZEROS TO COUNT-1BYTE-CHAR
MOVE ZEROS TO COUNT-2BYTE-CHAR
MOVE ZEROS TO COUNT-3BYTE-CHAR
PERFORM UNTIL BYTE-IDX > DATA-LEN OF W-WORK
EVALUATE TRUE
WHEN ONE-BYTE-CHAR
ADD 1 TO COUNT-1BYTE-CHAR
WHEN TWO-BYTES-CHAR
ADD 1 TO COUNT-2BYTE-CHAR
WHEN THREE-BYTES-CHAR
ADD 1 TO COUNT-3BYTE-CHAR
END-EVALUATE
SET BYTE-IDX UP BY 1
END-PERFORM
DISPLAY ' LANGUAGE = ' LANGUAGE
DISPLAY ' UTF8 TEXT: (LENGTH = ' DATA-LEN ')'
DISPLAY ' ' UNICODE-TEXT
DISPLAY ' '
DISPLAY ' UTF8 BYTE MAPPING:'
DISPLAY ' ' UNICODE-TEXT-MAP
DISPLAY ' '
DISPLAY ' TALLY:'
DISPLAY ' NUMBER OF ONE-BYTE CHAR: '
COUNT-1BYTE-CHAR
DISPLAY ' NUMBER OF TWO-BYTES CHAR: '
COUNT-2BYTE-CHAR
DISPLAY ' NUMBER OF THREE-BYTES CHAR: '
COUNT-3BYTE-CHAR
DISPLAY ' '
DISPLAY ' '
The following table provides sample Unicode text as input for the PTPSTRFN function:
Language |
Sample Unicode Text |
---|---|
Catalan |
Quan el món vol conversar, parla Unicode |
Chinese (Simplified) |
当世界需要沟通时,请用Unicode! |
Chinese (Traditional) |
當世界需要溝通時,請用統一碼(Unicode) |
Danish |
Når verden vil tale, taler den Unicode |
Dutch |
Als de wereld wil praten, spreekt hij Unicode |
English |
When the world wants to talk, it speaks Unicode |
Finnish |
Kun maailma haluaa puhua, se puhuu Unicodea |
French |
Quand le monde veut communiquer, il parle en Unicode |
For each row in the table, the code performs the following functions:
Calls PTPSTRFN to get the string mapping of the UTF-8 character string of the text that is retrieved for the UNITEXT field.
Displays the UTF-8 string equivalent of the text.
Tallies the number of one-byte, two-byte, and three-byte characters of the text.
The output of this program appears as follows:
Note: Certain strings appear to be garbled. This is because the system running the program has printed the output by individual bytes and not by multi-byte characters.
LANGUAGE = Catalan
UTF8 TEXT: (LENGTH = 0041)
Quan el món vol conversar, parla Unicode
UTF8 BYTE MAPPING:
1111111112X111111111111111111111111111111
TALLY:
NUMBER OF ONE-BYTE CHAR: 039
NUMBER OF TWO-BYTES CHAR: 001
NUMBER OF THREE-BYTES CHAR: 000
LANGUAGE = Chinese (Simplified)
UTF8 TEXT: (LENGTH = 0043)
彔世界éœ?è¦?沟é?šæ—¶ï¼Œè¯·ç”̈Unicodeï¼?
UTF8 BYTE MAPPING:
3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX
TALLY:
NUMBER OF ONE-BYTE CHAR: 007
NUMBER OF TWO-BYTES CHAR: 000
NUMBER OF THREE-BYTES CHAR: 012
LANGUAGE = Chinese (Traditional
UTF8 TEXT: (LENGTH = 0055)
當世界éœ?è¦?æº?é?šæ™?,è«?ç”̈çµ±ä¸?碼ï¼∘Unicodeï
UTF8 BYTE MAPPING:
3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX3XX11111113XX
TALLY:
NUMBER OF ONE-BYTE CHAR: 007
NUMBER OF TWO-BYTES CHAR: 000
NUMBER OF THREE-BYTES CHAR: 016
LANGUAGE = Danish
UTF8 TEXT: (LENGTH = 0039)
NÃ¥r verden vil tale, taler den Unicode
UTF8 BYTE MAPPING:
12X111111111111111111111111111111111111
TALLY:
NUMBER OF ONE-BYTE CHAR: 037
NUMBER OF TWO-BYTES CHAR: 001
NUMBER OF THREE-BYTES CHAR: 000
LANGUAGE = Dutch
UTF8 TEXT: (LENGTH = 0045)
Als de wereld wil praten, spreekt hij Unicode
UTF8 BYTE MAPPING:
111111111111111111111111111111111111111111111
TALLY:
NUMBER OF ONE-BYTE CHAR: 045
NUMBER OF TWO-BYTES CHAR: 000
NUMBER OF THREE-BYTES CHAR: 000
LANGUAGE = English
UTF8 TEXT: (LENGTH = 0047)
When the world wants to talk, it speaks Unicode
UTF8 BYTE MAPPING:
11111111111111111111111111111111111111111111111
TALLY:
NUMBER OF ONE-BYTE CHAR: 047
NUMBER OF TWO-BYTES CHAR: 000
NUMBER OF THREE-BYTES CHAR: 000
LANGUAGE = Finnish
UTF8 TEXT: (LENGTH = 0043)
Kun maailma haluaa puhua, se puhuu Unicodea
UTF8 BYTE MAPPING:
1111111111111111111111111111111111111111111
TALLY:
NUMBER OF ONE-BYTE CHAR: 043
NUMBER OF TWO-BYTES CHAR: 000
NUMBER OF THREE-BYTES CHAR: 000
LANGUAGE = French
UTF8 TEXT: (LENGTH = 0052)
Quand le monde veut communiquer, il parle en Unicode
UTF8 BYTE MAPPING:
1111111111111111111111111111111111111111111111111111
TALLY:
NUMBER OF ONE-BYTE CHAR: 052
NUMBER OF TWO-BYTES CHAR: 000
NUMBER OF THREE-BYTES CHAR: 000