10 Understanding Unicode Compliance Standards

This chapter contains the following topics:

10.1 Unicode Compliance Standards

The Unicode Standard is the universal character-encoding scheme for written characters and text. It defines a consistent way of way of encoding multilingual text that enables the exchange of text data internationally and creates the foundation for global software.

Facts about Unicode:

  • Unicode is a very large character set containing the characters of virtually every written language.

  • Unicode uses two bytes per character.

    Up to 64,000 characters can be supported using two bytes. Unicode also has a mechanism called "surrogates," which uses pairs of two bytes to describe an additional one million characters.

  • 0x00 is a valid byte in a character.

    For example, the character "A" is described as 0x00 0x41, which means that normal string functions, such as strlen() and strcpy, do not work with Unicode data.

Do not use the data type char. Instead, use JCHAR for Unicode characters and ZCHAR for non-Unicode characters. Use ZCHAR instead of char in a code that needs to interface with non-Unicode APIs.

Old Syntax No Longer Available New Syntax Non-Unicode New Syntax Unicode
Char ZCHAR JCHAR
char *, PSTR ZCHAR*, PZSTR JCHAR*, PJSTR
'A' _Z('A') _J('A')
"string" _Z("string") _J("string")

10.2 Unicode String Functions

Two versions of all string functions exist: one for Unicode and one for non-Unicode. Naming standards for Unicode and non-Unicode string functions are:

  • jdeSxxxxxx() indicates a Unicode string function

  • jdeZSxxxx() indicates a non-Unicode string function

Some of the replacement functions include:

Old String Functions New String Functions Non-Unicode New String Functions Unicode
strcpy() jdeZStrcpy() jdeStrcpy()
strlen() jdeZStrlen() jdeStrlen()
strstr() jdeZStrstr() jdeStrstr()
sprintf() jdeZSprintf() jdeSprintf()
strncpy() jdeZStrncpy() jdeStrncpy()

Note:

The function jdestrcpy() was in use before the migration to Unicode. The Unicode slimer changed existing jdestrcpy() to jdeStrncpyTerminate(). Going forward, developers need to use jdeStrncpyTerminate() where they previously used jdestrcpy().

Do not use traditional string functions, such as strcpy, strlen, and printf. All the jdeStrxxxxxx functions explicitly handle strings, so use character length instead of the sizeof() operator, which returns a byte count.

When using jdeStrncpy(), the third parameter is the number of characters, not the number of bytes.

The DIM() macro gives the number of characters of an array. Given "JCHAR a[10];", DIM(a) returns 10, while sizeof(a) returns 20. "strncpy (a, b, sizeof (a));" needs to become "jdeStrncpy (a, b, DIM (a));".

10.2.1 Example: Using Unicode String Functions

This example shows how to use Unicode string functions:

/**********************************************************************
 In this example jdeStrncpy replaces strncpy. Also sizeof is
 replaced by DIM.
 *********************************************************************/
/* Set key to F38112 */

/*Unicode Compliant*/
jdeStrncpy(dsKey1F38112.dxdcto,
          (const JCHAR *)(dsF4311ZDetail->pwdcto),
          DIM(dsKey1F38112.dxdcto) - 1);

10.3 Unicode Memory Functions

The memset() function changes memory byte by byte. For example, memset (buf, ' ', sizeof (buf)); sets the 10 bytes pointed to by the first argument, buf, to the value 0x20, the space character. Since a Unicode character is 2 bytes, each character is set to 0x2020, which is the dagger character (†) in Unicode.

A new function, jdeMemset() sets memory character by character rather than byte by byte. This function takes a void pointer, a JCHAR, and the number of bytes to set. Use jdeMemset (buf, _J (' '), sizeof (buf)); to set the Unicode string buf so that each character is 0x0020. When using jdeMemset(), the third parameter, sizeof(buf), is the number of bytes, not characters.

Note:

You can use memset when filling a memory block with NULL. For all other characters, use jdeMemset. You also can use jdeMemset for a NULL character.

10.3.1 Example: Using jdeMemset when Setting Characters to Values other than NULL

This example shows how to use jdeMemset when setting characters to values other than NULL:

/**********************************************************************
 In this example memset is replaced by jdeMemset. We need to change
 memset to jdeMemset because we are setting each character of the
 string to a value other than NULL. Also, because jdeMemset works in
 bytes, we cannot just subtract 1 from sizeof(szSubsidiaryBlank) to
 prevent the last character from being set to ' '. We must multiply
 1 by sizeof(JCHAR).
 *********************************************************************/

/*Unicode Compliant*/
jdeMemset((void *)(szSubsidiaryBlank), _J(' '),
          (sizeof(szSubsidiaryBlank) - (1*sizeof(JCHAR))));

10.4 Pointer Arithmetic

When advancing a JCHAR pointer, it is important to advance the pointer by the correct number. In the example, the intent is to initialize each member of an array consisting of JCHAR strings to blank. Inside the "For" loop, the pointer is advanced to point to the next member of the array of JCHAR strings after assigning a value to one of the members of the array. This is achieved by adding the maximum length of the string to the pointer. Since pStringPtr has been defined as a pointer to a JCHAR, adding MAXSTRLENGTH to pStringPtr results in pStringPtr pointing to the next member of the array of strings.

#define MAXSTRLENGTH 10
JCHAR             *pStringPtr;
LPMATH_NUMERIC    pmnPointerToF3007;
for(i=(iDayOfTheWeek+iNumberOfDaysInMonth);i<CALENDARDAYS;i++)
{
  FormatMathNumeric(pStringPtr, &pmnPointerToF3007[i]);
  pStringPtr = pStringPtr + MAXSTRLENGTH;
}

These tables illustrate the effect of adding MAXSTRLENGTH to pStringPtr. The top row in both tables contains memory locations; the bottom rows contain the contents of those memory locations.

The arrow indicates the memory location that pStringPtr points to before MAXSTRLENGTH is added to pStringPtr.

Figure 10-1 Example 1 of Unicode Pointer Arithmetic

Description of Figure 10-1 follows
Description of "Figure 10-1 Example 1 of Unicode Pointer Arithmetic"

Figure 10-2 Example 2 of Unicode Pointer Arithmetic

Description of Figure 10-2 follows
Description of "Figure 10-2 Example 2 of Unicode Pointer Arithmetic"

The arrow indicates the memory location that pStringPtr points to after MAXSTRLENGTH is added to pStringPtr. Adding 10 to pStringPtr makes it move 20 bytes, as it has been declared of type JCHAR.

Figure 10-3 Example 3 of Pointer Arithmetic

Description of Figure 10-3 follows
Description of "Figure 10-3 Example 3 of Pointer Arithmetic"

If pStringPtr is advanced by the value MAXSTRLENGTH * sizeof(JCHAR), then pStringPtr advances twice as much as intended and results in memory corruption.

10.5 Offsets

When adding an offset to a pointer to derive the location of another variable or entity, it is important to determine the method in which the offset was initially created.

In this example, lpKeyStruct->CacheKey[n].nOffset is added to lpData to arrive at the location of a Cache Key segment. This offset was for the segment created using the ANSI C function offsetof, which returns the number of bytes. Therefore, to arrive at the location of Cache Key segment, cast the data structure pointer to type BYTE.

lpTemp1 = (BYTE *)lpData + lpKeyStruct->CacheKey[n].nOffset;
lpTemp2 = (BYTE *)lpKey  + lpKeyStruct->CacheKey[n].nOffset;

In a non-Unicode environment, lpData could have been cast to be of type CHAR * as character size is one Byte in a non-Unicode environment. In a Unicode environment, however, lpData has to be explicitly cast to be of type (JCHAR *) since size of a JCHAR is 2 bytes.

10.6 MATH_NUMERIC APIs

The string members of the MATH_NUMERIC data structure are in ZCHAR (non-Unicode) format. The JD Edwards EnterpriseOne Common Library API includes several functions that retrieve and manipulate these strings in both JCHAR (Unicode) and ZCHAR (non-Unicode) formats.

To retrieve the string value of a MATH_NUMERIC data type in JCHAR format, use the FormatMathNumeric API function. This example illustrates the use of this function:

/* Declare variables */
JCHAR      szJobNumber[MAXLEN_MATH_NUMERIC+1] = _J("\0");
/* Retrieve the string value of the job number */
FormatMathNumeric(szJobNumber, &lpDS->mnJobnumber);

To retrieve the string value of a MATH_NUMERIC data type in ZCHAR format, use the jdeMathGetRawString API function. This example illustrates the use of this function:

/* Declare variables */
ZCHAR      zzJobNumber[MAXLEN_MATH_NUMERIC+1] = _Z("\0");
/* Retrieve the string value of the job number */
zzJobNumber = jdeMathGetRawString(&lpDS->mnJobnumber);

Another commonly used MATH_NUMERIC API function is jdeMathSetCurrencyCode. This function is used to update the currency code member of a MATH_NUMERIC data structure. Two versions of this function exist: jdeMathCurrencyCode and jdeMathCurrencyCodeUNI. The jdeMathCurrencyCode function is used to update the currency code with a ZCHAR value, and jdeMathCurrencyCodeUNI is used to update the currency code with a JCHAR value. This example illustrates the use of these two functions:

/* Declare variables */
ZCHAR      zzCurrencyCode[4] = _Z("USD");
JCHAR      szCurrencyCode[4] = _J("USD");
/* Set the currency code using a ZCHAR value */
jdeMathSetCurrencyCode(&lpDs->mnAmount, (ZCHAR *) zzCurrencyCode);
/* Set the currency code using a JCHAR value /*
jdeMathSetCurrencyCodeUNI(&lpDS->mnAmount, (JCHAR *) szCurrencyCode);

10.7 Third-Party APIs

Some third-party program interfaces (APIs) do not support Unicode character strings. In these cases, you must convert character strings to non-Unicode format before calling the API, and convert them back to Unicode format for storage in JD Edwards EnterpriseOne. Use these guidelines when programming for a non-Unicode API:

  • Declare a Unicode and a non-Unicode variable for each API string parameter.

  • Convert the Unicode strings to non-Unicode strings before calling the API.

  • Call the API passing the non-Unicode strings in the parameter list.

  • Convert the returned non-Unicode strings to Unicode strings for storage in JD Edwards EnterpriseOne.

10.7.1 Example: Third-Party API

This example calls a third-party API named GetStateName that accepts a two-character state code and returns a 30-character state name:

/* Declare variables */
JCHAR  szStateCode[3] = _J("CO");  /* Unicode state code */
JCHAR  szStateName[31] = _J("\0");  /* Unicode state name */
ZCHAR  zzStateCode[3] = _Z("\0");  /* Non-Unicode state code */
ZCHAR  zzStateName[31] = _Z("\0");  /* Non-Unicode state name */
BOOL  bReturnStatus = FALSE;    /* API return flag */
/* Convert unicode strings to non-unicode strings */
jdeFromUnicode(zzStateCode, szStateCode, DIM(zzStateCode), NULL);
/* Call API */
bReturnStatus = GetStateName(zzStateCode, zzStateName);
/* Convert non-unicode strings to unicode strings for storage in
* JD Edwards EnterpriseOne */
jdeToUnicode(szStateName, zzStateName, DIM(szStateName), NULL);

10.8 Flat-File APIs

JD Edwards EnterpriseOne APIs such as jdeFprintf() convert data. This means that the default flat file I/O for character data is in Unicode. If the users of JD Edwards EnterpriseOne-generated flat files are not Unicode enabled, they will not be able to read the flat file correctly. Therefore, use an additional set of APIs.

An interactive application allows users to configure flat file encoding based on attributes such as application name, application version name, user name, and environment name. The API set includes these file I/O functions: fwrite/fread, fprintf/fscanf, fputs/fgets, and fputc/fgetc. The API converts the data using the code page specified in the configuration application. One additional parameter, lpBhvrCom, must be passed to the functions so that the conversion function can find the configuration for that application or version.

These new APIs only need to be called if a process outside of JD Edwards EnterpriseOne is writing or reading the flat file data. If the file is simply a work file or a debugging file and will be written and read by JD Edwards EnterpriseOne, use the non-converting APIs (for example, jdeFprintf()).

10.8.1 Example: Flat-File APIs

This example writes text to a flat file that would only be read by JD Edwards EnterpriseOne. Encoding in the file will be Unicode.

FILE *fp;
fp = jdeFopen(_J( c:/testBSFNZ.txt), _J(w+));
jdeFprintf(fp, _J("%s%d\n"), _J("Line "), 1);
jdeFclose(fp);

This example writes text to a flat file that would be read by third-party systems. Encoding in the file will be based on the encoding configured.

FILE *fp;
fp = jdeFopen(_J( c:/testBSFNZ.txt), _J(w+));
jdeFprintfConvert(lpBhvrCom, fp, _J("%s%d\n"), _J("Line "), 1);
jdeFclose(fp);