Sun Studio 12: C User's Guide

Chapter 7 Converting Applications for a 64-Bit Environment

This chapter provides the information you need for writing code for the 32 bit or the 64-bit compilation environment.

Once you try to write or modify code for both the 32-bit and 64-bit compilation environments, you face two basic issues:

Maintaining a single code-source with as few #ifdefs as possible is usually better than maintaining multiple source trees. Therefore, this chapter provides guidelines for writing code that works correctly in both 32-bit and 64-bit compilation environments. In some cases, the conversion of current code requires only a recompilation and relinking with the 64-bit libraries. However, for those cases where code changes are required, this chapter discusses the tools and strategies that make conversion easier.

7.1 Overview of the Data Model Differences

The biggest difference between the 32-bit and the 64-bit compilation environments is the change in data-type models.

The C data-type model for 32-bit applications is the ILP32 model, so named because integers, longs, and pointers are 32-bit data types. The LP64 data model, so named because longs and pointers grow to 64-bits, is the creation of a consortium of companies across the industry. The remaining C types, int, long long, short, and char are the same in both data-type models.

Regardless of the data-type model, the standard relationship between C integral types holds true:

sizeof (char) <= sizeof (short) <= sizeof (int) <= sizeof (long)

The following table lists the basic C data types and their corresponding sizes in bits for both the ILP32 and LP64 data models.

Table 7–1 Data Type Size for ILP32 and LP64

C Data Type  

LP32  

LP64  

char

short

16 

16 

int

32 

32 

long

32 

64 

long long

64 

64 

pointer

32 

64 

enum

32 

32 

float

32 

32 

double

64 

64 

long double

128 

128 

It is not unusual for current 32-bit applications to assume that integers, pointers, and longs are the same size. Because the size of longs and pointers change in the LP64 data model, you need to be aware that this change alone can cause many ILP32 to LP64 conversion problems.

In addition, it becomes very important to examine declarations and casts; how expressions are evaluated can be affected when the types change. The effects of standard C conversion rules are influenced by the change in data-type sizes. To adequately show what you intend, you need to explicitly declare the types of constants. You can also use casts in expressions to make certain that the expression is evaluated the way you intend. This is particularly true in the case of sign extension, where explicit casting is essential for demonstrating intent.

7.2 Implementing Single Source Code

The following sections describe some of the available resources that you can use to write single-source code that supports 32-bit and 64-bit compilation.

7.2.1 Derived Types

Use the system derived types to make code safe for both the 32-bit and the 64-bit compilation environment. In general, it is good programming practice to use derived types to allow for change. When you use derived data-types, only the system derived types need to change due to data model changes, or due to a port.

The system include files <sys/types.h> and <inttypes.h> contain constants, macros, and derived types that are helpful in making applications 32-bit and 64-bit safe.

7.2.1.1 <sys/types.h>

Include <sys/types.h> in an application source file to gain access to the definition of _LP64 and _ILP32. This header also contains a number of basic derived types that should be used whenever appropriate. In particular, the following are of special interest:

All of these types remain 32-bit quantities in the ILP32 compilation environment and grow to 64-bit quantities in the LP64 compilation environment.

7.2.1.2 <inttypes.h>

The include file <inttypes.h> provides constants, macros, and derived types that help you make your code compatible with explicitly sized data items, independent of the compilation environment. It contains mechanisms for manipulating 8-bit, 16-bit, 32-bit, and 64-bit objects. The file is part of the new 1999 ISO/IEC C standard and the contents of the file track the proposals leading to its inclusion in the 1999 ISO/IEC C standard. The file will soon be updated to fully conform with the 1999 ISO/IEC C standard. The following is a discussion of the basic features provided by <inttypes.h>:

The following sections provide more information about the basic features of <inttypes.h>.

Fixed-Width Integer Types

The fixed-width integer types that <inttypes.h> provides, include signed integer types, such as int8_t, int16_t, int32_t, int64_t, and unsigned integer types, such as uint8_t, uint16_t, uint32_t, and uint64_t.

Derived types defined as the smallest integer types that can hold the specified number of bits include int_least8_t,…, int_least64_t, uint_least8_t,…, uint_least64_t.

It is safe to use an int or unsigned int for such operations as loop counters and file descriptors; it is also safe to use a long for an array index. However, do not use these fixed-width types indiscriminately. Use fixed-width types for explicit binary representations of the following:

Helpful Types Such as unintptr_t

The <inttypes.h> file includes signed and unsigned integer types large enough to hold a pointer. These are given as intptr_t and uintptr_t. In addition, <inttypes.h> provides intmax_t and uintmax_t, which are the longest (in bits) signed and unsigned integer types available.

Use the uintptr_t type as the integral type for pointers instead of a fundamental type such as unsigned long. Even though an unsigned long is the same size as a pointer in both the ILP32 and LP64 data models, using uintptr_t means that only the definition of uintptr_t is effected if the data model changes. This makes your code portable to many other systems. It is also a more clear way to express your intentions in C.

The intptr_t and uintptr_t types are extremely useful for casting pointers when you want to perform address arithmetic. Use intptr_t and uintptr_t types instead of long or unsigned long for this purpose.

Constant Macros

Use the macros INT8_C(c), …, INT64_C(c), UINT8_C(c),…, UINT64_C(c) to specify the size and sign of a given constant. Basically, these macros place an l, ul, ll, or ull at the end of the constant, if necessary. For example, INT64_C(1) appends ll to the constant 1 for ILP32 and an l for LP64.

Use the INTMAX_C(c) and UINTMAX_C(c) macros to make a constant the biggest type. These macros can be very useful for specifying the type of constants described in 7.3 Converting to the LP64 Data Type Model.

Limits

The limits defined by <inttypes.h> are constants that specify the minimum and maximum values of various integer types. This includes minimum and maximum values for each of the fixed-width types such as INT8_MIN,…, INT64_MIN, INT8_MAX,…, INT64_MAX, and their unsigned counterparts.

The <inttypes.h> file also provides the minimum and maximum for each of the least-sized types. These include INT_LEAST8_MIN,…, INT_LEAST64_MIN, INT_LEAST8_MAX,…, INT_LEAST64_MAX, as well as their unsigned counterparts.

Finally, <inttypes.h> defines the minimum and maximum value of the largest supported integer types. These include INTMAX_MIN and INTMAX_MAX and their corresponding unsigned versions.

Format String Macros

The <inttypes.h> file also includes the macros that specify the printf(3S) and scanf(3S) format specifiers. Essentially, these macros prepend the format specifier with an l or ll to identify the argument as a long or long long, given that the number of bits in the argument is built into the name of the macro.

There are macros for printf(3S) that print both the smallest and largest integer types in decimal, octal, unsigned, and hexadecimal formats as the following example shows:


int64_t i;
printf("i =%" PRIx64 "\n", i);

Similarly, there are macros for scanf(3S)that read both the smallest and largest integer types in decimal, octal, unsigned, and hexadecimal formats.


uint64_t u;
scanf("%" SCNu64 "\n", &u);

Do not use these macros indiscriminately. They are best used in conjunction with the fixed-width types discussed in Fixed-Width Integer Types.

7.2.2 Tools

The lint program’s -errchk option detects potential 64-bit porting problems. You can also specify cc -v which directs the compiler to perform additional and more strict semantic checks than by compiling without -v. The -v option also enables certain lint-like checks on the named files.

When you enhance code to be 64-bit safe, use the header files present in the Solaris operating system because these files have the correct definition of the derived types and data structures for the 64-bit compilation environment.

7.2.2.1 lint

Use lint to check code that is written for both the 32-bit and the 64-bit compilation environment. Specify the -errchk=longptr64 option to generate LP64 warnings. Also use the -errchk=longptr64 flag which checks portability to an environment for which the size of long integers and pointers is 64 bits and the size of plain integers is 32 bits. The -errchk=longptr64 flag checks assignments of pointer expressions and long integer expressions to plain integers, even when explicit casts are used.

Use the -errchk=longptr64,signext option to find code where the normal ISO C value-preserving rules allow the extension of the sign of a signed-integral value in an expression of unsigned-integral type.

Use the -Xarch=v9 option of lint when you want to check code that you intend to run in the Solaris 64-bit compilation environment only. Use -Xarch=amd64 when you want to check code you intend to run in the x86 64-bit environment.

When lint generates warnings, it prints the line number of the offending code, a message that describes the problem, and whether or not a pointer is involved. The warning message also indicates the sizes of the involved data types. When you know a pointer is involved and you know the size of the data types, you can find specific 64-bit problems and avoid the pre-existing problems between 32-bit and smaller types.

Be aware, however, that even though lint gives warnings about potential 64-bit problems, it cannot detect all problems. Also, in many cases, code that is intentional and correct for the application generates a warning.

You can suppress the warning for a given line of code by placing a comment of the form “NOTE(LINTED(“<optional message”>))” on the previous line. This is useful when you want lint to ignore certain lines of code such as casts and assignments. Exercise extreme care when you use the “NOTE(LINTED(“<optional message”>))” comment because it can mask real problems. When you use NOTE, include #include<note.h>. Refer to the lint man page for more information.

7.3 Converting to the LP64 Data Type Model

The examples that follow illustrate some of the more common problems you are likely to encounter when you convert code. Where appropriate, the corresponding lint warnings are shown.

7.3.1 Integer and Pointer Size Change

Since integers and pointers are the same size in the ILP32 compilation environment, some code relies on this assumption. Pointers are often cast to int or unsigned int for address arithmetic. Instead, cast your pointers to long because long and pointers are the same size in both ILP32 and LP64 data-type models. Rather than explicitly using unsigned long, use uintptr_t instead because it expresses your intent more closely and makes the code more portable, insulating it against future changes. Consider the following example:


char *p;
p = (char *) ((int)p & PAGEOFFSET);
%
warning: conversion of pointer loses bits

Here is the modified version:


char *p;
p = (char *) ((uintptr_t)p & PAGEOFFSET);

7.3.2 Integer and Long Size Change

Because integers and longs are never really distinguished in the ILP32 data-type model, your existing code probably uses them indiscriminately. Modify any code that uses integers and longs interchangeably so it conforms to the requirements of both the ILP32 and LP64 data-type models. While an integer and a long are both 32-bits in the ILP32 data-type model, a long is 64 bits in the LP64 data-type model.

Consider the following example:


int waiting;
long w_io;
long w_swap;
...
waiting = w_io + w_swap;

%
warning: assignment of 64-bit integer to 32-bit integer

Furthermore, large arrays of integers, such as longs or unsigned longs, can cause serious performance degradation in the LP64 data-type model as compared to arrays of ints or unsigned ints. Large arrays of longs or unsigned longs can also cause significantly more cache misses and consume more memory.

Therefore, if int works just as well as long for the application purposes, it’s better to use int rather than long.

This is also an argument for using arrays of ints instead of arrays of pointers. Some C applications suffer from serious performance degradation after conversion to the LP64 data-type model because they rely on many, large, arrays of pointers.

7.3.3 Sign Extension

Sign extension is a common problem when you convert to the 64-bit compilation environment because the type conversion and promotion rules are somewhat obscure. To prevent sign extension problems, use explicit casting to achieve the intended results.

To understand why sign extension occurs, it helps to understand the conversion rules for ISO C. The conversion rules that seem to cause the most sign extension problems between the 32-bit and the 64-bit compilation environment come into effect during the following operations:

When the following example is compiled as a 64-bit program, the addr variable becomes sign-extended, even though both addr and a.base are unsigned types.


%cat test.c
struct foo {
unsigned int base:19, rehash:13;
};

main(int argc, char *argv[])
{
  struct foo a;
  unsigned long addr;

  a.base = 0x40000;
  addr = a.base << 13;  /* Sign extension here! */
  printf("addr 0x%lx\n", addr);

 addr = (unsigned int)(a.base << 13); /* No sign extension here! */
 printf("addr 0x%lx\n", addr);
}

This sign extension occurs because the conversion rules are applied as follows:


% cc -o test64 -xarch=v9 test.c
% ./test64
addr 0xffffffff80000000
addr 0x80000000
%

When this same example is compiled as a 32-bit program it does not display any sign extension:


cc -o test test.c
%test

addr 0x80000000
addr 0x80000000

For a more detailed discussion of the conversion rules, refer to the ISO C standard. Also included in this standard are useful rules for ordinary arithmetic conversions and integer constants.

7.3.4 Pointer Arithmetic Instead of Integers

In general, using pointer arithmetic works better than integers because pointer arithmetic is independent of the data model, whereas integers might not be. Also, you can usually simplify your code by using pointer arithmetic. Consider the following example:


int *end;
int *p;
p = malloc(4 * NUM_ELEMENTS);
end = (int *)((unsigned int)p + 4 * NUM_ELEMENTS);

%
warning: conversion of pointer loses bits

Here is the modified version:


int *end;
int *p;
p = malloc(sizeof (*p) * NUM_ELEMENTS);
end = p + NUM_ELEMENTS;

7.3.5 Structures

Check the internal data structures in an applications for holes. Use extra padding between fields in the structure to meet alignment requirements. This extra padding is allocated when long or pointer fields grow to 64 bits for the LP64 data-type model. In the 64-bit compilation environment on SPARC platforms, all types of structures are aligned to the size of the largest member within them. When you repack a structure, follow the simple rule of moving the long and pointer fields to the beginning of the structure. Consider the following structure definition:


struct bar {
   int i;
   long j;
   int k;
   char *p;
};   /* sizeof (struct bar) = 32 */

Here is the same structure with the long and pointer data types defined at the beginning of the structure:


struct bar {
  char *p;
  long j;
  int i;
  int k;
};   /* sizeof (struct bar) = 24 */

7.3.6 Unions

Be sure to check unions because their fields can change size between the ILP32 and the LP64 data-type models.


typedef union {
   double _d;
   long _l[2];
} llx_t;

Here is the modified version


typedef union {
   double _d;
   int _l[2];
} llx_t;

7.3.7 Type Constants

A lack of precision can cause the loss of data in some constant expressions. Be explicit when you specify the data types in your constant expression. Specify the type of each integer constant by adding some combination of {u,U,l,L}. You can also use casts to specify the type of a constant expression. Consider the following example:


int i = 32;
long j = 1 << i; /* j will get 0 because RHS is integer */
                              /* expression */

Here is the modified version:


int i = 32;
long j = 1L << i;

7.3.8 Beware of Implicit Declarations

If you use -xc99=none, the C compiler assumes that any function or variable that is used in a module and not defined or declared externally is an integer. Any longs and pointers used in this way are truncated by the compiler’s implicit integer declaration. Place the appropriate extern declaration for the function or variable in a header and not in the C module. Include this header in any C module that uses the function or variable. If this is a function or variable defined by the system headers, you still need to include the proper header in the code. Consider the following example:


int
main(int argc, char *argv[])
{
  char *name = getlogin();
  printf("login = %s\n", name);
  return (0);
}

%
warning: improper pointer/integer combination: op "="
warning: cast to pointer from 32-bit integer
implicitly declared to return int
getlogin        printf

The proper headers are now in the modified version


#include <unistd.h>
#include <stdio.h>

int
main(int argc, char *argv[])
{
  char *name = getlogin();
  (void) printf("login = %s\n", name);
  return (0);
}

7.3.9 sizeof( ) Is an Unsigned long

In the LP64 data-type model, sizeof() has the effective type of an unsigned long. Occasionally, sizeof() is passed to a function expecting an argument of type int, or assigned or cast to an integer. In some cases, this truncation causes loss of data.


long a[50];
unsigned char size = sizeof (a);

%
warning: 64-bit constant truncated to 8 bits by assignment
warning: initializer does not fit or is out of range: 0x190

7.3.10 Use Casts to Show Your Intentions

Relational expressions can be tricky because of conversion rules. You should be very explicit about how you want the expression to be evaluated by adding casts wherever necessary.

7.3.11 Check Format String Conversion Operation

Make sure the format strings for printf(3S), sprintf(3S), scanf(3S), and sscanf(3S) can accommodate long or pointer arguments. For pointer arguments, the conversion operation given in the format string should be %p to work in both the 32-bit and 64-bit compilation environments.


char *buf;
struct dev_info *devi;
...
(void) sprintf(buf, "di%x", (void *)devi);

%
warning: function argument (number) type inconsistent with format
sprintf (arg 3)     void *: (format) int

Here is the modified version


char *buf;
struct dev_info *devi;
...
(void) sprintf(buf, ”di%p", (void *)devi);

For long arguments, the long size specification, l, should be prepended to the conversion operation character in the format string. Furthermore, check to be sure that the storage pointed to by buf is large enough to contain 16 digits.


size_t nbytes;
u_long align, addr, raddr, alloc;
printf("kalloca:%d%%%d from heap got%x.%x returns%x\n",
nbytes, align, (int)raddr, (int)(raddr + alloc), (int)addr);

%
warning: cast of 64-bit integer to 32-bit integer
warning: cast of 64-bit integer to 32-bit integer
warning: cast of 64-bit integer to 32-bit integer

Here is the modified version


size_t nbytes;
u_long align, addr, raddr, alloc;
printf("kalloca:%lu%%%lu from heap got%lx.%lx returns%lx\n",
nbytes, align, raddr, raddr + alloc, addr);

7.4 Other Considerations

The remaining guidelines highlight common problems encountered when converting an application to a full 64-bit program.

7.4.1 Derived Types That Have Grown in Size

A number of derived types have changed to now represent 64-bit quantities in the 64-bit application compilation environment. This change does not affect 32-bit applications; however, any 64-bit applications that consume or export data described by these types need to be reevaluated. An example of this is in applications that directly manipulate the utmp(4) or utmpx(4) files. For correct operation in the 64-bit application environment, do not attempt to directly access these files. Instead, use the getutxent(3C) and related family of functions.

7.4.2 Check for Side Effects of Changes

Be aware that a type change in one area can result in an unexpected 64-bit conversion in another area. For example, check all the callers of a function that previously returned an int and now returns an ssize_t.

7.4.3 Check Whether Literal Uses of long Still Make Sense

A variable that is defined as a long is 32 bits in the ILP32 data-type model and 64 bits in the LP64 data-type model. Where it is possible, avoid problems by redefining the variable and use a more portable derived type.

Related to this, a number of derived types have changed under the LP64 data-type model. For example, pid_t remains a long in the 32-bit environment, but under the 64-bit environment, a pid_t is an int.

7.4.4 Use #ifdef for Explicit 32-bit Versus 64-bit Prototypes

In some cases, specific 32-bit and 64-bit versions of an interface are unavoidable. You can distinguish these by specifying the _LP64 or _ILP32 feature test macros in the headers. Similarly, code that runs in 32-bit and 64-bit environments needs to utilize the appropriate #ifdefs, depending on the compilation mode.

7.4.5 Calling Convention Changes

When you pass structures by value and compile the code for a 64-bit environment, the structure is passed in registers rather than as a pointer to a copy if it is small enough. This can cause problems if you try to pass structures between C code and handwritten assembly code.

Floating point parameters work in a similar fashion; some floating point values passed by value are passed in floating point registers.

7.4.6 Algorithm Changes

After your code is safe for the 64-bit environment, review your code again to verify that the algorithms and data structures still make sense. The data types are larger, so data structures might use more space. The performance of your code might change as well. Given these concerns, you might need to modify your code appropriately.

7.5 Checklist for Getting Started

Use the following checklist to help you convert your code to 64-bit.