Guidelines for Converting to LP64 Data Type Model - Oracle® Solaris 64-bit Developer's Guide

Language:

4.4 Guidelines for Converting to LP64 Data Type Model

When using lint, remember that not all problems result in lint warnings, nor do all lint warnings indicate that a change is required. Examine each possibility for intent. The examples that follow illustrate some of the more common problems you are likely to encounter when converting code. Where appropriate, the corresponding lint warnings are shown.

4.4.1 Do Not Assume `int` and Pointers Are the Same Size

Because integers and pointers are the same size in the ILP32 compilation environment, some code relies on this assumption. Pointers are often cast to int or unsigned int for address arithmetic. Instead, cast your pointers to long because long and pointers are the same size in both ILP32 and LP64 data-type models. Rather than explicitly using unsigned long, use uintptr_t instead. It expresses your intent more closely and makes the code more portable, insulating it against future changes. Consider the following example:

char *p;
p = (char *) ((int)p & PAGEOFFSET);
%
warning: conversion of pointer loses bits

The modified version is:

char *p;
p = (char *) ((uintptr_t)p & PAGEOFFSET);

4.4.2 Do Not Assume int and long Are the Same Size

Because int and long were never really distinguished in ILP32, a lot of existing code uses them indiscriminately while implicitly or explicitly assuming that they are interchangeable. Any code that makes this assumption must be changed to work for both ILP32 and LP64. While an int and a long are both 32-bit in the ILP32 data model, in the LP64 data model, a long is 64-bit. For example:

int waiting;
long w_io;
long w_swap;
...
waiting = w_io + w_swap;

%
warning: assignment of 64-bit integer to 32-bit integer

Furthermore, large arrays of long or unsigned long can cause serious performance degradation in the LP64 data-type model as compared to arrays of int or unsigned int. Large arrays of long or unsigned long can also cause significantly more cache misses and consume more memory.

Therefore, if int works just as well as long for the application purposes, use int rather than long.

This argument also applies to using arrays of int instead of arrays of pointers. Some C applications suffer from serious performance degradation after conversion to the LP64 data-type model because they rely on many large arrays of pointers.

For more information about the capabilities of the C compilers and lint, see Oracle Developer Studio 12.6: C User's Guide.

4.4.3 Sign Extension

Unintended sign extension is a common problem when converting to 64-bit. It is hard to detect this problem before it occurs. However, lint command warns about sign-extension with –errchk=signext. Furthermore, the type conversion and promotion rules are somewhat obscure. To fix unintended sign extension problems, you must use explicit casting to achieve the intended results.

To understand why sign extension occurs, it helps to understand the conversion rules for ISO C. The conversion rules that seem to cause the most sign extension problems between the 32-bit and the 64-bit compilation environment come into effect during the following operations:

Integral promotion

You can use a char, short, enumerated type, or bit-field, whether signed or unsigned, in any expression that calls for an integer.

If an integer can hold all possible values of the original type, the value is converted to an integer; otherwise, the value is converted to an unsigned integer.
Conversion between signed and unsigned integers

When an integer with a negative sign is promoted to an unsigned integer of the same or larger type, it is first promoted to the signed equivalent of the larger type and then converted to the unsigned value.

For more information about the conversion rules, refer to the ISO C standard. Also included in this standard are useful rules for ordinary arithmetic conversions and integer constants.

When the following example is compiled as a 64-bit program, the addr variable becomes sign-extended, even though both addr and a.base are unsigned types.

Example 1 Compiling the test.c 64-bit Program

$ cat test.c
struct foo {
unsigned int base:19, rehash:13;
};

main(int argc, char *argv[])
{
  struct foo a;
  unsigned long addr;

  a.base = 0x40000;
  addr = a.base << 13;  /* Sign extension here! */
  printf("addr 0x%lx\n", addr);

 addr = (unsigned int)(a.base << 13); /* No sign extension here! */
 printf("addr 0x%lx\n", addr);
}

This sign extension occurs because the conversion rules are applied as follows:

a.base is converted from an unsigned int to an int because of the integral promotion rule. Thus, the expression a.base << 13 is of type int, but no sign extension has yet occurred.
The expression a.base << 13 is of type int, but it is converted to a long and then to an unsigned long before being assigned to addr, because of signed and unsigned integer promotion rules. The sign extension occurs when it is converted from an int to a long.

$ cc -o test64 -m64 test.c
$ ./test64
addr 0xffffffff80000000
addr 0x80000000

When this same example is compiled as a 32-bit program it does not display any sign extension:

$ cc -o test -m32 test.c
$ test

addr 0x80000000
addr 0x80000000

For more information about the conversion rules, refer to the ISO C standard. Also included in this standard are useful rules for ordinary arithmetic conversions and integer constants.

4.4.4 Use Pointer Arithmetic Instead of Integers

Using pointer arithmetic usually works better than integers because pointer arithmetic is independent of the data model, whereas integers might not be. Also, you can usually simplify your code by using pointer arithmetic. Consider the following example:

int *end;
int *p;
p = malloc(4 * NUM_ELEMENTS);
end = (int *)((unsigned int)p + 4 * NUM_ELEMENTS);

%
warning: conversion of pointer loses bits

The modified version is:

int *end; 
int *p;
p = malloc(sizeof (*p) * NUM_ELEMENTS);
end = p + NUM_ELEMENTS;

4.4.5 Internal Data Structure Checking

Check the internal data structures in an application for holes. Use extra padding between fields in the structure to meet alignment requirements. This extra padding is allocated when long or pointer fields grow to 64-bit for the LP64 data-type model. In the 64-bit compilation environment on SPARC platforms, all types of structures are aligned to the size of the largest member within them. When you repack a structure, follow the simple rule of moving the long and pointer fields to the beginning of the structure. However, this rule for repacking might affect performance depending on which members end up on the same cache line. Consider the following structure definition:

struct bar {
		int i;
		long j;
		int k;
		char *p;
};			/* sizeof (struct bar) = 32 */

The following example shows the same structure with the long and pointer data types defined at the beginning of the structure:

struct bar {
		char *p;
		long j;
		int i;
		int k;
};			/* sizeof (struct bar) = 24 */

Note - The alignment of fundamental types are different in the i386 and amd64 ABIs. See Alignment Issues.

4.4.6 Check Unions

Be sure to check unions because their fields might have changed sizes between ILP32 and LP64. For example:

typedef union {
       double   _d;
       long _l[2];
} llx_t;

The modified version is:

typedef union {
       double _d;
       int _l[2];
} llx_t;

4.4.7 Specify Constant Types

A lack of precision can cause the loss of data in some constant expressions. Be explicit when you specify the data types in your constant expression. Specify the type of each integer constant by adding some combination of {u,U,l,L}. You can also use casts to specify the type of a constant expression. Consider the following example:

int i = 32;
long j = 1 << i;		/* j will get 0 because RHS is integer expression */

The modified version is:

int i = 32;
long j = 1L << i;

4.4.8 Beware of Implicit Declaration

If you use –-std=c90 or –-xc99=none, the C compiler assumes that any function or variable that is used in a module and is not defined or declared externally is an integer. Any long and pointer data used in this way is truncated by the compiler's implicit integer declaration. Place the appropriate extern declaration for the function or variable in a header and not in the C module. Include this header in any C module that uses the function or variable. Even if the function or variable is defined by the system headers, you must include the proper header in the code. Consider the following example:

int
main(int argc, char *argv[])
{
		char *name = getlogin()
		printf("login = %s\n", name);
		return (0);
}

%
warning: improper pointer/integer combination: op "="
warning: cast to pointer from 32-bit integer
implicitly declared to return int 
getlogin        printf

The proper headers are now in the following modified version:

#include <unistd.h>
#include <stdio.h>
 
int
main(int argc, char *argv[])
{
		char *name = getlogin();
		(void) printf("login = %s\n", name);
		return (0);
}

4.4.9 `sizeof` `unsigned long` in LP64

In the LP64 data model, sizeof() has the effective type of an unsigned long. Occasionally, sizeof() is passed to a function expecting an argument of type int, or assigned or cast to an integer. In some cases, this truncation causes loss of data.

long a[50];
unsigned char size = sizeof (a);

%
warning: 64-bit constant truncated to 8 bits by assignment
warning: initializer does not fit or is out of range: 0x190

4.4.10 Use Casts to Show Your Intentions

Relational expressions can be tricky because of conversion rules. You should be very explicit about how you want the expression to be evaluated by adding casts wherever necessary.

4.4.11 Check Format String Conversion Operations

The format strings for printf(3C), sprintf(3C), scanf(3C), and sscanf(3C) might need to be changed for long or pointer arguments. For pointer arguments, the conversion operation given in the format string should be %p to work in both the 32-bit and 64-bit environments. For example:

char *buf;
struct dev_info *devi;
...
(void) sprintf(buf, "di%x", (void *)devi);

%
warning: function argument (number) type inconsistent with format
sprintf (arg 3)     void *: (format) int

The modified version is:

char *buf;
struct dev_info *devi;
...
(void) sprintf(buf, "di%p", (void *)devi);

Also check to be sure that the storage pointed to by buf is large enough to contain 16 digits. For long arguments, the long size specification, l, should be prepended to the conversion operation character in the format string. For example:

    size_t nbytes;
    ulong_t align, addr, raddr, alloc;
    printf("kalloca:%d%%%d from heap got %x.%x returns %x\n", 
            nbytes, align, (int)raddr, (int)(raddr + alloc), (int)addr);

produces the warnings:

warning: cast of 64-bit integer to 32-bit integer
warning: cast of 64-bit integer to 32-bit integer
warning: cast of 64-bit integer to 32-bit integer

The following code will produce clean results:

    size_t nbytes;
    ulong_t align, addr, raddr, alloc;
    printf("kalloca:%lu%%%lu from heap got %lx.%lx returns %lx\n", 
            nbytes, align, raddr, raddr + alloc, addr);

Note - The PRI* macros available in inttypes.h can help make format strings portable between 32-bit and 64-bit environment.

4.4.12 Compiling LP64 Programs

The following guidelines can help you increase the performance when converting to 64-bit applications:

When compiling applications in 64-bit, it supports largefile by default. Therefore, the following CPPFLAGS are no longer required when compiling:
- CPPFLAGS += -D_FILE_OFFSET_BITS=64
- CPPFLAGS += -D_LARGEFILE64_SOURCE
- CPPFLAGS += -D_LARGEFILE_SOURCE
For single-threaded code, use putc_unlocked() function instead of putc() function and use getc_unlocked() function instead of getc() function. You can also use buffers.
When converting the code to 64-bit, replace the explicit 64-bit interfaces and data types with just the generic interfaces and data types. For example, no need to use fopen64() or off64_t().
When converting code to 64-bit, ensure that all the dlopen() calls in the code are calling a 64-bit library.
When compiling an LP64 program pay attention to variables that are declared long. There might be problems in the code in assuming an int and long are the same. For example, when assigning a value of a long variable to an int variable.
In code that uses the select() interfaces, especially the fd masks, check to see how big a set of fds you actually want to handle. The default value for FD_SETSIZE is 1024 in 32-bit and 65536 in 64-bit. If they are careful, programs can #define FD_SETSIZE to another value before including any system headers to choose a different size. If you do not override it, all fd_mask variables will grow to be 8k (often on the stack), and operations such as FD_ZERO, copying fd_masks, or searching for the set bits in a mask returned by select() function will have to operate on 8k at a time.
Without the 256 fd limit of the 32-bit stdio, 64-bit programs will run with higher fd limits, so it becomes even more important to use closefrom() or fdwalk() instead of for (i = 0; i < max_fd ; i++) loops. For more information, see closefrom(3C) and fdwalk(3C) man pages.

Oracle® Solaris 64-bit Developer's Guide

4.4 Guidelines for Converting to LP64 Data Type Model

4.4.1 Do Not Assume int and Pointers Are the Same Size

4.4.2 Do Not Assume int and long Are the Same Size

4.4.3 Sign Extension

4.4.4 Use Pointer Arithmetic Instead of Integers

4.4.5 Internal Data Structure Checking

4.4.6 Check Unions

4.4.7 Specify Constant Types

4.4.8 Beware of Implicit Declaration

4.4.9 sizeof unsigned long in LP64

4.4.10 Use Casts to Show Your Intentions

4.4.11 Check Format String Conversion Operations

4.4.12 Compiling LP64 Programs

Oracle^® Solaris 64-bit Developer's Guide

4.4.1 Do Not Assume `int` and Pointers Are the Same Size

4.4.9 `sizeof` `unsigned long` in LP64