2.10 Pointers and Arrays

2.10.1 Pointers and Addresses
2.10.2 Pointer Safety
2.10.3 Array Declarations and Storage
2.10.4 Pointer and Array Relationship
2.10.5 Pointer Arithmetic
2.10.6 Generic Pointers
2.10.7 Multi-Dimensional Arrays
2.10.8 Pointers to DTrace Objects
2.10.9 Pointers and Address Spaces

Pointers are memory addresses of data objects in the operating system kernel or in the address space of a user process. D provides the ability to create and manipulate pointers and store them in variables and associative arrays. This chapter describes the D syntax for pointers, operators that can be applied to create or access pointers, and the relationship between pointers and fixed-size scalar arrays. Also discussed are issues relating to the use of pointers in different address spaces.

Note

If you are an experienced C or C++ programmer, you can skim most of this chapter as the D pointer syntax is the same as the corresponding ANSI C syntax. You should read Section 2.10.8, “Pointers to DTrace Objects” and Section 2.10.1, “Pointers and Addresses” as they describe features and issues that are specific to DTrace.

2.10.1 Pointers and Addresses

The Oracle Linux operating system uses a technique called virtual memory to provide each user process with its own virtual view of the memory resources on your system. A virtual view of memory resources is referred to as an address space, which associates a range of address values (either [0 ... 0xffffffff] for a 32-bit address space or [0 ... 0xffffffffffffffff] for a 64-bit address space) with a set of translations that the operating system and hardware use to convert each virtual address to a corresponding physical memory location. Pointers in D are data objects that store an integer virtual address value and associate it with a D type that describes the format of the data stored at the corresponding memory location.

You can declare a D variable to be of pointer type by first specifying the type of the referenced data and then appending an asterisk (*) to the type name to indicate you want to declare a pointer type. For example, the statement:

int *p;

declares a D global variable named p that is a pointer to an integer. This declaration means that p itself is a 64-bit integer, whose value is the address of another integer located somewhere in memory. Because the compiled form of your D code is executed at probe firing time inside the operating system kernel itself, D pointers are typically pointers associated with the kernel's address space. You can use the arch command to determine the number of bits used for pointers by the active operating system kernel.

If you want to create a pointer to a data object inside of the kernel, you can compute its address using the & operator. For example, the operating system kernel source code declares an unsigned long max_pfn variable. You could trace the address of this variable by tracing the result of applying the & operator to the name of that object in D:

trace(&`max_pfn);

The * operator can be used to refer to the object addressed by the pointer, and acts as the inverse of the & operator. For example, the following two D code fragments are equivalent in meaning:

p = &`max_pfn; trace(*p);

trace(`max_pfn); 

The first fragment creates a D global variable pointer p. Because the max_pfn object is of type unsigned long, the type of the result of &`max_pfn is unsigned long * (that is, pointer to unsigned long). Tracing the value of *pfollows the pointer back to the data object max_pfn. This fragment is therefore the same as the second fragment, which directly traces the value of the data object by using its name.

2.10.2 Pointer Safety

If you are a C or C++ programmer, you may be a bit frightened after reading the previous section because you know that misuse of pointers in your programs can cause your programs to crash. DTrace is a robust, safe environment for executing your D programs where these mistakes cannot cause program crashes. You may indeed write a buggy D program, but invalid D pointer accesses do not cause DTrace or the operating system kernel to fail or crash in any way. Instead, the DTrace software detects any invalid pointer accesses, disables your instrumentation, and reports the problem back to you for debugging.

If you have programmed in the Java programming language, you probably know that the Java language does not support pointers for precisely the same reasons of safety. Pointers are needed in D because they are an intrinsic part of the operating system's implementation in C, but DTrace implements the same kind of safety mechanisms found in the Java programming language that prevent buggy programs from damaging themselves or each other. DTrace's error reporting is similar to the run-time environment for the Java programming language that detects a programming error and reports an exception back to you.

To see DTrace's error handling and reporting, write a deliberately bad D program using pointers. In an editor, type the following D program and save it in a file named badptr.d:

BEGIN
{
  x = (int *)NULL;
  y = *x;
  trace(y);
}

The badptr.d program creates a D pointer named x that is a pointer to int. The program assigns this pointer the special invalid pointer value NULL, which is a built-in alias for address 0. By convention, address 0 is always defined to be invalid so that NULL can be used as a sentinel value in C and D programs. The program uses a cast expression to convert NULL to be a pointer to an integer. The program then dereferences the pointer using the expression *x, and assigns the result to another variable y, and then attempts to trace y. When the D program is executed, DTrace detects an invalid pointer access when the statement y = *x is executed and reports the error:

# dtrace -s badptr.d
dtrace: script 'badptr.d' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN):
invalid address (0x0) in action #2 at DIF offset 4
^C
#

The other problem that can arise from programs that use invalid pointers is an alignment error. Alignment errors in D usually indicate that your pointer has an invalid or corrupt value due to a bug in your D program.

For details about the DTrace error mechanism, see Section 11.1.3, “ERROR Probe”.

2.10.3 Array Declarations and Storage

D supports scalar arrays in addition to the dynamic associative arrays described in Section 2.9, “Variables”. Scalar arrays are a fixed-length group of consecutive memory locations that each store a value of the same type. Scalar arrays are accessed by referring to each location with an integer starting from zero. Scalar arrays correspond directly in concept and syntax with arrays in C and C++. Scalar arrays are not used as frequently in D as associative arrays and their more advanced counterparts aggregations, but you might need to use them to access existing operating system array data structures declared in C. Aggregations are described in Chapter 3, Aggregations.

A D scalar array of 5 integers is declared by using the type int and suffixing the declaration with the number of elements in square brackets, for example:

int a[5];

Figure 2.2, “Scalar Array Representation” shows a visual representation of the array storage:

Figure 2.2 Scalar Array Representation

The diagram illustrates the elements a[0] through a[4] of the array declared as a[5] arranged side-by-side in memory.


The D expression a[0] refers to the first array element, a[1] refers to the second, and so on. From a syntactic perspective, scalar arrays and associative arrays are very similar. You can declare an associative array of five integers referenced by an integer key as follows:

int a[int];

and also reference this array using the expression a[0]. But from a storage and implementation perspective, the two arrays are very different. The static array a consists of five consecutive memory locations numbered from zero and the index refers to an offset in the storage allocated for the array. An associative array, on the other hand, has no predefined size and does not store elements in consecutive memory locations. In addition, associative array keys have no relationship to the corresponding value storage location. You can access associative array elements a[0] and a[-5] and only two words of storage will be allocated by DTrace which may or may not be consecutive. Associative array keys are abstract names for the corresponding value that have no relationship to the value storage locations.

If you create an array using an initial assignment and use a single integer expression as the array index (for example, a[0] = 2), the D compiler will always create a new associative array, even though in this expression a could also be interpreted as an assignment to a scalar array. Scalar arrays must be predeclared in this situation so that the D compiler can see the definition of the array size and infer that the array is a scalar array.

2.10.4 Pointer and Array Relationship

Pointers and arrays have a special relationship in D, just as they do in ANSI C. An array is represented by a variable that is associated with the address of its first storage location. A pointer is also the address of a storage location with a defined type, so D permits the use of the array [] index notation with both pointer variables and array variables. For example, the following two D fragments are equivalent in meaning:

p = &a[0]; trace(p[2]);

trace(a[2]); 

In the first fragment, the pointer p is assigned to the address of the first array element in a by applying the & operator to the expression a[0]. The expression p[2] traces the value of the third array element (index 2). Because p now contains the same address associated with a, this expression yields the same value as a[2], shown in the second fragment. One consequence of this equivalence is that C and D permit you to access any index of any pointer or array. Array bounds checking is not performed for you by the compiler or DTrace runtime environment. If you access memory beyond the end of an array's predefined value, you either get an unexpected result or DTrace reports an invalid address error, as shown in the previous example. As always, you cannot damage DTrace itself or your operating system, but you do need to debug your D program.

The difference between pointers and arrays is that a pointer variable refers to a separate piece of storage that contains the integer address of some other storage. An array variable names the array storage itself, not the location of an integer that in turn contains the location of the array. Figure 2.3, “Pointer and Array Storage” illustrates this difference.

Figure 2.3 Pointer and Array Storage

The diagram illustrates the pointer p with the value 0x12345678, which is the address of the first element (a[0]) of the array declared as a[5].


This difference is manifested in the D syntax if you attempt to assign pointers and scalar arrays. If x and y are pointer variables, the expression x = y is legal; it copies the pointer address in y to the storage location named by x. If x and y are scalar array variables, the expression x = y is not legal. Arrays may not be assigned as a whole in D. However, an array variable or symbol name can be used in any context where a pointer is permitted. If p is a pointer and a is an array, the statement p = a is permitted; this statement is equivalent to the statement p = &a[0].

2.10.5 Pointer Arithmetic

Since pointers are just integers used as addresses of other objects in memory, D provides a set of features for performing arithmetic on pointers. However, pointer arithmetic is not identical to integer arithmetic. Pointer arithmetic implicitly adjusts the underlying address by multiplying or dividing the operands by the size of the type referenced by the pointer. The following D fragment illustrates this property:

int *x;

BEGIN
{
  trace(x);
  trace(x + 1);
  trace(x + 2);
}

This fragment creates an integer pointer x and then trace its value, its value incremented by one, and its value incremented by two. If you create and execute this program, DTrace reports the integer values 0, 4, and 8.

Since x is a pointer to an int (size 4 bytes), incrementing x adds 4 to the underlying pointer value. This property is useful when using pointers to refer to consecutive storage locations such as arrays. For example, if x were assigned to the address of an array a like the one shown in Figure 2.3, “Pointer and Array Storage”, the expression x + 1 would be equivalent to the expression &a[1]. Similarly, the expression *(x + 1) would refer to the value a[1]. Pointer arithmetic is implemented by the D compiler whenever a pointer value is incremented using the +, ++, or =+ operators. Pointer arithmetic is also applied when an integer is subtracted from a pointer on the left-hand side, when a pointer is subtracted from another pointer, or when the -- operator is applied to a pointer. For example, the following D program would trace the result 2:

int *x, *y;
int a[5];

BEGIN
{
  x = &a[0];
  y = &a[2];
  trace(y - x);
}

2.10.6 Generic Pointers

Sometimes it is useful to represent or manipulate a generic pointer address in a D program without specifying the type of data referred to by the pointer. Generic pointers can be specified using the type void *, where the keyword void represents the absence of specific type information, or using the built-in type alias uintptr_t which is aliased to an unsigned integer type of size appropriate for a pointer in the current data model. You may not apply pointer arithmetic to an object of type void *, and these pointers cannot be dereferenced without casting them to another type first. You can cast a pointer to the uintptr_t type when you need to perform integer arithmetic on the pointer value.

Pointers to void may be used in any context where a pointer to another data type is required, such as an associative array tuple expression or the right-hand side of an assignment statement. Similarly, a pointer to any data type may be used in a context where a pointer to void is required. To use a pointer to a non-void type in place of another non-void pointer type, an explicit cast is required. You must always use explicit casts to convert pointers to integer types such as uintptr_t, or to convert these integers back to the appropriate pointer type.

2.10.7 Multi-Dimensional Arrays

Multi-dimensional scalar arrays are used infrequently in D, but are provided for compatibility with ANSI C and for observing and accessing operating system data structures created using this capability in C. A multi-dimensional array is declared as a consecutive series of scalar array sizes enclosed in square brackets [] following the base type. For example, to declare a fixed-size two-dimensional rectangular array of integers of dimensions 12 rows by 34 columns, you would write the declaration:

int a[12][34];

A multi-dimensional scalar array is accessed using similar notation. For example, to access the value stored at row 0 and column 1, you would write the D expression:

a[0][1]

Storage locations for multi-dimensional scalar array values are computed by multiplying the row number by the total number of columns declared, and then adding the column number.

You should be careful not to confuse the multi-dimensional array syntax with the D syntax for associative array accesses (that is, a[0][1] is not the same as a[0,1]). If you use an incompatible tuple with an associative array or attempt an associative array access of a scalar array, the D compiler reports an appropriate error message and refuses to compile your program.

2.10.8 Pointers to DTrace Objects

The D compiler prohibits you from using the & operator to obtain pointers to DTrace objects such as associative arrays, built-in functions, and variables. You are prohibited from obtaining the address of these variables so that the DTrace runtime environment is free to relocate them as needed between probe firings in order to more efficiently manage the memory required for your programs. If you create composite structures, it is possible to construct expressions that do retrieve the kernel address of your DTrace object storage. You should avoid creating such expressions in your D programs. If you need to use such an expression, do not rely on the address being the same across probe firings.

In ANSI C, pointers can also be used to perform indirect function calls or to perform assignments, such as placing an expression using the unary * dereference operator on the left-hand side of an assignment operator. In D, these types of expressions using pointers are not permitted. You may only assign values directly to D variables using their name or by applying the array index operator [] to a D scalar or associative array. You may only call functions defined by the DTrace environment by name as specified in Chapter 4, Actions and Subroutines. Indirect function calls using pointers are not permitted in D.

2.10.9 Pointers and Address Spaces

A pointer is an address that provides a translation within some virtual address space to a piece of physical memory. DTrace executes your D programs within the address space of the operating system kernel itself. The Oracle Linux system manages many address spaces: one for the operating system kernel, and one for each user process. Since each address space provides the illusion that it can access all of the memory on the system, the same virtual address pointer value can be reused across address spaces but translate to different physical memory. Therefore, when writing D programs that use pointers, you must be aware of the address space corresponding to the pointers you intend to use.

For example, if you use the syscall provider to instrument entry to a system call that takes a pointer to an integer or array of integers as an argument (for example, pipe()), it would not be valid to dereference that pointer or array using the * or [] operators because the address in question is an address in the address space of the user process that performed the system call. Applying the * or [] operators to this address in D would result in a kernel address space access, which would result in an invalid address error or in returning unexpected data to your D program depending upon whether the address happened to match a valid kernel address.

To access user process memory from a DTrace probe, you must apply one of the copyin, copyinstr, or copyinto functions described in Chapter 4, Actions and Subroutines to the user address space pointer. Take care when writing your D programs to name and comment variables storing user addresses appropriately to avoid confusion. You can also store user addresses as uintptr_t so you do not accidentally compile D code that dereferences them. Techniques for using DTrace on user processes are described in Chapter 12, User Process Tracing.