Using the Debugger to Locate an Exception - Oracle® Solaris Studio 12.4: Numerical Computation Guide

Language:

4.4.1 Using the Debugger to Locate an Exception

This section gives examples showing how to use dbx to investigate the cause of a floating-point exception and locate the instruction that raised it. Recall that in order to use the source-level debugging features of dbx, programs should be compiled with the –g flag. Refer to the Oracle Solaris Studio 12.4: Debugging a Program With dbx for more information.

Consider the following C program:

#include <stdio.h>
#include <math.h>

double sqrtm1(double x)
{
   return sqrt(x) - 1.0;
}

int main(void)
{
   double x, y;

   x = -4.2;
   y = sqrtm1(x);
   printf("%g  %g\n", x, y);
   return 0;
}

Compiling and running this program produces:

-4.2  NaN

The appearance of a NaN in the output suggests that an invalid operation exception might have occurred. To determine whether this is the case, you can recompile with the –ftrap option to enable trapping on invalid operations and use dbx to run the program and stop when a SIGFPE signal is delivered. Alternatively, you can use dbx without recompiling the program by linking with a startup routine that enables the invalid operation trap or by manually enabling the trap.

4.4.1.1 Using `dbx` to Locate the Instruction Causing an Exception

The simplest way to locate the code that causes a floating-point exception is to recompile with the –g and –ftrap flags and then use dbx to track down the location where the exception occurs. First, recompile the program as follows:

example% cc -g -ftrap=invalid ex.c -lm

Compiling with –g allows you to use the source-level debugging features of dbx. Specifying –ftrap=invalid causes the program to run with trapping enabled for invalid operation exceptions. Next, invoke dbx, issue the catch fpe command to stop when a SIGFPE is issued, and run the program. On SPARC-based systems, the result resembles this:

example% dbx a.out
Reading a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) catch fpe
(dbx) run
Running: a.out 
(process id 2773)
signal FPE (invalid floating point operation) in __sqrt at 0x7fa9839c
0x7fa9839c: __sqrt+0x005c:      srlx     %o1, 63, %l5
Current function is sqrtm1
    5   return sqrt(x) - 1.0;
(dbx) print x
x = -4.2
(dbx)

The output shows that the exception occurred in the sqrtm1 function as a result of attempting to take the square root of a negative number.

You can also use dbx to identify the cause of an exception in code that has not been compiled with –g, such as a library routine. In this case, dbx will not be able to give the source file and line number, but it can show the instruction that raised the exception. Again, the first step is to recompile the main program with –ftrap:

example% cc -ftrap=invalid ex.c -lm

Now invoke dbx, use the catch fpe command, and run the program. When an invalid operation exception occurs, dbx stops at an instruction following the one that caused the exception. To find the instruction that caused the exception, disassemble several instructions and look for the last floating-point instruction prior to the instruction at which dbx has stopped. On SPARC-based systems, the result might resemble the following transcript.

example% dbx a.out
Reading a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) catch fpe
(dbx) run
Running: a.out 
(process id 2931)
signal FPE (invalid floating point operation) in __sqrt at 0x7fa9839c
0x7fa9839c: __sqrt+0x005c:      srlx     %o1, 63, %l5
(dbx) dis __sqrt+0x50/4
dbx: warning: unknown language, 'c' assumed
0x7fa98390: __sqrt+0x0050:      neg      %o4, %o1
0x7fa98394: __sqrt+0x0054:      srlx     %o2, 63, %l6
0x7fa98398: __sqrt+0x0058:      fsqrtd   %f0, %f2
0x7fa9839c: __sqrt+0x005c:      srlx     %o1, 63, %l5
(dbx) print $f0f1 
$f0f1 = -4.2
(dbx) print $f2f3
$f2f3 = -NaN.0
(dbx)

The output shows that the exception was caused by an fsqrtd instruction. Examining the source register shows that the exception was a result of attempting to take the square root of a negative number.

On x86-based systems, because instructions do not have a fixed length, finding the correct address from which to disassemble the code might involve some trial and error. In this example, the exception occurs close to the beginning of a function, so we can disassemble from there. Note that this output assumes the program has been compiled with the –xlibmil flag. The following output might be a typical result.

example% dbx a.out
Reading a.out
Reading ld.so.1
Reading libc.so.1
(dbx) catch fpe
(dbx) run
Running: a.out 
(process id 18566)
signal FPE (invalid floating point operation) in sqrtm1 at 0x80509ab
0x080509ab: sqrtm1+0x001b:      fstpl    0xffffffe0(%ebp)
(dbx) dis sqrtm1+0x16/5
dbx: warning: unknown language, 'c' assumed
0x080509a6: sqrtm1+0x0016:      fsqrt    
0x080509a8: sqrtm1+0x0018:      addl     $0x00000008,%esp
0x080509ab: sqrtm1+0x001b:      fstpl    0xffffffe0(%ebp)
0x080509ae: sqrtm1+0x001e:      fwait    
0x080509af: sqrtm1+0x001f:      movsd    0xffffffe0(%ebp),%xmm0
(dbx) print $st0
$st0 = -4.20000000000000017763568394002504647e+00
(dbx)

The output reveals that the exception was caused by a fsqrt instruction. Examination of the floating-point registers reveals that the exception was a result of attempting to take the square root of a negative number.

4.4.1.2 Enabling Traps Without Recompilation

In the preceding examples, trapping on invalid operation exceptions was enabled by recompiling the main subprogram with the –ftrap flag. In some cases, recompiling the main program might not be possible, so you might need to resort to other means to enable trapping. There are several ways to do this.

When you are using dbx, you can enable traps manually by directly modifying the floating-point status register. This can be somewhat tricky because the operating system does not enable the floating-point unit until the first time it is used within a program, at which point the floating-point state is initialized with all traps disabled. Thus, you cannot manually enable trapping until after the program has executed at least one floating-point instruction. In our example, the floating-point unit has already been accessed by the time the sqrtm1 function is called, so we can set a breakpoint on entry to that function, enable trapping on invalid operation exceptions, instruct dbx to stop on the receipt of a SIGFPE signal, and continue execution. On SPARC-based systems, the steps are as follows. Note the use of the assign command to modify the %fsr to enable trapping on invalid operation exceptions:

example% dbx a.out
Reading a.out
... etc.
(dbx) stop in sqrtm1
dbx: warning: 'sqrtm1' has no debugger info -- will trigger on first instruction
(2) stop in sqrtm1
(dbx) run
Running: a.out 
(process id 23086)
stopped in sqrtm1 at 0x106d8
0x000106d8: sqrtm1       :      save    %sp, -0x70, %sp
(dbx) assign $fsr=0x08000000
dbx: warning: unknown language, 'c' assumed
(dbx) catch fpe
(dbx) cont
signal FPE (invalid floating point operation) in __sqrt at 0xff36b3c4
0xff36b3c4: __sqrt+0x003c:      be      __sqrt+0x98
(dbx)

On x86-based systems, the same process might look like this:

example% dbx a.out
Reading a.out
... etc.
(dbx) stop in sqrtm1 
dbx: warning: 'sqrtm1' has no debugger info -- will trigger on first instruction
(2) stop in sqrtm1
(dbx) run    
Running: a.out 
(process id 25055)
stopped in sqrtm1 at 0x80506b0
0x080506b0: sqrtm1     :        pushl  %ebp
(dbx) assign $fctrl=0x137e
dbx: warning: unknown language, 'c' assumed
(dbx) catch fpe
(dbx) cont 
signal FPE (invalid floating point operation) in sqrtm1 at 0x8050696
0x08050696: sqrtm1+0x0016:      fstpl  -16(%ebp)
(dbx)

In the example above, the assign command unmasks (that is, enables trapping on) the invalid operation exception in the floating-point control word. If a program uses SSE2 instructions, you must unmask exceptions in the MXCSR register to enable trapping on exceptions raised by those instructions.

You can also enable trapping without recompiling the main program or using dbx by establishing an initialization routine that enables traps. This might be useful, for example, if you want to abort the program when an exception occurs without running under a debugger. There are two ways to establish such a routine.

If the object files and libraries that comprise the program are available, you can enable trapping by relinking the program with an appropriate initialization routine. First, create a C source file similar to the following:

#include <ieeefp.h>
 
#pragma init (trapinvalid)
 
void trapinvalid()
{
     /* FP_X_INV et al are defined in ieeefp.h */
     fpsetmask(FP_X_INV);
}

Compile this file to create an object file and link the original program with this object file:

example% cc -c init.c
example% cc ex.o init.o -lm
example% a.out
Arithmetic Exception

If relinking is not possible but the program has been dynamically linked, you can enable trapping by using the shared object preloading facility of the runtime linker. To do this on SPARC-based systems, create the same C source file as above, but compile as follows:

example% cc -Kpic -G -ztext init.c -o init.so -lc

To enable trapping, add the path name of the init.so object to the list of preloaded shared objects specified by the environment variable LD_PRELOAD:

example% env LD_PRELOAD=./init.so a.out
Arithmetic Exception

See the Oracle Solaris 11.2 Linkers and Libraries Guide for more information about creating and preloading shared objects.

In principle, you can change the way any floating-point control modes are initialized by preloading a shared object as described above. However, initialization routines in shared objects, whether preloaded or explicitly linked, are executed by the runtime linker before it passes control to the startup code that is part of the main executable. The startup code then establishes any nondefault modes selected via the –ftrap, –fround, –fns (SPARC), or –fprecision (x86) compiler flags; executes any initialization routines that are part of the main executable, including those that are statically linked; and finally passes control to the main program. Therefore, on SPARC, remember the following:

Any floating-point control modes established by initialization routines in shared objects, such as the traps enabled in the example above, will remain in effect throughout the execution of the program unless they are overridden.
Any nondefault modes selected via the compiler flags will override modes established by initialization routines in shared objects (but default modes selected via compiler flags will not override previously established modes).
Any modes established either by initialization routines that are part of the main executable or by the main program itself will override both.

On x86-based systems, the situation is slightly more complicated. In general, the startup code automatically supplied by the compiler resets all floating-point modes to the default by calling the __fpstart routine (found in the standard C library, libc) before establishing any nondefault modes selected by the –fround, –ftrap, or –fprecision flags and passing control to the main program. As a consequence, in order to enable trapping or change any other default floating-point mode on x86-based systems by preloading a shared object with an initialization routine, you must override the __fpstart routine so that it does not reset the default floating-point modes. The substitute __fpstart routine should still perform the rest of the initialization functions that the standard routine does, however. The following code shows one way to do this. This code assumes that the host platform is running the Oracle Solaris 10 OS or later releases.

#include <ieeefp.h>
#include <sys/sysi86.h>
 
#pragma init (trapinvalid)
 
void trapinvalid()
{
     /* FP_X_INV et al are defined in ieeefp.h */
     fpsetmask(FP_X_INV);
}
 
extern int  __fltrounds(), __flt_rounds;
extern int  _fp_hw, _sse_hw;
 
void __fpstart()
{
    /* perform the same floating point initializations as
       the standard __fpstart() function but leave all
       floating point modes as is */
    __flt_rounds = __fltrounds();
    (void) sysi86(SI86FPHW, &_fp_hw);
 
    /* set the following variable to 0 instead if the host
       platform does not support SSE2 instructions */
    _sse_hw = 1;
}

4.4.1 Using the Debugger to Locate an Exception

4.4.1.1 Using dbx to Locate the Instruction Causing an Exception

4.4.1.2 Enabling Traps Without Recompilation

4.4.1.1 Using `dbx` to Locate the Instruction Causing an Exception