C++ User's Guide

-xO[level]

Specifies optimization level. In general, program execution speed depends on level of optimization. The higher the level of optimization, the faster the speed.

If -xO[level] is not specified, only a very basic level of optimization (limited to local common subexpression elimination and dead code analysis) is performed. A program's performance might be significantly improved when it is compiled with an optimization level. Use of -O (which implies -xO2) is recommended for most programs.

Generally, the higher the level of optimization with which a program is compiled, the better the runtime performance. However, higher optimization levels can result in increased compilation time and larger executable files.

In a few cases, -xO2 might perform better than the others, and -xO3 might outperform -xO4. Try compiling with each level to see if you have one of these rare cases.

If the optimizer runs out of memory, it tries to recover by retrying the current procedure at a lower level of optimization. The optimizer resumes subsequent procedures at the original level specified in the -xO option.

There are five levels that you can use with -xO. The following sections describe how they operate on the SPARC platform and the x86 platform.

Values

On the SPARC Platform:

-xO is equivalent to -xO2.

-xO1 does only the minimum amount of optimization (peephole), which is postpass, assembly-level optimization. Do not use -xO1 unless using -xO2 or -xO3 results in excessive compilation time, or you are running out of swap space.

-xO2 does basic local and global optimization, which includes:

This level does not optimize references or definitions for external or indirect variables. In general, this level results in minimum code size.

-xO3, in addition to optimizations performed at the -xO2 level, also optimizes references and definitions for external variables. This level does not trace the effects of pointer assignments. When compiling either device drivers that are not properly protected by volatile or programs that modify external variables from within signal handlers, use -xO2. In general, -xO3 results in increased code size. If you are running out of swap space, use -xO2.

-xO4 does automatic inlining of functions contained in the same file in addition to performing -xO3 optimizations. This automatic inlining usually improves execution speed but sometimes makes it worse. In general, this level results in increased code size.

-xO5 generates the highest level of optimization. It is suitable only for the small fraction of a program that uses the largest fraction of computer time. This level uses optimization algorithms that take more compilation time or that do not have as high a certainty of improving execution time. Optimization at this level is more likely to improve performance if it is done with profile feedback. See "-xprofile=p".

On the x86 Platform:

-xO1 preloads arguments from memory and causes cross jumping (tail merging), as well as the single pass of the default optimization.

-xO2 schedules both high- and low-level instructions and performs improved spill analysis, loop memory-reference elimination, register lifetime analysis, enhanced register allocation, global common subexpression elimination, as well as the optimization done by level 1.

-xO3 performs loop strength reduction and inlining, as well as the optimization done by level 2.

-xO4 performs architecture-specific optimization, as well as the optimization done by level 3.

-xO5 generates the highest level of optimization. It uses optimization algorithms that take more compilation time or that do not have as high a certainty of improving execution time.

Interactions

Debugging with -g does not suppress -xO[level], but -xO[level] limits -g in certain ways.

The -xO3 and -xO4 options reduce the utility of debugging so that you cannot display variables from dbx, but you can still use the dbx where command to get a symbolic traceback.

The -xinline option has no effect for optimization levels below -xO3. At -xO4, the optimizer decides which functions should be inlined, and does so without the -xinline option being specified. At -xO4, the compiler also attempts to determine which functions will improve performance if they are inlined. If you force inlining of a function with -xinline, you might actually diminish performance.

Warnings

If you optimize at -xO3 or -xO4 with very large procedures (thousands of lines of code in a single procedure), the optimizer might require an unreasonable amount of memory. In such cases, machine performance can be degraded.

To prevent this degradation from taking place, use the limit command to limit the amount of virtual memory available to a single process (see the csh(1) man page). For example, to limit virtual memory to 16 megabytes:

demo% limit datasize 16M

This command causes the optimizer to try to recover if it reaches 16 megabytes of data space.

The limit cannot be greater than the total available swap space of the machine, and should be small enough to permit normal use of the machine while a large compilation is in progress.

The best setting for data size depends on the degree of optimization requested, the amount of real memory, and virtual memory available.

To find the actual swap space, type: swap -l

To find the actual real memory, type: dmesg | grep mem

See also

-fast, -xprofile=p, csh(1) man page