Select options that optimize execution performance.
-fast provides high performance for certain benchmark applications. However, the particular choice of options may or may not be appropriate for your application. Use -fast as a good starting point for compiling your application for best performance. But additional tuning may still be required. If your program behaves improperly when compiled with -fast, look closely at the individual options that make up -fast and invoke only those appropriate to your program that preserve correct behavior.
Note also that a program compiled with -fast may show good performance and accurate results with some data sets, but not with others. Avoid compiling with -fast those programs that depend on particular properties of floating-point arithmetic.
Because some of the options selected by -fast have linking implications, if you compile and link in separate steps be sure to link with -fast also.
–fast selects the following options:
The -xtarget=native hardware target .
If the program is intended to run on a different target than the compilation machine, follow the -fast with a code–generator option. For example: f95 -fast -xtarget=ultraT2 ...
The -depend option analyzes loops for data dependencies and possible restructuring. (This option is always enabled when compiling at optimizations levels -xO3 and greater.)
The -libmil option for system–supplied inline expansion templates.
For C functions that depend on exception handling, follow -fast by -nolibmil (as in -fast -nolibmil). With -libmil, exceptions cannot be detected with errno or matherr(3m).
The -fsimple=2 option for aggressive floating–point optimizations.
–fsimple=2 is unsuitable if strict IEEE 754 standards compliance is required. See –fsimple[={1|2|0}].
The -dalign option to generate double loads and stores for double and quad data in common blocks. Using this option can generate nonstandard Fortran data alignment in common blocks.
The -xlibmopt option selects optimized math library routines.
-pad=local inserts padding between local variables, where appropriate, to improve cache usage. (SPARC)
-xvector=lib transforms certain math library calls within DO loops to single calls to a vectorized library equivalent routine with vector arguments. (SPARC)
-fma=fused enables automatic generation of floating-point fused multiply-add instructions.
–fns selects non-standard floating-point arithmetic exception handling and gradual underflow. See –fns[={yes|no}].
-fround=nearest is selected because —xvector and —xlibmopt require it. (Oracle Solaris)
Trapping on common floating-point exceptions, -ftrap=common, is the enabled with f95.
-nofstore cancels forcing expressions to have the precision of the result. (x86)
-xregs=frameptr on x86 allows the compiler to use the frame-pointer register as a general purpose register. See the description of —xregs=frameptr for details and especially if compiling mixed C, C++, and Fortran source code. Specify -xregs=no%frameptr after -fast and the frame pointer register will not be used as a general purpose register. (x86)
It is possible to add or subtract from this list by following the -fast option with other options, as in:
f95 -fast -fsimple=1 -xnolibmopt ...
which overrides the -fsimple=2 option and disables the -xlibmopt selected by -fast.
Because -fast invokes -dalign, -fns, -fsimple=2, programs compiled with -fast can result in nonstandard floating-point arithmetic, nonstandard alignment of data, and nonstandard ordering of expression evaluation. These selections might not be appropriate for most programs.
Note that the set of options selected by the -fast flag can change with each compiler release. Invoking the compiler with -dryrun displays the -fast expansion:
<sparc>% f95 -dryrun -fast |& grep ### ### command line files and options (expanded): ### -dryrun -xO5 -xarch=sparcvis2 -xcache=64/32/4:1024/64/4 -xchip=ultra3i -xdepend=yes -xpad=local -xvector=lib -dalign -fsimple=2 -fns=yes -ftrap=common -xlibmil -xlibmopt -fround=nearest