Sun Studio 12 Update 1: C++ User's Guide

A.2.167 -xprefetch[=a[,a...]]

Enable prefetch instructions on those architectures that support prefetch.

Explicit prefetching should only be used under special circumstances that are supported by measurements.

a must be one of the following values.

Table A–43 The -xprefetch Values

Value 

Meaning  

auto

Enable automatic generation of prefetch instructions 

no%auto

Disable automatic generation of prefetch instructions 

explicit

(SPARC) Enable explicit prefetch macros 

no%explicit

(SPARC) Disable explicit prefetch macros 

latx:factor

Adjust the compiler’s assumed prefetch-to-load and prefetch-to-store latencies by the specified factor. You can only combine this flag with -xprefetch=auto. The factor must be a positive floating-point or integer number.

yes

Obsolete, do not use. Use -xprefetch=auto,explicit instead.

no

Obsolete, do not use. Use -xprefetch=no%auto,no%explicit instead.

With -xprefetch and -xprefetch=auto the compiler is free to insert prefetch instructions into the code it generates. This may result in a performance improvement on architectures that support prefetch.

If you are running computationally intensive codes on large multiprocessors, you might find it advantageous to use -xprefetch=latx:factor. This option instructs the code generator to adjust the default latency time between a prefetch and its associated load or store by the specified factor.

The prefetch latency is the hardware delay between the execution of a prefetch instruction and the time the data being prefetched is available in the cache. The compiler assumes a prefetch latency value when determining how far apart to place a prefetch instruction and the load or store instruction that uses the prefetched data.


Note –

The assumed latency between a prefetch and a load may not be the same as the assumed latency between a prefetch and a store.


The compiler tunes the prefetch mechanism for optimal performance across a wide range of machines and applications. This tuning may not always be optimal. For memory-intensive applications, especially applications intended to run on large multiprocessors, you may be able to obtain better performance by increasing the prefetch latency values. To increase the values, use a factor that is greater than 1 (one). A value between .5 and 2.0 will most likely provide the maximum performance.

For applications with data sets that reside entirely within the external cache, you may be able to obtain better performance by decreasing the prefetch latency values. To decrease the values, use a factor that is less than 1 (one).

To use the -xprefetch=latx:factor option, start with a factor value near 1.0 and run performance tests against the application. Then increase or decrease the factor, as appropriate, and run the performance tests again. Continue adjusting the factor and running the performance tests until you achieve optimum performance. When you increase or decrease the factor in small steps, you will see no performance difference for a few steps, then a sudden difference, then it will level off again.

A.2.167.1 Defaults

The default is -xprefetch=auto,explicit. This default adversely affects applications that have essentially non-linear memory access patterns. Specify -xprefetch=no%auto,no%explicit to override the default.

The default of auto is assumed unless explicitly overridden with an argument of no%auto or an argument of no. For example, -xprefetch=explicit is the same as -xprefetch=explicit,auto.

The default of explicit is assumed unless explicitly overridden with an argument of no%explicit or an argument of no. For example, -xprefetch=auto is the same as -xprefetch=auto,explicit.

If only -xprefetch is specified, -xprefetch=auto,explicit is assumed.

If automatic prefetching is enabled, but a latency factor is not specified, then -xprefetch=latx:1.0 is assumed.

Interactions

This option accumulates instead of overrides.

The sun_prefetch.h header file provides the macros for specifying explicit prefetch instructions. The prefetches will be approximately at the place in the executable that corresponds to where the macros appear.

To use the explicit prefetch instructions, you must be on the correct architecture, include sun_prefetch.h, and either exclude -xprefetch from the compiler command or use -xprefetch, -xprefetch=auto,explicit or -xprefetch=explicit.

If you call the macros and include the sun_prefetch.h header file, but specify -xprefetch=no%explicit, the explicit prefetches will not appear in your executable.

The use of latx:factor is valid only when automatic prefetching is enabled. That is, latx:factor is ignored unless you use it in conjunction with -xprefetch=auto,latx:factor.

Warnings

Explicit prefetching should only be used under special circumstances that are supported by measurements.

Because the compiler tunes the prefetch mechanism for optimal performance across a wide range of machines and applications, you should only use -xprefetch=latx:factor when the performance tests indicate there is a clear benefit. The assumed prefetch latencies may change from release to release. Therefore, retesting the effect of the latency factor on performance whenever switching to a different release is highly recommended.