Sun Studio 12: C User's Guide

B.2.128 -xprefetch[=val[,val]]

Enable prefetch instructions on those architectures that support prefetch.

Explicit prefetching should only be used under special circumstances that are supported by measurements.

val must be one of the following:

Table B–35 The -xprefetch Flags

Flag  

Meaning  

latx:factor

Adjust the compiler’s assumed prefetch-to-load and prefetch-to-store latencies by the specified factor. You can only combine this flag with -xprefetch=auto. See B.2.128.1 Prefetch Latency Ratio

[no%]auto

[Disable] Enable automatic generation of prefetch instructions 

[no%]explicit

(SPARC) [Disable] Enable explicit prefetch macros 

yes

Obsolete - do not use. Use -xprefetch=auto,explicit instead.

no

Obsolete - do not use. Use -xprefetch=no%auto,no%explicit instead.

The default is -xprefetch=auto,explicit. This default adversely affects applications that have essentially non-linear memory access patterns. Specify -xprefetch=no%auto,no%explicit to override the default.

The sun_prefetch.h header file provides the macros that you can use to specify explicit prefetch instructions. The prefetches are approximately at the place in the executable that corresponds to where the macros appear.

B.2.128.1 Prefetch Latency Ratio

The prefetch latency is the hardware delay between the execution of a prefetch instruction and the time the data being prefetched is available in the cache.

The factor must be a positive number of the form n.n.

The compiler assumes a prefetch latency value when determining how far apart to place a prefetch instruction and the load or store instruction that uses the prefetched data. The assumed latency between a prefetch and a load may not be the same as the assumed latency between a prefetch and a store.

The compiler tunes the prefetch mechanism for optimal performance across a wide range of machines and applications. This tuning may not always be optimal. For memory-intensive applications, especially applications intended to run on large multiprocessors, you may be able to obtain better performance by increasing the prefetch latency values. To increase the values, use a factor that is greater than 1 (one). A value between .5 and 2.0 will most likely pro vide the maximum performance.

For applications with data sets that reside entirely within the external cache, you may be able to obtain better performance by decreasing the prefetch latency values. To decrease the values, use a factor that is less than one.

To use the latx:factor suboption, start with a factor value near 1.0 and run performance tests against the application. Then increase or decrease the factor, as appropriate, and run the performance tests again. Continue adjusting the factor and running the performance tests until you achieve optimum performance. When you increase or decrease the factor in small steps, you will see no performance difference for a few steps, then a sudden difference, then it will level off again.