C H A P T E R 2 - Sun Studio 10 New Features and Enhancements

Sun Studio 10 replaces the Sun Studio 9. New features in the Sun Studio 10 release include updates to the following compilers, libraries, and tools:

In most sections, there is a table that lists the new features of that component. The table has two columns, where the left-hand column provides a short description of the feature, and the right-hand column has a longer description.

C Compiler

TABLE 2-1 C Compiler New Features
Feature	Description
OpenMP parallel programming API	The API is now enabled on 32-bit and 64-bit x86 based systems running the Solaris OS.
New `-xarch` option	`-xarch=amd64` specifies compilation for the 64-bit AMD instruction set. The C compiler now predefines `__amd64` and `__x86_64` when you specify `-xarch=amd64`.
New `-xtarget` option	`-xtarget=opteron` specifies the `-xarch`, `-xchip`, and `-xcache` settings for 32-bit AMD compilation.
New `-xregs` flag on x86 based systems	A new x86-only flag for the `-xregs` option, `-xregs=[no%]frameptr`, lets you use the frame-pointer register as an unallocated callee-saves register to increase the run-time performance of applications.
New `-Xarch`=amd64 option for `lint`	The C utility `lint` now accepts a new option `-Xarch=amd64`. See the `lint`(1) man page for more information.
`-xarch=generic64` on x86 based systems	The existing `-xarch=generic64` option now supports the x86 platform in addition to the traditional SPARC platform.
`-xipo` on x86 based systems	The `-xipo` option is now available on x86 based systems.

C++ Compiler

TABLE 2-2 C++ Compiler New Features
Feature	Description
OpenMP parallel programming API	The API is now enabled on 32-bit and 64-bit x86 based systems running the Solaris OS.
New `-xarch` option	`-xarch=amd64` specifies compilation for the 64-bit AMD instruction set. The C++ compiler now predefines `__amd64` and `__x86_64` when you specify `-xarch=amd64`.
New `-xtarget` option	`-xtarget=opteron` specifies the `-xarch`, `-xchip`, and `-xcache` settings for 32-bit AMD compilation.
New `-xregs` flag on x86 based systems	A new x86-only flag for the `-xregs` option, `-xregs=[no%]frameptr`, lets you use the frame-pointer register as an unallocated callee-saves register to increase the run-time performance of applications.
`-xarch=generic64` on x86 based systems	The existing `-xarch=generic64` option now supports the x86 platform in addition to the traditional SPARC platform.
`-xipo` on x86 based systems	The `-xipo` option is now available on x86 based systems.
Template-template parameters	You can specify a template definition with parameters that are themselves templates, rather than types or values. Recall that a template instantiated on a type is itself a type. For examples, see Examples of Template-Template Parameters.
Access rules for nested classes	In default mode, the C++ compiler in this release allows nested classes the same access to member classes that member functions have. For more information, see Nested Class Access Rules.

Examples of Template-Template Parameters

The section provides two code examples, one that does not use template-template parameters and one that does.

This example does not use template-template parameters because MyClass<int> is a type.

template<typename T> class MyClass { ... };

std::list< MyClass<int> > x;

In this example, class template C has a parameter that is a class template, and object x is an instance of C using class template A as its argument. Member y of c has type A<int>.

// ordinary class template

template<typename T> class A {

    T x;

};

// class template having a template parameter

template < template<typename U> class V > class C {

    V<int> y;

// instantiate C on template

C<A> x;

Nested Class Access Rules

The C++ compiler, in default standard mode, now allows nested classes to access private members of the enclosing class.

The C++ standard says that nested classes have no special access to members of the enclosing class. However, most people feel this restriction is not justified because member functions have access to private members, so member classes should too. In the following example, function foo tries to access a private member of class outer. According to the C++ standard, the function has no access unless it is declared a friend function:

class outer {

    int i; // private in outer

    class inner {

        int foo(outer* p) {

            return p->i; // invalid

};

};

The C++ Committee is in the process of adopting a change to the access rules giving the same access to member classes that member functions have. Many compilers have implemented this rule in anticipation of the changed language rule.

To restore the old compiler behavior, disallowing the access, use the compiler option -features=no%nestedaccess. The default is -features=nestedaccess.

F ortran Compiler

TABLE 2-3 Fortran Compiler New Features
Feature	Description
OpenMP parallel programming API	The API is now enabled on 32-bit and 64-bit x86 based systems running the Solaris OS.
New `-xarch` option	`-xarch=amd64` specifies compilation for the 64-bit AMD instruction set. The Fortran compiler now predefines `__amd64` and `__x86_64` when you specify `-xarch=amd64`.
New `-xtarget` option	`-xtarget=opteron` specifies the `-xarch`, `-xchip`, and `-xcache` settings for 32-bit AMD compilation.
`-xarch=generic64` on x86 based systems	The existing `-xarch=generic64` option now supports the x86 platform in addition to the traditional SPARC platform.
`-xipo` on x86 based systems	The `-xipo` option is now available on x86 based systems.
Binary (unformatted) file sharing between big-endian and little-endian platforms	A new compiler flag `-xfilebyteorder` provides support of binary I/O files when moving between SPARC based systems and x86 based systems. The flag identifies the byte-order and byte-alignment of unformatted I/O files. For more information, see Binary File Sharing Between Big-endian and Little-endian Platforms

Binary File Sharing Between Big-endian and Little-endian Platforms

A new compiler flag -xfilebyteorder provides support of binary I/O files when moving between SPARC based systems and x86 based systems. The flag identifies the byte-order and byte-alignment of unformatted I/O files.

-xfilebyteorder={[littlemax_align:%all,unitno,filename}],[bigmax_align:{%all,unitno,filename}],[native:{%all,unitno,filename}]}:

max_align

Maximum byte alignment for the target platform. Values are 1, 2, 4, 8, and 16. The alignment applies to Fortran VAX structures and Fortran 95 derived types which use platform-dependent alignments for compatibility with C structures.

littlemax_align:{%all,unitno,filename}

List of files or unit numbers that are "little-endian" files used on a system where the maximum byte alignment is max_align. For example, little4 describes a 32-bit x86 file while little16 describes a 64-bit x86 file.

bigmax_align:{%all,unitno,filename}

List of files or unit numbers that are "big-endian" files used on a system where the maximum byte alignment is max_align.

native:{%all,unitno,filename}

List of files or unit numbers that are native files of the same byte order and alignment used by the compiling processor system

%all

Specifies all files and logical units except those opened as "SCRATCH" or named explicitly in this option. Can be used to describe default files not explicitly listed by this flag. %all can only appear once.

unitno

Fortran logical unit number opened by the program.

filename

Fortran file name opened by the program.

This option does not apply to files opened with STATUS=scratch. I/O operations done on these files are always with the byte-order and byte-alignment of the native processor.

The first default, when -xfilebyteorder is not specified on the compiler command line, is -xfilebyteorder=native:%all. The option must be specified with at least one argument. That is, at least one of the little:, big:, or native: parameters must be present.

Files not explicitly declared by this flag are assumed to be native files. For example, compiling with -xfilebyteorder=little4:zfile.out declares zfile.out to be a little-endian 32-bit x86 file with a 4-byte maximum data alignment rule, and all other files are native files.

When the byte-order specified for a file is the same as the native processor but a different alignment is specified, the appropriate padding will be used even though no byte swapping is done. For example, this would be the case when compiling with -xarch=amd64 for 64-bit x86 and -xfilebyteorder=little4:filename is specified.

The declared types in data records shared between big-endian and little-endian platforms must have the same sizes. For example, a file produced by a SPARC executable compiled with -xyptemap=integer:64,real:64,double:128 cannot be read by an x86 executable compiled with -xtypemap=integer:64,real:64,double:64 since the default double precision data types will have different sizes.

Shared I/O files must not contain VAX UNION/MAP data structures since it is not possible for the compiler to know how the UNION data should be interpreted. Declaring a file containing UNION data with the -xfilebyteorder flag will result in a runtime error.

Command-line Debugger dbx

TABLE 2-4 `dbx` New Features
Feature	Description
AMD64 architecture support	64-bit `dbx` now supports the AMD64 architecture.

As in Sun Studio software for SPARC based systems, Sun Studio software for x86 based systems includes two dbx binaries, a 32-bit dbx that can debug 32-bit programs only, and a 64-bit dbx that can debug both 32-bit and 64-bit programs.

When you start dbx, it determines which of its binaries to execute. On the 64-bit Solaris OS, the 64-bit dbx is the default.

OpenMP API

TABLE 2-5 OpenMP API New Features
Feature	Description
Availability on x86 based systems running the Solaris 10 OS.	The same OpenMP API features already available for the Solaris OS on SPARC based systems are now available with the Sun Studio compilers on 32-bit or 64-bit x86 based systems running the Solaris 10 OS.
`libmtsk`	the multitasking library, `libmtsk`, is now a shared library and is part of the Solaris 10 OS.
Nested parallelism	Nested parallelism is supported in this release. It is disabled by default, and requires that you set the `OMP_NESTED` environment variable make a runtime call to the `omp_set_nested()` function to enable it. With nested parallelism enabled, calls to most `omp_` functions made from within a parallel region will not be ignored. Calls to adjust the parallel environment (for example, `omp_set_num_threads()` or `omp_set_dynamic()`) affect only the subsequent parallel regions at the same or inner nesting level encountered by the thread.
Default behavior for threads	The default behavior for threads is now SLEEP. The previous default was SPIN. To restore the previous behavior, use `SUNW_MP_THR_IDLE=SPIN.`
`SUNW_MP_NUM_POOL_THREADS` environment variable	`SUNW_MP_NUM_POOL_THREADS` specifies the size (maximum number of threads) of the thread pool. The thread pool contains only non-user threads--threads that the `libmtsk` library creates. It does not include user threads such as the main thread. Setting `SUNW_MP_NUM_POOL_THREADS` to 0 forces the thread pool to be empty and all parallel regions will be executed by one thread. The value specified should be a non-negative integer. The default value is 1023. This environment variable can prevent a single process from creating too many threads, which is something that might happen, for example, with recursively nested parallel regions.
`SUNW_MP_MAX_NESTED_LEVELS` environment variable	`SUNW_MP_MAX_NESTED_LEVELS` specifies the maximum depth of active parallel regions. Any parallel region that has an active nested depth greater than `SUNW_MP_MAX_NESTED_LEVELS` will be executed by a single thread. The value should be a positive integer. The default is 4. The outermost parallel region has a depth level of 1.
`SUNW_MP_GUIDED_WEIGHT` environment variable	`SUNW_MP_GUIDED_WEIGHT` sets the weighting value used by `libmtsk` for loops with the `GUIDED` schedule. `libmtsk` uses the following formula to compute the chunk sizes for `GUIDED` loops: chunk_size=num_unassigned_iterations/(weight*num_threads) where num_unassigned_iterations is the number of iterations in the loop that have not yet been assigned to any thread, weight is a floating-point constant (default 2.0 in this release, 1.0 previously), and num_threads is the number of threads used to execute the loop. The value specified for `SUNW_MP_GUIDED_WEIGHT` must be a positive, non-zero floating-point constant. `libmtsk` will use that value as weight in the `GUIDED` chunk size calculation.

Interval Arithmetic

Sun Performance Library

TABLE 2-6 Sun Performance Library New Features
Feature	Description
64-bit Solaris OS support	This release of Sun Performance Library includes support for the 64-bit Solaris OS on x86 based systems.

The 64-bit x86 version of Sun Performance Library is functionally identical to the SPARC v9 version, with the following exceptions:

To link with the high performance amd64 optimized library, use the -xarch=amd64 flag. For example:

TABLE 2-7 dmake New Features
Feature	Description
New `DMAKE_OUTPUT_MODE` environment variable	A new environment variable or makefile macro, `DMAKE_OUTPUT_MODE`, allows the format of the log file to be changed. By default, or when `DMAKE_OUTPUT_MODE` is set to `TXT1`, `dmake` prints additional lines of system information to the log file, and commands with output are repeated. When `DMAKE_OUTPUT_MODE` is set to `TXT2`, the system information is omitted and commands are never repeated. For details, refer to the `ENVIRONMENT/MACROS` section of the `dmake`(1) man page. (Note that the environment variable is incorrectly described in the man page; the correct values for `DMAKE_OUTPUT_MODE` are `TXT1` and `TXT2`.)
Unix2003 compliance	You can force Unix2003 compliance by setting `DMAKE_COMPAT_MODE=POSIX.`
Grid engine support	Specify grid engine support by setting `DMAKE_MODE=grid`.
Control of system overloading	Control system overloading with `DMAKE_ADJUST_MAX_JOBS`.
Improvements to memory usage	Improvements to memory usage are included in this release.

TABLE 2-8 Performance Analysis Tools New Features
Feature	Description
Changes to experiment format	Changes have been made to the experiment format. The log now has an entry that gives the size of the targets in bits. Also, the version has changed from 9.1 to 9.2, so new experiments are not readable by older tools, but older experiments are readable using Sun Studio 10 tools.
`er_kernel` utility	A new `er_kernel` utility is now available on the Solaris 10 OS only. DTrace permissions are required to use this `er_kernel` utility.
Increased precision for performance metrics	The precision for percentage metrics in the Performance Analyzer and the `er_print` utility has increased from one to two decimal places.
Direct editing of the experiment Notes file	Direct editing of the experiment Notes file has been added to the Performance Analyzer.
New options to display function names	New options to display function names are now available in the Performance Analyzer and `er_print` command.
Enhanced metrics selection	Metrics selection has been enhanced in the Performance Analyzer. You can select or clear the display of all metrics at once.
Collector GUI changes	The menu used for following descendants has been moved to the Collect Experiment tab. In addition to the on and off options, the menu now supports the all option and extended hardware counter overflow profiling features.
Enhancements to hardware counter overflow profiling	Hardware counter overflow profiling has been enhanced to work with larger numbers of processors, including x86-based processors. The enhancement is available using the `collect -h` command, the `collector hwprofile` command in `dbx`, and the Performance Analyzer GUI.
New `appendfile` option	The `appendfile` option has been added to the `er_print` utility. This option allows output from the `er_print` utility to be appended to the end of an existing file.
Change in default behavior of `er_src` utility	The default behavior of the `er_src` utility has changed to be the same behavior as the following command: `er_src -source all -1 object`.
J2SE technology location	The Performance Analyzer and `collect` utility now use the default location of the J2SE technology where the product installer has installed it.
New `collect -J java_args` option	The `collect -J java_args` option provides a means of passing flag arguments to the Java installation being used for profiling.
Sampling behavior changes during pause and resume	Sample data is generated prior to a pause and following a resume, but not when the collector is paused.
Pseudo function for JVM functions	The name of the pseudo function for Java Virtual Machine (JVM)^[1] functions in Java Mode has been changed from `<JVM-Overhead>` to `<JVM-System>`.
`<Unknown>` subtypes	The names of the `<Unknown>` subtypes of Java functions has been changed to be more comprehensible.
`.er.rc` file paths	The paths of processed `.er.rc` files are now displayed in the Error/Warning Logs window for the Performance Analyzer and the `stderr` for the `er_print` and `er_src` utilities.
`JDK_1_4_2_HOME` environment variable	The environment variable `JDK_1_4_2_HOME`, which used to define the Java path to be used for data collection, is now obsolete.
Heap profiling	The heap profiling for Java programs is now obsolete since it will not be supported in JVM 1.5.
Extended options for `collect -j`	The `collect` utility will accept the values on or off and also a path to the Java installation to use for profiling.

TABLE 2-9 IDE New Features
Feature	Description
Script execution capability	You can now execute scripts directly from the IDE.
ss_attach on Linux operating system	The ss_attach feature is now available in Sun Studio software running on the Linux operating system

^{1 (TableFootnote) The terms "Java Virtual Machine" and "JVM" mean a Virtual Machine for the Java platform.}

C Compiler

C++ Compiler

Examples of Template-Template Parameters

Nested Class Access Rules

F ortran Compiler

Binary File Sharing Between Big-endian and Little-endian Platforms

Command-line Debugger `dbx`

OpenMP API

Interval Arithmetic

Sun Performance Library

`dmake`

Performance Analysis Tools

Integrated Development Environment (IDE)

Documentation

C Compiler

C++ Compiler

Examples of Template-Template Parameters

Nested Class Access Rules

Fortran Compiler

Binary File Sharing Between Big-endian and Little-endian Platforms

Command-line Debugger dbx

OpenMP API

Interval Arithmetic

Sun Performance Library

dmake

Performance Analysis Tools

Integrated Development Environment (IDE)

Documentation

F ortran Compiler

Command-line Debugger `dbx`

`dmake`