C H A P T E R  2

Sun Studio 10 New Features and Enhancements

Suntrademark Studio 10 replaces the Suntrademark Studio 9. New features in the Sun Studio 10 release include updates to the following compilers, libraries, and tools:

In most sections, there is a table that lists the new features of that component. The table has two columns, where the left-hand column provides a short description of the feature, and the right-hand column has a longer description.



Note - To find the Sun Studio 10 documentation described in this chapter, see the documentation index installed with the product software at /opt/SUNWspro/docs/index.html. If your software is not installed in the /opt directory, contact your system administrator for the equivalent path on your system or network.




C Compiler


TABLE 2-1 C Compiler New Features

Feature

Description

OpenMP parallel programming API

The API is now enabled on 32-bit and 64-bit x86 based systems running the Solaris OS.

New -xarch option

-xarch=amd64 specifies compilation for the 64-bit AMD instruction set. The C compiler now predefines __amd64 and __x86_64 when you specify -xarch=amd64.

New -xtarget option

-xtarget=opteron specifies the -xarch, -xchip, and -xcache settings for 32-bit AMD compilation.

New -xregs flag on x86 based systems

A new x86-only flag for the -xregs option, -xregs=[no%]frameptr, lets you use the frame-pointer register as an unallocated callee-saves register to increase the run-time performance of applications.

New -Xarch=amd64 option for lint

The C utility lint now accepts a new option -Xarch=amd64. See the lint(1) man page for more information.

-xarch=generic64 on x86 based systems

The existing -xarch=generic64 option now supports the x86 platform in addition to the traditional SPARC platform.

-xipo on x86 based systems

The -xipo option is now available on x86 based systems.




Note - You must specify -xarch=amd64 to the right of -fast and -xtarget on the command line to generate 64-bit code. For example, specify cc -fast -xarch=amd64 or cc -xtarget=opteron -xarch=amd64. The new -xtarget=opteron option does not automatically generate 64-bit code. It expands to -xarch=sse2, -xchip=opteron, and -xcache=64/64/2:1024/64/16, which results in 32-bit code. The -fast option also results in 32-bit code because it is a macro which also defines -xtarget=native.




C++ Compiler


TABLE 2-2 C++ Compiler New Features

Feature

Description

OpenMP parallel programming API

The API is now enabled on 32-bit and 64-bit x86 based systems running the Solaris OS.

New -xarch option

-xarch=amd64 specifies compilation for the 64-bit AMD instruction set. The C++ compiler now predefines __amd64 and __x86_64 when you specify -xarch=amd64.

New -xtarget option

-xtarget=opteron specifies the -xarch, -xchip, and -xcache

settings for 32-bit AMD compilation.

New -xregs flag on x86 based systems

A new x86-only flag for the -xregs option, -xregs=[no%]frameptr, lets you use the frame-pointer register as an unallocated callee-saves register to increase the run-time performance of applications.

-xarch=generic64 on x86 based systems

The existing -xarch=generic64 option now supports the x86 platform in addition to the traditional SPARC platform.

-xipo on x86 based systems

The -xipo option is now available on x86 based systems.

Template-template parameters

You can specify a template definition with parameters that are themselves templates, rather than types or values. Recall that a template instantiated on a type is itself a type. For examples, see Examples of Template-Template Parameters.

Access rules for nested classes

In default mode, the C++ compiler in this release allows nested classes the same access to member classes that member functions have. For more information, see Nested Class Access Rules.




Note - You must specify -xarch=amd64 to the right of -fast and -xtarget on the command line to generate 64-bit code. For example, specify CC -fast -xarch=amd64 or CC-xtarget=opteron -xarch=amd64. The new -xtarget=opteron option does not automatically generate 64-bit code. It expands to -xarch=sse2, -xchip=opteron, and -xcache=64/64/2:1024/64/16, which results in 32-bit code. The -fast option also results in 32-bit code because it is a macro which also defines -xtarget=native.



Examples of Template-Template Parameters

The section provides two code examples, one that does not use template-template parameters and one that does.

This example does not use template-template parameters because MyClass<int> is a type.


template<typename T> class MyClass { ... };  
std::list< MyClass<int> > x;

In this example, class template C has a parameter that is a class template, and object x is an instance of C using class template A as its argument. Member y of c has type A<int>.


// ordinary class template
template<typename T> class A {
    T x;  
};
// class template having a template parameter 
template < template<typename U> class V > class C {
    V<int> y;
// instantiate C on template
C<A> x;

Nested Class Access Rules

The C++ compiler, in default standard mode, now allows nested classes to access private members of the enclosing class.

The C++ standard says that nested classes have no special access to members of the enclosing class. However, most people feel this restriction is not justified because member functions have access to private members, so member classes should too. In the following example, function foo tries to access a private member of class outer. According to the C++ standard, the function has no access unless it is declared a friend function:


class outer {
    int i; // private in outer
    class inner {
        int foo(outer* p) {
            return p->i; // invalid
        }
    };
};

The C++ Committee is in the process of adopting a change to the access rules giving the same access to member classes that member functions have. Many compilers have implemented this rule in anticipation of the changed language rule.

To restore the old compiler behavior, disallowing the access, use the compiler option -features=no%nestedaccess. The default is -features=nestedaccess.


Fortran Compiler


TABLE 2-3 Fortran Compiler New Features

Feature

Description

OpenMP parallel programming API

The API is now enabled on 32-bit and 64-bit x86 based systems running the Solaris OS.

New -xarch option

-xarch=amd64 specifies compilation for the 64-bit AMD instruction set. The Fortran compiler now predefines __amd64 and __x86_64 when you specify -xarch=amd64.

New -xtarget option

-xtarget=opteron specifies the -xarch, -xchip, and -xcache

settings for 32-bit AMD compilation.

-xarch=generic64 on x86 based systems

The existing -xarch=generic64 option now supports the x86 platform in addition to the traditional SPARC platform.

-xipo on x86 based systems

The -xipo option is now available on x86 based systems.

 

Binary (unformatted) file sharing between big-endian and little-endian platforms

A new compiler flag -xfilebyteorder provides support of binary I/O files when moving between SPARC based systems and x86 based systems. The flag identifies the byte-order and byte-alignment of unformatted I/O files. For more information, see Binary File Sharing Between Big-endian and Little-endian Platforms




Note - You must specify -xarch=amd64 to the right of -fast and -xtarget on the command line to generate 64-bit code. For example, specify f95 -fast -xarch=amd64 or f95 -xtarget=opteron -xarch=amd64. The new -xtarget=opteron option does not automatically generate 64-bit code. It expands to -xarch=sse2, -xchip=opteron, and -xcache=64/64/2:1024/64/16, which results in 32-bit code. The -fast option also results in 32-bit code because it is a macro which also defines -xtarget=native.



Binary File Sharing Between Big-endian and Little-endian Platforms

A new compiler flag -xfilebyteorder provides support of binary I/O files when moving between SPARC based systems and x86 based systems. The flag identifies the byte-order and byte-alignment of unformatted I/O files.

The syntax of the flag is:

-xfilebyteorder={[littlemax_align:%all,unitno,filename}],[bigmax_align:{%all,unitno,filename}],[native:{%all,unitno,filename}]}:


max_align

 

Maximum byte alignment for the target platform. Values are 1, 2, 4, 8, and 16. The alignment applies to Fortran VAX structures and Fortran 95 derived types which use platform-dependent alignments for compatibility with C structures.

littlemax_align:{%all,unitno,filename}

List of files or unit numbers that are "little-endian" files used on a system where the maximum byte alignment is max_align. For example, little4 describes a 32-bit x86 file while little16 describes a 64-bit x86 file.

bigmax_align:{%all,unitno,filename}

 

List of files or unit numbers that are "big-endian" files used on a system where the maximum byte alignment is max_align.

native:{%all,unitno,filename}

 

List of files or unit numbers that are native files of the same byte order and alignment used by the compiling processor system

%all

Specifies all files and logical units except those opened as "SCRATCH" or named explicitly in this option. Can be used to describe default files not explicitly listed by this flag. %all can only appear once.

unitno

Fortran logical unit number opened by the program.

filename

Fortran file name opened by the program.


This option does not apply to files opened with STATUS=scratch. I/O operations done on these files are always with the byte-order and byte-alignment of the native processor.

The first default, when -xfilebyteorder is not specified on the compiler command line, is -xfilebyteorder=native:%all. The option must be specified with at least one argument. That is, at least one of the little:, big:, or native: parameters must be present.

Files not explicitly declared by this flag are assumed to be native files. For example, compiling with -xfilebyteorder=little4:zfile.out declares zfile.out to be a little-endian 32-bit x86 file with a 4-byte maximum data alignment rule, and all other files are native files.

When the byte-order specified for a file is the same as the native processor but a different alignment is specified, the appropriate padding will be used even though no byte swapping is done. For example, this would be the case when compiling with -xarch=amd64 for 64-bit x86 and -xfilebyteorder=little4:filename is specified.

The declared types in data records shared between big-endian and little-endian platforms must have the same sizes. For example, a file produced by a SPARC executable compiled with -xyptemap=integer:64,real:64,double:128 cannot be read by an x86 executable compiled with -xtypemap=integer:64,real:64,double:64 since the default double precision data types will have different sizes.

Shared I/O files must not contain VAX UNION/MAP data structures since it is not possible for the compiler to know how the UNION data should be interpreted. Declaring a file containing UNION data with the -xfilebyteorder flag will result in a runtime error.


Command-line Debugger dbx


TABLE 2-4 dbx New Features

Feature

Description

AMD64 architecture support

64-bit dbx now supports the AMD64 architecture.

 


As in Sun Studio software for SPARC based systems, Sun Studio software for x86 based systems includes two dbx binaries, a 32-bit dbx that can debug 32-bit programs only, and a 64-bit dbx that can debug both 32-bit and 64-bit programs.

When you start dbx, it determines which of its binaries to execute. On the 64-bit Solaris OS, the 64-bit dbx is the default.


OpenMP API


TABLE 2-5 OpenMP API New Features

Feature

Description

Availability on x86 based systems running the Solaris 10 OS.

The same OpenMP API features already available for the Solaris OS on SPARC based systems are now available with the Sun Studio compilers on 32-bit or 64-bit x86 based systems running the Solaris 10 OS.

libmtsk

the multitasking library, libmtsk, is now a shared library and is part of the Solaris 10 OS.

Nested parallelism

Nested parallelism is supported in this release. It is disabled by default, and requires that you set the OMP_NESTED environment variable make a runtime call to the omp_set_nested() function to enable it. With nested parallelism enabled, calls to most omp_ functions made from within a parallel region will not be ignored. Calls to adjust the parallel environment (for example, omp_set_num_threads() or omp_set_dynamic()) affect only the subsequent parallel regions at the same or inner nesting level encountered by the thread.

Default behavior for threads

The default behavior for threads is now SLEEP. The previous default was SPIN. To restore the previous behavior, use SUNW_MP_THR_IDLE=SPIN.

SUNW_MP_NUM_POOL_THREADS environment variable

SUNW_MP_NUM_POOL_THREADS specifies the size (maximum number of threads) of the thread pool. The thread pool contains only non-user threads--threads that the libmtsk library creates. It does not include user threads such as the main thread. Setting SUNW_MP_NUM_POOL_THREADS to 0 forces the thread pool to be empty and all parallel regions will be executed by one thread. The value specified should be a non-negative integer. The default value is 1023. This environment variable can prevent a single process from creating too many threads, which is something that might happen, for example, with recursively nested parallel regions.

SUNW_MP_MAX_NESTED_LEVELS environment variable

SUNW_MP_MAX_NESTED_LEVELS specifies the maximum depth of active parallel regions. Any parallel region that has an active nested depth greater than SUNW_MP_MAX_NESTED_LEVELS will be executed by a single thread. The value should be a positive integer. The default is 4. The outermost parallel region has a depth level of 1.

SUNW_MP_GUIDED_WEIGHT environment variable

SUNW_MP_GUIDED_WEIGHT sets the weighting value used by libmtsk for loops with the GUIDED schedule. libmtsk uses the following formula to compute the chunk sizes for GUIDED loops: chunk_size=num_unassigned_iterations/(weight*num_threads)

where num_unassigned_iterations is the number of iterations in the loop that have not yet been assigned to any thread, weight is a floating-point constant (default 2.0 in this release, 1.0 previously), and num_threads is the number of threads used to execute the loop. The value specified for SUNW_MP_GUIDED_WEIGHT must be a positive, non-zero floating-point constant. libmtsk will use that value as weight in the GUIDED chunk size calculation.



Interval Arithmetic

There are no new interval arithmetic features in this release.


Sun Performance Library


TABLE 2-6 Sun Performance Library New Features

Feature

Description

64-bit Solaris OS support

This release of Sun Performance Library includes support for the 64-bit Solaris OS on x86 based systems.


The 64-bit x86 version of Sun Performance Library is functionally identical to the SPARC v9 version, with the following exceptions:

To link with the high performance amd64 optimized library, use the -xarch=amd64 flag. For example:

f95 -xarch=amd64 example.f -xlic_lib=sunperf


dmake


TABLE 2-7 dmake New Features

Feature

Description

New DMAKE_OUTPUT_MODE environment variable

 

A new environment variable or makefile macro, DMAKE_OUTPUT_MODE, allows the format of the log file to be changed. By default, or when DMAKE_OUTPUT_MODE is set to TXT1, dmake prints additional lines of system information to the log file, and commands with output are repeated. When DMAKE_OUTPUT_MODE is set to TXT2, the system information is omitted and commands are never repeated. For details, refer to the ENVIRONMENT/MACROS section of the dmake(1) man page. (Note that the environment variable is incorrectly described in the man page; the correct values for DMAKE_OUTPUT_MODE are TXT1 and TXT2.)

Unix2003 compliance

You can force Unix2003 compliance by setting DMAKE_COMPAT_MODE=POSIX.

Grid engine support

Specify grid engine support by setting DMAKE_MODE=grid.

Control of system overloading

Control system overloading with DMAKE_ADJUST_MAX_JOBS.

Improvements to memory usage

Improvements to memory usage are included in this release.



Performance Analysis Tools


TABLE 2-8 Performance Analysis Tools New Features

Feature

Description

Changes to experiment format

Changes have been made to the experiment format. The log now has an entry that gives the size of the targets in bits. Also, the version has changed from 9.1 to 9.2, so new experiments are not readable by older tools, but older experiments are readable using Sun Studio 10 tools.

er_kernel utility

A new er_kernel utility is now available on the Solaris 10 OS only. DTrace permissions are required to use this er_kernel utility.

Increased precision for performance metrics

The precision for percentage metrics in the Performance Analyzer and the er_print utility has increased from one to two decimal places.

Direct editing of the experiment Notes file

Direct editing of the experiment Notes file has been added to the Performance Analyzer.

New options to display function names

New options to display function names are now available in the Performance Analyzer and er_print command.

Enhanced metrics selection

Metrics selection has been enhanced in the Performance Analyzer. You can select or clear the display of all metrics at once.

Collector GUI changes

The menu used for following descendants has been moved to the Collect Experiment tab. In addition to the on and off options, the menu now supports the all option and extended hardware counter overflow profiling features.

Enhancements to hardware counter overflow profiling

Hardware counter overflow profiling has been enhanced to work with larger numbers of processors, including x86-based processors. The enhancement is available using the collect -h command, the collector hwprofile command in dbx, and the Performance Analyzer GUI.

New appendfile option

The appendfile option has been added to the er_print utility. This option allows output from the er_print utility to be appended to the end of an existing file.

Change in default behavior of er_src utility

The default behavior of the er_src utility has changed to be the same behavior as the following command: er_src -source all -1 object.

J2SE technology location

The Performance Analyzer and collect utility now use the default location of the J2SE technology where the product installer has installed it.

New collect -J java_args option

The collect -J java_args option provides a means of passing flag arguments to the Java installation being used for profiling.

Sampling behavior changes during pause and resume

Sample data is generated prior to a pause and following a resume, but not when the collector is paused.

Pseudo function for JVM functions

The name of the pseudo function for Java Virtual Machine (JVM)[1] functions in Java Mode has been changed from <JVM-Overhead> to <JVM-System>.

<Unknown> subtypes

The names of the <Unknown> subtypes of Java functions has been changed to be more comprehensible.

.er.rc file paths

The paths of processed .er.rc files are now displayed in the Error/Warning Logs window for the Performance Analyzer and the stderr for the er_print and er_src utilities.

JDK_1_4_2_HOME environment variable

The environment variable JDK_1_4_2_HOME, which used to define the Java path to be used for data collection, is now obsolete.

Heap profiling

The heap profiling for Java programs is now obsolete since it will not be supported in JVM 1.5.

Extended options for collect -j

The collect utility will accept the values on or off and also a path to the Java installation to use for profiling.



Integrated Development Environment (IDE)


TABLE 2-9 IDE New Features

Feature

Description

Script execution capability

You can now execute scripts directly from the IDE.

ss_attach on Linux operating system

The ss_attach feature is now available in Sun Studio software running on the Linux operating system



Documentation

See the Latest News page on the developer portal at http://developers.sun.com/prodtech/cc/support_index.html for information that updates the Sun Studio 10 documentation.


1 (TableFootnote) The terms "Java Virtual Machine" and "JVM" mean a Virtual Machine for the Javatrademark platform.