Sun Studio 12 Update 1: Performance Analyzer

Explicit Multithreading

A simple program executes in a single thread, on a single LWP (lightweight process) in the Solaris OS. Multithreaded executables make calls to a thread creation function, to which the target function for execution is passed. When the target exits, the thread is destroyed.

The Solaris OS supports two thread implementations: Solaris threads and POSIX threads (Pthreads). Beginning with the Solaris 10 OS, both thread implementations are included in libc.so.

With Solaris threads, newly-created threads begin execution at a function called _thread_start(), which calls the function passed in the thread creation call. For any call stack involving the target as executed by this thread, the top of the stack is _thread_start(), and there is no connection to the caller of the thread creation function. Inclusive metrics associated with the created thread therefore only propagate up as far as _thread_start() and the <Total> function. In addition to creating the threads, the Solaris threads implementation also creates LWPs on Solaris to execute the threads. Each thread is bound to a specific LWP.

Pthreads is available in the Solaris 10 OS as well as in the Linux OS for explicit multithreading.

In both environments, to create a new thread, the application calls the Pthread API function pthread_create(), passing a pointer to an application-defined start routine as one of the function arguments.

On the Solaris OS, when a new pthread starts execution, it calls the _lwp_start() function. On the Solaris 10 OS, _lwp_start() calls an intermediate function _thr_setup(), which then calls the application-defined start routine that was specified in pthread_create().

On the Linux OS, when the new pthread starts execution, it runs a Linux-specific system function, clone(), which calls another internal initialization function, pthread_start_thread(), which in turn calls the application-defined start routine that was specified in pthread_create() . The Linux metrics-gathering functions available to the Collector are thread-specific. Therefore, when the collect utility runs, it interposes a metrics-gathering function, named collector_root(), between pthread_start_thread() and the application-defined thread start routine.