Multithreaded Programming Guide

Chapter 1 Covering Multithreading Basics

The word multithreading can be translated as multiple threads of control or multiple flows of control. While a traditional UNIX process contains a single thread of control, multithreading (MT) separates a process into many execution threads. Each of these threads runs independently.

This chapter explains some multithreading terms, benefits, and concepts. If you are ready to start using multithreading, skip to Chapter 2, Basic Threads Programming.

If you need in-depth information about multithreaded programming, see the Related Books section of the preface.

Multithreading Terms

Table 1–1 introduces some of the terms that are used in this book.

Table 1–1 Multithreading Terms


Term	Definition
Process	The UNIX environment, such as file descriptors, user ID, and so on, created with the `fork(2)` system call, which is set up to run a program.
Thread	A sequence of instructions executed within the context of a process.
POSIX `pthreads`	A threads interface that is POSIX threads compliant. See Solaris Multithreading Libraries and Standards for more information.
Solaris `threads`	A Sun Microsystems threads interface that is not POSIX threads compliant. A predecessor of pthreads.
Single-threaded	Restricts access to a single thread. Execution is through sequential processing, limited to one thread of control.
Multithreading	Allows access to two or more threads. Execution occurs in more than one thread of control, using parallel or concurrent processing.
User-level or Application-level threads	Threads managed by threads routines in user space, as opposed to kernel space. The POSIX pthreads and Solaris threads APIs are used to create and handle user threads. In this manual, and in general, a thread is a user-level thread. Note – Because this manual is for application programmers, kernel thread programming is not discussed.
Lightweight processes	Kernel threads, also called LWPs, that execute kernel code and system calls. LWPs are managed by the system thread scheduler, and cannot be directly controlled by the application programmer. Beginning with Solaris 9, every user-level thread has a dedicated LWP. This is known as a 1:1 thread model.
Bound thread (obsolete term)	Prior to Solaris 9, a user-level thread that is permanently bound to one LWP. Beginning with Solaris 9, every thread has a dedicated LWP, so all threads are bound threads. The concept of an unbound thread no longer exists.
Unbound thread (obsolete term)	Prior to Solaris 9, a user-level thread that is not necessarily bound to one LWP. Beginning with Solaris 9, every thread has a dedicated LWP, so the concept of unbound threads no longer exists.
Attribute object	Contains opaque data types and related manipulation functions. These data types and functions standardize some of the configurable aspects of POSIX threads, mutual exclusion locks (mutexes), and condition variables.
Mutual exclusion locks	Objects used to lock and unlock access to shared data. Such objects are also known as mutexes.
Condition variables	Objects used to block threads until a change of state.
Read-write locks	Objects used to allow multiple read-only access to shared data, but exclusive access for modification of that data.
Counting semaphore	A memory-based synchronization mechanism in which a non-negative integer count is used to coordinate access by multiple threads to shared resources.
Parallelism	A condition that arises when at least two threads are executing simultaneously.
Concurrency	A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.

Solaris Multithreading Libraries and Standards

The concept of multithreaded programming goes back to at least the 1960s. Multithreaded programming development on UNIX systems began in the middle 1980s. While agreement existed about what multithreading is and the features necessary to support multithreading, the interfaces used to implement multithreading have varied greatly in the past.

For several years, POSIX (Portable Operating System Interface) 1003.4a worked on standards for multithreaded programming. The standard was eventually ratified and is now part of The Single UNIX Specification (SUS). The latest specification is available at The Open Group website. Beginning with the Solaris 10 release, the Solaris OS conforms to The Open Group's UNIX 03 Product Standard, or SUSv3.

Before the POSIX standard was ratified, the Solaris multithreading API was implemented in the Solaris libthread library, which was developed by Sun and later became the basis for the UNIX International (UI) threads standard. The libthread library was introduced in the Solaris 2.2 release in 1993. Support for the POSIX standard was added with the libpthread API in the Solaris 2.5 release in 1995, and both APIs have been available since. The libthread and libpthread libraries were merged into the standard libc C library beginning in the Solaris 10 release.

The libthread and libpthread libraries are maintained to provide backward compatibility for both runtime and compilation environments. The libthread.so.1 and libpthread.so.1 shared objects are implemented as filters on libc.so.1. See the libthread(3LIB) and libpthread(3LIB) man pages for more information.

While both thread libraries are supported, the POSIX library should be used in most cases. The threads(5) man page documents the differences and similarities between POSIX threads and Solaris threads.

This Multithreaded Programming Guide is based on the latest revision of the POSIX standard IEEE Std 1003.1:2001 (also known as ISO/IEC 9945:2003 and as The Single UNIX Specification, Version 3).

Subjects specific to Solaris threads are covered in the Chapter 6, Programming With Solaris Threads.

Benefiting From Multithreading

This section briefly describes the benefits of multithreading.

Multithreading your code can help in the following areas:

Improving Application Responsiveness

Any program in which many activities are not dependent upon each other can be redesigned so that each independent activity is defined as a thread. For example, the user of a multithreaded GUI does not have to wait for one activity to complete before starting another activity.

Using Multiprocessors Efficiently

Typically, applications that express concurrency requirements with threads need not take into account the number of available processors. The performance of the application improves transparently with additional processors because the operating system takes care of scheduling threads for the number of processors that are available. When multicore processors and multithreaded processors are available, a multithreaded application's performance scales appropriately because the cores and threads are viewed by the OS as processors.

Numerical algorithms and numerical applications with a high degree of parallelism, such as matrix multiplications, can run much faster when implemented with threads on a multiprocessor.

Note –

In this manual, whenever multiprocessors are discussed, the context applies also to multicore and multithreaded processors unless noted otherwise.

Improving Program Structure

Many programs are more efficiently structured as multiple independent or semi-independent units of execution instead of as a single, monolithic thread. For example, a non-threaded program that performs many different tasks might need to devote much of its code just to coordinating the tasks. When the tasks are programmed as threads, the code can be simplified. Multithreaded programs, especially programs that provide service to multiple concurrent users, can be more adaptive to variations in user demands than single-threaded programs.

Using Fewer System Resources

Programs that use two or more processes that access common data through shared memory are applying more than one thread of control.

However, each process has a full address space and operating environment state. Cost of creating and maintaining this large amount of state information makes each process much more expensive than a thread in both time and space.

In addition, the inherent separation between processes can require a major effort by the programmer. This effort includes handling communication between the threads in different processes, or synchronizing their actions. When the threads are in the same process, communication and synchronization becomes much easier.

Combining Threads and RPC

By combining threads and a remote procedure call (RPC) package, you can exploit nonshared-memory multiprocessors, such as a collection of workstations. This combination distributes your application relatively easily and treats the collection of workstations as a multiprocessor.

For example, one thread might create additional threads. Each of these children could then place a remote procedure call, invoking a procedure on another workstation. Although the original thread has merely created threads that are now running in parallel, this parallelism involves other computers.

Note –

The Message Processing Interface (MPI) might be a more effective approach to achieve multithreading in applications that run across distributed systems. See http://www-unix.mcs.anl.gov/mpi/ for more information about MPI.

The Sun HPC ClusterTools^TM software includes Open MPI Message Passing Interface (OMPI), which is an open source implementation of MPI. See the Sun HPC ClusterTools product page for more information about ClusterTools.

Multithreading Concepts

This section introduces basic concepts of multithreading.

Concurrency and Parallelism

In a multithreaded process on a single processor, the processor can switch execution resources between threads, resulting in concurrent execution. Concurrency indicates that more than one thread is making progress, but the threads are not actually running simultaneously. The switching between threads happens quickly enough that the threads might appear to run simultaneously.

In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run concurrently on a separate processor, resulting in parallel execution, which is true simultaneous execution. When the number of threads in a process is less than or equal to the number of processors available, the operating system's thread support system ensures that each thread runs on a different processor. For example, in a matrix multiplication that is programmed with four threads, and runs on a system that has two dual-core processors, each software thread can run simultaneously on the four processor cores to compute a row of the result at the same time.

Multithreading Structure

Traditional UNIX already supports the concept of threads. Each process contains a single thread, so programming with multiple processes is programming with multiple threads. But, a process is also an address space, and creating a process involves creating a new address space.

Creating a thread is less expensive than creating a new process because the newly created thread uses the current process address space. The time that is required to switch between threads is less than the time required to switch between processes. A switch between threads is faster because no switching between address spaces occurs.

Communication between the threads of one process is simple because the threads share everything, most importantly address space. So, data produced by one thread is immediately available to all the other threads in the process.

However, this sharing of data leads to a different set of challenges for the programmer. Care must be taken to synchronize threads to protect data from being modified by more than one thread at once, or from being read by some threads while being modified by another thread at the same time. See Thread Synchronization for more information.

User-Level Threads

Threads are the primary programming interface in multithreaded programming. Threads are visible only from within the process, where the threads share all process resources like address space, open files, and so on.

User-Level Threads State

The following state is unique to each thread.

Thread ID
Register state, including program counter (PC) and stack pointer
Stack
Signal mask
Priority
Thread-private storage

Threads share the process instructions and most of the process data. For that reason, a change in shared data by one thread can be seen by the other threads in the process. When a thread needs to interact with other threads in the same process, the thread can do so without involving the operating environment.

Note –

User-level threads are so named to distinguish them from kernel-level threads, which are the concern of systems programmers only. Because this book is for application programmers, kernel-level threads are not discussed.

Thread Scheduling

The POSIX standard specifies three scheduling policies: first-in-first-out (SCHED_FIFO), round-robin (SCHED_RR), and custom (SCHED_OTHER). SCHED_FIFO is a queue-based scheduler with different queues for each priority level. SCHED_RR is like FIFO except that each thread has an execution time quota.

Both SCHED_FIFO and SCHED_RR are POSIX Realtime extensions. Threads executing with these policies are in the Solaris Real-Time (RT) scheduling class, normally requiring special privilege. SCHED_OTHER is the default scheduling policy. Threads executing with the SCHED_OTHER policy are in the traditional Solaris Time-Sharing (TS) scheduling class.

Solaris provides other scheduling classes, namely the Interactive timesharing (IA) class, the Fair-Share (FSS) class, and the Fixed-Priority (FX) class. Such specialized classes are not discussed here. See the Solaris priocntl(2) manual page for more information.

See LWPs and Scheduling Classes for information about the SCHED_OTHER policy.

Two scheduling scopes are available: process scope (PTHREAD_SCOPE_PROCESS) and system scope (PTHREAD_SCOPE_SYSTEM). Threads with differing scope states can coexist on the same system and even in the same process. Process scope causes such threads to contend for resources only with other such threads in the same process. System scope causes such threads to contend with all other threads in the system. In practice, beginning with the Solaris 9 release, the system makes no distinction between these two scopes.

Thread Cancellation

A thread can request the termination of any other thread in the process. The target thread, the one being cancelled, can keep cancellation requests pending as well as perform application-specific cleanup when the thread acts upon the cancellation request.

The pthreads cancellation feature permits either asynchronous or deferred termination of a thread. Asynchronous cancellation can occur at any time. Deferred cancellation can occur only at defined points. Deferred cancellation is the default type.

Thread Synchronization

Synchronization enables you to control program flow and access to shared data for concurrently executing threads.

The four synchronization models are mutex locks, read/write locks, condition variables, and semaphores.

Mutex locks allow only one thread at a time to execute a specific section of code, or to access specific data.
Read/write locks permit concurrent reads and exclusive writes to a protected shared resource. To modify a resource, a thread must first acquire the exclusive write lock. An exclusive write lock is not permitted until all read locks have been released.
Condition variables block threads until a particular condition is true.
Counting semaphores typically coordinate access to resources. The count is the limit on how many threads can have concurrent access to the data protected by the semaphore. When the count is reached, the semaphore causes the calling thread to block until the count changes. A binary semaphore (with a count of one) is similar in operation to a mutex lock.

Using the 64-bit Architecture

For application developers, the major difference between the Solaris 64-bit and 32–bit environments is the C–language data type model used. The 64-bit data type uses the LP64 model where longs and pointers are 64 bits wide. All other fundamental data types remain the same as the data types of the 32–bit implementation. The 32–bit data type uses the ILP32 model where ints, longs, and pointers are 32 bits.

The following summary briefly describes the major features and considerations for using the 64-bit environment:

Large Virtual Address Space

In the 64-bit environment, a process can have up to 64 bits of virtual address space, or 18 exabytes. The larger virtual address space is 4 billion times the current 4 Gbyte maximum of a 32-bit process. Because of hardware restrictions, however, some platforms might not support the full 64 bits of address space.

A large address space increases the number of threads that can be created with the default stack size. The default stack size is 1 megabyte on 32 bits, 2 megabytes on 64 bits. The number of threads with the default stack size is approximately 2000 threads on a 32–bit system and 8000 billion on a 64-bit system.
Kernel Memory Readers

The kernel is an LP64 object that uses 64-bit data structures internally. This means that existing 32-bit applications that use libkvm, /dev/mem, or /dev/kmem do not work properly and must be converted to 64-bit programs.
/proc Restrictions

A 32-bit program that uses /proc is able to look at 32-bit processes but is unable to understand a 64-bit process. The existing interfaces and data structures that describe the process are not large enough to contain the 64-bit quantities. Such programs must be recompiled as 64-bit programs to work for both 32-bit processes and 64-bit processes.
64-bit Libraries

32–bit applications are required to link with 32–bit libraries and 64-bit applications are required to link with 64-bit libraries. With the exception of those libraries that have become obsolete, all of the system libraries are provided in both 32–bit versions and 64-bit versions.
64-bit Arithmetic

64-bit arithmetic has long been available in previous 32–bit Solaris releases. The 64-bit implementation now provides full 64-bit machine registers for integer operations and parameter passing.
Large Files

If an application requires only large file support, the application can remain 32-bit and use the Large Files interface. To take full advantage of 64-bit capabilities, the application must be converted to 64-bit.