Sun Microsystems Documentation
Table of Contents

Making Sense of Parallel Programming Terms

By Susan Morgan, January 2008

This article explains common parallel and multithreading concepts, and differentiates between the hardware and software aspects of parallel processing. It briefly explains the hardware architectures that make parallel processing possible. The article describes several popular parallel programming models. It also makes connections between parallel processing concepts and related Sun hardware and software offerings.

Parallel Processing and Programming Terms

The terms parallel computing, parallel processing, and parallel programming are sometimes used in ambiguous ways, or are not clearly defined and differentiated. Parallel computing is a term that encompasses all the technologies used in running multiple tasks simultaneously on multiple processors. Parallel processing, or parallelism, is accomplished by dividing one single runtime task into multiple, independent, smaller tasks. The tasks can execute simultaneously when more than one processor is available. If only one processor is available, the tasks execute sequentially. On a modern high-speed single processor, the tasks might appear to run at the same time, but in reality they cannot be executed simultaneously on a single processor.

Parallel programming, or multithreaded programming, is the software methodology used to implement parallel processing. The program must include instructions to inform the runtime system which parts of the application can be executed simultaneously. The program is then said to be parallelized. Parallel programming is a performance optimization technique that attempts to reduce the “wall clock” runtime of an application by enabling the program to handle many activities simultaneously.

Parallel programming can be implemented using several different software interfaces, or parallel programming models.

This article explains the common parallel and multithreading concepts, and the differences between the hardware and software aspects of parallel processing. It briefly describes the hardware architectures that make parallel processing possible, and presents several popular parallel programming models. Pointers to other locations where you can read more about specific topics are included.

Parallel Processing

Parallel processing is a general term for the process of dividing tasks into multiple subtasks that can execute at the same time. These subtasks are known as threads, which are runtime entities that are able to independently execute a stream of instructions. Parallel processing can occur at the hardware level and at the software level. Distinguishing between these types of parallel processing is important. At the software level, an application might be rewritten to take advantage of parallelism in the code. With the right hardware support, such as a multiprocessing system, the threads can then execute simultaneously at runtime. If not enough processors or cores are available for all the threads to run simultaneously, certain tasks might still execute one after the other. The common way to describe such non-parallel execution is to say these tasks execute sequentially or serially.

Parallelism in the Hardware

Execution of a parallel application is dependent on hardware design. However, even when the system is capable of parallel execution, the software must still divide, schedule, and manage the tasks.

Parallelism by Software Programming Models

The Solaris OS kernel and most Solaris services have been multithreaded and optimized for many years in order to take advantage of multiprocessor architectures. Sun continues to invest in parallelizing and optimizing Solaris software to fully support emerging parallel architectures. For a single application to benefit from a multiprocessor architecture including clusters and grids, the program should be parallelized using one of the parallel programming models. In all cases, the application's use of parallelism must improve performance enough to surpass the processing overhead that comes with the programming model. The creation and management of threads are examples of processing overhead.

The programming model used in any application depends on the underlying hardware architecture of the system on which the application is expected to run. Specifically, the developer must distinguish between a shared memory system and a distributed memory system. In a shared memory architecture, the application can transparently access any memory location. A multicore processor is an example of a shared memory system. In a distributed memory environment, the application can only transparently access the memory of the node it is running on. Access to the memory on another node has to be explicitly arranged within the application. Clusters and grids are examples of distributed memory systems.

For more information about parallel computing software models, see the technical article Developing Applications for Parallel Computing by Liang Chen.

Shared Memory Programming Models

Shared memory, or multithreaded, programming is sometimes also called threaded programming. In this context, threads are lightweight processes, which are processes that exist within a single operating system process. Threads share the same memory address space and state information of the process that contains them. The containing process is sometimes also called the parent process. The shared memory model is supported on computers that have multiple processors, where each core or processor has access to the same shared memory. Such a system has a single address space. Communication and data exchange between the threads takes place through shared memory.

Parallel programming can be implemented for shared memory systems using any of the following models.

Distributed Memory Programming Models

Developers can implement the parallelism in an application by using a very low-level communication interface, such as sockets, between networked computers. However, using such a method is the equivalent of using assembly language programming for applications: very powerful, but also very minimal. As a result, an application parallelized using such an API might be hard to maintain and expand.

The Message Passing Interface (MPI) model is commonly used to parallelize applications for a cluster of computers, or a grid. Like OpenMP, this interface is an additional software layer on top of basic OS functionality. MPI is built on top of a software networking interface, such as sockets, with a protocol such as TCP/IP. MPI provides a rich set of communication routines, and is widely available.

An MPI program is a sequential C, C++, or Fortran program that runs on a subset of processors, or all processors or cores in the cluster. The programmer implements the distribution of the tasks and communication between the tasks, and decides how the work is allocated to the various threads. To this end, the program needs to be augmented with calls to MPI library functions, for example, to send and receive information from other threads.

MPI is a very explicit programming model. Although some convenience functionality is provided, such as a global broadcast operation, the developer has to specifically design the parallel application for this programming model. Many low-level details also need to be handled explicitly.

The advantage to MPI is that an application can run on any type of cluster that has the software to support the MPI programming model. Although originally MPI programs mainly ran on clusters of single processor workstations or PCs, running an MPI application on one or more shared memory computers is now common. An optimized MPI implementation can then also take advantage of the faster communication over shared memory for those threads executing in the same system.

The following resources provide more information about MPI:

Hybrid Programming Models

With the emergence of multicore systems, an increasing number of clusters and grids are parallel systems with two layers. Within a single node, fast communication through shared memory can be exploited, and a networking protocol can be used to communicate across the nodes. Programs can take advantage of both shared memory and distributed memory.

The MPI model can be used to run parallel applications on clusters of multicore systems. MPI applications run across the nodes as well as within each node, so both parallelization layers, shared and distributed, could be used through MPI. In certain situations, however, adding the finer-grained parallelization offered by a shared memory programming model such as Pthreads or OpenMP is more efficient. Typically, parallel execution over the nodes is achieved through MPI. Within one node, Pthreads or OpenMP is used. When two programming models are used in one application, the application is said to be parallelized with a hybrid or mixed-mode programming model.

Another hybrid programming model that is sometimes used is to combine Pthreads and OpenMP. This type of application only runs in one shared-memory system. Each Pthread process is further parallelized using OpenMP, taking advantage of the additional parallelism offered by this type of process.

Sun Parallel Application Development Software

Sun offers software products to support the technologies discussed in this article.

For Shared Memory Systems

Sun software for shared memory systems includes:

Threads – POSIX threads and Solaris threads libraries are both included in the Solaris libc library.

OpenMP – An implementation of OpenMP for C, C++ and Fortran is included in the Sun Studio software, which is free to download. The -xopenmp compile and link-time option instructs the Sun Studio compiler to recognize OpenMP directives and runtime functions in a program. The OpenMP runtime support library, libmtsk, provides support for thread management, synchronization, and scheduling of work. The library is implemented on top of the POSIX threads library.

For Distributed Memory Systems

An implementation of MPI is included in Sun HPC ClusterTools. This product also includes driver compile scripts and tools to query and manage the jobs at runtime. Note that multiple versions of Sun HPC ClusterTools are available. The ClusterTools 5 and ClusterTools 6 software includes the Sun implementation of MPI, called Sun MPI. The ClusterTools 7 software includes the newer open-source implementation of MPI, called Open MPI. The Sun HPC ClusterTools 7.1 Software Migration Guide describes the differences between Sun MPI and Open MPI to help in upgrading applications that use Sun MPI functions to run with Open MPI. For complete ClusterTools information, see Sun HPC ClusterTools 7.1 Documentation.

Hardware and Software for HPC Cluster and Grid

Sun products for implementing and managing clusters include:

Sun grid computing products include:

See Sun High Performance Computing for more information about HPC at Sun, including the Sun Constellation System, an HPC integrated environment including high performance system and storage hardware, software, and developer tools.

Would you recommend this Sun site to a friend or colleague?
ContactAbout SunNewsEmploymentSite MapPrivacyTerms of UseTrademarksCopyright Sun Microsystems, Inc.