OpenMP uses the fork-join model of parallel execution. When a thread encounters a parallel construct, the thread creates a team composed of itself and some additional (possibly zero) helper threads. The encountering thread becomes the master of the new team. All team members execute the code in the parallel region. When a thread finishes its work within the parallel region, it waits at an implicit barrier at the end of the parallel region. When all team members have arrived at the barrier, the threads can leave the barrier. The master thread continues execution of user code in the program beyond the end of the parallel construct, while the helper threads wait to be summoned to join other teams.
OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the team executing a nested parallel region consists of one thread only (the thread that encountered the nested parallel construct). If nested parallelism is enabled, then the new team may consist of more than one thread.
The OpenMP runtime library maintains a pool of helper threads that can be used to work on parallel regions. When a thread encounters a parallel construct and requests a team of more than one thread, the thread will check the pool and grab idle threads from the pool, making them part of the team. The encountering thread might get fewer helper threads than it requests if the pool does not contain a sufficient number of idle threads. When the team finishes executing the parallel region, the helper threads are returned to the pool.