Collective calls are matched on a communicator according to the order in which the calls are issued at each processor. All the processes on a given communicator must make the same collective call. You can avoid the effects of this restriction on the threads on a given processor by using a different communicator for each thread.
No process that belongs to the communicator may omit making a particular collective call; that is, none should be left "dangling."