This glossary describes computer floating-point arithmetic terms. It also describes terms and acronyms associated with parallel processing.
This symbol, "||", appended to a term designates it as associated with parallel processing.
A measure of how well one number approximates another. For example, the accuracy of a computed result often reflects the extent to which errors in the computation cause it to differ from the mathematically exact result. Accuracy can be expressed in terms of significant digits (e.g., "The result is accurate to six digits") or more generally in terms of the preservation of relevant
mathematical properties (e.g., "The result has the correct algebraic sign").
A number of processors working simultaneously, each handling one element of the array, so that a single operation can apply to all elements of the array in parallel.
See cache, direct mapped cache, fully associative cache, set associative cache.
Computer control behavior where a specific operation is begun upon receipt of an indication (signal) that a particular event has occurred. Asynchronous control relies on synchronization mechanisms called locks to coordinate processors. See also mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.
A synchronization mechanism for coordinating tasks even when data accesses are not involved. A barrier is analogous to a gate. Processors or threads operating in parallel reach the gate at different times, but none can pass through until all processors reach the gate. For example, suppose at the end of each day, all bank tellers are required to tally the amount of money that was deposited, and the amount that was withdrawn. These totals are then reported to the bank vice president, who must check the grand totals to verify debits equal credits. The tellers operate at their own speeds; that is, they finish totaling their transactions at different times. The barrier mechanism prevents tellers from leaving for home before the grand total is checked. If debits do not equal credits, all tellers must return to their desks to find the error. The barrier is removed after the vice president obtains a satisfactory grand total.
The sum of the base-2 exponent and a constant (bias) chosen to make the stored exponent's range non-negative. For example, the exponent of 2-100 is stored in IEEE single precision format as (-100) + (single precision bias of 127) = 27.
The interval between any two consecutive powers of two.
A thread is waiting for a resource or data; such as, return data from a pending disk read, or waiting for another thread to unlock a resource.
For Solaris threads, a thread permanently assigned to a particular LWP is called a bound thread. Bound threads can be scheduled on a real-time basis in strict priority with respect to all other active threads in the system, not only within a process. An LWP is an entity that can be scheduled with the same default scheduling priority as any UNIX process.
Small, fast, hardware-controlled memory that acts as a buffer between a processor and main memory. Cache contains a copy of the most recently used memory locations--addresses and contents--of instructions and data. Every address reference goes first to cache. If the desired instruction or data is not in cache, a cache miss occurs. The contents are fetched across the bus from main memory into the CPU register specified in the instruction being executed and a copy is also written to cache. It is likely that the same location will be used again soon, and, if so, the address is found in cache, resulting in a cache hit. If a write to that address occurs, the hardware not only writes to cache, but can also generate a write-through to main memory.
See also associativity, circuit switching, direct mapped cache, fully associative cache, MBus, packet switching, set associative cache, write-back, write-through, XDBus.
A program does not access all of its code or data at once with equal probability. Having recently accessed information in cache increases the probability of finding information locally without having to access memory. The principle of locality states that programs access a relatively small portion of their address space at any instant of time. There are two different types of locality: temporal and spatial.
Temporal locality (locality in time) is the tendency to reuse recently accessed items. For example, most programs contain loops, so that instructions and data are likely to be accessed repeatedly. Temporal locality retains recently accessed items closer to the processor in cache rather than requiring a memory access. See also cache, competitive-caching, false sharing, write-invalidate, write-update.
Spatial locality (locality in space) is the tendency to reference items whose addresses are close to other recently accessed items. For example, accesses to elements of an array or record show a natural spatial locality. Caching takes advantage of spatial locality by moving blocks (multiple contiguous words) from memory into cache and closer to the processor. See also cache, competitive-caching, false sharing, write-invalidate, write-update.
A hardware feature of some pipeline architectures that allows the result of an operation to be used immediately as an operand for a second operation, simultaneously with the writing of the result to its destination register. The total cycle time of two chained operations is less than the sum of the stand-alone cycle times for the instructions. For example, the TI 8847 supports chaining of consecutive fadd, fsub, and fmul (of the same precision). Chained faddd/fmuld requires 12 cycles, while consecutive unchained faddd/fmuld requires 17 cycles.
A mechanism for caches to communicate with each other as well as with main memory. A dedicated connection (circuit) is established between caches or between cache and main memory. While a circuit is in place no other traffic can travel over the bus.
In systems with multiple caches, the mechanism that ensures that all processors see the same image of memory at all times.
The three floating point exceptions overflow, invalid, and division are collectively referred to as the common exceptions for the purposes of ieee_flags(3m) and ieee_handler(3m). They are called common exceptions because they are commonly trapped as errors.
Competitive-caching maintains cache coherence by using a hybrid of write-invalidate and write-update. Competitive-caching uses a counter to age shared data. Shared data is purged from cache based on a least-recently-used (LRU) algorithm. This can cause shared data to become private data again, thus eliminating the need for the cache coherency protocol to access memory (via backplane bandwidth) to keep multiple copies synchronized. See also cache, cache locality, false sharing, write-invalidate, write-update.
The execution of two or more active threads or processes in parallel. On a uniprocessor apparent concurrence is accomplished by rapidly switching between threads. On a multiprocessor system true parallel execution can be achieved. See also asynchronous control, multiprocessor system, thread.
Processes that execute in parallel in multiple processors or asynchronously on a single processor. Concurrent processes can interact with each other, and one process can suspend execution pending receipt of information from another process or the occurrence of an external event. See also process, sequential processes.
For Solaris threads, a condition variable enables threads to atomically block until a condition is satisfied. The condition is tested under the protection of a mutex lock. When the condition is false, a thread blocks on a condition variable and atomically releases the mutex waiting for the condition to change. When another thread changes the condition, it can signal the associated condition variable to cause one or more waiting threads to wake up, reacquire the mutex, and re-evaluate the condition. Condition variables can be used to synchronize threads in this process and other processes if the variable is allocated in memory that is writable and shared among the cooperating processes and have been initialized for this behavior.
In multitasking operating systems, such as the SunOS operating system, processes run for a fixed time quantum. At the end of the time quantum, the CPU receives a signal from the timer, interrupts the currently running process, and prepares to run a new process. The CPU saves the registers for the old process, and then loads the registers for the new process. Switching from the old process state to the new is known as a context switch. Time spent switching contexts is system overhead; the time required depends on the number of registers, and on whether there are special instructions to save the registers associated with a process.
control flow model||
The von Neumann model of a computer. This model specifies flow of control; that is, which instruction is executed at each step of a program. All Sun workstations are instances of the von Neumann model. See also data flow model, demand-driven dataflow.
An indivisible section of code that can only be executed by one thread at a time and is not interruptible by other threads; such as, code that accesses a shared variable. See also mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.
A resource that can only be in use by at most one thread at any given time. Where several asynchronous threads are required to coordinate their access to a critical resource, they do so by synchronization mechanisms. See also mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.
data flow model||
This computer model specifies what happens to data, and ignores instruction order. That is, computations move forward by nature of availability of data values instead of the availability of instructions. See also control flow model, demand-driven dataflow.
In multithreading, a situation where two or more threads simultaneously access a shared resource. The results are indeterminate depending on the order in which the threads accessed the resource. This situation, called a data race, can produce different results when a program is run repeatedly with the same input. See also mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.
A situation that can arise when two (or more) separately active processes compete for resources. Suppose that process P requires resources X and Y and requests their use in that order at the same time that process Q requires resources Y and X and asks for them in that order. If process P has acquired resource X and simultaneously process Q has acquired resource Y, then neither process can proceed--each process requires a resource that has been allocated to the other process.
The value that is delivered as the result of a floating-point operation that caused an exception.
A task is enabled for execution by a processor when its results are required by another task that is also enabled; such as, a graph reduction model. A graph reduction program consists of reducible expressions that are replaced by their computed values as the computation progresses through time. Most of the time, the reductions are done in parallel--nothing prevents parallel reductions except the availability of data from previous reductions. See also control flow model, data flow model.
Older nomenclature for subnormal number.
direct mapped cache||
A direct mapped cache is a one-way set associative cache. That is, each cache entry holds one block and forms a single set with one element. See also cache, cache locality, false sharing, fully associative cache, set associative cache, write-invalidate, write-update.
distributed memory architecture||
A combination of local memory and processors at each node of the interconnect network topology. Each processor can directly access only a portion of the total memory of the system. Message passing is used to communicate between any two processors, and there is no global, shared memory. Therefore, when a data structure must be shared, the program issues send/receive messages to the process that owns that structure. See also interprocess communication, message passing.
Using two words to represent a number in order to keep or increase precision. On SPARC® workstations, double precision is the 64-bit IEEE double precision.
An arithmetic exception arises when an attempted atomic arithmetic operation has no result that is acceptable universally. The meanings of atomic and acceptable vary with time and place.
The component of a floating-point number that signifies the integer power to which the base is raised in determining the value of the represented number.
A condition that occurs in cache when two unrelated data accessed independently by two threads reside in the same block. This block can end up 'ping-ponging' between caches for no valid reason. Recognizing such a case and rearranging the data structure to eliminate the false sharing greatly increases cache performance. See also cache, cache locality.
floating-point number system
A system for representing a subset of real numbers in which the spacing between representable numbers is not a fixed, absolute constant. Such a system is characterized by a base, a sign, a significand, and an exponent (usually biased). The value of the number is the signed product of its significand and the base raised to the power of the unbiased exponent.
fully associative cache||
A fully associative cache with m entries is an m-way set associative cache. That is, it has a single set with m blocks. A cache entry can reside in any of the m blocks within that set. See also cache, cache locality, direct mapped cache, false sharing, set associative cache, write-invalidate, write-update.
When a floating-point operation underflows, return a subnormal number instead of 0. This method of handling underflow minimizes the loss of accuracy in floating-point calculations on small numbers.
Extra bits used by hardware to ensure correct rounding, not accessible by software. For example, IEEE double precision operations use three hidden bits to compute a 56-bit result that is then rounded to 53 bits.
IEEE Standard 754
The standard for binary floating-point arithmetic developed by the Institute of Electrical and Electronics Engineers, published in 1985.
A fragment of assembly language code that is substituted for the function call it defines, during the inlining pass of Sun Studio compilers. Used (for example) by the math library in in-line template files (libm.il) in order to access hardware implementations of trigonometric functions and other elementary functions from C programs.
interconnection network topology||
Interconnection topology describes how the processors are connected. All networks consist of switches whose links go to processor-memory nodes and to other switches. There are four generic forms of topology: star, ring, bus, and fully-connected network. Star topology consists of a single hub processor with the other processors directly connected to the single hub, the non-hub processors are not directly connected to each other. In ring topology all processors are on a ring and communication is generally in one direction around the ring. Bus topology is noncyclic, with all nodes connected; consequently, traffic travels in both directions, and some form of arbitration is needed to determine which processor can use the bus at any particular time. In a fully-connected (crossbar) network, every processor has a bidirectional link to every other processor.
Commercially-available parallel processors use multistage network topologies. A multistage network topology is characterized by 2-dimensional grid, and boolean n-cube.
Message passing among active processes. See also circuit switching, distributed memory architecture, MBus, message passing, packet switching, shared memory, XDBus.
See interprocess communication.
Solaris threads are implemented as a user-level library, using the kernel's threads of control, that are called light-weight processes (LWPs). In the Solaris environment, a process is a collection of LWPs that share memory. Each LWP has the scheduling priority of a UNIX process and shares the resources of that process. LWPs coordinate their access to the shared memory by using synchronization mechanisms such as locks. An LWP can be thought of as a virtual CPU that executes code or system calls. The threads library schedules threads on a pool of LWPs in the process, in much the same way as the kernel schedules LWPs on a pool of processors. Each LWP is independently dispatched by the kernel, performs independent system calls, incurs independent page faults, and runs in parallel on a multiprocessor system. The LWPs are scheduled by the kernel onto the available CPU resources according to their scheduling class and priority.
A mechanism for enforcing a policy for serializing access to shared data. A thread or process uses a particular lock in order to gain access to shared memory protected by that lock. The locking and unlocking of data is voluntary in the sense that only the programmer knows what must be locked. See also data race, mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.
See light-weight process.
MBus is a bus specification for a processor/memory/IO interconnect. It is licensed by SPARC International to several silicon vendors who produce interoperating CPU modules, IO interfaces and memory controllers. MBus is a circuit-switched protocol combining read requests and response on a single bus. MBus level I defines uniprocessor signals; MBus level II defines multiprocessor extensions for the write-invalidate cache coherence mechanism.
A medium that can retain information for subsequent retrieval. The term is most frequently used for referring to a computer's internal storage that can be directly addressed by machine instructions. See also cache, distributed memory, shared memory.
In the distributed memory architecture, a mechanism for processes to communicate with each other. There is no shared data structure in which they deposit messages. Message passing allows a process to send data to another process and for the intended recipient to synchronize with the arrival of the data.
See Multiple Instruction Multiple Data, shared memory.
In the Solaris environment, function calls inside libraries are either mt-safe or not mt-safe; mt-safe code is also called "re-entrant" code. That is, several threads can simultaneously call a given function in a module and it is up to the function code to handle this. The assumption is that data shared between threads is only accessed by module functions. If mutable global data is available to clients of a module, appropriate locks must also be made visible in the interface. Furthermore, the module function cannot be made re-entrant unless the clients are assumed to use the locks consistently and at appropriate times. See also single-lock strategy.
Multiple Instruction Multiple Data||
System model where many processors can be simultaneously executing different instructions on different data. Furthermore, these processors operate in a largely autonomous manner as if they are separate computers. They have no central controller, and they typically do not operate in lock-step fashion. Most real world banks run this way. Tellers do not consult with one another, nor do they perform each step of every transaction at the same time. Instead, they work on their own, until a data access conflict occurs. Processing of transactions occurs without concern for timing or customer order. But customers A and B must be explicitly prevented from simultaneously accessing the joint AB account balance. MIMD relies on synchronization mechanisms called locks to coordinate access to shared resources. See also mutual exclusion, mutex lock, semaphore lock, single-lock strategy, spin lock.
multiple read single write||
In a concurrent environment, the first process to access data for writing has exclusive access to it, making concurrent write access or simultaneous read and write access impossible. However, the data can be read by multiple readers.
See multiprocessor system.
In a shared memory multiprocessor machine each CPU and cache module are connected together via a bus that also includes memory and IO connections. The bus enforces a cache coherency protocol. See also cache, coherence, Mbus, XDBus.
A system in which more than one processor can be active at any given time. While the processors are actively executing separate processes, they run completely asynchronously. However, synchronization between processors is essential when they access critical system resources or critical regions of system code. See also critical region, critical resource, multithreading, uniprocessor system.
In a uniprocessor system, a large number of threads appear to be running in parallel. This is accomplished by rapidly switching between threads.
Applications that can have more than one thread or processor active at one time. Multithreaded applications can run in both uniprocessor systems and multiprocessor systems. See also bound thread, mt-safe, single-lock strategy, thread, unbound thread, uniprocessor.
Synchronization variable to implement the mutual exclusion mechanism. See also condition variable, mutual exclusion.
In a concurrent environment, the ability of a thread to update a critical resource without accesses from competing threads. See also critical region, critical resource.
Stands for Not a Number. A symbolic entity that is encoded in floating-point format.
In IEEE arithmetic, a number with a biased exponent that is neither zero nor maximal (all 1's), representing a subset of the normal range of real numbers with a bounded small relative error.
In the shared memory architecture, a mechanism for caches to communicate with each other as well as with main memory. In packet switching, traffic is divided into small segments called packets that are multiplexed onto the bus. A packet carries identification that enables cache and memory hardware to determine whether the packet is destined for it or to send the packet on to its ultimate destination. Packet switching allows bus traffic to be multiplexed and unordered (not sequenced) packets to be put on the bus. The unordered packets are reassembled at the destination (cache or main memory). See also cache, shared memory.
A model of the world that is used to formulate a computer solution to a problem. Paradigms provide a context in which to understand and solve a real-world problem. Because a paradigm is a model, it abstracts the details of the problem from the reality, and in doing so, makes the problem easier to solve. Like all abstractions, however, the model can be inaccurate because it only approximates the real world. See also Multiple Instruction Multiple Data, Single Instruction Multiple Data, Single Instruction Single Data, Single Program Multiple Data.
In a multiprocessor system, true parallel execution is achieved where a large number of threads or processes can be active at one time. See also concurrence, multiprocessor system, multithreading, uniprocessor.
See concurrent processes, multithreading.
If the total function applied to the data can be divided into distinct processing phases, different portions of data can flow along from phase to phase; such as a compiler with phases for lexical analysis, parsing, type checking, code generation and so on. As soon as the first program or module has passed the lexical analysis phase, it can be passed on to the parsing phase while the lexical analyzer starts on the second program or module. See also array processing, vector processing.
A hardware feature where operations are reduced to multiple stages, each of which takes (typically) one cycle to complete. The pipeline is filled when new operations can be issued each cycle. If there are no dependencies among instructions in the pipe, new results can be delivered each cycle. Chaining implies pipelining of dependent instructions. If dependent instructions cannot be chained, when the hardware does not support chaining of those particular instructions, then the pipeline stalls.
A quantitative measure of the density of representable numbers. For example, in a binary floating point format that has a precision of 53 significant bits, there are 253 representable numbers between any two adjacent powers of two (within the range of normal numbers). Do not confuse precision with accuracy, which expresses how closely one number approximates another.
A unit of activity characterized by a single sequential thread of execution, a current state, and an associated set of system resources.
A NaN (not a number) that propagates through almost every arithmetic operation without raising new exceptions.
The base number of any system of numbers. For example, 2 is the radix of a binary system, and 10 is the radix of the decimal system of numeration. SPARC workstations use radix-2 arithmetic; IEEE Std 754 is a radix-2 arithmetic standard.
Inexact results must be rounded up or down to obtain representable values. When a result is rounded up, it is increased to the next representable value. When rounded down, it is reduced to the preceding representable value.
The error introduced when a real number is rounded to a machine-representable number. Most floating-point calculations incur roundoff error. For any one floating-point operation, IEEE Std 754 specifies that the result shall not incur more than one rounding error.
Synchronization mechanism for controlling access to critical resources by cooperating asynchronous threads. See also semaphore.
A special-purpose data type introduced by E. W. Dijkstra that coordinates access to a particular resource or set of shared resources. A semaphore has an integer value (that cannot become negative) with two operations allowed on it. The signal (V or up) operation increases the value by one, and in general indicates that a resource has become free. The wait (P or down) operation decreases the value by one, when that can be done without the value going negative, and in general indicates that a free resource is about to start being used. See also semaphore lock.
Processes that execute in such a manner that one must finish before the next begins. See also concurrent processes, process.
set associative cache||
In a set associative cache, there are a fixed number of locations (at least two) where each block can be placed. A set associative cache with n locations for a block is called an n-way set associative cache. An n-way set associative cache consists of more than one set, each of which consists of n blocks. A block can be placed in any location (element) of that set. Increasing the associativity level (number of blocks in a set) increases the cache hit rate. See also cache, cache locality, false sharing, write-invalidate, write-update.
shared memory architecture||
In a bus-connected multiprocessor system, processes or threads communicate through a global memory shared by all processors. This shared data segment is placed in the address space of the cooperating processes between their private data and stack segments. Subsequent tasks spawned by fork() copy all but the shared data segment in their address space. Shared memory requires program language extensions and library routines to support the model.
A NaN (not a number) that raises the invalid operation exception whenever it appears as an operand.
The component of a floating-point number that is multiplied by a signed power of the base to determine the value of the number. In a normalized number, the significand consists of a single nonzero digit to the left of the radix point and a fraction to the right.
See Single Instruction Multiple Data.
Single Instruction Multiple Data||
System model where there are many processing elements, but they are designed to execute the same instruction at the same time; that is, one program counter is used to sequence through a single copy of the program. SIMD is especially useful for solving problems that have lots of data that needs to be updated on a wholesale basis; such as numerical calculations that are regular. Many scientific and engineering applications (such as, image processing, particle simulation, and finite element methods) naturally fall into the SIMD paradigm. See also array processing, pipeline, vector processing.
Single Instruction Single Data||
The conventional uniprocessor model, with a single processor fetching and executing a sequence of instructions that operate on the data items specified within them. This is the original von Neumann model of the operation of a computer.
Using one computer word to represent a number.
Single Program Multiple Data||
A form of asynchronous parallelism where simultaneous processing of different data occurs without lock-step coordination. In SPMD, processors can execute different instructions at the same time; such as, different branches of an if-then-else statement.
In the single-lock strategy, a thread acquires a single, application-wide mutex lock whenever any thread in the application is running and releases the lock before the thread blocks. The single-lock strategy requires cooperation from all modules and libraries in the system to synchronize on the single lock. Because only one thread can be accessing shared data at any given time, each thread has a consistent view of memory. This strategy is quite effective in a uniprocessor, provided shared memory is put into a consistent state before the lock is released and that the lock is released often enough to allow other threads to run. Furthermore, in uniprocessor systems, concurrency is diminished if the lock is not dropped during most I/O operations. The single-lock strategy cannot be applied in a multiprocessor system.
See Single Instruction Single Data.
The most popular protocol for maintaining cache coherency is called snooping. Cache controllers monitor or snoop on the bus to determine whether or not the cache contains a copy of a shared block.
For reads, multiple copies can reside in the cache of different processors, but because the processors need the most recent copy, all processors must get new values after a write. See also cache, competitive-caching, false sharing, write-invalidate, write-update.
For writes, a processor must have exclusive access to write to cache. Writes to unshared blocks do not cause bus traffic. The consequence of a write to shared data is either to invalidate all other copies or to update the shared copies with the value being written. See also cache, competitive-caching, false sharing, write-invalidate, write-update.
Threads use a spin lock to test a lock variable over and over until some other task releases the lock. That is, the waiting thread spins on the lock until the lock is cleared. Then, the waiting thread sets the lock while inside the critical region. After work in the critical region is complete, the thread clears the spin lock so another thread can enter the critical region. The difference between a spin lock and a mutex is that an attempt to get a mutex held by someone else will block and release the LWP; a spin lock does not release the LWP. See also mutex lock.
See Single Program Multiple Data.
Standard Error is the Unix file pointer to standard error output. This file is opened when a program is started.
Flushing the underflowed result of an arithmetic operation to zero.
In IEEE arithmetic, a nonzero floating point number with a biased exponent of zero. The subnormal numbers are those between zero and the smallest normal number.
A flow of control within a single UNIX process address space. Solaris threads provide a light-weight form of concurrent task, allowing multiple threads of control in a common user-address space, with minimal scheduling and communication overhead. Threads share the same address space, file descriptors (when one thread opens a file, the other threads can read it), data structures, and operating system state. A thread has a program counter and a stack to keep track of local variables and return addresses. Threads interact through the use of shared data and thread synchronization operations. See also bound thread, light-weight processes, multithreading, unbound thread.
See interconnection network topology.
The radix complement of a binary numeral, formed by subtracting each digit from 1, then adding 1 to the least significant digit and executing any required carries. For example, the two's complement of 1101 is 0011.
Stands for unit in last place. In binary formats, the least significant bit of the significand, bit 0, is the unit in the last place.
Stands for ulp of x truncated in working format.
For Solaris threads, threads scheduled onto a pool of LWPs are called unbound threads. The threads library invokes and assigns LWPs to execute runnable threads. If the thread becomes blocked on a synchronization mechanism (such as a mutex lock) the state of the thread is saved in process memory. The threads library then assigns another thread to the LWP. See also bound thread, multithreading, thread.
A condition that occurs when the result of a floating-point arithmetic operation is so small that it cannot be represented as a normal number in the destination floating-point format with only normal roundoff.
A uniprocessor system has only one processor active at any given time. This single processor can run multithreaded applications as well as the conventional single instruction single data model. See also multithreading, single instruction single data, single-lock strategy.
Processing of sequences of data in a uniform manner, a common occurrence in manipulation of matrices (whose elements are vectors) or other arrays of data. This orderly progression of data can capitalize on the use of pipeline processing. See also array processing, pipeline.
An ordered set of characters that are stored, addressed, transmitted and operated on as a single entity within a given computer. In the context of SPARC workstations, a word is 32 bits.
In IEEE arithmetic, a number created from a value that otherwise overflows or underflows by adding a fixed offset to its exponent to position the wrapped value in the normal number range. Wrapped results are not currently produced on SPARC workstations.
Write policy for maintaining coherency between cache and main memory. Write-back (also called copy back or store in) writes only to the block in local cache. Writes occur at the speed of cache memory. The modified cache block is written to main memory only when the corresponding memory address is referenced by another processor. The processor can write within a cache block multiple times and writes it to main memory only when referenced. Because every write does not go to memory, write-back reduces demands on bus bandwidth. See also cache, coherence, write-through.
Maintains cache coherence by reading from local caches until a write occurs. To change the value of a variable the writing processor first invalidates all copies in other caches. The writing processor is then free to update its local copy until another processor asks for the variable. The writing processor issues an invalidation signal over the bus and all caches check to see if they have a copy; if so, they must invalidate the block containing the word. This scheme allows multiple readers, but only a single writer. Write-invalidate use the bus only on the first write to invalidate the other copies; subsequent local writes do not result in bus traffic, thus reducing demands on bus bandwidth. See also cache, cache locality, coherence, false sharing, write-update.
Write policy for maintaining coherency between cache and main memory. Write-through (also called store through) writes to main memory as well as to the block in local cache. Write-through has the advantage that main memory has the most current copy of the data. See also cache, coherence, write-back.
Write-update, also known as write-broadcast, maintains cache coherence by immediately updating all copies of a shared variable in all caches. This is a form of write-through because all writes go over the bus to update copies of shared data. Write-update has the advantage of making new values appear in cache sooner, which can reduce latency. See also cache, cache locality, coherence, false sharing, write-invalidate.
The XDBus specification uses low-impedance GTL (Gunning Transceiver Logic) transceiver signalling to drive longer backplanes at higher clock rates. XDBus supports a large number of CPUs with multiple interleaved memory banks for increased throughput. XDBus uses a packet switched protocol with split requests and responses for more efficient bus utilization. XDBus also defines an interleaving scheme so that one, two or four separate bus data paths can be used as a single backplane for increased throughput. XDBus supports write-invalidate, write-update and competitive-caching coherency schemes, and has several congestion control mechanisms. See also cache, coherence, competitive-caching, write-invalidate, write-update.