SSE Instructions
SSE instructions are an extension of the SIMD execution model introduced with the
MMX technology. SSE instructions are divided into four subgroups:
- 
SIMD single-precision floating-point instructions that operate on the XMM registers 
- 
MXSCR state management instructions 
- 
64–bit SIMD integer instructions that operate on the MMX registers 
- 
Instructions that provide cache control, prefetch, and instruction ordering functionality 
SIMD Single-Precision Floating-Point Instructions (SSE)
The SSE SIMD instructions operate on packed and scalar single-precision floating-point values located
in the XMM registers or memory.
Data Transfer Instructions (SSE)
The SSE data transfer instructions move packed and scalar single-precision floating-point operands between
XMM registers and between XMM registers and memory.
Table 3-27 Data Transfer Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | MOVAPS | move four
aligned packed single-precision floating-point values between XMM registers or memory |  |  
|  | MOVHLPS | move two packed
single-precision floating-point values from the high quadword of an XMM register to the
low quadword of another XMM register |  |  
|  | MOVHPS | move two packed single-precision floating-point values
to or from the high quadword of an XMM register or memory |  |  
|  | MOVLHPS | move
two packed single-precision floating-point values from the low quadword of an XMM register
to the high quadword of another XMM register |  |  
|  | MOVLPS | move two packed single-precision
floating-point values to or from the low quadword of an XMM register or
memory |  |  
|  | MOVMSKPS | extract sign mask from four packed single-precision floating-point values |  |  
|  | MOVSS | move scalar
single-precision floating-point value between XMM registers or memory |  |  
|  | MOVUPS | move four unaligned packed single-precision
floating-point values between XMM registers or memory |  | 
 | 
Packed Arithmetic Instructions (SSE)
SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and scalar
single-precision floating-point operands.
Table 3-28 Packed Arithmetic Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | ADDPS | add packed single-precision floating-point values |  |  
|  | ADDSS | add
scalar single-precision floating-point values |  |  
|  | DIVPS | divide packed single-precision floating-point values |  |  
|  | DIVSS | divide scalar single-precision
floating-point values |  |  
|  | MAXPS | return maximum packed single-precision floating-point values |  |  
|  | MAXSS | return maximum scalar single-precision
floating-point values |  |  
|  | MINPS | return minimum packed single-precision floating-point values |  |  
|  | MINSS | return minimum scalar single-precision
floating-point values. |  |  
|  | MULPS | multiply packed single-precision floating-point values |  |  
|  | MULSS | multiply scalar single-precision floating-point values |  |  
|  | RCPPS | compute reciprocals of packed single-precision floating-point values |  |  
|  | RCPSS | compute reciprocal of scalar single-precision
floating-point values |  |  
|  | RSQRTPS | compute reciprocals of square roots of packed single-precision floating-point values |  |  
|  | RSQRTSS | compute
reciprocal of square root of scalar single-precision floating-point values |  |  
|  | SQRTPS | compute square roots of
packed single-precision floating-point values |  |  
|  | SQRTSS | compute square root of scalar single-precision floating-point values |  |  
|  | SUBPS | subtract packed single-precision floating-point values |  |  
|  | SUBSS | subtract scalar single-precision floating-point values |  | 
 | 
Comparison Instructions (SSE)
The SEE compare instructions compare packed and scalar single-precision floating-point operands.
Table 3-29 Comparison Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | CMPPS | compare packed single-precision floating-point values |  |  
|  | CMPSS | compare scalar single-precision floating-point values |  |  
|  | COMISS | perform
ordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register |  |  
|  | UCOMISS | perform unordered comparison of scalar single-precision floating-point values and set flags in EFLAGS
register |  | 
 | 
Logical Instructions (SSE)
The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations
on packed single-precision floating-point operands.
Table 3-30 Logical Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | ANDNPS | perform bitwise logical AND NOT
of packed single-precision floating-point values |  |  
|  | ANDPS | perform bitwise logical AND of packed single-precision floating-point values |  |  
|  | ORPS | perform bitwise logical OR of packed single-precision floating-point values |  |  
|  | XORPS | perform bitwise logical XOR
of packed single-precision floating-point values |  | 
 | 
Shuffle and Unpack Instructions (SSE)
The SSE shuffle and unpack instructions shuffle or interleave single-precision floating-point values in packed
single-precision floating-point operands.
Table 3-31 Shuffle and Unpack Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | SHUFPS | shuffles values in packed single-precision floating-point
operands |  |  
|  | UNPCKHPS | unpacks and interleaves the two high-order values from two single-precision floating-point operands |  |  
|  | UNPCKLPS | unpacks and interleaves the two low-order values from two single-precision floating-point operands |  | 
 | 
Conversion Instructions (SSE)
The SSE conversion instructions convert packed and individual doubleword integers into packed and
scalar single-precision floating-point values.
Table 3-32 Conversion Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | CVTPI2PS | convert packed doubleword integers to packed
single-precision floating-point values |  |  
|  | CVTPS2PI | convert packed single-precision floating-point values to packed doubleword integers |  |  
|  | CVTSI2SS | convert doubleword
integer to scalar single-precision floating-point value |  |  
|  | CVTSS2SI | convert scalar single-precision floating-point value to a
doubleword integer |  |  
|  | CVTTPS2PI | convert with truncation packed single-precision floating-point values to packed doubleword integers |  |  
|  | CVTTSS2SI | convert
with truncation scalar single-precision floating-point value to scalar doubleword integer |  | 
 | 
MXCSR State Management Instructions (SSE)
The MXCSR state management instructions save and restore the state of the MXCSR
control and status register.
Table 3-33 MXCSR State Management Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | LDMXCSR | load %mxcsr register |  |  
|  | STMXCSR | save %mxcsr
register state |  | 
 | 
64–Bit SIMD Integer Instructions (SSE)
The SSE 64–bit SIMD integer instructions perform operations on packed bytes, words, or
doublewords in MMX registers.
Table 3-34 64–Bit SIMD Integer Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | PAVGB | compute average of packed unsigned byte
integers |  |  
|  | PAVGW | compute average of packed unsigned byte integers |  |  
|  | PEXTRW | extract word |  |  
|  | PINSRW | insert word |  |  
|  | PMAXSW | maximum of packed signed word integers |  |  
|  | PMAXUB | maximum of packed unsigned byte integers |  |  
|  | PMINSW | minimum of packed signed word integers |  |  
|  | PMINUB | minimum of packed unsigned byte integers |  |  
|  | PMOVMSKB | move byte mask |  |  
|  | PMULHUW | multiply packed unsigned integers and store high result |  |  
|  | PSADBW | compute
sum of absolute differences |  |  
|  | PSHUFW | shuffle packed integer word in MMX register |  | 
 | 
Miscellaneous Instructions (SSE)
The following instructions control caching, prefetching, and instruction ordering.
Table 3-35 Miscellaneous Instructions (SSE)
| |  |  |  |  | 
|---|
 
|  | MASKMOVQ | non-temporal
store of selected bytes from an MMX register into memory |  |  
|  | MOVNTPS | non-temporal store
of four packed single-precision floating-point values from an XMM register into memory |  |  
|  | MOVNTQ | non-temporal
store of quadword from an MMX register into memory |  |  
|  | PREFETCHNTA | prefetch data into
non-temporal cache structure and into a location close to the processor |  |  
|  | PREFETCHT0 | prefetch data
into all levels of the cache hierarchy |  |  
|  | PREFETCHT1 | prefetch data into level 2
cache and higher |  |  
|  | PREFETCHT2 | prefetch data into level 2 cache and higher |  |  
|  | SFENCE | serialize
store operations |  | 
 |