SSE Instructions
SSE instructions are an extension of the SIMD execution model introduced with
the MMX technology. SSE instructions are divided into four subgroups:
- 
SIMD single-precision floating-point instructions that operate on the XMM registers
 
- 
MXSCR state management instructions
 
- 
64–bit SIMD integer instructions that operate on the MMX registers
 
- 
Instructions that provide cache control, prefetch, and instruction ordering functionality
 
SIMD Single-Precision Floating-Point Instructions (SSE)
The SSE SIMD instructions operate on packed and scalar single-precision floating-point values
located in the XMM registers or memory.
Data Transfer Instructions (SSE)
The SSE data transfer instructions move packed and scalar single-precision floating-point operands between
XMM registers and between XMM registers and memory.
Table 3-27 Data Transfer Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
MOVAPS  | 
move four aligned packed
single-precision floating-point values between XMM registers or memory  | 
 | 
 
 | 
MOVHLPS  | 
move two packed single-precision floating-point values
from the high quadword of an XMM register to the low quadword of
another XMM register  | 
 | 
 
 | 
MOVHPS  | 
move two packed single-precision floating-point values to or from the
high quadword of an XMM register or memory  | 
 | 
 
 | 
MOVLHPS  | 
move two packed single-precision floating-point values
from the low quadword of an XMM register to the high quadword of
another XMM register  | 
 | 
 
 | 
MOVLPS  | 
move two packed single-precision floating-point values to or from the
low quadword of an XMM register or memory  | 
 | 
 
 | 
MOVMSKPS  | 
extract sign mask from four packed
single-precision floating-point values  | 
 | 
 
 | 
MOVSS  | 
move scalar single-precision floating-point value between XMM registers or memory  | 
 | 
 
 | 
MOVUPS  | 
move
four unaligned packed single-precision floating-point values between XMM registers or memory  | 
 | 
 
 
 | 
Packed Arithmetic Instructions (SSE)
SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and
scalar single-precision floating-point operands.
Table 3-28 Packed Arithmetic Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
ADDPS  | 
add packed single-precision floating-point values  | 
 | 
 
 | 
ADDSS  | 
add scalar single-precision floating-point
values  | 
 | 
 
 | 
DIVPS  | 
divide packed single-precision floating-point values  | 
 | 
 
 | 
DIVSS  | 
divide scalar single-precision floating-point values  | 
 | 
 
 | 
MAXPS  | 
return maximum packed single-precision
floating-point values  | 
 | 
 
 | 
MAXSS  | 
return maximum scalar single-precision floating-point values  | 
 | 
 
 | 
MINPS  | 
return minimum packed single-precision floating-point values  | 
 | 
 
 | 
MINSS  | 
return
minimum scalar single-precision floating-point values.  | 
 | 
 
 | 
MULPS  | 
multiply packed single-precision floating-point values  | 
 | 
 
 | 
MULSS  | 
multiply scalar single-precision floating-point
values  | 
 | 
 
 | 
RCPPS  | 
compute reciprocals of packed single-precision floating-point values  | 
 | 
 
 | 
RCPSS  | 
compute reciprocal of scalar single-precision floating-point
values  | 
 | 
 
 | 
RSQRTPS  | 
compute reciprocals of square roots of packed single-precision floating-point values  | 
 | 
 
 | 
RSQRTSS  | 
compute reciprocal of square
root of scalar single-precision floating-point values  | 
 | 
 
 | 
SQRTPS  | 
compute square roots of packed single-precision floating-point values  | 
 | 
 
 | 
SQRTSS  | 
compute
square root of scalar single-precision floating-point values  | 
 | 
 
 | 
SUBPS  | 
subtract packed single-precision floating-point values  | 
 | 
 
 | 
SUBSS  | 
subtract scalar
single-precision floating-point values  | 
 | 
 
 
 | 
Comparison Instructions (SSE)
The SEE compare instructions compare packed and scalar single-precision floating-point operands.
Table 3-29 Comparison Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
CMPPS  | 
compare
packed single-precision floating-point values  | 
 | 
 
 | 
CMPSS  | 
compare scalar single-precision floating-point values  | 
 | 
 
 | 
COMISS  | 
perform ordered comparison of scalar
single-precision floating-point values and set flags in EFLAGS register  | 
 | 
 
 | 
UCOMISS  | 
perform unordered comparison of scalar
single-precision floating-point values and set flags in EFLAGS register  | 
 | 
 
 
 | 
Logical Instructions (SSE)
The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations
on packed single-precision floating-point operands.
Table 3-30 Logical Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
ANDNPS  | 
perform bitwise logical AND NOT of
packed single-precision floating-point values  | 
 | 
 
 | 
ANDPS  | 
perform bitwise logical AND of packed single-precision floating-point values  | 
 | 
 
 | 
ORPS  | 
perform
bitwise logical OR of packed single-precision floating-point values  | 
 | 
 
 | 
XORPS  | 
perform bitwise logical XOR of
packed single-precision floating-point values  | 
 | 
 
 
 | 
Shuffle and Unpack Instructions (SSE)
 The SSE shuffle and unpack instructions shuffle or interleave single-precision floating-point values in
packed single-precision floating-point operands.
Table 3-31 Shuffle and Unpack Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
SHUFPS  | 
shuffles values in packed single-precision floating-point operands  | 
 | 
 
 | 
UNPCKHPS  | 
unpacks and
interleaves the two high-order values from two single-precision floating-point operands  | 
 | 
 
 | 
UNPCKLPS  | 
unpacks and interleaves the
two low-order values from two single-precision floating-point operands  | 
 | 
 
 
 | 
Conversion Instructions (SSE)
The SSE conversion instructions convert packed and individual doubleword integers into packed and
scalar single-precision floating-point values.
Table 3-32 Conversion Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
CVTPI2PS  | 
convert packed doubleword integers to packed single-precision floating-point values  | 
 | 
 
 | 
CVTPS2PI  | 
convert
packed single-precision floating-point values to packed doubleword integers  | 
 | 
 
 | 
CVTSI2SS  | 
convert doubleword integer to scalar
single-precision floating-point value  | 
 | 
 
 | 
CVTSS2SI  | 
convert scalar single-precision floating-point value to a doubleword integer  | 
 | 
 
 | 
CVTTPS2PI  | 
convert with
truncation packed single-precision floating-point values to packed doubleword integers  | 
 | 
 
 | 
CVTTSS2SI  | 
convert with truncation scalar single-precision
floating-point value to scalar doubleword integer  | 
 | 
 
 
 | 
MXCSR State Management Instructions (SSE)
The MXCSR state management instructions save and restore the state of the MXCSR
control and status register.
Table 3-33 MXCSR State Management Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
LDMXCSR  | 
load %mxcsr register  | 
 | 
 
 | 
STMXCSR  | 
save %mxcsr register state  | 
 | 
 
 
 | 
64–Bit SIMD Integer Instructions (SSE)
The SSE 64–bit SIMD integer instructions perform operations on packed bytes, words, or
doublewords in MMX registers.
Table 3-34 64–Bit SIMD Integer Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
PAVGB  | 
compute average of packed unsigned byte integers  | 
 | 
 
 | 
PAVGW  | 
compute average
of packed unsigned byte integers  | 
 | 
 
 | 
PEXTRW  | 
extract word  | 
 | 
 
 | 
PINSRW  | 
insert word  | 
 | 
 
 | 
PMAXSW  | 
maximum of packed signed word integers  | 
 | 
 
 | 
PMAXUB  | 
maximum
of packed unsigned byte integers  | 
 | 
 
 | 
PMINSW  | 
minimum of packed signed word integers  | 
 | 
 
 | 
PMINUB  | 
minimum of packed
unsigned byte integers  | 
 | 
 
 | 
PMOVMSKB  | 
move byte mask  | 
 | 
 
 | 
PMULHUW  | 
multiply packed unsigned integers and store high result  | 
 | 
 
 | 
PSADBW  | 
compute
sum of absolute differences  | 
 | 
 
 | 
PSHUFW  | 
shuffle packed integer word in MMX register  | 
 | 
 
 
 | 
Miscellaneous Instructions (SSE)
The following instructions control caching, prefetching, and instruction ordering.
Table 3-35 Miscellaneous Instructions (SSE)
 | 
 | 
 | 
 | 
 
 | 
MASKMOVQ  | 
non-temporal store of
selected bytes from an MMX register into memory  | 
 | 
 
 | 
MOVNTPS  | 
non-temporal store of four packed
single-precision floating-point values from an XMM register into memory  | 
 | 
 
 | 
MOVNTQ  | 
non-temporal store of quadword from
an MMX register into memory  | 
 | 
 
 | 
PREFETCHNTA  | 
prefetch data into non-temporal cache structure and into
a location close to the processor  | 
 | 
 
 | 
PREFETCHT0  | 
prefetch data into all levels of the cache
hierarchy  | 
 | 
 
 | 
PREFETCHT1  | 
prefetch data into level 2 cache and higher  | 
 | 
 
 | 
PREFETCHT2  | 
prefetch data into level 2
cache and higher  | 
 | 
 
 | 
SFENCE  | 
serialize store operations  | 
 | 
 
 
 |