3.21 SSE2 Instructions
SSE2 instructions are an extension of the SIMD execution model introduced
with the MMX technology and the SSE extensions. SSE2 instructions are divided
into four subgroups:
Packed and scalar double-precision floating-point instructions
Packed single-precision floating-point conversion instructions
128-bit SIMD integer instructions
Instructions that provide cache control and instruction ordering
functionality
3.21.1 SSE2 Packed and Scalar Double-Precision Floating-Point
Instructions
The SSE2 packed and scalar double-precision floating-point instructions
operate on double-precision floating-point operands.
3.21.1.1 SSE2 Data Movement Instructions
The SSE2 data movement instructions
move double-precision floating-point data between XMM registers and memory.
Table 52 SSE2 Data Movement Instructions
| | | |
| MOVAPD
| move two aligned packed double-precision floating-point values between
XMM registers and memory
|
|
| MOVHPD
| move high packed double-precision floating-point value to or from the
high quadword of an XMM register and memory
|
|
| MOVLPD
| move low packed single-precision floating-point value to or from the
low quadword of an XMM register and memory
|
|
| MOVMSKPD
| extract sign mask from two packed double-precision floating-point values
|
|
| MOVSD
| move scalar double-precision floating-point value between XMM registers
and memory.
|
|
| MOVUPD
| move two unaligned packed double-precision floating-point values between
XMM registers and memory
|
|
|
3.21.1.2 SSE2 Packed Arithmetic Instructions
The SSE2 arithmetic instructions
operate on packed and scalar double-precision floating-point operands.
Table 53 SSE2 Packed Arithmetic Instructions
| | | |
| ADDPD
| add packed double-precision floating-point values
|
|
| ADDSD
| add scalar double-precision floating-point values
|
|
| DIVPD
| divide packed double-precision floating-point values
|
|
| DIVSD
| divide scalar double-precision floating-point values
|
|
| MAXPD
| return maximum packed double-precision floating-point values
|
|
| MAXSD
| return maximum scalar double-precision floating-point value
|
|
| MINPD
| return minimum packed double-precision floating-point values
|
|
| MINSD
| return minimum scalar double-precision floating-point value
|
|
| MULPD
| multiply packed double-precision floating-point values
|
|
| MULSD
| multiply scalar double-precision floating-point values
|
|
| SQRTPD
| compute packed square roots of packed double-precision floating-point
values
|
|
| SQRTSD
| compute scalar square root of scalar double-precision floating-point
value
|
|
| SUBPD
| subtract packed double-precision floating-point values
|
|
| SUBSD
| subtract scalar double-precision floating-point values
|
|
|
3.21.1.3 SSE2 Logical Instructions
The SSE2 logical instructions operate
on packed double-precision floating-point values.
Table 54 SSE2 Logical Instructions
| | | |
| ANDNPD
| perform bitwise logical AND NOT of packed double-precision floating-point
values
|
|
| ANDPD
| perform bitwise logical AND of packed double-precision floating-point
values
|
|
| ORPD
| perform bitwise logical OR of packed double-precision floating-point
values
|
|
| XORPD
| perform bitwise logical XOR of packed double-precision floating-point
values
|
|
|
3.21.1.4 SSE2 Compare Instructions
The SSE2 compare instructions compare
packed and scalar double-precision floating-point values and return the results
of the comparison to either the destination operand or to the EFLAGS register.
Table 55 SSE2 Compare Instructions
| | | |
| CMPPD
| compare packed double-precision floating-point values
|
|
| CMPSD
| compare scalar double-precision floating-point values
|
|
| COMISD
| perform ordered comparison of scalar double-precision floating-point
values and set flags in EFLAGS register
|
|
| UCOMISD
| perform unordered comparison of scalar double-precision floating-point
values and set flags in EFLAGS register
|
|
|
3.21.1.5 SSE2 Shuffle and Unpack Instructions
The
SSE2 shuffle and unpack instructions operate on packed double-precision floating-point
operands.
Table 56 SSE2 Shuffle and Unpack Instructions
| | | |
| SHUFPD
| shuffle values in packed double-precision floating-point operands
|
|
| UNPCKHPD
| unpack and interleave the high values from two packed double-precision
floating-point operands
|
|
| UNPCKLPD
| unpack and interleave the low values from two packed double-precision
floating-point operands
|
|
|
3.21.1.6 SSE2 Conversion Instructions
The SSE2 conversion instructions
convert packed and individual doubleword integers into packed and scalar double-precision
floating-point values (and vice versa). These instructions also convert between
packed and scalar single-precision and double-precision floating-point values.
Table 57 SSE2 Conversion Instructions
| | | |
| CVTDQ2PD
| convert packed doubleword integers to packed double-precision floating-point
values
|
|
| CVTPD2DQ
| convert packed double-precision floating-point values to packed doubleword
integers
|
|
| CVTPD2PI
| convert packed double-precision floating-point values to packed doubleword
integers
|
|
| CVTPD2PS
| convert packed double-precision floating-point values to packed single-precision
floating-point values
|
|
| CVTPI2PD
| convert packed doubleword integers to packed double-precision floating-point
values
|
|
| CVTPS2PD
| convert packed single-precision floating-point values to packed double-precision
floating-point values
|
|
| CVTSD2SI
| convert scalar double-precision floating-point values to a doubleword
integer
|
|
| CVTSD2SS
| convert scalar double-precision floating-point values to scalar single-precision
floating-point values
|
|
| CVTSI2SD
| convert doubleword integer to scalar double-precision floating-point
value
|
|
| CVTSS2SD
| convert scalar single-precision floating-point values to scalar double-precision
floating-point values
|
|
| CVTTPD2DQ
| convert with truncation packed double-precision floating-point values
to packed doubleword integers
|
|
| CVTTPD2PI
| convert with truncation packed double-precision floating-point values
to packed doubleword integers
|
|
| CVTTSD2SI
| convert with truncation scalar double-precision floating-point values
to scalar doubleword integers
|
|
|
3.21.2 SSE2 Packed Single-Precision Floating-Point Instructions
The
SSE2 packed single-precision floating-point instructions operate on single-precision
floating-point and integer operands.
Table 58 SSE2 Packed Single-Precision Floating-Point
Instructions
| | | |
| CVTDQ2PS
| convert packed doubleword integers to packed single-precision floating-point
values
|
|
| CVTPS2DQ
| convert packed single-precision floating-point values to packed doubleword
integers
|
|
| CVTTPS2DQ
| convert with truncation packed single-precision floating-point values
to packed doubleword integers
|
|
|
3.21.3 SSE2 128-Bit SIMD Integer Instructions
The
SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords
contained in XMM and MMX registers.
Table 59 SSE2 128-Bit SIMD Integer Instructions
| | | |
| MOVDQ2Q
| move quadword integer from XMM to MMX registers
|
|
| MOVDQA
| move aligned double quadword
|
|
| MOVDQU
| move unaligned double quadword
|
|
| MOVQ2DQ
| move quadword integer from MMX to XMM registers
|
|
| PADDQ
| add packed quadword integers
|
|
| PMULUDQ
| multiply packed unsigned doubleword integers
|
|
| PSHUFD
| shuffle packed doublewords
|
|
| PSHUFHW
| shuffle packed high words
|
|
| PSHUFLW
| shuffle packed low words
|
|
| PSLLDQ
| shift double quadword left logical
|
|
| PSRLDQ
| shift double quadword right logical
|
|
| PSUBQ
| subtract packed quadword integers
|
|
| PUNPCKHQDQ
| unpack high quadwords
|
|
| PUNPCKLQDQ
| unpack low quadwords
|
|
|
3.21.4 SSE2 Miscellaneous Instructions
The SSE2 instructions described
below provide additional functionality for caching non-temporal data when
storing data from XMM registers to memory, and provide additional control
of instruction ordering on store operations.
Table 60 SSE2 Miscellaneous Instructions
| | | |
| CLFLUSH
| flushes and invalidates a memory operand and its associated cache line
from all levels of the processor's cache hierarchy
|
|
| LFENCE
| serializes load operations
|
|
| MASKMOVDQU
| non-temporal store of selected bytes from an XMM register into memory
|
|
| MFENCE
| serializes load and store operations
|
|
| MOVNTDQ
| non-temporal store of double quadword from an XMM register into memory
|
|
| MOVNTI
| non-temporal store of a doubleword from a general-purpose register into
memory
| movntiq valid only under –m64
|
| MOVNTPD
| non-temporal store of two packed double-precision floating-point values
from an XMM register into memory
|
|
| PAUSE
| improves the performance of spin-wait loops
|
|
|