SSE2 Instructions
SSE2 instructions are an extension of the SIMD execution model introduced with
the MMX technology and the SSE extensions. SSE2 instructions are divided into
four subgroups:
-
Packed and scalar double-precision floating-point instructions
-
Packed single-precision floating-point conversion instructions
-
128–bit SIMD integer instructions
-
Instructions that provide cache control and instruction ordering functionality
SSE2 Packed and Scalar Double-Precision Floating-Point Instructions
The SSE2 packed and scalar double-precision floating-point instructions operate on double-precision floating-point
operands.
SSE2 Data Movement Instructions
The SSE2 data movement instructions move double-precision floating-point data between XMM registers and
memory.
Table 3-36 SSE2 Data Movement Instructions
|
|
|
|
|
MOVAPD |
move two aligned packed double-precision floating-point values between XMM registers and
memory |
|
|
MOVHPD |
move high packed double-precision floating-point value to or from the high quadword of
an XMM register and memory |
|
|
MOVLPD |
move low packed single-precision floating-point value to or
from the low quadword of an XMM register and memory |
|
|
MOVMSKPD |
extract sign mask from
two packed double-precision floating-point values |
|
|
MOVSD |
move scalar double-precision floating-point value between XMM registers
and memory. |
|
|
MOVUPD |
move two unaligned packed double-precision floating-point values between XMM registers and memory |
|
|
SSE2 Packed Arithmetic Instructions
The SSE2 arithmetic instructions operate on packed and scalar double-precision floating-point operands.
Table 3-37 SSE2 Packed Arithmetic Instructions
|
|
|
|
|
ADDPD |
add packed double-precision floating-point values |
|
|
ADDSD |
add scalar double-precision floating-point values |
|
|
DIVPD |
divide packed double-precision floating-point
values |
|
|
DIVSD |
divide scalar double-precision floating-point values |
|
|
MAXPD |
return maximum packed double-precision floating-point values |
|
|
MAXSD |
return maximum scalar
double-precision floating-point value |
|
|
MINPD |
return minimum packed double-precision floating-point values |
|
|
MINSD |
return minimum scalar double-precision floating-point
value |
|
|
MULPD |
multiply packed double-precision floating-point values |
|
|
MULSD |
multiply scalar double-precision floating-point values |
|
|
SQRTPD |
compute packed square roots
of packed double-precision floating-point values |
|
|
SQRTSD |
compute scalar square root of scalar double-precision floating-point
value |
|
|
SUBPD |
subtract packed double-precision floating-point values |
|
|
SUBSD |
subtract scalar double-precision floating-point values |
|
|
SSE2 Logical Instructions
The SSE2 logical instructions operate on packed double-precision floating-point values.
Table 3-38 SSE2 Logical Instructions
|
|
|
|
|
ANDNPD |
perform bitwise
logical AND NOT of packed double-precision floating-point values |
|
|
ANDPD |
perform bitwise logical AND of
packed double-precision floating-point values |
|
|
ORPD |
perform bitwise logical OR of packed double-precision floating-point values |
|
|
XORPD |
perform
bitwise logical XOR of packed double-precision floating-point values |
|
|
SSE2 Compare Instructions
The SSE2 compare instructions compare packed and scalar double-precision floating-point values and return
the results of the comparison to either the destination operand or to the
EFLAGS register.
Table 3-39 SSE2 Compare Instructions
|
|
|
|
|
CMPPD |
compare packed double-precision floating-point values |
|
|
CMPSD |
compare scalar double-precision floating-point values |
|
|
COMISD |
perform
ordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register |
|
|
UCOMISD |
perform
unordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register |
|
|
SSE2 Shuffle and Unpack Instructions
The SSE2 shuffle and unpack instructions operate on packed double-precision floating-point operands.
Table 3-40 SSE2 Shuffle and Unpack Instructions
|
|
|
|
|
SHUFPD |
shuffle values in packed double-precision floating-point operands |
|
|
UNPCKHPD |
unpack and interleave the high values
from two packed double-precision floating-point operands |
|
|
UNPCKLPD |
unpack and interleave the low values from two
packed double-precision floating-point operands |
|
|
SSE2 Conversion Instructions
The SSE2 conversion instructions convert packed and individual doubleword integers into packed and
scalar double-precision floating-point values (and vice versa). These instructions also convert between packed and
scalar single-precision and double-precision floating-point values.
Table 3-41 SSE2 Conversion Instructions
|
|
|
|
|
CVTDQ2PD |
convert packed doubleword integers to
packed double-precision floating-point values |
|
|
CVTPD2DQ |
convert packed double-precision floating-point values to packed doubleword integers |
|
|
CVTPD2PI |
convert packed
double-precision floating-point values to packed doubleword integers |
|
|
CVTPD2PS |
convert packed double-precision floating-point values to
packed single-precision floating-point values |
|
|
CVTPI2PD |
convert packed doubleword integers to packed double-precision floating-point values |
|
|
CVTPS2PD |
convert packed single-precision
floating-point values to packed double-precision floating-point values |
|
|
CVTSD2SI |
convert scalar double-precision floating-point values to a
doubleword integer |
|
|
CVTSD2SS |
convert scalar double-precision floating-point values to scalar single-precision floating-point values |
|
|
CVTSI2SD |
convert doubleword integer
to scalar double-precision floating-point value |
|
|
CVTSS2SD |
convert scalar single-precision floating-point values to scalar double-precision floating-point
values |
|
|
CVTTPD2DQ |
convert with truncation packed double-precision floating-point values to packed doubleword integers |
|
|
CVTTPD2PI |
convert with truncation
packed double-precision floating-point values to packed doubleword integers |
|
|
CVTTSD2SI |
convert with truncation scalar double-precision floating-point
values to scalar doubleword integers |
|
|
SSE2 Packed Single-Precision Floating-Point Instructions
The SSE2 packed single-precision floating-point instructions operate on single-precision floating-point and integer operands.
Table 3-42 SSE2 Packed Single-Precision Floating-Point Instructions
|
|
|
|
|
CVTDQ2PS |
convert packed doubleword integers to packed single-precision floating-point values |
|
|
CVTPS2DQ |
convert packed single-precision floating-point values
to packed doubleword integers |
|
|
CVTTPS2DQ |
convert with truncation packed single-precision floating-point values to packed doubleword
integers |
|
|
SSE2 128–Bit SIMD Integer Instructions
The SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords contained
in XMM and MMX registers.
Table 3-43 SSE2 128–Bit SIMD Integer Instructions
|
|
|
|
|
MOVDQ2Q |
move quadword integer from XMM to MMX
registers |
|
|
MOVDQA |
move aligned double quadword |
|
|
MOVDQU |
move unaligned double quadword |
|
|
MOVQ2DQ |
move quadword integer from MMX to
XMM registers |
|
|
PADDQ |
add packed quadword integers |
|
|
PMULUDQ |
multiply packed unsigned doubleword integers |
|
|
PSHUFD |
shuffle packed doublewords |
|
|
PSHUFHW |
shuffle packed
high words |
|
|
PSHUFLW |
shuffle packed low words |
|
|
PSLLDQ |
shift double quadword left logical |
|
|
PSRLDQ |
shift double quadword
right logical |
|
|
PSUBQ |
subtract packed quadword integers |
|
|
PUNPCKHQDQ |
unpack high quadwords |
|
|
PUNPCKLQDQ |
unpack low quadwords |
|
|
SSE2 Miscellaneous Instructions
The SSE2 instructions described below provide additional functionality for caching non-temporal data when
storing data from XMM registers to memory, and provide additional control of instruction
ordering on store operations.
Table 3-44 SSE2 Miscellaneous Instructions
|
|
|
|
|
CLFLUSH |
flushes and invalidates a memory operand and its
associated cache line from all levels of the processor's cache hierarchy |
|
|
LFENCE |
serializes load operations |
|
|
MASKMOVDQU |
non-temporal
store of selected bytes from an XMM register into memory |
|
|
MFENCE |
serializes load and
store operations |
|
|
MOVNTDQ |
non-temporal store of double quadword from an XMM register into memory |
|
|
MOVNTI |
non-temporal
store of a doubleword from a general-purpose register into memory |
movntiq valid only
under -xarch=amd64 |
|
MOVNTPD |
non-temporal store of two packed double-precision floating-point values from an XMM register
into memory |
|
|
PAUSE |
improves the performance of spin-wait loops |
|
|