SSE2 instructions are an extension of the SIMD execution model introduced with the MMX technology and the SSE extensions. SSE2 instructions are divided into four subgroups:
Packed and scalar double-precision floating-point instructions
Packed single-precision floating-point conversion instructions
128–bit SIMD integer instructions
Instructions that provide cache control and instruction ordering functionality
The SSE2 packed and scalar double-precision floating-point instructions operate on double-precision floating-point operands.
The SSE2 data movement instructions move double-precision floating-point data between XMM registers and memory.
Table 3–36 SSE2 Data Movement Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
MOVAPD |
move two aligned packed double-precision floating-point values between XMM registers and memory | ||
MOVHPD |
move high packed double-precision floating-point value to or from the high quadword of an XMM register and memory | ||
MOVLPD |
move low packed single-precision floating-point value to or from the low quadword of an XMM register and memory | ||
MOVMSKPD |
extract sign mask from two packed double-precision floating-point values | ||
MOVSD |
move scalar double-precision floating-point value between XMM registers and memory. | ||
MOVUPD |
move two unaligned packed double-precision floating-point values between XMM registers and memory |
The SSE2 arithmetic instructions operate on packed and scalar double-precision floating-point operands.
Table 3–37 SSE2 Packed Arithmetic Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
ADDPD |
add packed double-precision floating-point values | ||
ADDSD |
add scalar double-precision floating-point values | ||
DIVPD |
divide packed double-precision floating-point values | ||
DIVSD |
divide scalar double-precision floating-point values | ||
MAXPD |
return maximum packed double-precision floating-point values | ||
MAXSD |
return maximum scalar double-precision floating-point value | ||
MINPD |
return minimum packed double-precision floating-point values | ||
MINSD |
return minimum scalar double-precision floating-point value | ||
MULPD |
multiply packed double-precision floating-point values | ||
MULSD |
multiply scalar double-precision floating-point values | ||
SQRTPD |
compute packed square roots of packed double-precision floating-point values | ||
SQRTSD |
compute scalar square root of scalar double-precision floating-point value | ||
SUBPD |
subtract packed double-precision floating-point values | ||
SUBSD |
subtract scalar double-precision floating-point values |
The SSE2 logical instructions operate on packed double-precision floating-point values.
Table 3–38 SSE2 Logical Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
ANDNPD |
perform bitwise logical AND NOT of packed double-precision floating-point values | ||
ANDPD |
perform bitwise logical AND of packed double-precision floating-point values | ||
ORPD |
perform bitwise logical OR of packed double-precision floating-point values | ||
XORPD |
perform bitwise logical XOR of packed double-precision floating-point values |
The SSE2 compare instructions compare packed and scalar double-precision floating-point values and return the results of the comparison to either the destination operand or to the EFLAGS register.
Table 3–39 SSE2 Compare Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
CMPPD |
compare packed double-precision floating-point values | ||
CMPSD |
compare scalar double-precision floating-point values | ||
COMISD |
perform ordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register | ||
UCOMISD |
perform unordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register |
The SSE2 shuffle and unpack instructions operate on packed double-precision floating-point operands.
Table 3–40 SSE2 Shuffle and Unpack Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
SHUFPD |
shuffle values in packed double-precision floating-point operands | ||
UNPCKHPD |
unpack and interleave the high values from two packed double-precision floating-point operands | ||
UNPCKLPD |
unpack and interleave the low values from two packed double-precision floating-point operands |
The SSE2 conversion instructions convert packed and individual doubleword integers into packed and scalar double-precision floating-point values (and vice versa). These instructions also convert between packed and scalar single-precision and double-precision floating-point values.
Table 3–41 SSE2 Conversion Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
CVTDQ2PD |
convert packed doubleword integers to packed double-precision floating-point values | ||
CVTPD2DQ |
convert packed double-precision floating-point values to packed doubleword integers | ||
CVTPD2PI |
convert packed double-precision floating-point values to packed doubleword integers | ||
CVTPD2PS |
convert packed double-precision floating-point values to packed single-precision floating-point values | ||
CVTPI2PD |
convert packed doubleword integers to packed double-precision floating-point values | ||
CVTPS2PD |
convert packed single-precision floating-point values to packed double-precision floating-point values | ||
CVTSD2SI |
convert scalar double-precision floating-point values to a doubleword integer | ||
CVTSD2SS |
convert scalar double-precision floating-point values to scalar single-precision floating-point values | ||
CVTSI2SD |
convert doubleword integer to scalar double-precision floating-point value | ||
CVTSS2SD |
convert scalar single-precision floating-point values to scalar double-precision floating-point values | ||
CVTTPD2DQ |
convert with truncation packed double-precision floating-point values to packed doubleword integers | ||
CVTTPD2PI |
convert with truncation packed double-precision floating-point values to packed doubleword integers | ||
CVTTSD2SI |
convert with truncation scalar double-precision floating-point values to scalar doubleword integers |
The SSE2 packed single-precision floating-point instructions operate on single-precision floating-point and integer operands.
Table 3–42 SSE2 Packed Single-Precision Floating-Point Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
CVTDQ2PS |
convert packed doubleword integers to packed single-precision floating-point values | ||
CVTPS2DQ |
convert packed single-precision floating-point values to packed doubleword integers | ||
CVTTPS2DQ |
convert with truncation packed single-precision floating-point values to packed doubleword integers |
The SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords contained in XMM and MMX registers.
Table 3–43 SSE2 128–Bit SIMD Integer Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
MOVDQ2Q |
move quadword integer from XMM to MMX registers | ||
MOVDQA |
move aligned double quadword | ||
MOVDQU |
move unaligned double quadword | ||
MOVQ2DQ |
move quadword integer from MMX to XMM registers | ||
PADDQ |
add packed quadword integers | ||
PMULUDQ |
multiply packed unsigned doubleword integers | ||
PSHUFD |
shuffle packed doublewords | ||
PSHUFHW |
shuffle packed high words | ||
PSHUFLW |
shuffle packed low words | ||
PSLLDQ |
shift double quadword left logical | ||
PSRLDQ |
shift double quadword right logical | ||
PSUBQ |
subtract packed quadword integers | ||
PUNPCKHQDQ |
unpack high quadwords | ||
PUNPCKLQDQ |
unpack low quadwords |
The SSE2 instructions described below provide additional functionality for caching non-temporal data when storing data from XMM registers to memory, and provide additional control of instruction ordering on store operations.
Table 3–44 SSE2 Miscellaneous Instructions
Solaris Mnemonic |
Intel/AMD Mnemonic |
Description |
Notes |
---|---|---|---|
CLFLUSH |
flushes and invalidates a memory operand and its associated cache line from all levels of the processor's cache hierarchy | ||
LFENCE |
serializes load operations | ||
MASKMOVDQU |
non-temporal store of selected bytes from an XMM register into memory | ||
MFENCE |
serializes load and store operations | ||
MOVNTDQ |
non-temporal store of double quadword from an XMM register into memory | ||
MOVNTI |
non-temporal store of a doubleword from a general-purpose register into memory |
movntiq valid only under -xarch=amd64 |
|
MOVNTPD |
non-temporal store of two packed double-precision floating-point values from an XMM register into memory | ||
PAUSE |
improves the performance of spin-wait loops |