Go to main content

x86 Assembly Language Reference Manual

Exit Print View

Updated: November 2020
 
 

3.26 SSE2 Instructions

SSE2 instructions are an extension of the SIMD execution model introduced with the MMX technology and the SSE extensions. SSE2 instructions are divided into four subgroups:

  • Packed and scalar double-precision floating-point instructions

  • Packed single-precision floating-point conversion instructions

  • 128-bit SIMD integer instructions

  • Instructions that provide cache control and instruction ordering functionality

3.26.1 SSE2 Packed and Scalar Double-Precision Floating-Point Instructions

The SSE2 packed and scalar double-precision floating-point instructions operate on double-precision floating-point operands.

3.26.1.1 SSE2 Data Movement Instructions

The SSE2 data movement instructions move double-precision floating-point data between XMM registers and memory.

Table 57  SSE2 Data Movement Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
movapd
MOVAPD
move two aligned packed double-precision floating-point values between XMM registers and memory
movhpd
MOVHPD
move high packed double-precision floating-point value to or from the high quadword of an XMM register and memory
movlpd
MOVLPD
move low packed single-precision floating-point value to or from the low quadword of an XMM register and memory
movmskpd
MOVMSKPD
extract sign mask from two packed double-precision floating-point values
movsd
MOVSD
move scalar double-precision floating-point value between XMM registers and memory.
movupd
MOVUPD
move two unaligned packed double-precision floating-point values between XMM registers and memory

3.26.1.2 SSE2 Packed Arithmetic Instructions

The SSE2 arithmetic instructions operate on packed and scalar double-precision floating-point operands.

Table 58  SSE2 Packed Arithmetic Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
addpd
ADDPD
add packed double-precision floating-point values
addsd
ADDSD
add scalar double-precision floating-point values
divpd
DIVPD
divide packed double-precision floating-point values
divsd
DIVSD
divide scalar double-precision floating-point values
maxpd
MAXPD
return maximum packed double-precision floating-point values
maxsd
MAXSD
return maximum scalar double-precision floating-point value
minpd
MINPD
return minimum packed double-precision floating-point values
minsd
MINSD
return minimum scalar double-precision floating-point value
mulpd
MULPD
multiply packed double-precision floating-point values
mulsd
MULSD
multiply scalar double-precision floating-point values
sqrtpd
SQRTPD
compute packed square roots of packed double-precision floating-point values
sqrtsd
SQRTSD
compute scalar square root of scalar double-precision floating-point value
subpd
SUBPD
subtract packed double-precision floating-point values
subsd
SUBSD
subtract scalar double-precision floating-point values

3.26.1.3 SSE2 Logical Instructions

The SSE2 logical instructions operate on packed double-precision floating-point values.

Table 59  SSE2 Logical Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
andnpd
ANDNPD
perform bitwise logical AND NOT of packed double-precision floating-point values
andpd
ANDPD
perform bitwise logical AND of packed double-precision floating-point values
orpd
ORPD
perform bitwise logical OR of packed double-precision floating-point values
xorpd
XORPD
perform bitwise logical XOR of packed double-precision floating-point values

3.26.1.4 SSE2 Compare Instructions

The SSE2 compare instructions compare packed and scalar double-precision floating-point values and return the results of the comparison to either the destination operand or to the EFLAGS register.

Table 60  SSE2 Compare Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
cmppd
CMPPD
compare packed double-precision floating-point values
cmpsd
CMPSD
compare scalar double-precision floating-point values
comisd
COMISD
perform ordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register
ucomisd
UCOMISD
perform unordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register

3.26.1.5 SSE2 Shuffle and Unpack Instructions

The SSE2 shuffle and unpack instructions operate on packed double-precision floating-point operands.

Table 61  SSE2 Shuffle and Unpack Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
shufpd
SHUFPD
shuffle values in packed double-precision floating-point operands
unpckhpd
UNPCKHPD
unpack and interleave the high values from two packed double-precision floating-point operands
unpcklpd
UNPCKLPD
unpack and interleave the low values from two packed double-precision floating-point operands

3.26.1.6 SSE2 Conversion Instructions

The SSE2 conversion instructions convert packed and individual doubleword integers into packed and scalar double-precision floating-point values (and vice versa). These instructions also convert between packed and scalar single-precision and double-precision floating-point values.

Table 62  SSE2 Conversion Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
cvtdq2pd
CVTDQ2PD
convert packed doubleword integers to packed double-precision floating-point values
cvtpd2dq
CVTPD2DQ
convert packed double-precision floating-point values to packed doubleword integers
cvtpd2pi
CVTPD2PI
convert packed double-precision floating-point values to packed doubleword integers
cvtpd2ps
CVTPD2PS
convert packed double-precision floating-point values to packed single-precision floating-point values
cvtpi2pd
CVTPI2PD
convert packed doubleword integers to packed double-precision floating-point values
cvtps2pd
CVTPS2PD
convert packed single-precision floating-point values to packed double-precision floating-point values
cvtsd2si
CVTSD2SI
convert scalar double-precision floating-point values to a doubleword integer
cvtsd2ss
CVTSD2SS
convert scalar double-precision floating-point values to scalar single-precision floating-point values
cvtsi2sd
CVTSI2SD
convert doubleword integer to scalar double-precision floating-point value
cvtss2sd
CVTSS2SD
convert scalar single-precision floating-point values to scalar double-precision floating-point values
cvttpd2dq
CVTTPD2DQ
convert with truncation packed double-precision floating-point values to packed doubleword integers
cvttpd2pi
CVTTPD2PI
convert with truncation packed double-precision floating-point values to packed doubleword integers
cvttsd2si
CVTTSD2SI
convert with truncation scalar double-precision floating-point values to scalar doubleword integers

3.26.2 SSE2 Packed Single-Precision Floating-Point Instructions

The SSE2 packed single-precision floating-point instructions operate on single-precision floating-point and integer operands.

Table 63  SSE2 Packed Single-Precision Floating-Point Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
cvtdq2ps
CVTDQ2PS
convert packed doubleword integers to packed single-precision floating-point values
cvtps2dq
CVTPS2DQ
convert packed single-precision floating-point values to packed doubleword integers
cvttps2dq
CVTTPS2DQ
convert with truncation packed single-precision floating-point values to packed doubleword integers

3.26.3 SSE2 128-Bit SIMD Integer Instructions

The SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords contained in XMM and MMX registers.

Table 64  SSE2 128-Bit SIMD Integer Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
movdq2q
MOVDQ2Q
move quadword integer from XMM to MMX registers
movdqa
MOVDQA
move aligned double quadword
movdqu
MOVDQU
move unaligned double quadword
movq2dq
MOVQ2DQ
move quadword integer from MMX to XMM registers
paddq
PADDQ
add packed quadword integers
pmuludq
PMULUDQ
multiply packed unsigned doubleword integers
pshufd
PSHUFD
shuffle packed doublewords
pshufhw
PSHUFHW
shuffle packed high words
pshuflw
PSHUFLW
shuffle packed low words
pslldq
PSLLDQ
shift double quadword left logical
psrldq
PSRLDQ
shift double quadword right logical
psubq
PSUBQ
subtract packed quadword integers
punpckhqdq
PUNPCKHQDQ
unpack high quadwords
punpcklqdq
PUNPCKLQDQ
unpack low quadwords

3.26.4 SSE2 Miscellaneous Instructions

The SSE2 instructions described in the following table provide additional functionality for caching non-temporal data when storing data from XMM registers to memory, and provide additional control of instruction ordering on store operations.

Table 65  SSE2 Miscellaneous Instructions
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
clflush
CLFLUSH
flushes and invalidates a memory operand and its associated cache line from all levels of the processor's cache hierarchy
lfence
LFENCE
serializes load operations
maskmovdqu
MASKMOVDQU
non-temporal store of selected bytes from an XMM register into memory
mfence
MFENCE
serializes load and store operations
movntdq
MOVNTDQ
non-temporal store of double quadword from an XMM register into memory
movnti
MOVNTI
non-temporal store of a doubleword from a general-purpose register into memory
movntiq valid only under –m64
movntpd
MOVNTPD
non-temporal store of two packed double-precision floating-point values from an XMM register into memory
pause
PAUSE
improves the performance of spin-wait loops