x86 Assembly Language Reference Manual

SSE2 Instructions

SSE2 instructions are an extension of the SIMD execution model introduced with the MMX technology and the SSE extensions. SSE2 instructions are divided into four subgroups:

SSE2 Packed and Scalar Double-Precision Floating-Point Instructions

The SSE2 packed and scalar double-precision floating-point instructions operate on double-precision floating-point operands.

SSE2 Data Movement Instructions

The SSE2 data movement instructions move double-precision floating-point data between XMM registers and memory.

Table 3–36 SSE2 Data Movement Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

movapd

MOVAPD

move two aligned packed double-precision floating-point values between XMM registers and memory 

 

movhpd

MOVHPD

move high packed double-precision floating-point value to or from the high quadword of an XMM register and memory 

 

movlpd

MOVLPD

move low packed single-precision floating-point value to or from the low quadword of an XMM register and memory 

 

movmskpd

MOVMSKPD

extract sign mask from two packed double-precision floating-point values 

 

movsd

MOVSD

move scalar double-precision floating-point value between XMM registers and memory. 

 

movupd

MOVUPD

move two unaligned packed double-precision floating-point values between XMM registers and memory 

 

SSE2 Packed Arithmetic Instructions

The SSE2 arithmetic instructions operate on packed and scalar double-precision floating-point operands.

Table 3–37 SSE2 Packed Arithmetic Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

addpd

ADDPD

add packed double-precision floating-point values 

 

addsd

ADDSD

add scalar double-precision floating-point values 

 

divpd

DIVPD

divide packed double-precision floating-point values 

 

divsd

DIVSD

divide scalar double-precision floating-point values 

 

maxpd

MAXPD

return maximum packed double-precision floating-point values 

 

maxsd

MAXSD

return maximum scalar double-precision floating-point value 

 

minpd

MINPD

return minimum packed double-precision floating-point values 

 

minsd

MINSD

return minimum scalar double-precision floating-point value 

 

mulpd

MULPD

multiply packed double-precision floating-point values 

 

mulsd

MULSD

multiply scalar double-precision floating-point values 

 

sqrtpd

SQRTPD

compute packed square roots of packed double-precision floating-point values 

 

sqrtsd

SQRTSD

compute scalar square root of scalar double-precision floating-point value 

 

subpd

SUBPD

subtract packed double-precision floating-point values 

 

subsd

SUBSD

subtract scalar double-precision floating-point values 

 

SSE2 Logical Instructions

The SSE2 logical instructions operate on packed double-precision floating-point values.

Table 3–38 SSE2 Logical Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

andnpd

ANDNPD

perform bitwise logical AND NOT of packed double-precision floating-point values 

 

andpd

ANDPD

perform bitwise logical AND of packed double-precision floating-point values 

 

orpd

ORPD

perform bitwise logical OR of packed double-precision floating-point values 

 

xorpd

XORPD

perform bitwise logical XOR of packed double-precision floating-point values 

 

SSE2 Compare Instructions

The SSE2 compare instructions compare packed and scalar double-precision floating-point values and return the results of the comparison to either the destination operand or to the EFLAGS register.

Table 3–39 SSE2 Compare Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

cmppd

CMPPD

compare packed double-precision floating-point values 

 

cmpsd

CMPSD

compare scalar double-precision floating-point values 

 

comisd

COMISD

perform ordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register 

 

ucomisd

UCOMISD

perform unordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register 

 

SSE2 Shuffle and Unpack Instructions

The SSE2 shuffle and unpack instructions operate on packed double-precision floating-point operands.

Table 3–40 SSE2 Shuffle and Unpack Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

shufpd

SHUFPD

shuffle values in packed double-precision floating-point operands 

 

unpckhpd

UNPCKHPD

unpack and interleave the high values from two packed double-precision floating-point operands 

 

unpcklpd

UNPCKLPD

unpack and interleave the low values from two packed double-precision floating-point operands 

 

SSE2 Conversion Instructions

The SSE2 conversion instructions convert packed and individual doubleword integers into packed and scalar double-precision floating-point values (and vice versa). These instructions also convert between packed and scalar single-precision and double-precision floating-point values.

Table 3–41 SSE2 Conversion Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

cvtdq2pd

CVTDQ2PD

convert packed doubleword integers to packed double-precision floating-point values 

 

cvtpd2dq

CVTPD2DQ

convert packed double-precision floating-point values to packed doubleword integers 

 

cvtpd2pi

CVTPD2PI

convert packed double-precision floating-point values to packed doubleword integers 

 

cvtpd2ps

CVTPD2PS

convert packed double-precision floating-point values to packed single-precision floating-point values 

 

cvtpi2pd

CVTPI2PD

convert packed doubleword integers to packed double-precision floating-point values 

 

cvtps2pd

CVTPS2PD

convert packed single-precision floating-point values to packed double-precision floating-point values 

 

cvtsd2si

CVTSD2SI

convert scalar double-precision floating-point values to a doubleword integer 

 

cvtsd2ss

CVTSD2SS

convert scalar double-precision floating-point values to scalar single-precision floating-point values 

 

cvtsi2sd

CVTSI2SD

convert doubleword integer to scalar double-precision floating-point value 

 

cvtss2sd

CVTSS2SD

convert scalar single-precision floating-point values to scalar double-precision floating-point values 

 

cvttpd2dq

CVTTPD2DQ

convert with truncation packed double-precision floating-point values to packed doubleword integers 

 

cvttpd2pi

CVTTPD2PI

convert with truncation packed double-precision floating-point values to packed doubleword integers 

 

cvttsd2si

CVTTSD2SI

convert with truncation scalar double-precision floating-point values to scalar doubleword integers 

 

SSE2 Packed Single-Precision Floating-Point Instructions

The SSE2 packed single-precision floating-point instructions operate on single-precision floating-point and integer operands.

Table 3–42 SSE2 Packed Single-Precision Floating-Point Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

cvtdq2ps

CVTDQ2PS

convert packed doubleword integers to packed single-precision floating-point values 

 

cvtps2dq

CVTPS2DQ

convert packed single-precision floating-point values to packed doubleword integers 

 

cvttps2dq

CVTTPS2DQ

convert with truncation packed single-precision floating-point values to packed doubleword integers 

 

SSE2 128–Bit SIMD Integer Instructions

The SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords contained in XMM and MMX registers.

Table 3–43 SSE2 128–Bit SIMD Integer Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

movdq2q

MOVDQ2Q

move quadword integer from XMM to MMX registers 

 

movdqa

MOVDQA

move aligned double quadword 

 

movdqu

MOVDQU

move unaligned double quadword 

 

movq2dq

MOVQ2DQ

move quadword integer from MMX to XMM registers 

 

paddq

PADDQ

add packed quadword integers 

 

pmuludq

PMULUDQ

multiply packed unsigned doubleword integers 

 

pshufd

PSHUFD

shuffle packed doublewords 

 

pshufhw

PSHUFHW

shuffle packed high words 

 

pshuflw

PSHUFLW

shuffle packed low words 

 

pslldq

PSLLDQ

shift double quadword left logical 

 

psrldq

PSRLDQ

shift double quadword right logical 

 

psubq

PSUBQ

subtract packed quadword integers 

 

punpckhqdq

PUNPCKHQDQ

unpack high quadwords 

 

punpcklqdq

PUNPCKLQDQ

unpack low quadwords 

 

SSE2 Miscellaneous Instructions

The SSE2 instructions described below provide additional functionality for caching non-temporal data when storing data from XMM registers to memory, and provide additional control of instruction ordering on store operations.

Table 3–44 SSE2 Miscellaneous Instructions

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

clflush

CLFLUSH

flushes and invalidates a memory operand and its associated cache line from all levels of the processor's cache hierarchy 

 

lfence

LFENCE

serializes load operations 

 

maskmovdqu

MASKMOVDQU

non-temporal store of selected bytes from an XMM register into memory 

 

mfence

MFENCE

serializes load and store operations 

 

movntdq

MOVNTDQ

non-temporal store of double quadword from an XMM register into memory 

 

movnti

MOVNTI

non-temporal store of a doubleword from a general-purpose register into memory 

movntiq valid only under -xarch=amd64

movntpd

MOVNTPD

non-temporal store of two packed double-precision floating-point values from an XMM register into memory 

 

pause

PAUSE

improves the performance of spin-wait loops