x86 Assembly Language Reference Manual

SSE2 Instructions

SSE2 instructions are an extension of the SIMD execution model introduced with the MMX technology and the SSE extensions. SSE2 instructions are divided into four subgroups:

Packed and scalar double-precision floating-point instructions
Packed single-precision floating-point conversion instructions
128–bit SIMD integer instructions
Instructions that provide cache control and instruction ordering functionality

SSE2 Packed and Scalar Double-Precision Floating-Point Instructions

The SSE2 packed and scalar double-precision floating-point instructions operate on double-precision floating-point operands.

SSE2 Data Movement Instructions

The SSE2 data movement instructions move double-precision floating-point data between XMM registers and memory.

Table 3–36 SSE2 Data Movement Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`movapd`	`MOVAPD`	move two aligned packed double-precision floating-point values between XMM registers and memory
`movhpd`	`MOVHPD`	move high packed double-precision floating-point value to or from the high quadword of an XMM register and memory
`movlpd`	`MOVLPD`	move low packed single-precision floating-point value to or from the low quadword of an XMM register and memory
`movmskpd`	`MOVMSKPD`	extract sign mask from two packed double-precision floating-point values
`movsd`	`MOVSD`	move scalar double-precision floating-point value between XMM registers and memory.
`movupd`	`MOVUPD`	move two unaligned packed double-precision floating-point values between XMM registers and memory

SSE2 Packed Arithmetic Instructions

The SSE2 arithmetic instructions operate on packed and scalar double-precision floating-point operands.

Table 3–37 SSE2 Packed Arithmetic Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`addpd`	`ADDPD`	add packed double-precision floating-point values
`addsd`	`ADDSD`	add scalar double-precision floating-point values
`divpd`	`DIVPD`	divide packed double-precision floating-point values
`divsd`	`DIVSD`	divide scalar double-precision floating-point values
`maxpd`	`MAXPD`	return maximum packed double-precision floating-point values
`maxsd`	`MAXSD`	return maximum scalar double-precision floating-point value
`minpd`	`MINPD`	return minimum packed double-precision floating-point values
`minsd`	`MINSD`	return minimum scalar double-precision floating-point value
`mulpd`	`MULPD`	multiply packed double-precision floating-point values
`mulsd`	`MULSD`	multiply scalar double-precision floating-point values
`sqrtpd`	`SQRTPD`	compute packed square roots of packed double-precision floating-point values
`sqrtsd`	`SQRTSD`	compute scalar square root of scalar double-precision floating-point value
`subpd`	`SUBPD`	subtract packed double-precision floating-point values
`subsd`	`SUBSD`	subtract scalar double-precision floating-point values

SSE2 Logical Instructions

The SSE2 logical instructions operate on packed double-precision floating-point values.

Table 3–38 SSE2 Logical Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`andnpd`	`ANDNPD`	perform bitwise logical AND NOT of packed double-precision floating-point values
`andpd`	`ANDPD`	perform bitwise logical AND of packed double-precision floating-point values
`orpd`	`ORPD`	perform bitwise logical OR of packed double-precision floating-point values
`xorpd`	`XORPD`	perform bitwise logical XOR of packed double-precision floating-point values

SSE2 Compare Instructions

The SSE2 compare instructions compare packed and scalar double-precision floating-point values and return the results of the comparison to either the destination operand or to the EFLAGS register.

Table 3–39 SSE2 Compare Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cmppd`	`CMPPD`	compare packed double-precision floating-point values
`cmpsd`	`CMPSD`	compare scalar double-precision floating-point values
`comisd`	`COMISD`	perform ordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register
`ucomisd`	`UCOMISD`	perform unordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register

SSE2 Shuffle and Unpack Instructions

The SSE2 shuffle and unpack instructions operate on packed double-precision floating-point operands.

Table 3–40 SSE2 Shuffle and Unpack Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`shufpd`	`SHUFPD`	shuffle values in packed double-precision floating-point operands
`unpckhpd`	`UNPCKHPD`	unpack and interleave the high values from two packed double-precision floating-point operands
`unpcklpd`	`UNPCKLPD`	unpack and interleave the low values from two packed double-precision floating-point operands

SSE2 Conversion Instructions

The SSE2 conversion instructions convert packed and individual doubleword integers into packed and scalar double-precision floating-point values (and vice versa). These instructions also convert between packed and scalar single-precision and double-precision floating-point values.

Table 3–41 SSE2 Conversion Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cvtdq2pd`	`CVTDQ2PD`	convert packed doubleword integers to packed double-precision floating-point values
`cvtpd2dq`	`CVTPD2DQ`	convert packed double-precision floating-point values to packed doubleword integers
`cvtpd2pi`	`CVTPD2PI`	convert packed double-precision floating-point values to packed doubleword integers
`cvtpd2ps`	`CVTPD2PS`	convert packed double-precision floating-point values to packed single-precision floating-point values
`cvtpi2pd`	`CVTPI2PD`	convert packed doubleword integers to packed double-precision floating-point values
`cvtps2pd`	`CVTPS2PD`	convert packed single-precision floating-point values to packed double-precision floating-point values
`cvtsd2si`	`CVTSD2SI`	convert scalar double-precision floating-point values to a doubleword integer
`cvtsd2ss`	`CVTSD2SS`	convert scalar double-precision floating-point values to scalar single-precision floating-point values
`cvtsi2sd`	`CVTSI2SD`	convert doubleword integer to scalar double-precision floating-point value
`cvtss2sd`	`CVTSS2SD`	convert scalar single-precision floating-point values to scalar double-precision floating-point values
`cvttpd2dq`	`CVTTPD2DQ`	convert with truncation packed double-precision floating-point values to packed doubleword integers
`cvttpd2pi`	`CVTTPD2PI`	convert with truncation packed double-precision floating-point values to packed doubleword integers
`cvttsd2si`	`CVTTSD2SI`	convert with truncation scalar double-precision floating-point values to scalar doubleword integers

SSE2 Packed Single-Precision Floating-Point Instructions

The SSE2 packed single-precision floating-point instructions operate on single-precision floating-point and integer operands.

Table 3–42 SSE2 Packed Single-Precision Floating-Point Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cvtdq2ps`	`CVTDQ2PS`	convert packed doubleword integers to packed single-precision floating-point values
`cvtps2dq`	`CVTPS2DQ`	convert packed single-precision floating-point values to packed doubleword integers
`cvttps2dq`	`CVTTPS2DQ`	convert with truncation packed single-precision floating-point values to packed doubleword integers

SSE2 128–Bit SIMD Integer Instructions

The SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords contained in XMM and MMX registers.

Table 3–43 SSE2 128–Bit SIMD Integer Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`movdq2q`	`MOVDQ2Q`	move quadword integer from XMM to MMX registers
`movdqa`	`MOVDQA`	move aligned double quadword
`movdqu`	`MOVDQU`	move unaligned double quadword
`movq2dq`	`MOVQ2DQ`	move quadword integer from MMX to XMM registers
`paddq`	`PADDQ`	add packed quadword integers
`pmuludq`	`PMULUDQ`	multiply packed unsigned doubleword integers
`pshufd`	`PSHUFD`	shuffle packed doublewords
`pshufhw`	`PSHUFHW`	shuffle packed high words
`pshuflw`	`PSHUFLW`	shuffle packed low words
`pslldq`	`PSLLDQ`	shift double quadword left logical
`psrldq`	`PSRLDQ`	shift double quadword right logical
`psubq`	`PSUBQ`	subtract packed quadword integers
`punpckhqdq`	`PUNPCKHQDQ`	unpack high quadwords
`punpcklqdq`	`PUNPCKLQDQ`	unpack low quadwords

SSE2 Miscellaneous Instructions

The SSE2 instructions described below provide additional functionality for caching non-temporal data when storing data from XMM registers to memory, and provide additional control of instruction ordering on store operations.

Table 3–44 SSE2 Miscellaneous Instructions


Solaris Mnemonic	Intel/AMD Mnemonic	Description	Notes
`clflush`	`CLFLUSH`	flushes and invalidates a memory operand and its associated cache line from all levels of the processor's cache hierarchy
`lfence`	`LFENCE`	serializes load operations
`maskmovdqu`	`MASKMOVDQU`	non-temporal store of selected bytes from an XMM register into memory
`mfence`	`MFENCE`	serializes load and store operations
`movntdq`	`MOVNTDQ`	non-temporal store of double quadword from an XMM register into memory
`movnti`	`MOVNTI`	non-temporal store of a doubleword from a general-purpose register into memory	`movntiq` valid only under `-xarch=amd64`
`movntpd`	`MOVNTPD`	non-temporal store of two packed double-precision floating-point values from an XMM register into memory
`pause`	`PAUSE`	improves the performance of spin-wait loops