SSE2 Instructions - x86 Assembly Language Reference Manual

Language:

3.21 SSE2 Instructions

SSE2 instructions are an extension of the SIMD execution model introduced with the MMX technology and the SSE extensions. SSE2 instructions are divided into four subgroups:

Packed and scalar double-precision floating-point instructions
Packed single-precision floating-point conversion instructions
128-bit SIMD integer instructions
Instructions that provide cache control and instruction ordering functionality

3.21.1 SSE2 Packed and Scalar Double-Precision Floating-Point Instructions

The SSE2 packed and scalar double-precision floating-point instructions operate on double-precision floating-point operands.

3.21.1.1 SSE2 Data Movement Instructions

The SSE2 data movement instructions move double-precision floating-point data between XMM registers and memory.

Table 52 SSE2 Data Movement Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`movapd`	`MOVAPD`	move two aligned packed double-precision floating-point values between XMM registers and memory
`movhpd`	`MOVHPD`	move high packed double-precision floating-point value to or from the high quadword of an XMM register and memory
`movlpd`	`MOVLPD`	move low packed single-precision floating-point value to or from the low quadword of an XMM register and memory
`movmskpd`	`MOVMSKPD`	extract sign mask from two packed double-precision floating-point values
`movsd`	`MOVSD`	move scalar double-precision floating-point value between XMM registers and memory.
`movupd`	`MOVUPD`	move two unaligned packed double-precision floating-point values between XMM registers and memory

3.21.1.2 SSE2 Packed Arithmetic Instructions

The SSE2 arithmetic instructions operate on packed and scalar double-precision floating-point operands.

Table 53 SSE2 Packed Arithmetic Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`addpd`	`ADDPD`	add packed double-precision floating-point values
`addsd`	`ADDSD`	add scalar double-precision floating-point values
`divpd`	`DIVPD`	divide packed double-precision floating-point values
`divsd`	`DIVSD`	divide scalar double-precision floating-point values
`maxpd`	`MAXPD`	return maximum packed double-precision floating-point values
`maxsd`	`MAXSD`	return maximum scalar double-precision floating-point value
`minpd`	`MINPD`	return minimum packed double-precision floating-point values
`minsd`	`MINSD`	return minimum scalar double-precision floating-point value
`mulpd`	`MULPD`	multiply packed double-precision floating-point values
`mulsd`	`MULSD`	multiply scalar double-precision floating-point values
`sqrtpd`	`SQRTPD`	compute packed square roots of packed double-precision floating-point values
`sqrtsd`	`SQRTSD`	compute scalar square root of scalar double-precision floating-point value
`subpd`	`SUBPD`	subtract packed double-precision floating-point values
`subsd`	`SUBSD`	subtract scalar double-precision floating-point values

3.21.1.3 SSE2 Logical Instructions

The SSE2 logical instructions operate on packed double-precision floating-point values.

Table 54 SSE2 Logical Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`andnpd`	`ANDNPD`	perform bitwise logical AND NOT of packed double-precision floating-point values
`andpd`	`ANDPD`	perform bitwise logical AND of packed double-precision floating-point values
`orpd`	`ORPD`	perform bitwise logical OR of packed double-precision floating-point values
`xorpd`	`XORPD`	perform bitwise logical XOR of packed double-precision floating-point values

3.21.1.4 SSE2 Compare Instructions

The SSE2 compare instructions compare packed and scalar double-precision floating-point values and return the results of the comparison to either the destination operand or to the EFLAGS register.

Table 55 SSE2 Compare Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cmppd`	`CMPPD`	compare packed double-precision floating-point values
`cmpsd`	`CMPSD`	compare scalar double-precision floating-point values
`comisd`	`COMISD`	perform ordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register
`ucomisd`	`UCOMISD`	perform unordered comparison of scalar double-precision floating-point values and set flags in EFLAGS register

3.21.1.5 SSE2 Shuffle and Unpack Instructions

The SSE2 shuffle and unpack instructions operate on packed double-precision floating-point operands.

Table 56 SSE2 Shuffle and Unpack Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`shufpd`	`SHUFPD`	shuffle values in packed double-precision floating-point operands
`unpckhpd`	`UNPCKHPD`	unpack and interleave the high values from two packed double-precision floating-point operands
`unpcklpd`	`UNPCKLPD`	unpack and interleave the low values from two packed double-precision floating-point operands

3.21.1.6 SSE2 Conversion Instructions

The SSE2 conversion instructions convert packed and individual doubleword integers into packed and scalar double-precision floating-point values (and vice versa). These instructions also convert between packed and scalar single-precision and double-precision floating-point values.

Table 57 SSE2 Conversion Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cvtdq2pd`	`CVTDQ2PD`	convert packed doubleword integers to packed double-precision floating-point values
`cvtpd2dq`	`CVTPD2DQ`	convert packed double-precision floating-point values to packed doubleword integers
`cvtpd2pi`	`CVTPD2PI`	convert packed double-precision floating-point values to packed doubleword integers
`cvtpd2ps`	`CVTPD2PS`	convert packed double-precision floating-point values to packed single-precision floating-point values
`cvtpi2pd`	`CVTPI2PD`	convert packed doubleword integers to packed double-precision floating-point values
`cvtps2pd`	`CVTPS2PD`	convert packed single-precision floating-point values to packed double-precision floating-point values
`cvtsd2si`	`CVTSD2SI`	convert scalar double-precision floating-point values to a doubleword integer
`cvtsd2ss`	`CVTSD2SS`	convert scalar double-precision floating-point values to scalar single-precision floating-point values
`cvtsi2sd`	`CVTSI2SD`	convert doubleword integer to scalar double-precision floating-point value
`cvtss2sd`	`CVTSS2SD`	convert scalar single-precision floating-point values to scalar double-precision floating-point values
`cvttpd2dq`	`CVTTPD2DQ`	convert with truncation packed double-precision floating-point values to packed doubleword integers
`cvttpd2pi`	`CVTTPD2PI`	convert with truncation packed double-precision floating-point values to packed doubleword integers
`cvttsd2si`	`CVTTSD2SI`	convert with truncation scalar double-precision floating-point values to scalar doubleword integers

3.21.2 SSE2 Packed Single-Precision Floating-Point Instructions

The SSE2 packed single-precision floating-point instructions operate on single-precision floating-point and integer operands.

Table 58 SSE2 Packed Single-Precision Floating-Point Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cvtdq2ps`	`CVTDQ2PS`	convert packed doubleword integers to packed single-precision floating-point values
`cvtps2dq`	`CVTPS2DQ`	convert packed single-precision floating-point values to packed doubleword integers
`cvttps2dq`	`CVTTPS2DQ`	convert with truncation packed single-precision floating-point values to packed doubleword integers

3.21.3 SSE2 128-Bit SIMD Integer Instructions

The SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords contained in XMM and MMX registers.

Table 59 SSE2 128-Bit SIMD Integer Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description
`movdq2q`	`MOVDQ2Q`	move quadword integer from XMM to MMX registers
`movdqa`	`MOVDQA`	move aligned double quadword
`movdqu`	`MOVDQU`	move unaligned double quadword
`movq2dq`	`MOVQ2DQ`	move quadword integer from MMX to XMM registers
`paddq`	`PADDQ`	add packed quadword integers
`pmuludq`	`PMULUDQ`	multiply packed unsigned doubleword integers
`pshufd`	`PSHUFD`	shuffle packed doublewords
`pshufhw`	`PSHUFHW`	shuffle packed high words
`pshuflw`	`PSHUFLW`	shuffle packed low words
`pslldq`	`PSLLDQ`	shift double quadword left logical
`psrldq`	`PSRLDQ`	shift double quadword right logical
`psubq`	`PSUBQ`	subtract packed quadword integers
`punpckhqdq`	`PUNPCKHQDQ`	unpack high quadwords
`punpcklqdq`	`PUNPCKLQDQ`	unpack low quadwords

3.21.4 SSE2 Miscellaneous Instructions

The SSE2 instructions described below provide additional functionality for caching non-temporal data when storing data from XMM registers to memory, and provide additional control of instruction ordering on store operations.

Table 60 SSE2 Miscellaneous Instructions

Oracle Solaris Mnemonic	Intel/AMD Mnemonic	Description	Notes
`clflush`	`CLFLUSH`	flushes and invalidates a memory operand and its associated cache line from all levels of the processor's cache hierarchy
`lfence`	`LFENCE`	serializes load operations
`maskmovdqu`	`MASKMOVDQU`	non-temporal store of selected bytes from an XMM register into memory
`mfence`	`MFENCE`	serializes load and store operations
`movntdq`	`MOVNTDQ`	non-temporal store of double quadword from an XMM register into memory
`movnti`	`MOVNTI`	non-temporal store of a doubleword from a general-purpose register into memory	`movntiq` valid only under `–m64`
`movntpd`	`MOVNTPD`	non-temporal store of two packed double-precision floating-point values from an XMM register into memory
`pause`	`PAUSE`	improves the performance of spin-wait loops