x86 Assembly Language Reference Manual

SSE Instructions

SSE instructions are an extension of the SIMD execution model introduced with the MMX technology. SSE instructions are divided into four subgroups:

SIMD single-precision floating-point instructions that operate on the XMM registers
MXSCR state management instructions
64–bit SIMD integer instructions that operate on the MMX registers
Instructions that provide cache control, prefetch, and instruction ordering functionality

SIMD Single-Precision Floating-Point Instructions (SSE)

The SSE SIMD instructions operate on packed and scalar single-precision floating-point values located in the XMM registers or memory.

Data Transfer Instructions (SSE)

The SSE data transfer instructions move packed and scalar single-precision floating-point operands between XMM registers and between XMM registers and memory.

Table 3–27 Data Transfer Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`movaps`	`MOVAPS`	move four aligned packed single-precision floating-point values between XMM registers or memory
`movhlps`	`MOVHLPS`	move two packed single-precision floating-point values from the high quadword of an XMM register to the low quadword of another XMM register
`movhps`	`MOVHPS`	move two packed single-precision floating-point values to or from the high quadword of an XMM register or memory
`movlhps`	`MOVLHPS`	move two packed single-precision floating-point values from the low quadword of an XMM register to the high quadword of another XMM register
`movlps`	`MOVLPS`	move two packed single-precision floating-point values to or from the low quadword of an XMM register or memory
`movmskps`	`MOVMSKPS`	extract sign mask from four packed single-precision floating-point values
`movss`	`MOVSS`	move scalar single-precision floating-point value between XMM registers or memory
`movups`	`MOVUPS`	move four unaligned packed single-precision floating-point values between XMM registers or memory

Packed Arithmetic Instructions (SSE)

SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and scalar single-precision floating-point operands.

Table 3–28 Packed Arithmetic Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`addps`	`ADDPS`	add packed single-precision floating-point values
`addss`	`ADDSS`	add scalar single-precision floating-point values
`divps`	`DIVPS`	divide packed single-precision floating-point values
`divss`	`DIVSS`	divide scalar single-precision floating-point values
`maxps`	`MAXPS`	return maximum packed single-precision floating-point values
`maxss`	`MAXSS`	return maximum scalar single-precision floating-point values
`minps`	`MINPS`	return minimum packed single-precision floating-point values
`minss`	`MINSS`	return minimum scalar single-precision floating-point values.
`mulps`	`MULPS`	multiply packed single-precision floating-point values
`mulss`	`MULSS`	multiply scalar single-precision floating-point values
`rcpps`	`RCPPS`	compute reciprocals of packed single-precision floating-point values
`rcpss`	`RCPSS`	compute reciprocal of scalar single-precision floating-point values
`rsqrtps`	`RSQRTPS`	compute reciprocals of square roots of packed single-precision floating-point values
`rsqrtss`	`RSQRTSS`	compute reciprocal of square root of scalar single-precision floating-point values
`sqrtps`	`SQRTPS`	compute square roots of packed single-precision floating-point values
`sqrtss`	`SQRTSS`	compute square root of scalar single-precision floating-point values
`subps`	`SUBPS`	subtract packed single-precision floating-point values
`subss`	`SUBSS`	subtract scalar single-precision floating-point values

Comparison Instructions (SSE)

The SEE compare instructions compare packed and scalar single-precision floating-point operands.

Table 3–29 Comparison Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cmpps`	`CMPPS`	compare packed single-precision floating-point values
`cmpss`	`CMPSS`	compare scalar single-precision floating-point values
`comiss`	`COMISS`	perform ordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register
`ucomiss`	`UCOMISS`	perform unordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register

Logical Instructions (SSE)

The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations on packed single-precision floating-point operands.

Table 3–30 Logical Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`andnps`	`ANDNPS`	perform bitwise logical AND NOT of packed single-precision floating-point values
`andps`	`ANDPS`	perform bitwise logical AND of packed single-precision floating-point values
`orps`	`ORPS`	perform bitwise logical OR of packed single-precision floating-point values
`xorps`	`XORPS`	perform bitwise logical XOR of packed single-precision floating-point values

Shuffle and Unpack Instructions (SSE)

The SSE shuffle and unpack instructions shuffle or interleave single-precision floating-point values in packed single-precision floating-point operands.

Table 3–31 Shuffle and Unpack Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`shufps`	`SHUFPS`	shuffles values in packed single-precision floating-point operands
`unpckhps`	`UNPCKHPS`	unpacks and interleaves the two high-order values from two single-precision floating-point operands
`unpcklps`	`UNPCKLPS`	unpacks and interleaves the two low-order values from two single-precision floating-point operands

Conversion Instructions (SSE)

The SSE conversion instructions convert packed and individual doubleword integers into packed and scalar single-precision floating-point values.

Table 3–32 Conversion Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`cvtpi2ps`	`CVTPI2PS`	convert packed doubleword integers to packed single-precision floating-point values
`cvtps2pi`	`CVTPS2PI`	convert packed single-precision floating-point values to packed doubleword integers
`cvtsi2ss`	`CVTSI2SS`	convert doubleword integer to scalar single-precision floating-point value
`cvtss2si`	`CVTSS2SI`	convert scalar single-precision floating-point value to a doubleword integer
`cvttps2pi`	`CVTTPS2PI`	convert with truncation packed single-precision floating-point values to packed doubleword integers
`cvttss2si`	`CVTTSS2SI`	convert with truncation scalar single-precision floating-point value to scalar doubleword integer

MXCSR State Management Instructions (SSE)

The MXCSR state management instructions save and restore the state of the MXCSR control and status register.

Table 3–33 MXCSR State Management Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description	Notes
`ldmxcsr`	`LDMXCSR`	load `%mxcsr` register
`stmxcsr`	`STMXCSR`	save `%mxcsr` register state

64–Bit SIMD Integer Instructions (SSE)

The SSE 64–bit SIMD integer instructions perform operations on packed bytes, words, or doublewords in MMX registers.

Table 3–34 64–Bit SIMD Integer Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`pavgb`	`PAVGB`	compute average of packed unsigned byte integers
`pavgw`	`PAVGW`	compute average of packed unsigned byte integers
`pextrw`	`PEXTRW`	extract word
`pinsrw`	`PINSRW`	insert word
`pmaxsw`	`PMAXSW`	maximum of packed signed word integers
`pmaxub`	`PMAXUB`	maximum of packed unsigned byte integers
`pminsw`	`PMINSW`	minimum of packed signed word integers
`pminub`	`PMINUB`	minimum of packed unsigned byte integers
`pmovmskb`	`PMOVMSKB`	move byte mask
`pmulhuw`	`PMULHUW`	multiply packed unsigned integers and store high result
`psadbw`	`PSADBW`	compute sum of absolute differences
`pshufw`	`PSHUFW`	shuffle packed integer word in MMX register

Miscellaneous Instructions (SSE)

The following instructions control caching, prefetching, and instruction ordering.

Table 3–35 Miscellaneous Instructions (SSE)


Solaris Mnemonic	Intel/AMD Mnemonic	Description
`maskmovq`	`MASKMOVQ`	non-temporal store of selected bytes from an MMX register into memory
`movntps`	`MOVNTPS`	non-temporal store of four packed single-precision floating-point values from an XMM register into memory
`movntq`	`MOVNTQ`	non-temporal store of quadword from an MMX register into memory
`prefetchnta`	`PREFETCHNTA`	prefetch data into non-temporal cache structure and into a location close to the processor
`prefetcht0`	`PREFETCHT0`	prefetch data into all levels of the cache hierarchy
`prefetcht1`	`PREFETCHT1`	prefetch data into level 2 cache and higher
`prefetcht2`	`PREFETCHT2`	prefetch data into level 2 cache and higher
`sfence`	`SFENCE`	serialize store operations