x86 Assembly Language Reference Manual

SSE Instructions

SSE instructions are an extension of the SIMD execution model introduced with the MMX technology. SSE instructions are divided into four subgroups:

SIMD Single-Precision Floating-Point Instructions (SSE)

The SSE SIMD instructions operate on packed and scalar single-precision floating-point values located in the XMM registers or memory.

Data Transfer Instructions (SSE)

The SSE data transfer instructions move packed and scalar single-precision floating-point operands between XMM registers and between XMM registers and memory.

Table 3–27 Data Transfer Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

movaps

MOVAPS

move four aligned packed single-precision floating-point values between XMM registers or memory 

 

movhlps

MOVHLPS

move two packed single-precision floating-point values from the high quadword of an XMM register to the low quadword of another XMM register 

 

movhps

MOVHPS

move two packed single-precision floating-point values to or from the high quadword of an XMM register or memory 

 

movlhps

MOVLHPS

move two packed single-precision floating-point values from the low quadword of an XMM register to the high quadword of another XMM register 

 

movlps

MOVLPS

move two packed single-precision floating-point values to or from the low quadword of an XMM register or memory 

 

movmskps

MOVMSKPS

extract sign mask from four packed single-precision floating-point values 

 

movss

MOVSS

move scalar single-precision floating-point value between XMM registers or memory 

 

movups

MOVUPS

move four unaligned packed single-precision floating-point values between XMM registers or memory 

 

Packed Arithmetic Instructions (SSE)

SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and scalar single-precision floating-point operands.

Table 3–28 Packed Arithmetic Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

addps

ADDPS

add packed single-precision floating-point values 

 

addss

ADDSS

add scalar single-precision floating-point values 

 

divps

DIVPS

divide packed single-precision floating-point values 

 

divss

DIVSS

divide scalar single-precision floating-point values 

 

maxps

MAXPS

return maximum packed single-precision floating-point values 

 

maxss

MAXSS

return maximum scalar single-precision floating-point values 

 

minps

MINPS

return minimum packed single-precision floating-point values 

 

minss

MINSS

return minimum scalar single-precision floating-point values. 

 

mulps

MULPS

multiply packed single-precision floating-point values 

 

mulss

MULSS

multiply scalar single-precision floating-point values 

 

rcpps

RCPPS

compute reciprocals of packed single-precision floating-point values 

 

rcpss

RCPSS

compute reciprocal of scalar single-precision floating-point values 

 

rsqrtps

RSQRTPS

compute reciprocals of square roots of packed single-precision floating-point values 

 

rsqrtss

RSQRTSS

compute reciprocal of square root of scalar single-precision floating-point values 

 

sqrtps

SQRTPS

compute square roots of packed single-precision floating-point values 

 

sqrtss

SQRTSS

compute square root of scalar single-precision floating-point values 

 

subps

SUBPS

subtract packed single-precision floating-point values 

 

subss

SUBSS

subtract scalar single-precision floating-point values 

 

Comparison Instructions (SSE)

The SEE compare instructions compare packed and scalar single-precision floating-point operands.

Table 3–29 Comparison Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

cmpps

CMPPS

compare packed single-precision floating-point values 

 

cmpss

CMPSS

compare scalar single-precision floating-point values 

 

comiss

COMISS

perform ordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register 

 

ucomiss

UCOMISS

perform unordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register 

 

Logical Instructions (SSE)

The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations on packed single-precision floating-point operands.

Table 3–30 Logical Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

andnps

ANDNPS

perform bitwise logical AND NOT of packed single-precision floating-point values 

 

andps

ANDPS

perform bitwise logical AND of packed single-precision floating-point values 

 

orps

ORPS

perform bitwise logical OR of packed single-precision floating-point values 

 

xorps

XORPS

perform bitwise logical XOR of packed single-precision floating-point values 

 

Shuffle and Unpack Instructions (SSE)

The SSE shuffle and unpack instructions shuffle or interleave single-precision floating-point values in packed single-precision floating-point operands.

Table 3–31 Shuffle and Unpack Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

shufps

SHUFPS

shuffles values in packed single-precision floating-point operands 

 

unpckhps

UNPCKHPS

unpacks and interleaves the two high-order values from two single-precision floating-point operands 

 

unpcklps

UNPCKLPS

unpacks and interleaves the two low-order values from two single-precision floating-point operands 

 

Conversion Instructions (SSE)

The SSE conversion instructions convert packed and individual doubleword integers into packed and scalar single-precision floating-point values.

Table 3–32 Conversion Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

cvtpi2ps

CVTPI2PS

convert packed doubleword integers to packed single-precision floating-point values 

 

cvtps2pi

CVTPS2PI

convert packed single-precision floating-point values to packed doubleword integers 

 

cvtsi2ss

CVTSI2SS

convert doubleword integer to scalar single-precision floating-point value 

 

cvtss2si

CVTSS2SI

convert scalar single-precision floating-point value to a doubleword integer 

 

cvttps2pi

CVTTPS2PI

convert with truncation packed single-precision floating-point values to packed doubleword integers 

 

cvttss2si

CVTTSS2SI

convert with truncation scalar single-precision floating-point value to scalar doubleword integer 

 

MXCSR State Management Instructions (SSE)

The MXCSR state management instructions save and restore the state of the MXCSR control and status register.

Table 3–33 MXCSR State Management Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

ldmxcsr

LDMXCSR

load %mxcsr register

 

stmxcsr

STMXCSR

save %mxcsr register state

 

64–Bit SIMD Integer Instructions (SSE)

The SSE 64–bit SIMD integer instructions perform operations on packed bytes, words, or doublewords in MMX registers.

Table 3–34 64–Bit SIMD Integer Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

pavgb

PAVGB

compute average of packed unsigned byte integers 

 

pavgw

PAVGW

compute average of packed unsigned byte integers 

 

pextrw

PEXTRW

extract word 

 

pinsrw

PINSRW

insert word 

 

pmaxsw

PMAXSW

maximum of packed signed word integers 

 

pmaxub

PMAXUB

maximum of packed unsigned byte integers 

 

pminsw

PMINSW

minimum of packed signed word integers 

 

pminub

PMINUB

minimum of packed unsigned byte integers 

 

pmovmskb

PMOVMSKB

move byte mask 

 

pmulhuw

PMULHUW

multiply packed unsigned integers and store high result 

 

psadbw

PSADBW

compute sum of absolute differences 

 

pshufw

PSHUFW

shuffle packed integer word in MMX register 

 

Miscellaneous Instructions (SSE)

The following instructions control caching, prefetching, and instruction ordering.

Table 3–35 Miscellaneous Instructions (SSE)

Solaris Mnemonic 

Intel/AMD Mnemonic 

Description 

Notes 

maskmovq

MASKMOVQ

non-temporal store of selected bytes from an MMX register into memory 

 

movntps

MOVNTPS

non-temporal store of four packed single-precision floating-point values from an XMM register into memory 

 

movntq

MOVNTQ

non-temporal store of quadword from an MMX register into memory 

 

prefetchnta

PREFETCHNTA

prefetch data into non-temporal cache structure and into a location close to the processor 

 

prefetcht0

PREFETCHT0

prefetch data into all levels of the cache hierarchy 

 

prefetcht1

PREFETCHT1

prefetch data into level 2 cache and higher 

 

prefetcht2

PREFETCHT2

prefetch data into level 2 cache and higher 

 

sfence

SFENCE

serialize store operations