Go to main content

x86 Assembly Language Reference Manual

Exit Print View

Updated: November 2020
 
 

3.25 SSE Instructions

SSE instructions are an extension of the SIMD execution model introduced with the MMX technology. SSE instructions are divided into four subgroups:

  • SIMD single-precision floating-point instructions that operate on the XMM registers

  • MXSCR state management instructions

  • 64-bit SIMD integer instructions that operate on the MMX registers

  • Instructions that provide cache control, prefetch, and instruction ordering functionality

3.25.1 SIMD Single-Precision Floating-Point Instructions (SSE)

The SSE SIMD instructions operate on packed and scalar single-precision floating-point values located in the XMM registers or memory.

3.25.1.1 Data Transfer Instructions (SSE)

The SSE data transfer instructions move packed and scalar single-precision floating-point operands between XMM registers and between XMM registers and memory.

Table 48  Data Transfer Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
movaps
MOVAPS
move four aligned packed single-precision floating-point values between XMM registers or memory
movhlps
MOVHLPS
move two packed single-precision floating-point values from the high quadword of an XMM register to the low quadword of another XMM register
movhps
MOVHPS
move two packed single-precision floating-point values to or from the high quadword of an XMM register or memory
movlhps
MOVLHPS
move two packed single-precision floating-point values from the low quadword of an XMM register to the high quadword of another XMM register
movlps
MOVLPS
move two packed single-precision floating-point values to or from the low quadword of an XMM register or memory
movmskps
MOVMSKPS
extract sign mask from four packed single-precision floating-point values
movss
MOVSS
move scalar single-precision floating-point value between XMM registers or memory
movups
MOVUPS
move four unaligned packed single-precision floating-point values between XMM registers or memory

3.25.1.2 Packed Arithmetic Instructions (SSE)

SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and scalar single-precision floating-point operands.

Table 49  Packed Arithmetic Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
addps
ADDPS
add packed single-precision floating-point values
addss
ADDSS
add scalar single-precision floating-point values
divps
DIVPS
divide packed single-precision floating-point values
divss
DIVSS
divide scalar single-precision floating-point values
maxps
MAXPS
return maximum packed single-precision floating-point values
maxss
MAXSS
return maximum scalar single-precision floating-point values
minps
MINPS
return minimum packed single-precision floating-point values
minss
MINSS
return minimum scalar single-precision floating-point values.
mulps
MULPS
multiply packed single-precision floating-point values
mulss
MULSS
multiply scalar single-precision floating-point values
rcpps
RCPPS
compute reciprocals of packed single-precision floating-point values
rcpss
RCPSS
compute reciprocal of scalar single-precision floating-point values
rsqrtps
RSQRTPS
compute reciprocals of square roots of packed single-precision floating-point values
rsqrtss
RSQRTSS
compute reciprocal of square root of scalar single-precision floating-point values
sqrtps
SQRTPS
compute square roots of packed single-precision floating-point values
sqrtss
SQRTSS
compute square root of scalar single-precision floating-point values
subps
SUBPS
subtract packed single-precision floating-point values
subss
SUBSS
subtract scalar single-precision floating-point values

3.25.1.3 Comparison Instructions (SSE)

The SEE compare instructions compare packed and scalar single-precision floating-point operands.

Table 50  Comparison Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
cmpps
CMPPS
compare packed single-precision floating-point values
cmpss
CMPSS
compare scalar single-precision floating-point values
comiss
COMISS
perform ordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register
ucomiss
UCOMISS
perform unordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register

3.25.1.4 Logical Instructions (SSE)

The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations on packed single-precision floating-point operands.

Table 51  Logical Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
andnps
ANDNPS
perform bitwise logical AND NOT of packed single-precision floating-point values
andps
ANDPS
perform bitwise logical AND of packed single-precision floating-point values
orps
ORPS
perform bitwise logical OR of packed single-precision floating-point values
xorps
XORPS
perform bitwise logical XOR of packed single-precision floating-point values

3.25.1.5 Shuffle and Unpack Instructions (SSE)

The SSE shuffle and unpack instructions shuffle or interleave single-precision floating-point values in packed single-precision floating-point operands.

Table 52  Shuffle and Unpack Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
shufps
SHUFPS
shuffles values in packed single-precision floating-point operands
unpckhps
UNPCKHPS
unpacks and interleaves the two high-order values from two single-precision floating-point operands
unpcklps
UNPCKLPS
unpacks and interleaves the two low-order values from two single-precision floating-point operands

3.25.1.6 Conversion Instructions (SSE)

The SSE conversion instructions convert packed and individual doubleword integers into packed and scalar single-precision floating-point values.

Table 53  Conversion Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
cvtpi2ps
CVTPI2PS
convert packed doubleword integers to packed single-precision floating-point values
cvtps2pi
CVTPS2PI
convert packed single-precision floating-point values to packed doubleword integers
cvtsi2ss
CVTSI2SS
convert doubleword integer to scalar single-precision floating-point value
cvtss2si
CVTSS2SI
convert scalar single-precision floating-point value to a doubleword integer
cvttps2pi
CVTTPS2PI
convert with truncation packed single-precision floating-point values to packed doubleword integers
cvttss2si
CVTTSS2SI
convert with truncation scalar single-precision floating-point value to scalar doubleword integer

3.25.2 MXCSR State Management Instructions (SSE)

The MXCSR state management instructions save and restore the state of the MXCSR control and status register.

Table 54  MXCSR State Management Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
ldmxcsr
LDMXCSR
load %mxcsr register
stmxcsr
STMXCSR
save %mxcsr register state

3.25.3 64-Bit SIMD Integer Instructions (SSE)

The SSE 64-bit SIMD integer instructions perform operations on packed bytes, words, or doublewords in MMX registers.

Table 55  64-Bit SIMD Integer Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
pavgb
PAVGB
compute average of packed unsigned byte integers
pavgw
PAVGW
compute average of packed unsigned byte integers
pextrw
PEXTRW
extract word
pinsrw
PINSRW
insert word
pmaxsw
PMAXSW
maximum of packed signed word integers
pmaxub
PMAXUB
maximum of packed unsigned byte integers
pminsw
PMINSW
minimum of packed signed word integers
pminub
PMINUB
minimum of packed unsigned byte integers
pmovmskb
PMOVMSKB
move byte mask
pmulhuw
PMULHUW
multiply packed unsigned integers and store high result
psadbw
PSADBW
compute sum of absolute differences
pshufw
PSHUFW
shuffle packed integer word in MMX register

3.25.4 Miscellaneous Instructions (SSE)

The following instructions control caching, prefetching, and instruction ordering.

Table 56  Miscellaneous Instructions (SSE)
Oracle Solaris Mnemonic
Intel/AMD Mnemonic
Description
Notes
maskmovq
MASKMOVQ
non-temporal store of selected bytes from an MMX register into memory
movntps
MOVNTPS
non-temporal store of four packed single-precision floating-point values from an XMM register into memory
movntq
MOVNTQ
non-temporal store of quadword from an MMX register into memory
prefetchnta
PREFETCHNTA
prefetch data into non-temporal cache structure and into a location close to the processor
prefetcht0
PREFETCHT0
prefetch data into all levels of the cache hierarchy
prefetcht1
PREFETCHT1
prefetch data into level 2 cache and higher
prefetcht2
PREFETCHT2
prefetch data into level 2 cache and higher
sfence
SFENCE
serialize store operations