SSE Instructions
SSE instructions are an extension of the SIMD execution model introduced with the
MMX technology. SSE instructions are divided into four subgroups:

SIMD singleprecision floatingpoint instructions that operate on the XMM registers

MXSCR state management instructions

64–bit SIMD integer instructions that operate on the MMX registers

Instructions that provide cache control, prefetch, and instruction ordering functionality
SIMD SinglePrecision FloatingPoint Instructions (SSE)
The SSE SIMD instructions operate on packed and scalar singleprecision floatingpoint values located
in the XMM registers or memory.
Data Transfer Instructions (SSE)
The SSE data transfer instructions move packed and scalar singleprecision floatingpoint operands between
XMM registers and between XMM registers and memory.
Table 327 Data Transfer Instructions (SSE)





MOVAPS 
move four
aligned packed singleprecision floatingpoint values between XMM registers or memory 


MOVHLPS 
move two packed
singleprecision floatingpoint values from the high quadword of an XMM register to the
low quadword of another XMM register 


MOVHPS 
move two packed singleprecision floatingpoint values
to or from the high quadword of an XMM register or memory 


MOVLHPS 
move
two packed singleprecision floatingpoint values from the low quadword of an XMM register
to the high quadword of another XMM register 


MOVLPS 
move two packed singleprecision
floatingpoint values to or from the low quadword of an XMM register or
memory 


MOVMSKPS 
extract sign mask from four packed singleprecision floatingpoint values 


MOVSS 
move scalar
singleprecision floatingpoint value between XMM registers or memory 


MOVUPS 
move four unaligned packed singleprecision
floatingpoint values between XMM registers or memory 


Packed Arithmetic Instructions (SSE)
SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and scalar
singleprecision floatingpoint operands.
Table 328 Packed Arithmetic Instructions (SSE)





ADDPS 
add packed singleprecision floatingpoint values 


ADDSS 
add
scalar singleprecision floatingpoint values 


DIVPS 
divide packed singleprecision floatingpoint values 


DIVSS 
divide scalar singleprecision
floatingpoint values 


MAXPS 
return maximum packed singleprecision floatingpoint values 


MAXSS 
return maximum scalar singleprecision
floatingpoint values 


MINPS 
return minimum packed singleprecision floatingpoint values 


MINSS 
return minimum scalar singleprecision
floatingpoint values. 


MULPS 
multiply packed singleprecision floatingpoint values 


MULSS 
multiply scalar singleprecision floatingpoint values 


RCPPS 
compute reciprocals of packed singleprecision floatingpoint values 


RCPSS 
compute reciprocal of scalar singleprecision
floatingpoint values 


RSQRTPS 
compute reciprocals of square roots of packed singleprecision floatingpoint values 


RSQRTSS 
compute
reciprocal of square root of scalar singleprecision floatingpoint values 


SQRTPS 
compute square roots of
packed singleprecision floatingpoint values 


SQRTSS 
compute square root of scalar singleprecision floatingpoint values 


SUBPS 
subtract packed singleprecision floatingpoint values 


SUBSS 
subtract scalar singleprecision floatingpoint values 


Comparison Instructions (SSE)
The SEE compare instructions compare packed and scalar singleprecision floatingpoint operands.
Table 329 Comparison Instructions (SSE)





CMPPS 
compare packed singleprecision floatingpoint values 


CMPSS 
compare scalar singleprecision floatingpoint values 


COMISS 
perform
ordered comparison of scalar singleprecision floatingpoint values and set flags in EFLAGS register 


UCOMISS 
perform unordered comparison of scalar singleprecision floatingpoint values and set flags in EFLAGS
register 


Logical Instructions (SSE)
The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations
on packed singleprecision floatingpoint operands.
Table 330 Logical Instructions (SSE)





ANDNPS 
perform bitwise logical AND NOT
of packed singleprecision floatingpoint values 


ANDPS 
perform bitwise logical AND of packed singleprecision floatingpoint values 


ORPS 
perform bitwise logical OR of packed singleprecision floatingpoint values 


XORPS 
perform bitwise logical XOR
of packed singleprecision floatingpoint values 


Shuffle and Unpack Instructions (SSE)
The SSE shuffle and unpack instructions shuffle or interleave singleprecision floatingpoint values in packed
singleprecision floatingpoint operands.
Table 331 Shuffle and Unpack Instructions (SSE)





SHUFPS 
shuffles values in packed singleprecision floatingpoint
operands 


UNPCKHPS 
unpacks and interleaves the two highorder values from two singleprecision floatingpoint operands 


UNPCKLPS 
unpacks and interleaves the two loworder values from two singleprecision floatingpoint operands 


Conversion Instructions (SSE)
The SSE conversion instructions convert packed and individual doubleword integers into packed and
scalar singleprecision floatingpoint values.
Table 332 Conversion Instructions (SSE)





CVTPI2PS 
convert packed doubleword integers to packed
singleprecision floatingpoint values 


CVTPS2PI 
convert packed singleprecision floatingpoint values to packed doubleword integers 


CVTSI2SS 
convert doubleword
integer to scalar singleprecision floatingpoint value 


CVTSS2SI 
convert scalar singleprecision floatingpoint value to a
doubleword integer 


CVTTPS2PI 
convert with truncation packed singleprecision floatingpoint values to packed doubleword integers 


CVTTSS2SI 
convert
with truncation scalar singleprecision floatingpoint value to scalar doubleword integer 


MXCSR State Management Instructions (SSE)
The MXCSR state management instructions save and restore the state of the MXCSR
control and status register.
Table 333 MXCSR State Management Instructions (SSE)





LDMXCSR 
load %mxcsr register 


STMXCSR 
save %mxcsr
register state 


64–Bit SIMD Integer Instructions (SSE)
The SSE 64–bit SIMD integer instructions perform operations on packed bytes, words, or
doublewords in MMX registers.
Table 334 64–Bit SIMD Integer Instructions (SSE)





PAVGB 
compute average of packed unsigned byte
integers 


PAVGW 
compute average of packed unsigned byte integers 


PEXTRW 
extract word 


PINSRW 
insert word 


PMAXSW 
maximum of packed signed word integers 


PMAXUB 
maximum of packed unsigned byte integers 


PMINSW 
minimum of packed signed word integers 


PMINUB 
minimum of packed unsigned byte integers 


PMOVMSKB 
move byte mask 


PMULHUW 
multiply packed unsigned integers and store high result 


PSADBW 
compute
sum of absolute differences 


PSHUFW 
shuffle packed integer word in MMX register 


Miscellaneous Instructions (SSE)
The following instructions control caching, prefetching, and instruction ordering.
Table 335 Miscellaneous Instructions (SSE)





MASKMOVQ 
nontemporal
store of selected bytes from an MMX register into memory 


MOVNTPS 
nontemporal store
of four packed singleprecision floatingpoint values from an XMM register into memory 


MOVNTQ 
nontemporal
store of quadword from an MMX register into memory 


PREFETCHNTA 
prefetch data into
nontemporal cache structure and into a location close to the processor 


PREFETCHT0 
prefetch data
into all levels of the cache hierarchy 


PREFETCHT1 
prefetch data into level 2
cache and higher 


PREFETCHT2 
prefetch data into level 2 cache and higher 


SFENCE 
serialize
store operations 

