SSE Instructions
SSE instructions are an extension of the SIMD execution model introduced with
the MMX technology. SSE instructions are divided into four subgroups:

SIMD singleprecision floatingpoint instructions that operate on the XMM registers

MXSCR state management instructions

64–bit SIMD integer instructions that operate on the MMX registers

Instructions that provide cache control, prefetch, and instruction ordering functionality
SIMD SinglePrecision FloatingPoint Instructions (SSE)
The SSE SIMD instructions operate on packed and scalar singleprecision floatingpoint values
located in the XMM registers or memory.
Data Transfer Instructions (SSE)
The SSE data transfer instructions move packed and scalar singleprecision floatingpoint operands between
XMM registers and between XMM registers and memory.
Table 327 Data Transfer Instructions (SSE)





MOVAPS 
move four aligned packed
singleprecision floatingpoint values between XMM registers or memory 


MOVHLPS 
move two packed singleprecision floatingpoint values
from the high quadword of an XMM register to the low quadword of
another XMM register 


MOVHPS 
move two packed singleprecision floatingpoint values to or from the
high quadword of an XMM register or memory 


MOVLHPS 
move two packed singleprecision floatingpoint values
from the low quadword of an XMM register to the high quadword of
another XMM register 


MOVLPS 
move two packed singleprecision floatingpoint values to or from the
low quadword of an XMM register or memory 


MOVMSKPS 
extract sign mask from four packed
singleprecision floatingpoint values 


MOVSS 
move scalar singleprecision floatingpoint value between XMM registers or memory 


MOVUPS 
move
four unaligned packed singleprecision floatingpoint values between XMM registers or memory 


Packed Arithmetic Instructions (SSE)
SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and
scalar singleprecision floatingpoint operands.
Table 328 Packed Arithmetic Instructions (SSE)





ADDPS 
add packed singleprecision floatingpoint values 


ADDSS 
add scalar singleprecision floatingpoint
values 


DIVPS 
divide packed singleprecision floatingpoint values 


DIVSS 
divide scalar singleprecision floatingpoint values 


MAXPS 
return maximum packed singleprecision
floatingpoint values 


MAXSS 
return maximum scalar singleprecision floatingpoint values 


MINPS 
return minimum packed singleprecision floatingpoint values 


MINSS 
return
minimum scalar singleprecision floatingpoint values. 


MULPS 
multiply packed singleprecision floatingpoint values 


MULSS 
multiply scalar singleprecision floatingpoint
values 


RCPPS 
compute reciprocals of packed singleprecision floatingpoint values 


RCPSS 
compute reciprocal of scalar singleprecision floatingpoint
values 


RSQRTPS 
compute reciprocals of square roots of packed singleprecision floatingpoint values 


RSQRTSS 
compute reciprocal of square
root of scalar singleprecision floatingpoint values 


SQRTPS 
compute square roots of packed singleprecision floatingpoint values 


SQRTSS 
compute
square root of scalar singleprecision floatingpoint values 


SUBPS 
subtract packed singleprecision floatingpoint values 


SUBSS 
subtract scalar
singleprecision floatingpoint values 


Comparison Instructions (SSE)
The SEE compare instructions compare packed and scalar singleprecision floatingpoint operands.
Table 329 Comparison Instructions (SSE)





CMPPS 
compare
packed singleprecision floatingpoint values 


CMPSS 
compare scalar singleprecision floatingpoint values 


COMISS 
perform ordered comparison of scalar
singleprecision floatingpoint values and set flags in EFLAGS register 


UCOMISS 
perform unordered comparison of scalar
singleprecision floatingpoint values and set flags in EFLAGS register 


Logical Instructions (SSE)
The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations
on packed singleprecision floatingpoint operands.
Table 330 Logical Instructions (SSE)





ANDNPS 
perform bitwise logical AND NOT of
packed singleprecision floatingpoint values 


ANDPS 
perform bitwise logical AND of packed singleprecision floatingpoint values 


ORPS 
perform
bitwise logical OR of packed singleprecision floatingpoint values 


XORPS 
perform bitwise logical XOR of
packed singleprecision floatingpoint values 


Shuffle and Unpack Instructions (SSE)
The SSE shuffle and unpack instructions shuffle or interleave singleprecision floatingpoint values in
packed singleprecision floatingpoint operands.
Table 331 Shuffle and Unpack Instructions (SSE)





SHUFPS 
shuffles values in packed singleprecision floatingpoint operands 


UNPCKHPS 
unpacks and
interleaves the two highorder values from two singleprecision floatingpoint operands 


UNPCKLPS 
unpacks and interleaves the
two loworder values from two singleprecision floatingpoint operands 


Conversion Instructions (SSE)
The SSE conversion instructions convert packed and individual doubleword integers into packed and
scalar singleprecision floatingpoint values.
Table 332 Conversion Instructions (SSE)





CVTPI2PS 
convert packed doubleword integers to packed singleprecision floatingpoint values 


CVTPS2PI 
convert
packed singleprecision floatingpoint values to packed doubleword integers 


CVTSI2SS 
convert doubleword integer to scalar
singleprecision floatingpoint value 


CVTSS2SI 
convert scalar singleprecision floatingpoint value to a doubleword integer 


CVTTPS2PI 
convert with
truncation packed singleprecision floatingpoint values to packed doubleword integers 


CVTTSS2SI 
convert with truncation scalar singleprecision
floatingpoint value to scalar doubleword integer 


MXCSR State Management Instructions (SSE)
The MXCSR state management instructions save and restore the state of the MXCSR
control and status register.
Table 333 MXCSR State Management Instructions (SSE)





LDMXCSR 
load %mxcsr register 


STMXCSR 
save %mxcsr register state 


64–Bit SIMD Integer Instructions (SSE)
The SSE 64–bit SIMD integer instructions perform operations on packed bytes, words, or
doublewords in MMX registers.
Table 334 64–Bit SIMD Integer Instructions (SSE)





PAVGB 
compute average of packed unsigned byte integers 


PAVGW 
compute average
of packed unsigned byte integers 


PEXTRW 
extract word 


PINSRW 
insert word 


PMAXSW 
maximum of packed signed word integers 


PMAXUB 
maximum
of packed unsigned byte integers 


PMINSW 
minimum of packed signed word integers 


PMINUB 
minimum of packed
unsigned byte integers 


PMOVMSKB 
move byte mask 


PMULHUW 
multiply packed unsigned integers and store high result 


PSADBW 
compute
sum of absolute differences 


PSHUFW 
shuffle packed integer word in MMX register 


Miscellaneous Instructions (SSE)
The following instructions control caching, prefetching, and instruction ordering.
Table 335 Miscellaneous Instructions (SSE)





MASKMOVQ 
nontemporal store of
selected bytes from an MMX register into memory 


MOVNTPS 
nontemporal store of four packed
singleprecision floatingpoint values from an XMM register into memory 


MOVNTQ 
nontemporal store of quadword from
an MMX register into memory 


PREFETCHNTA 
prefetch data into nontemporal cache structure and into
a location close to the processor 


PREFETCHT0 
prefetch data into all levels of the cache
hierarchy 


PREFETCHT1 
prefetch data into level 2 cache and higher 


PREFETCHT2 
prefetch data into level 2
cache and higher 


SFENCE 
serialize store operations 

