3.25 SSE Instructions
SSE instructions are an extension of the SIMD execution model introduced
with the MMX technology. SSE instructions are divided into four subgroups:
SIMD single-precision floating-point instructions that operate
on the XMM registers
MXSCR state management instructions
64-bit SIMD integer instructions that operate on the
MMX registers
Instructions that provide cache control, prefetch, and instruction
ordering functionality
3.25.1 SIMD Single-Precision Floating-Point Instructions (SSE)
The SSE SIMD instructions operate on packed and scalar single-precision
floating-point values located in the XMM registers or memory.
3.25.1.1 Data Transfer Instructions (SSE)
The SSE data transfer instructions
move packed and scalar single-precision floating-point operands between XMM
registers and between XMM registers and memory.
Table 48 Data Transfer Instructions (SSE)
| | | |
| MOVAPS
| move four aligned packed single-precision floating-point values between
XMM registers or memory
|
|
| MOVHLPS
| move two packed single-precision floating-point values from the high
quadword of an XMM register to the low quadword of another XMM register
|
|
| MOVHPS
| move two packed single-precision floating-point values to or from the
high quadword of an XMM register or memory
|
|
| MOVLHPS
| move two packed single-precision floating-point values from the low
quadword of an XMM register to the high quadword of another XMM register
|
|
| MOVLPS
| move two packed single-precision floating-point values to or from the
low quadword of an XMM register or memory
|
|
| MOVMSKPS
| extract sign mask from four packed single-precision floating-point values
|
|
| MOVSS
| move scalar single-precision floating-point value between XMM registers
or memory
|
|
| MOVUPS
| move four unaligned packed single-precision floating-point values between
XMM registers or memory
|
|
|
3.25.1.2 Packed Arithmetic Instructions (SSE)
SSE packed arithmetic
instructions perform packed and scalar arithmetic operations on packed and
scalar single-precision floating-point operands.
Table 49 Packed Arithmetic Instructions (SSE)
| | | |
| ADDPS
| add packed single-precision floating-point values
|
|
| ADDSS
| add scalar single-precision floating-point values
|
|
| DIVPS
| divide packed single-precision floating-point values
|
|
| DIVSS
| divide scalar single-precision floating-point values
|
|
| MAXPS
| return maximum packed single-precision floating-point values
|
|
| MAXSS
| return maximum scalar single-precision floating-point values
|
|
| MINPS
| return minimum packed single-precision floating-point values
|
|
| MINSS
| return minimum scalar single-precision floating-point values.
|
|
| MULPS
| multiply packed single-precision floating-point values
|
|
| MULSS
| multiply scalar single-precision floating-point values
|
|
| RCPPS
| compute reciprocals of packed single-precision floating-point values
|
|
| RCPSS
| compute reciprocal of scalar single-precision floating-point values
|
|
| RSQRTPS
| compute reciprocals of square roots of packed single-precision floating-point
values
|
|
| RSQRTSS
| compute reciprocal of square root of scalar single-precision floating-point
values
|
|
| SQRTPS
| compute square roots of packed single-precision floating-point values
|
|
| SQRTSS
| compute square root of scalar single-precision floating-point values
|
|
| SUBPS
| subtract packed single-precision floating-point values
|
|
| SUBSS
| subtract scalar single-precision floating-point values
|
|
|
3.25.1.3 Comparison Instructions (SSE)
The SEE compare instructions compare
packed and scalar single-precision floating-point operands.
Table 50 Comparison Instructions (SSE)
| | | |
| CMPPS
| compare packed single-precision floating-point values
|
|
| CMPSS
| compare scalar single-precision floating-point values
|
|
| COMISS
| perform ordered comparison of scalar single-precision floating-point
values and set flags in EFLAGS register
|
|
| UCOMISS
| perform unordered comparison of scalar single-precision floating-point
values and set flags in EFLAGS register
|
|
|
3.25.1.4 Logical Instructions (SSE)
The SSE logical instructions perform
bitwise AND, AND NOT, OR, and XOR operations on packed single-precision floating-point
operands.
Table 51 Logical Instructions (SSE)
| | | |
| ANDNPS
| perform bitwise logical AND NOT of packed single-precision floating-point
values
|
|
| ANDPS
| perform bitwise logical AND of packed single-precision floating-point
values
|
|
| ORPS
| perform bitwise logical OR of packed single-precision floating-point
values
|
|
| XORPS
| perform bitwise logical XOR of packed single-precision floating-point
values
|
|
|
3.25.1.5 Shuffle and Unpack Instructions (SSE)
The
SSE shuffle and unpack instructions shuffle or interleave single-precision
floating-point values in packed single-precision floating-point operands.
Table 52 Shuffle and Unpack Instructions (SSE)
| | | |
| SHUFPS
| shuffles values in packed single-precision floating-point operands
|
|
| UNPCKHPS
| unpacks and interleaves the two high-order values from two single-precision
floating-point operands
|
|
| UNPCKLPS
| unpacks and interleaves the two low-order values from two single-precision
floating-point operands
|
|
|
3.25.1.6 Conversion Instructions (SSE)
The SSE conversion instructions
convert packed and individual doubleword integers into packed and scalar single-precision
floating-point values.
Table 53 Conversion Instructions (SSE)
| | | |
| CVTPI2PS
| convert packed doubleword integers to packed single-precision floating-point
values
|
|
| CVTPS2PI
| convert packed single-precision floating-point values to packed doubleword
integers
|
|
| CVTSI2SS
| convert doubleword integer to scalar single-precision floating-point
value
|
|
| CVTSS2SI
| convert scalar single-precision floating-point value to a doubleword
integer
|
|
| CVTTPS2PI
| convert with truncation packed single-precision floating-point values
to packed doubleword integers
|
|
| CVTTSS2SI
| convert with truncation scalar single-precision floating-point value
to scalar doubleword integer
|
|
|
3.25.2 MXCSR State Management Instructions (SSE)
The MXCSR state management
instructions save and restore the state of the MXCSR control and status register.
Table 54 MXCSR State Management Instructions
(SSE)
| | | |
| LDMXCSR
| load %mxcsr register
|
|
| STMXCSR
| save %mxcsr register state
|
|
|
3.25.3 64-Bit SIMD Integer Instructions (SSE)
The SSE 64-bit
SIMD integer instructions perform operations on packed bytes, words, or doublewords
in MMX registers.
Table 55 64-Bit SIMD Integer Instructions
(SSE)
| | | |
| PAVGB
| compute average of packed unsigned byte integers
|
|
| PAVGW
| compute average of packed unsigned byte integers
|
|
| PEXTRW
| extract word
|
|
| PINSRW
| insert word
|
|
| PMAXSW
| maximum of packed signed word integers
|
|
| PMAXUB
| maximum of packed unsigned byte integers
|
|
| PMINSW
| minimum of packed signed word integers
|
|
| PMINUB
| minimum of packed unsigned byte integers
|
|
| PMOVMSKB
| move byte mask
|
|
| PMULHUW
| multiply packed unsigned integers and store high result
|
|
| PSADBW
| compute sum of absolute differences
|
|
| PSHUFW
| shuffle packed integer word in MMX register
|
|
|
3.25.4 Miscellaneous Instructions (SSE)
The following instructions
control caching, prefetching, and instruction ordering.
Table 56 Miscellaneous Instructions (SSE)
| | | |
| MASKMOVQ
| non-temporal store of selected bytes from an MMX register into memory
|
|
| MOVNTPS
| non-temporal store of four packed single-precision floating-point values
from an XMM register into memory
|
|
| MOVNTQ
| non-temporal store of quadword from an MMX register into memory
|
|
| PREFETCHNTA
| prefetch data into non-temporal cache structure and into a location
close to the processor
|
|
| PREFETCHT0
| prefetch data into all levels of the cache hierarchy
|
|
| PREFETCHT1
| prefetch data into level 2 cache and higher
|
|
| PREFETCHT2
| prefetch data into level 2 cache and higher
|
|
| SFENCE
| serialize store operations
|
|
|