E.6 UltraSPARC and VIS Instruction Set Extensions
This section describes extensions that require SPARC-V9. The extensions support enhanced graphics functionality
and improved memory access efficiency.
Note - SPARC-V9 instruction set extensions used in executables may not be portable to other
SPARC-V9 systems.
E.6.1 Graphics Data Formats
The overhead of converting to and from floating-point arithmetic is high, so the
graphics instructions are optimized for short-integer arithmetic. Image components are 8 or 16
bits. Intermediate results are 16 or 32 bits.
E.6.2 Eight-bit Format
A 32-bit word contains pixels of four unsigned 8-bit integers. The integers represent
image intensity values (, G, B, R). Support is provided for band interleaved images (store
color components of a point), and band sequential images (store all values of one
color component).
E.6.3 Fixed Data Formats
A 64-bit word contains four 16-bit signed fixed-point values. This is the fixed
16-bit data format.
A 64-bit word contains two 8-bit signed fixed-point values. This is the fixed
32-bit data format.
Enough precision and dynamic range (for filtering and simple image computations on pixel
values) can be provided by an intermediate format of fixed data values. Pixel
multiplication is used to convert from pixel data to fixed data. Pack instructions
are used to convert from fixed data to pixel data (clip and
truncate to an 8-bit unsigned value). The FPACKFIX instruction supports conversion from 32-bit fixed
to 16-bit fixed. Rounding is done by adding one to the rounding bit
position. You should use floating-point data to perform complex calculations needing more precision
or dynamic range.
E.6.4 SHUTDOWN Instruction
All outstanding transactions are completed before the SHUTDOWN instruction completes.
Table E-13 SHUTDOWN Instruction
|
|
|
|
SHUTDOWN |
shutdown |
|
shutdown to enter
power down mode |
|
E.6.5 Graphics Status Register (GSR)
You use ASR 0x13 instructions RDASR and WRASR to access the Graphics Status
Register.
Table E-14 Graphics Status Register (GSR)
|
|
|
|
|
|
%gsr , regrd regrs1, reg_or_imm, %gsr |
|
|
E.6.6 Graphics Instructions
Unless otherwise specified, floating-point registers contain all instruction operands. There are 32 double-precision
registers. Single-precision floating-point registers contain the pixel values, and double-precision floating-point registers contain the
fixed values.
The opcode space reserved for the Implementation-Dependent Instruction1 (IMPDEP1) instructions is where the
graphics instruction set is mapped.
Partitioned add/subtract instructions perform two 32-bit or four 16-bit partitioned adds or subtracts
between the source operands corresponding fixed point values.
Table E-15 Graphics Instructions
|
|
|
|
FPADD16FPADD16S FPADD32 FPADD32S FPSUB16 FPSUB16S FPSUB32 FPSUB32S |
fpadd16fpadd16s fpadd32 fpadd32s fpsub16 fpsub16s fpsub32 fpsub32s |
fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
four 16-bit add two 16-bit add two
32-bit add one 32-bit add four 16-bit subtract two 16-bit subtract two 32-bit subtract one 32-bit subtract |
|
Pack instructions convert to a lower pixel or precision fixed format.
Table E-16 Pack Instructions
|
|
|
|
FPACK16FPACK32 FPACKFIX FEXPAND FPMERGE |
fpack16fpack32 fpackfix fexpand fpmerge |
fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs2, fregrd fregrs2, fregrd fregrs1, fregrs2, fregrd |
four
16-bit packs two 32-bit packs four 16-bit packs four 16-bit expands two 32-bit merges |
|
Partitioned multiply instructions have the following variations.
Table E-17 Partitioned Multiply Instructions
|
|
|
|
FMUL8x16FMUL8x16AU FMUL8x16AL FMUL8SUx16 FMUL8ULx16 FMULD8SUx16 FMULD8ULx16 |
fmul8x16fmul8x16au fmul8x16al fmul8sux16 fmul8ulx16 fmuld8sux16 fmuld8ulx16 |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
8x16-bit partition 8x16-bit upper partition 8x16-bit
lower partition upper 8x16-bit partition lower unsigned 8x16-bit partition upper 8x16-bit partition lower unsigned 8x16-bit
partition |
|
Alignment instructions have the following variations.
Table E-18 Alignment Instructions
|
|
|
|
ALIGNADDRESSALIGNADDRESS_LITTLE FALIGNDATA |
alignaddralignaddrl faligndata |
regrs1, regrs2, regrdregrs1, regrs2, regrd fregrs1, fregrs2, fregrd |
find misaligned data access address same as
above, but little-endian do misaligned data, data alignment |
|
Logical operate instructions perform one of sixteen 64-bit logical operations between rs1 and
rs2 (in the standard 64-bit version).
Table E-19 Logical Operate Instructions
|
|
|
|
FZEROFZEROS FONE FONES FSRC1 |
fzerofzeros fone fones fsrc1 |
fregrdfregrd fregrd fregrd fregrs1, fregrd |
zero fill zero fill, single precision one fill one
fill, single precision copy src1 |
FSRC1SFSRC2 FSRC2S FNOT1 FNOT1S |
fsrc1sfsrc2 fsrc2s fnot1 fnot1s |
fregrs1, fregrdfregrs2, fregrd fregrs2, fregrd fregrs1, fregrd fregrs1, fregrd |
copy src1, single precision copy src2 copy src2, single precision negate src1,
1's complement same as above, single precision |
|
|
fregrs2, fregrdfregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
negate src2, 1's complement same as above, single
precision logical OR logical OR, single precision logical NOR |
FNORSFAND FANDS FNAND FNANDS |
fnorsfand fands fnand fnands |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
logical NOR, single precision logical AND logical AND, single
precision logical NAND logical NAND, single precision |
FXORFXORS FXNOR FXNORS FORNOT1 |
fxorfxors fxnor fxnors fornot1 |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
logical XOR logical XOR, single precision logical XNOR logical XNOR,
single precision negated src1 OR src2 |
FORNOT1SFORNOT2 FORNOT2S FANDNOT1 |
fornot1sfornot2 fornot2s fandnot1 |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
same as above, single precision src1 OR negated src2 same
as above, single precision negated src1 AND src2 |
FANDNOT1SFANDNOT2 FANDNOT2S |
fandnot1sfandnot2 fandnot2s |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
same as above, single precision src1 AND
negated src2 same as above, single precision |
|
Pixel compare instructions compare fixed-point values in rs1 and rs2 (two 32
bit or four 16 bit).
Table E-20 Pixel Compare Instructions
|
|
|
|
FCMPGT16FCMPGT32 FCMPLE16 FCMPLE32 |
fcmpgt16fcmpgt32 fcmple16 fcmple32 |
fregrs1, fregrs2, regrdfregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd |
4 16-bit compare, set rd if src1>src2 2
32-bit compare, set rd if src1>src2 4 16-bit compare, set rd if src1≤src2 2
32-bit compare, set rd if src1≤src2 |
FCMPNE16FCMPNE32 FCMPEQ16 FCMPEQ32 |
fcmpne16fcmpne32 fcmpeq16 fcmpeq32 |
fregrs1, fregrs2, regrdfregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd |
4 16-bit compare, set rd if src1≠src2 2
32-bit compare, set rd if src1≠src2 4 16-bit compare, set rd if src1=src2 2
32-bit compare, set rd if src1=src2 |
|
Edge handling instructions handle the boundary conditions for parallel pixel scan line loops.
Table E-21 Edge Handling Instructions
|
|
|
|
|
|
regrs1, regrs2, regrdregrs1, regrs2, regrd regrs1, regrs2, regrd |
8 8-bit edge boundary processing same as above, little-endian 4 16-bit edge boundary processing |
|
|
regrs1, regrs2, regrdregrs1, regrs2, regrd regrs1, regrs2, regrd |
same
as above, little-endian 2 32-bit edge boundary processing same as above, little-endian |
|
Pixel component distance instructions are used for motion estimation in video compression algorithms.
Table E-22 Pixel Component Distance Instructions
|
|
|
|
PDIST |
pdist |
fregrs1, fregrs2, fregrd |
8 8-bit components, distance between |
|
The three-dimensional array addressing instructions convert three- dimensional fixed-point addresses (in rs1) to
a blocked-byte address. The result is stored in rd.
Table E-23 Three-Dimensional Array Addressing Instructions
|
|
|
|
|
|
regrs1, regrs2, regrdregrs1, regrs2, regrd regrs1, regrs2, regrd |
convert 8-bit 3-D address
to blocked byte address same as above, but 16-bit same as above, but 32-bit |
|
E.6.7 Memory Access Instructions
These memory access instructions are part of the SPARC-V9 instruction set extensions.
Table E-24 Memory Access Instructions
|
|
|
|
|
|
|
eight 8-bit conditional stores to: |
|
ASI_PST8_PASI_PST8_S ASI_PST8_PL ASI_PST8_SL |
stda
fregrd, [regaddr] regmask, imm_asi |
primary address space secondary address space primary address space, little endian secondary
address space, little endian |
|
|
|
four 16-bit conditional stores to: |
|
ASI_PST16_PASI_PST16_S ASI_PST16_PL ASI_PST16_SL |
|
primary address space secondary address space primary address space, little endian secondary
address space, little endian |
|
|
|
two 32-bit conditional stores to: |
|
ASI_PST32_PASI_PST32_S ASI_PST32_PL ASI_PST32_SL |
|
primary address space secondary address space primary address space, little endian secondary
address space, little endian |
|
Note - To select a partial store instruction, use one of the partial store ASIs
with the STDA instruction.
Table E-25 Partial Store Instructions
|
|
|
|
|
|
|
8-bit load/store from/to: |
|
ASI_FL8_P |
ldda [ reg_addr] imm_asi, fregrdstda fregrd, [reg_addr] imm_asi |
8-bit load/store
from/to:primary address space |
|
ASI_FL8_S |
ldda [ reg_plus_imm] %asi , fregrdstda [reg_plus_imm] %asi |
secondary address space |
|
ASI_FL8_PL |
|
primary address space, little
endian |
|
ASI_FL8_SL |
|
secondary address space, little endian |
|
|
|
16-bit load/store from/to: |
|
ASI_FL16_P |
|
primary address space |
|
ASI_FL16_S |
|
secondary address space |
|
ASI_FL16_PL |
|
primary address space, little
endian |
|
ASI_FL16_SL |
|
secondary address space, little endian |
|
Note - To select a short floating-point load and store instruction, use one of the
short ASIs with the LDDA and STDA instructions.
Table E-26 Load and Store Instructions
|
|
|
|
|
ASI_NUCLEUS_QUAD_LDDASI_NUCLEUS_QUAD_LDD_L |
[ reg_addr] imm_asi, regrd [reg_plus_imm] %asi, regrd |
128-bit
atomic load 128-bit atomic load, little endian |
|
|
ldda [reg_addr] imm_asi, fregrd stda fregrd, [reg_addr] imm_asi |
64-byte block
load/store from/to: primary address space, user privilege |
|
ASI_BLK_AIUS |
ldda [reg_plus_imm] %asi , fregrdstda fregrd, [reg_plus_imm] %asi |
secondary address
space, user privilege. |
|
ASI_BLK_AIUPL |
|
primary address space, user privilege, little endian |
|
ASI_BLK_AIUSL |
|
secondary address space, user
privilege little endian |
|
ASI_BLK_P |
|
primary address space |
|
ASI_BLK_S |
|
secondary address space |
|
ASI_BLK_PL |
|
primary address space, little endian |
|
ASI_BLK_SL |
|
secondary address
space, little endian |
|
ASI_BLK_COMMIT_P |
|
64-byte block commit store to primary address space |
|
ASI_BLK_COMMIT_S |
|
64-byte block commit
store to secondary address space |
|
Note - To select a block load and store instruction, use one of the
block transfer ASIs with the LDDA and STDA instructions.