E.6 UltraSPARC and VIS Instruction Set Extensions
This section describes extensions that require SPARC-V9. The extensions support enhanced graphics functionality
and improved memory access efficiency.
Note - SPARC-V9 instruction set extensions used in executables may not be portable to other
SPARC-V9 systems.
E.6.1 Graphics Data Formats
The overhead of converting to and from floating-point arithmetic is high, so the
graphics instructions are optimized for short-integer arithmetic. Image components are 8 or 16
bits. Intermediate results are 16 or 32 bits.
E.6.2 Eight-bit Format
A 32-bit word contains pixels of four unsigned 8-bit integers. The integers represent
image intensity values (, G, B, R). Support is provided for band interleaved
images (store color components of a point), and band sequential images (store all values of
one color component).
E.6.3 Fixed Data Formats
A 64-bit word contains four 16-bit signed fixed-point values. This is the fixed
16-bit data format.
A 64-bit word contains two 8-bit signed fixed-point values. This is the fixed
32-bit data format.
Enough precision and dynamic range (for filtering and simple image computations on pixel
values) can be provided by an intermediate format of fixed data values. Pixel
multiplication is used to convert from pixel data to fixed data. Pack instructions
are used to convert from fixed data to pixel data (clip and truncate
to an 8-bit unsigned value). The FPACKFIX instruction supports conversion from 32-bit fixed to
16-bit fixed. Rounding is done by adding one to the rounding bit position.
You should use floating-point data to perform complex calculations needing more precision or
dynamic range.
E.6.4 SHUTDOWN Instruction
All outstanding transactions are completed before the SHUTDOWN instruction completes.
Table E-13
|
|
|
|
SHUTDOWN |
shutdown |
|
shutdown to enter
power down mode |
|
E.6.5 Graphics Status Register (GSR)
You use ASR 0x13 instructions RDASR and WRASR to access the Graphics Status
Register.
Table E-14
|
|
|
|
|
|
%gsr , regrd regrs1, reg_or_imm, %gsr |
|
|
E.6.6 Graphics Instructions
Unless otherwise specified, floating-point registers contain all instruction operands. There are 32 double-precision
registers. Single-precision floating-point registers contain the pixel values, and double-precision floating-point registers contain the
fixed values.
The opcode space reserved for the Implementation-Dependent Instruction1 (IMPDEP1) instructions is where the
graphics instruction set is mapped.
Partitioned add/subtract instructions perform two 32-bit or four 16-bit partitioned adds or subtracts
between the source operands corresponding fixed point values.
Table E-15
|
|
|
|
FPADD16FPADD16S FPADD32 FPADD32S FPSUB16 FPSUB16S FPSUB32 FPSUB32S |
fpadd16fpadd16s fpadd32 fpadd32s fpsub16 fpsub16s fpsub32 fpsub32s |
fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
four 16-bit add two 16-bit
add two 32-bit add one 32-bit add four 16-bit subtract two 16-bit subtract two 32-bit subtract one 32-bit
subtract |
|
Pack instructions convert to a lower pixel or precision fixed format.
Table E-16
|
|
|
|
FPACK16FPACK32 FPACKFIX FEXPAND FPMERGE |
fpack16fpack32 fpackfix fexpand fpmerge |
fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs2, fregrd fregrs2, fregrd fregrs1, fregrs2, fregrd |
four 16-bit
packs two 32-bit packs four 16-bit packs four 16-bit expands two 32-bit merges |
|
Partitioned multiply instructions have the following variations.
Table E-17
|
|
|
|
FMUL8x16FMUL8x16AU FMUL8x16AL FMUL8SUx16 FMUL8ULx16 FMULD8SUx16 FMULD8ULx16 |
fmul8x16fmul8x16au fmul8x16al fmul8sux16 fmul8ulx16 fmuld8sux16 fmuld8ulx16 |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
8x16-bit partition 8x16-bit upper partition 8x16-bit
lower partition upper 8x16-bit partition lower unsigned 8x16-bit partition upper 8x16-bit partition lower unsigned 8x16-bit
partition |
|
Alignment instructions have the following variations.
Table E-18
|
|
|
|
ALIGNADDRESSALIGNADDRESS_LITTLE FALIGNDATA |
alignaddralignaddrl faligndata |
regrs1, regrs2, regrdregrs1, regrs2, regrd fregrs1, fregrs2, fregrd |
find misaligned data access address same as
above, but little-endian do misaligned data, data alignment |
|
Logical operate instructions perform one of sixteen 64-bit logical operations between rs1 and
rs2 (in the standard 64-bit version).
Table E-19
|
|
|
|
FZEROFZEROS FONE FONES FSRC1 |
fzerofzeros fone fones fsrc1 |
fregrdfregrd fregrd fregrd fregrs1, fregrd |
zero fill zero fill, single precision one fill one
fill, single precision copy src1 |
FSRC1SFSRC2 FSRC2S FNOT1 FNOT1S |
fsrc1sfsrc2 fsrc2s fnot1 fnot1s |
fregrs1, fregrdfregrs2, fregrd fregrs2, fregrd fregrs1, fregrd fregrs1, fregrd |
copy src1, single precision copy src2 copy src2, single precision negate src1,
1's complement same as above, single precision |
|
|
fregrs2, fregrdfregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
negate src2, 1's complement same as above, single
precision logical OR logical OR, single precision logical NOR |
FNORSFAND FANDS FNAND FNANDS |
fnorsfand fands fnand fnands |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
logical NOR, single precision logical AND logical AND, single
precision logical NAND logical NAND, single precision |
FXORFXORS FXNOR FXNORS FORNOT1 |
fxorfxors fxnor fxnors fornot1 |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
logical XOR logical XOR, single precision logical XNOR logical XNOR,
single precision negated src1 OR src2 |
FORNOT1SFORNOT2 FORNOT2S FANDNOT1 |
fornot1sfornot2 fornot2s fandnot1 |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
same as above, single precision src1 OR negated src2 same
as above, single precision negated src1 AND src2 |
FANDNOT1SFANDNOT2 FANDNOT2S |
fandnot1sfandnot2 fandnot2s |
fregrs1, fregrs2, fregrdfregrs1, fregrs2, fregrd fregrs1, fregrs2, fregrd |
same as above, single precision src1 AND
negated src2 same as above, single precision |
|
Pixel compare instructions compare fixed-point values in rs1 and rs2 (two 32
bit or four 16 bit)
Table E-20
|
|
|
|
FCMPGT16FCMPGT32 FCMPLE16 FCMPLE32 |
fcmpgt16fcmpgt32 fcmple16 fcmple32 |
fregrs1, fregrs2, regrdfregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd |
4 16-bit compare, set rd if src1>src2 2 32-bit
compare, set rd if src1>src2 4 16-bit compare, set rd if src1≤src2 2 32-bit
compare, set rd if src1≤src2 |
FCMPNE16FCMPNE32 FCMPEQ16 FCMPEQ32 |
fcmpne16fcmpne32 fcmpeq16 fcmpeq32 |
fregrs1, fregrs2, regrdfregrs1, fregrs2, regrd fregrs1, fregrs2, regrd fregrs1, fregrs2, regrd |
4 16-bit compare, set rd if src1≠src2 2 32-bit
compare, set rd if src1≠src2 4 16-bit compare, set rd if src1=src2 2 32-bit
compare, set rd if src1=src2 |
|
Edge handling instructions handle the boundary conditions for parallel pixel scan line loops.
Table E-21
|
|
|
|
|
|
regrs1, regrs2, regrdregrs1, regrs2, regrd regrs1, regrs2, regrd |
8 8-bit edge boundary processing same as above, little-endian 4 16-bit edge boundary
processing |
|
|
regrs1, regrs2, regrdregrs1, regrs2, regrd regrs1, regrs2, regrd |
same as above, little-endian 2 32-bit edge boundary processing same as above, little-endian |
|
Pixel component distance instructions are used for motion estimation in video compression algorithms.
Table E-22
|
|
|
|
PDIST |
pdist |
fregrs1, fregrs2, fregrd |
8 8-bit components, distance between |
|
The three-dimensional array addressing instructions convert three- dimensional fixed-point addresses (in rs1) to
a blocked-byte address. The result is stored in rd.
Table E-23
|
|
|
|
|
|
regrs1, regrs2, regrdregrs1, regrs2, regrd regrs1, regrs2, regrd |
convert 8-bit 3-D address
to blocked byte address same as above, but 16-bit same as above, but 32-bit |
|
E.6.7 Memory Access Instructions
These memory access instructions are part of the SPARC-V9 instruction set extensions.
Table E-24
|
|
|
|
|
ASI_PST8_P ASI_PST8_S ASI_PST8_PL ASI_PST8_SL |
stda fregrd, [regaddr] regmask, imm_asi |
eight
8-bit conditional stores to: primary address space secondary
address space primary address space, little endian secondary
address space, little endian |
|
ASI_PST16_P ASI_PST16_S ASI_PST16_PL ASI_PST16_SL |
|
four 16-bit conditional stores to: primary address
space secondary address space primary address space, little
endian secondary address space, little endian |
|
ASI_PST32_P ASI_PST32_S ASI_PST32_PL ASI_PST32_SL |
|
two 32-bit conditional stores to:
primary address space secondary address space
primary address space, little endian secondary address space, little endian |
|
Note - To select a partial store instruction, use one of the partial store ASIs
with the STDA instruction.
Table E-25
|
|
|
|
|
|
ldda [reg_addr] imm_asi, fregrd stda fregrd, [reg_addr] imm_asi |
8-bit load/store
from/to: primary address space |
|
ASI_FL8_S |
ldda [ reg_plus_imm] %asi , fregrdstda [reg_plus_imm] %asi |
secondary address space |
|
ASI_FL8_PL |
|
primary address space, little endian |
|
ASI_FL8_SL |
|
secondary address space, little endian |
|
|
|
16-bit load/store from/to: primary address space |
|
ASI_FL16_S |
|
secondary address space |
|
ASI_FL16_PL |
|
primary address space, little endian |
|
ASI_FL16_SL |
|
secondary address space, little endian |
|
Note - To select a short floating-point load and store instruction, use one of the
short ASIs with the LDDA and STDA instructions.
Table E-26
|
|
|
|
|
ASI_NUCLEUS_QUAD_LDDASI_NUCLEUS_QUAD_LDD_L |
[ reg_addr] imm_asi, regrd [reg_plus_imm] %asi, regrd |
128-bit atomic load 128-bit
atomic load, little endian |
|
|
ldda [reg_addr] imm_asi, fregrd stda fregrd, [reg_addr] imm_asi |
64-byte block load/store from/to:
primary address space, user privilege |
|
ASI_BLK_AIUS |
ldda [reg_plus_imm] %asi , fregrdstda fregrd, [reg_plus_imm] %asi |
secondary address space, user privilege. |
|
ASI_BLK_AIUPL |
|
primary address space, user
privilege, little endian |
|
ASI_BLK_AIUSL |
|
secondary address space,
user privilege little endian |
|
ASI_BLK_P |
|
primary address
space |
|
ASI_BLK_S |
|
secondary address space |
|
ASI_BLK_PL |
|
primary address space, little
endian |
|
ASI_BLK_SL |
|
secondary address space, little endian |
|
ASI_BLK_COMMIT_P |
|
64-byte block commit store to
primary address space |
|
ASI_BLK_COMMIT_S |
|
64-byte block commit store to
secondary address space |
|
Note - To select a block load and store instruction, use one of the block
transfer ASIs with the LDDA and STDA instructions.