This chapter describes the components and functions of the Sun Netra DPS Profiler API. Topics include:
Profiler API Configuration
You can set two properties (TABLE 3-1) for a process in the software architecture. These properties are configured per process and applied to all threads of that process.
TABLE 3-1 Process Properties
Property
|
Description
|
profiler_log_table_size
|
Sets the total number of profile records in the log. The default value is 1024.
|
profiler_user_data_size
|
Represents the maximum number of user-data in 64-bit words that user wants to log along with the profile record. The default value is 0.
|
Profiler API Data Types
TABLE 3-2 describes the Profiler API data types.
TABLE 3-2 Profiler API Data Types
Data Type
|
Description
|
teja_profiler_group_t;
|
Represents a group of events. For example, events regarding instructions and cache hit or miss in one group, while memory related events can be in another group. Groups are target-specific and available to the user in preprocessor define forms.
|
teja_profiler_event_t;
|
Represents what needs to be measured in a specific group. Group and event combinations make a unique event. Each bit in the 64-bit value represents a different event so more that one event can be specified using an event mask.
|
teja_profiler_value_t;
|
Type for the value of the event. This is the type for the actual value that is being measured.
|
TEJA_PROFILER_MAX_EVENTS
|
Maximum number of events that can be measured per group. This value is target-dependent.
|
teja_profiler_values_t;
|
Type for the values of the events. The events array contains the values of the events in the same group. For example:
typedef struct teja_profiler_values_t
uint64_t events [TEJA_PROFILER_MAX_EVENTS];
|
Profiler API Functions
teja_profiler_start
Description
Starts collecting profile data for the specified events in the specified group. More than one event can be specified as a bit mask. Only one group is allowed. If the user wants to start profiling more than one group, the user must invoke the same function multiple times.
Syntax
int teja_profiler_start(const teja_profiler_group_t group,
const teja_profiler_event_t event);
Parameters
group - ID of the group for to start collecting profiler data.
event - Events of the group as a bit mask. Up to two different events can be specified at a time.
In case of measuring events inside the CPU group for the UltraSPARC T1 processor, the user can specify only one event. The second event is always the number of executed instructions but is not explicitly specified.
In case of measuring events inside DRAM or JBUS group for the UltraSPARC T1 processor or inside any group of events for the UltraSPARC T2 processor, the user can specify two events to be measured at a time. In this case, the event argument in the teja_profiler_start function call has the following format:
event1 | event2
where event1 and event2 are events to be measured.
Return Values
int - 0 for success and -1 for error.
teja_profiler_stop
Description
Stops collecting profile data for all events in the specified group. This function has empty implementation on some targets.
Syntax
int teja_profiler_stop(const teja_profiler_group_t group);
Parameters
group - ID of the group to stop collecting profiler data.
Return Values
int - 0 for success and -1 for error.
teja_profiler_update
Description
Takes a snapshot of the current profiling data and saves the snapshot in the log. All the events that were specified for the group with the teja_profiler_start are updated. User-defined data that needs to be logged with the profiler log entry can be specified using variable arguments. The maximum number of arguments is specified in the software architecture using the process property.
Syntax
int teja_profiler_update(const teja_profiler_group_t group, ...);
Parameters
group - ID of the group for which the user wants to update profile data.
... - List of channels from which to read. The list must be NULL terminated.
Return Values
int - 0 for success and -1 for error.
teja_profiler_get_values
Description
Takes a snapshot of the current profiling data and returns it in the values parameter. All the events that were specified for the group with teja_profiler_start is returned.
Syntax
int teja_profiler_get_values(const teja_profiler_group_t group,
teja_profiler_values_t *values);
Parameters
group - ID of the group which the user wants to get the profiler data.
values - User-allocated data structure that will be filled with the profiler data.
Return Values
int - Returns overflow information or -1 for error
teja_profiler_get_value
Description
Retrieves the value of a given event from a teja_profiler_values_t data structure.
Syntax
teja_profiler_value_t teja_profiler_get_value(teja_profiler_values_t *values, int index);
Parameters
values - Data structure that was filled by teja_profiler_get_values
index - Index of the event to read (sequential number from 0 up to the maximum number of events specifiable in a group)
Return Values
teja_profiler_value_t - Returns the value of the given event.
teja_profiler_dump
Description
Dumps the profile data in stdout. The profiler data represents the profiler records that are collected so far for the thread identifier.
Syntax
int teja_profiler_dump(teja_thread_t thread);
Parameters
thread - Thread identifier for which the profiler dump is requested.
Return Values
int - Returns 0 for success and -1 for error.
Processor Specific Profiler ConstantsUltraSPARC T1 Processor-Specific Profiler Groups
TABLE 3-3 lists the specific profiler groups for the UltraSPARC T1 processor.
TABLE 3-3 UltraSPARC T1 Processor - Specific Profiler Groups
Group
|
Event or Description
|
Description
|
TEJA_PROFILER_CMT_CPU (0x1)
|
Captures events related to CPU and caches. The events measured in this group are per strand. The following events are available for this group. The completed instructions count is always an available event for this group.One additional event that can be measured along with the instructions count.
|
|
TEJA_PROFILER_CMT_CPU_SB_FULL (0x1)
|
Measures the number of store buffer full cycles.
|
|
TEJA_PROFILER_CMT_CPU_FP_INSTR_CNT (0x2)
|
Measures the number of floating point instructions.
|
|
TEJA_PROFILER_CMT_CPU_IC_MISS (0x4)
|
Measures the number of instruction cache misses.
|
|
TEJA_PROFILER_CMT_CPU_DC_MISS (0x8)
|
Measures the number of data cache misses.
|
|
TEJA_PROFILER_CMT_CPU_ITLB_MISS (0x10)
|
Measures the number of instruction TLB miss traps taken.
|
|
TEJA_PROFILER_CMT_CPU_DTLB_MISS (0x20)
|
Measures the number of data TLB miss traps taken.
|
|
TEJA_PROFILER_CMT_CPU_L2_IMISS (0x40)
|
Measures the number of secondary cache (L2) misses due to instruction cache requests.
|
|
TEJA_PROFILER_CMT_CPU_L2_DMISS_LD (0x80)
|
Measures the number of secondary cache (L2) misses due to data cache load requests.
|
|
TEJA_PROFILER_CMT_CPU_INSTR_COMPLETED (0x100)
|
Measures the number of completed instructions.
|
TEJA_PROFILER_CMT_DRAM_CTL0
|
Captures events related to DRAM memory read, write, and queues. There are different groups for different DRAM controllers. The following events can be measured in this group:
|
|
TEJA_PROFILER_CMT_DRAM_MEM_READS (0x1)
|
Read transactions.
|
|
TEJA_PROFILER_CMT_DRAM_MEM_WRITES (0x2)
|
Write transactions.
|
|
TEJA_PROFILER_CMT_DRAM_MEM_READ_WRITE (0x4)
|
Read + write transactions.
|
|
TEJA_PROFILER_CMT_DRAM_BANK_BUSY_STALLS (0x8)
|
Bank busy stalls.
|
|
TEJA_PROFILER_CMT_DRAM_RD_QUEUE_LATENCY (0x10)
|
Read queue latency.
|
|
TEJA_PROFILER_CMT_DRAM_WR_QUEUE_LATENCY (0x20)
|
Write queue latency.
|
|
TEJA_PROFILER_CMT_DRAM_RW_QUEUE_LATENCY (0x40)
|
Read + write queue latency.
|
|
TEJA_PROFILER_CMT_DRAM_WR_BUF_HITS (0x80)
|
Write-back buffer hits.
|
TEJA_PROFILER_CMT_DRAM_CTL1
|
Measures same events as DRAM controller 0, but for DRAM controller 1.
|
TEJA_PROFILER_CMT_DRAM_CTL2
|
Measures same events as DRAM controller 0, but for DRAM controller 2.
|
TEJA_PROFILER_CMT_DRAM_CTL3
|
Measures same events as DRAM controller 0, but for DRAM controller 3.
|
TEJA_PROFILER_CMT_JBUS
|
This group captures events related to JBus read, write, and cycles. Following events can be measured for this group:
|
|
TEJA_PROFILER_CMT_JBUS_CYCLES (0x1)
|
JBus cycles.
|
|
TEJA_PROFILER_CMT_JBUS_DMA_READS (0x2)
|
DMA read transactions (inbound).
|
|
TEJA_PROFILER_CMT_JBUS_DMA_READ_LATENCY (0x4)
|
Total DMA read latency.
|
|
TEJA_PROFILER_CMT_JBUS_DMA_WRITES (0x8)
|
DMA write transactions.
|
|
TEJA_PROFILER_CMT_JBUS_DMA_WRITE8 (0x10)
|
DMA WR8 subtransactions.
|
|
TEJA_PROFILER_CMT_JBUS_ORDERING_WAITS (0x20)
|
Ordering waits.
|
|
TEJA_PROFILER_CMT_JBUS_PIO_READS (0x40)
|
PIO read transactions (outbound).
|
|
TEJA_PROFILER_CMT_JBUS_PIO_READ_LATENCY (0x80)
|
Total PIO read latency.
|
|
TEJA_PROFILER_CMT_JBUS_AOK_DOK_OFF_CYCLES (0x100)
|
AOK_OFF or DOK_OFF seen (cycles).
|
|
TEJA_PROFILER_CMT_JBUS_AOK_OFF_CYCLES (0x200)
|
AOK_OFF seen (cycles).
|
|
TEJA_PROFILER_CMT_JBUS_DOK_OFF_CYCLES (0x400)
|
DOK_OFF seen (cycles).
|
UltraSPARC T2 Processor-Specific Profiler Groups
TABLE 3-4 lists the Specific Profiler Groups for the UltraSPARC T2 processor:
TABLE 3-4 UltraSPARC T2 Processor - Specific Profiler Groups
Group
|
Event or Description
|
Description
|
TEJA_PROFILER_CMT_CPU (0x1)
|
Captures events related to CPU and caches. The events measured in this group are per strand. You can specify up to two independent events that can be concurrently measured. The following events are available for this group.
|
|
TEJA_PROFILER_CMT2_COMPLETED_BRANCHES
|
Number of completed branches.
|
|
TEJA_PROFILER_CMT2_TAKEN_BRANCHES
|
Number of branches taken.
|
|
TEJA_PROFILER_CMT2_FGU_ARITHMATIC_INSTR
|
Number of floating-point arithmetic instructions executed.
|
|
TEJA_PROFILER_CMT2_LOAD_INSTR
|
Number of load instructions executed.
|
|
TEJA_PROFILER_CMT2_STORE_INSTR
|
Number of store instruction executed.
|
|
TEJA_PROFILER_CMT2_SETHI_INSTR
|
Number of sethi instructions executed.
|
|
TEJA_PROFILER_CMT2_OTHER_INSTR
|
Number of all other instructions executed.
|
|
TEJA_PROFILER_CMT2_ATOMICS
|
Number of atomic operations executed.
|
|
TEJA_PROFILER_CMT2_ALL_INSTR
|
Total number of instructions executed.
|
|
TEJA_PROFILER_CMT2_ICACHE_MISSES
|
Number of instruction cache misses.
|
|
TEJA_PROFILER_CMT2_DCACHE_MISSES
|
Number of L1 data cache misses.
|
|
TEJA_PROFILER_CMT2_L2_INSTR_MISSES
|
Number of secondary cache (L2) misses due to instruction cache requests.
|
|
TEJA_PROFILER_CMT2_L2_LOAD_MISSES
|
Measures the number of secondary cache (L2) misses due to data cache load requests.
|
|
TEJA_PROFILER_CMT2_ITLB_REF_L2
|
For each ITLB miss, counts the number of accesses the ITLB hardware tablewalk makes to L2 when hardware tablewalk is enabled.
|
|
TEJA_PROFILER_CMT2_DTLB_REF_L2
|
For each DTLB miss, counts the number of accesses the DTLB hardware tablewalk makes to L2 when hardware tablewalk is enabled.
|
|
TEJA_PROFILER_CMT2_ITLB_MISS_L2
|
For each ITLB miss, counts the number of accesses the ITLB hardware tablewalk makes to L2 which misses in L2 when hardware tablewalk is enabled.
Note: Depending on the hardware tablewalk configuration, each ITLB miss might issue from 1 to 4 requests to L2 to search TSB’s.
|
|
TEJA_PROFILER_CMT2_DTLB_MISS_L2
|
For each DTLB miss, counts the number of accesses the DTLB hardware tablewalk makes to L2, which misses in L2 when hardware tablewalk is enabled.
Note: Depending on the hardware tablewalk configuration, each DTLB miss may issue from 1 to 4 requests to L2 to search TSB’s.
|
|
TEJA_PROFILER_CMT2_STREAM_LD_TO_PCX
|
Counts the number of SPU load operations to L2.
|
|
TEJA_PROFILER_CMT2_STREAM_ST_TO_PCX
|
Counts the number of SPU store operations to L2.
|
|
TEJA_PROFILER_CMT2_CPU_LD_TO_PCX
|
Counts the number of CPU loads to L2.
|
|
TEJA_PROFILER_CMT2_CPU_IFETCH_TO_PCX
|
Counts the number of I-fetches to L2.
|
|
TEJA_PROFILER_CMT2_CPU_ST_TO_PCX
|
Counts the number of CPU stores to L2.
|
|
TEJA_PROFILER_CMT2_MMU_LD_TO_PCX
|
Counts the number of MMU loads to L2.
|
|
TEJA_PROFILER_CMT2_DES_3DES_OP
|
Increments for each CWQ or ASI operation that uses DES/3DES unit.
|
|
TEJA_PROFILER_CMT2_AES_OP
|
Increments for each CWQ or ASI operation which uses AES unit.
|
|
TEJA_PROFILER_CMT2_RC4_OP
|
Increments for each CWQ or ASI operation which uses RC4.
|
|
TEJA_PROFILER_CMT2_MD5_SHA1_SHA256_OP
|
Increments for each CWQ or ASI operation which uses MD5, SHA-1, or SHA-256.
|
|
TEJA_PROFILER_CMT2_MA_OP
|
Increments for each CWQ or ASI modular arithmetic operation.
|
|
TEJA_PROFILER_CMT2_CRC_TCPIP_CKSUM
|
Increments for each iSCSI CRC or TCP/IP checksum operation.
|
|
TEJA_PROFILER_CMT2_DES_3DES_BUSY_CYCLE
|
Increments each cycle when DES/3DES unit is busy.
|
|
TEJA_PROFILER_CMT2_AES_BUSY_CYCLE
|
Number of busy cycles encountered when attempting to execute the AES operation.
|
|
TEJA_PROFILER_CMT2_RC4_BUSY_CYCLE
|
Number of busy cycles encountered when attempting to execute the RC4 operation.
|
|
TEJA_PROFILER_CMT2_MD5_SHA1_SHA256_BUSY_CYCLE
|
Number of busy cycles encountered when attempting to execute the MD5_SHA1_SHA256 operation.
|
|
TEJA_PROFILER_CMT2_MA_BUSY
|
Increments each cycle when modular arithmetic unit is busy.
|
|
TEJA_PROFILER_CMT2_CRC_MPA_CKSUM
|
Increments each cycle when CRC/MPA/checksum unit is busy.
|
|
TEJA_PROFILER_CMT2_ITLB_MISS
|
Includes all misses (successful and unsuccessful tablewalks).
|
|
TEJA_PROFILER_CMT2_DTLB_MISS
|
Includes all misses (successful and unsuccessful tablewalks).
|
|
TEJA_PROFILER_CMT2_TLB_MISS
|
Counts both ITLB and DTLB misses (successful and unsuccessful tablewalks).
|
TEJA_PROFILER_CMT_DRAM_CTL0
|
This group captures events related to DRAM memory read, write, and queues. The events that can be measured are the same as for the UltraSPARC T1 processor (see TABLE 3-3 in UltraSPARC T1 Processor-Specific Profiler Groups). There are different groups for different DRAM controllers.
|
TEJA_PROFILER_CMT_DRAM_CTL1
|
Measures same events as DRAM controller 0, but for DRAM controller 1.
|
TEJA_PROFILER_CMT_DRAM_CTL2
|
Measures same events as DRAM controller 0, but for DRAM controller 2.
|
TEJA_PROFILER_CMT_DRAM_CTL3
|
Measures same events as DRAM controller 0, but for DRAM controller 3.
|
Sun Netra Data Plane Software Suite 2.1 Update 1 Reference Manual
|
820-5156-11
|
  
|
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.