C H A P T E R  3

Profiler API

This chapter describes the components and functions of the Sun Netra DPS Profiler API. Topics include:


Profiler API Configuration

You can set two properties (TABLE 3-1) for a process in the software architecture. These properties are configured per process and applied to all threads of that process.


TABLE 3-1 Process Properties

Property

Description

profiler_log_table_size

Sets the total number of profile records in the log. The default value is 1024.

profiler_user_data_size

Represents the maximum number of user-data in 64-bit words that user wants to log along with the profile record. The default value is 0.



Profiler API Data Types

TABLE 3-2 describes the Profiler API data types.


TABLE 3-2 Profiler API Data Types

Data Type

Description

teja_profiler_group_t;

Represents a group of events. For example, events regarding instructions and cache hit or miss in one group, while memory related events can be in another group. Groups are target-specific and available to the user in preprocessor define forms.

teja_profiler_event_t;

Represents what needs to be measured in a specific group. Group and event combinations make a unique event. Each bit in the 64-bit value represents a different event so more that one event can be specified using an event mask.

teja_profiler_value_t;

Type for the value of the event. This is the type for the actual value that is being measured.

TEJA_PROFILER_MAX_EVENTS

Maximum number of events that can be measured per group. This value is target-dependent.

teja_profiler_values_t;

Type for the values of the events. The events array contains the values of the events in the same group. For example:

typedef struct teja_profiler_values_t
  uint64_t events [TEJA_PROFILER_MAX_EVENTS];

Profiler API Functions

teja_profiler_start

Description

Starts collecting profile data for the specified events in the specified group. More than one event can be specified as a bit mask. Only one group is allowed. If the user wants to start profiling more than one group, the user must invoke the same function multiple times.

Syntax

int teja_profiler_start(const teja_profiler_group_t group,
const teja_profiler_event_t
event);

Parameters

group - ID of the group for to start collecting profiler data.

event - Events of the group as a bit mask. Up to two different events can be specified at a time.

In case of measuring events inside the CPU group for the UltraSPARC T1 processor, the user can specify only one event. The second event is always the number of executed instructions but is not explicitly specified.

In case of measuring events inside DRAM or JBUS group for the UltraSPARC T1 processor or inside any group of events for the UltraSPARC T2 processor, the user can specify two events to be measured at a time. In this case, the event argument in the teja_profiler_start function call has the following format:

event1 | event2

where event1 and event2 are events to be measured.

Return Values

int - 0 for success and -1 for error.

teja_profiler_stop

Description

Stops collecting profile data for all events in the specified group. This function has empty implementation on some targets.

Syntax

int teja_profiler_stop(const teja_profiler_group_t group);

Parameters

group - ID of the group to stop collecting profiler data.

Return Values

int - 0 for success and -1 for error.

teja_profiler_update

Description

Takes a snapshot of the current profiling data and saves the snapshot in the log. All the events that were specified for the group with the teja_profiler_start are updated. User-defined data that needs to be logged with the profiler log entry can be specified using variable arguments. The maximum number of arguments is specified in the software architecture using the process property.

Syntax

int teja_profiler_update(const teja_profiler_group_t group, ...);

Parameters

group - ID of the group for which the user wants to update profile data.

... - List of channels from which to read. The list must be NULL terminated.

Return Values

int - 0 for success and -1 for error.

teja_profiler_get_values

Description

Takes a snapshot of the current profiling data and returns it in the values parameter. All the events that were specified for the group with teja_profiler_start is returned.

Syntax

int teja_profiler_get_values(const teja_profiler_group_t group,
teja_profiler_values_t *
values);

Parameters

group - ID of the group which the user wants to get the profiler data.

values - User-allocated data structure that will be filled with the profiler data.

Return Values

int - Returns overflow information or -1 for error

teja_profiler_get_value

Description

Retrieves the value of a given event from a teja_profiler_values_t data structure.

Syntax

teja_profiler_value_t teja_profiler_get_value(teja_profiler_values_t *values, int index);

Parameters

values - Data structure that was filled by teja_profiler_get_values

index - Index of the event to read (sequential number from 0 up to the maximum number of events specifiable in a group)

Return Values

teja_profiler_value_t - Returns the value of the given event.

teja_profiler_dump

Description

Dumps the profile data in stdout. The profiler data represents the profiler records that are collected so far for the thread identifier.

Syntax

int teja_profiler_dump(teja_thread_t thread);

Parameters

thread - Thread identifier for which the profiler dump is requested.

Return Values

int - Returns 0 for success and -1 for error.


Processor Specific Profiler Constants

UltraSPARC T1 Processor-Specific Profiler Groups

TABLE 3-3 lists the specific profiler groups for the UltraSPARC T1 processor.


TABLE 3-3 UltraSPARC T1 Processor - Specific Profiler Groups

Group

Event or Description

Description

TEJA_PROFILER_CMT_CPU (0x1)

Captures events related to CPU and caches. The events measured in this group are per strand. The following events are available for this group. The completed instructions count is always an available event for this group.One additional event that can be measured along with the instructions count.

 

TEJA_PROFILER_CMT_CPU_SB_FULL (0x1)

Measures the number of store buffer full cycles.

 

TEJA_PROFILER_CMT_CPU_FP_INSTR_CNT (0x2)

Measures the number of floating point instructions.

 

TEJA_PROFILER_CMT_CPU_IC_MISS (0x4)

Measures the number of instruction cache misses.

 

TEJA_PROFILER_CMT_CPU_DC_MISS (0x8)

Measures the number of data cache misses.

 

TEJA_PROFILER_CMT_CPU_ITLB_MISS (0x10)

Measures the number of instruction TLB miss traps taken.

 

TEJA_PROFILER_CMT_CPU_DTLB_MISS (0x20)

Measures the number of data TLB miss traps taken.

 

TEJA_PROFILER_CMT_CPU_L2_IMISS (0x40)

Measures the number of secondary cache (L2) misses due to instruction cache requests.

 

TEJA_PROFILER_CMT_CPU_L2_DMISS_LD (0x80)

Measures the number of secondary cache (L2) misses due to data cache load requests.

 

TEJA_PROFILER_CMT_CPU_INSTR_COMPLETED (0x100)

Measures the number of completed instructions.

TEJA_PROFILER_CMT_DRAM_CTL0

Captures events related to DRAM memory read, write, and queues. There are different groups for different DRAM controllers. The following events can be measured in this group:

 

TEJA_PROFILER_CMT_DRAM_MEM_READS (0x1)

Read transactions.

 

TEJA_PROFILER_CMT_DRAM_MEM_WRITES (0x2)

Write transactions.

 

TEJA_PROFILER_CMT_DRAM_MEM_READ_WRITE (0x4)

Read + write transactions.

 

TEJA_PROFILER_CMT_DRAM_BANK_BUSY_STALLS (0x8)

Bank busy stalls.

 

TEJA_PROFILER_CMT_DRAM_RD_QUEUE_LATENCY (0x10)

Read queue latency.

 

TEJA_PROFILER_CMT_DRAM_WR_QUEUE_LATENCY (0x20)

Write queue latency.

 

TEJA_PROFILER_CMT_DRAM_RW_QUEUE_LATENCY (0x40)

Read + write queue latency.

 

TEJA_PROFILER_CMT_DRAM_WR_BUF_HITS (0x80)

Write-back buffer hits.

TEJA_PROFILER_CMT_DRAM_CTL1

Measures same events as DRAM controller 0, but for DRAM controller 1.

TEJA_PROFILER_CMT_DRAM_CTL2

Measures same events as DRAM controller 0, but for DRAM controller 2.

TEJA_PROFILER_CMT_DRAM_CTL3

Measures same events as DRAM controller 0, but for DRAM controller 3.

TEJA_PROFILER_CMT_JBUS

This group captures events related to JBus read, write, and cycles. Following events can be measured for this group:

 

TEJA_PROFILER_CMT_JBUS_CYCLES (0x1)

JBus cycles.

 

TEJA_PROFILER_CMT_JBUS_DMA_READS (0x2)

DMA read transactions (inbound).

 

TEJA_PROFILER_CMT_JBUS_DMA_READ_LATENCY (0x4)

Total DMA read latency.

 

TEJA_PROFILER_CMT_JBUS_DMA_WRITES (0x8)

DMA write transactions.

 

TEJA_PROFILER_CMT_JBUS_DMA_WRITE8 (0x10)

DMA WR8 subtransactions.

 

TEJA_PROFILER_CMT_JBUS_ORDERING_WAITS (0x20)

Ordering waits.

 

TEJA_PROFILER_CMT_JBUS_PIO_READS (0x40)

PIO read transactions (outbound).

 

TEJA_PROFILER_CMT_JBUS_PIO_READ_LATENCY (0x80)

Total PIO read latency.

 

TEJA_PROFILER_CMT_JBUS_AOK_DOK_OFF_CYCLES (0x100)

AOK_OFF or DOK_OFF seen (cycles).

 

TEJA_PROFILER_CMT_JBUS_AOK_OFF_CYCLES (0x200)

AOK_OFF seen (cycles).

 

TEJA_PROFILER_CMT_JBUS_DOK_OFF_CYCLES (0x400)

DOK_OFF seen (cycles).


UltraSPARC T2 Processor-Specific Profiler Groups

TABLE 3-4 lists the Specific Profiler Groups for the UltraSPARC T2 processor:


TABLE 3-4 UltraSPARC T2 Processor - Specific Profiler Groups

Group

Event or Description

Description

TEJA_PROFILER_CMT_CPU (0x1)

Captures events related to CPU and caches. The events measured in this group are per strand. You can specify up to two independent events that can be concurrently measured. The following events are available for this group.

 

TEJA_PROFILER_CMT2_COMPLETED_BRANCHES

Number of completed branches.

 

TEJA_PROFILER_CMT2_TAKEN_BRANCHES

Number of branches taken.

 

TEJA_PROFILER_CMT2_FGU_ARITHMATIC_INSTR

Number of floating-point arithmetic instructions executed.

 

TEJA_PROFILER_CMT2_LOAD_INSTR

Number of load instructions executed.

 

TEJA_PROFILER_CMT2_STORE_INSTR

Number of store instruction executed.

 

TEJA_PROFILER_CMT2_SETHI_INSTR

Number of sethi instructions executed.

 

TEJA_PROFILER_CMT2_OTHER_INSTR

Number of all other instructions executed.

 

TEJA_PROFILER_CMT2_ATOMICS

Number of atomic operations executed.

 

TEJA_PROFILER_CMT2_ALL_INSTR

Total number of instructions executed.

 

TEJA_PROFILER_CMT2_ICACHE_MISSES

Number of instruction cache misses.

 

TEJA_PROFILER_CMT2_DCACHE_MISSES

Number of L1 data cache misses.

 

TEJA_PROFILER_CMT2_L2_INSTR_MISSES

Number of secondary cache (L2) misses due to instruction cache requests.

 

TEJA_PROFILER_CMT2_L2_LOAD_MISSES

Measures the number of secondary cache (L2) misses due to data cache load requests.

 

TEJA_PROFILER_CMT2_ITLB_REF_L2

For each ITLB miss, counts the number of accesses the ITLB hardware tablewalk makes to L2 when hardware tablewalk is enabled.

 

TEJA_PROFILER_CMT2_DTLB_REF_L2

For each DTLB miss, counts the number of accesses the DTLB hardware tablewalk makes to L2 when hardware tablewalk is enabled.

 

TEJA_PROFILER_CMT2_ITLB_MISS_L2

For each ITLB miss, counts the number of accesses the ITLB hardware tablewalk makes to L2 which misses in L2 when hardware tablewalk is enabled.

Note: Depending on the hardware tablewalk configuration, each ITLB miss might issue from 1 to 4 requests to L2 to search TSB’s.

 

TEJA_PROFILER_CMT2_DTLB_MISS_L2

For each DTLB miss, counts the number of accesses the DTLB hardware tablewalk makes to L2, which misses in L2 when hardware tablewalk is enabled.

Note: Depending on the hardware tablewalk configuration, each DTLB miss may issue from 1 to 4 requests to L2 to search TSB’s.

 

TEJA_PROFILER_CMT2_STREAM_LD_TO_PCX

Counts the number of SPU load operations to L2.

 

TEJA_PROFILER_CMT2_STREAM_ST_TO_PCX

Counts the number of SPU store operations to L2.

 

TEJA_PROFILER_CMT2_CPU_LD_TO_PCX

Counts the number of CPU loads to L2.

 

TEJA_PROFILER_CMT2_CPU_IFETCH_TO_PCX

Counts the number of I-fetches to L2.

 

TEJA_PROFILER_CMT2_CPU_ST_TO_PCX

Counts the number of CPU stores to L2.

 

TEJA_PROFILER_CMT2_MMU_LD_TO_PCX

Counts the number of MMU loads to L2.

 

TEJA_PROFILER_CMT2_DES_3DES_OP

Increments for each CWQ or ASI operation that uses DES/3DES unit.

 

TEJA_PROFILER_CMT2_AES_OP

Increments for each CWQ or ASI operation which uses AES unit.

 

TEJA_PROFILER_CMT2_RC4_OP

Increments for each CWQ or ASI operation which uses RC4.

 

TEJA_PROFILER_CMT2_MD5_SHA1_SHA256_OP

Increments for each CWQ or ASI operation which uses MD5, SHA-1, or SHA-256.

 

TEJA_PROFILER_CMT2_MA_OP

Increments for each CWQ or ASI modular arithmetic operation.

 

TEJA_PROFILER_CMT2_CRC_TCPIP_CKSUM

Increments for each iSCSI CRC or TCP/IP checksum operation.

 

TEJA_PROFILER_CMT2_DES_3DES_BUSY_CYCLE

Increments each cycle when DES/3DES unit is busy.

 

TEJA_PROFILER_CMT2_AES_BUSY_CYCLE

Number of busy cycles encountered when attempting to execute the AES operation.

 

TEJA_PROFILER_CMT2_RC4_BUSY_CYCLE

Number of busy cycles encountered when attempting to execute the RC4 operation.

 

TEJA_PROFILER_CMT2_MD5_SHA1_SHA256_BUSY_CYCLE

Number of busy cycles encountered when attempting to execute the MD5_SHA1_SHA256 operation.

 

TEJA_PROFILER_CMT2_MA_BUSY

Increments each cycle when modular arithmetic unit is busy.

 

TEJA_PROFILER_CMT2_CRC_MPA_CKSUM

Increments each cycle when CRC/MPA/checksum unit is busy.

 

TEJA_PROFILER_CMT2_ITLB_MISS

Includes all misses (successful and unsuccessful tablewalks).

 

TEJA_PROFILER_CMT2_DTLB_MISS

Includes all misses (successful and unsuccessful tablewalks).

 

TEJA_PROFILER_CMT2_TLB_MISS

Counts both ITLB and DTLB misses (successful and unsuccessful tablewalks).

TEJA_PROFILER_CMT_DRAM_CTL0

This group captures events related to DRAM memory read, write, and queues. The events that can be measured are the same as for the UltraSPARC T1 processor (see TABLE 3-3 in UltraSPARC T1 Processor-Specific Profiler Groups). There are different groups for different DRAM controllers.

TEJA_PROFILER_CMT_DRAM_CTL1

Measures same events as DRAM controller 0, but for DRAM controller 1.

TEJA_PROFILER_CMT_DRAM_CTL2

Measures same events as DRAM controller 0, but for DRAM controller 2.

TEJA_PROFILER_CMT_DRAM_CTL3

Measures same events as DRAM controller 0, but for DRAM controller 3.