C H A P T E R  3

Profiler API

This chapter describes the components and functions of the Nera DPS Profiler API. Topics include:


Profiler API Configuration

You can set two properties for a process in the software architecture. These properties are configured per process and applied to all threads of that process.


TABLE 3-1 Process Properties

Property

Description

profiler_log_table_size

Sets the total number of profile records in the log. The default value is 1024.

profiler_user_data_size

Represents the maximum number of user-data in 64-bit words that user wants to log along with the profile record. The default value is 0.



Profiler API

Profiler API Data Types


TABLE 3-2 Profiler API Data Types

Data Type

Description

teja_profiler_group_t;

Represents a group of events. For example, events regarding instructions and cache hit or miss in one group, while memory related events can be in another group. Groups are target-specific and available to you in preprocessor define forms.

teja_profiler_event_t;

Represents what needs to be measured in a specific group. Group and event combinations make an unique event. Each bit in the 64-bit value represents a different event so more that one event can be specified using an event mask.

teja_profiler_value_t;

Type for the value of the event. This is the type for the actual value that is being measured.

TEJA_PROFILER_MAX_EVENTS

Max number of events that can be measured per group. This value is target-dependent.

teja_profiler_values_t;

Type for the values of the events. The events array contains the values of the events in the same group. For example:

typedef struct teja_profiler_values_t
  uint64_t events [TEJA_PROFILER_MAX_EVENTS];

Profiler API Functions

teja_profiler_start

Description

Starts collecting profile data for the specified events in the specified group. More than one event can be specified as a bit mask. Only one group is allowed. If you want to start profiling more than one group, you must invoke the same function multiple times.

Function

int teja_profiler_start(const teja_profiler_group_t group,
const teja_profiler_event_t
event);

Parameters

group - ID of the group for to start collecting profiler data.

event - Events of the group as a bit mask.

Return Values

int - 0 for success and -1 for error.

teja_profiler_stop

Description

Stops collecting profile data for all events in the specified group. This function has empty implementation on some targets.

Function

int teja_profiler_stop(const teja_profiler_group_t group);

Parameters

group - ID of the group to stop collecting profiler data.

Return Values

int - 0 for success and -1 for error.

teja_profiler_update

Description

Takes a snapshot of the current profiling data and saves the snapshot in the log. All the events that were specified for the group with the teja_profiler_start are updated. User-defined data that needs to be logged with the profiler log entry can be specified using variable arguments. The maximum number of arguments is specified in the software architecture using the process property.

Function

int teja_profiler_update(const teja_profiler_group_t group, ...);

Parameters

group - ID of the group for which you want to update profile data.

... - List of channels from which to read. The list must be NULL terminated.

Return Values

int - 0 for success and -1 for error.

teja_profiler_get_values

Description

Takes a snapshot of the current profiling data and returns it in the values parameter. All the events that were specified for the group with teja_profiler_start is returned.

Function

int teja_profiler_get_values(const teja_profiler_group_t group,
teja_profiler_values_t *
values);

Parameters

group - ID of the group which you want to get the profiler data.

values - User-allocated data structure that will be filled with the profiler data.

Return Values

int - Returns overflow information or -1 for error

teja_profiler_get_value

Description

Retrieves the value of a given event from a teja_profiler_values_t data structure.

Function

teja_profiler_value_t teja_profiler_get_value(teja_profiler_values_t *values, int index);

Parameters

values - Data structure that was filled by teja_profiler_get_values

index - Index of the event to read (sequential number from 0 up to the maximum number of events specifiable in a group)

Return Values

teja_profiler_value_t - Returns the value of the given event.

teja_profiler_dump

Description

Dumps the profile data in stdout. The profiler data represents the profiler records that are collected so far for the thread identifier.

Function

int teja_profiler_dump(teja_thread_t thread);

Parameters

thread - Thread identifier for which the profiler dump is requested.

Return Values

int - Returns 0 for success and -1 for error.


CMT- Specific Profiler Constants

CMT- Specific Profiler Groups


TABLE 3-3 CMT_Specific Profiler Groups

Event or Group

Event or Description

Description

TEJA_PROFILER_CMT_CPU (0x1)

Captures events related to CPU and cache. The events measured in this group are per CPU strand. The following events are available for this group. The completed instructions count is always an available event for this group. There is additionally one more event that can be measured along with instructions count

 

TEJA_PROFILER_CMT_CPU_SB_FULL (0x1)

Measures number of store buffer full cycles.

 

TEJA_PROFILER_CMT_CPU_FP_INSTR_CNT (0x2)

Measures number of floating point instructions.

 

TEJA_PROFILER_CMT_CPU_IC_MISS (0x4)

Measures number of instruction cache misses.

 

TEJA_PROFILER_CMT_CPU_DC_MISS (0x8)

Measures number of data cache misses.

 

TEJA_PROFILER_CMT_CPU_ITLB_MISS (0x10)

Measures number of instruction TLB miss traps taken.

 

TEJA_PROFILER_CMT_CPU_DTLB_MISS (0x20)

Measures number of data TLB miss traps taken.

 

TEJA_PROFILER_CMT_CPU_L2_IMISS (0x40)

Measures number of secondary cache (L2) misses due to instruction cache requests.

 

TEJA_PROFILER_CMT_CPU_L2_DMISS_LD (0x80)

Measures number of secondary cache (L2) misses due to data cache load requests.

 

TEJA_PROFILER_CMT_CPU_INSTR_COMPLETED (0x100)

Measures number of completed instructions.

TEJA_PROFILER_CMT_DRAM_CTL0

This group captures events related to DRAM memory read, write, and queues. There are different groups for different DRAM controllers. The following events can be measured in this group.

 

TEJA_PROFILER_CMT_DRAM_MEM_READS (0x1)

Read transactions.

 

TEJA_PROFILER_CMT_DRAM_MEM_WRITES (0x2)

Write transactions.

 

TEJA_PROFILER_CMT_DRAM_MEM_READ_WRITE (0x4)

Read + write transactions.

 

TEJA_PROFILER_CMT_DRAM_BANK_BUSY_STALLS (0x8)

Bank busy stalls.

 

TEJA_PROFILER_CMT_DRAM_RD_QUEUE_LATENCY (0x10)

Read queue latency.

 

TEJA_PROFILER_CMT_DRAM_WR_QUEUE_LATENCY (0x20)

Write queue latency.

 

TEJA_PROFILER_CMT_DRAM_RW_QUEUE_LATENCY (0x40)

Read + write queue latency.

 

TEJA_PROFILER_CMT_DRAM_WR_BUF_HITS (0x80)

Write-back buffer hits.

TEJA_PROFILER_CMT_DRAM_CTL1

Measures same events as DRAM controller 0, but for DRAM controller 1.

TEJA_PROFILER_CMT_DRAM_CTL2

Measures same events as DRAM controller 0, but for DRAM controller 2.

TEJA_PROFILER_CMT_DRAM_CTL3

Measures same events as DRAM controller 0, but for DRAM controller 3.

TEJA_PROFILER_CMT_JBUS

This group captures events related to JBus read, write, and cycles. Following events can be measured for this group.

 

TEJA_PROFILER_CMT_JBUS_CYCLES (0x1)

JBus cycles

 

TEJA_PROFILER_CMT_JBUS_DMA_READS (0x2)

DMA read transactions (Inbound)

 

TEJA_PROFILER_CMT_JBUS_DMA_READ_LATENCY (0x4)

Total DMA read latency

 

TEJA_PROFILER_CMT_JBUS_DMA_WRITES (0x8)

DMA write transactions

 

TEJA_PROFILER_CMT_JBUS_DMA_WRITE8 (0x10)

DMA WR8 subtransactions

 

TEJA_PROFILER_CMT_JBUS_ORDERING_WAITS (0x20)

Ordering waits

 

TEJA_PROFILER_CMT_JBUS_PIO_READS (0x40)

PIO read transactions (outbound)

 

TEJA_PROFILER_CMT_JBUS_PIO_READ_LATENCY (0x80)

Total PIO read latency

 

TEJA_PROFILER_CMT_JBUS_AOK_DOK_OFF_CYCLES (0x100)

AOK_OFF or DOK_OFF seen (cycles)

 

TEJA_PROFILER_CMT_JBUS_AOK_OFF_CYCLES (0x200)

AOK_OFF seen (cycles)

 

TEJA_PROFILER_CMT_JBUS_DOK_OFF_CYCLES (0x400)

DOK_OFF seen (cycles)