Go to main content

man pages section 8: System Administration Commands

Exit Print View

Updated: Wednesday, July 27, 2022
 
 

numatop(8)

Name

numatop - A tool for memory access locality characterization and analysis.

Synopsis

numatop [-s sampling_precision
]
        [-l log_level] [-f
 log_file] [-d 
dump_file] [-h]

Description

Most modern systems use Non-Uniform Memory Access (NUMA) design for multiprocessing. In NUMA systems, memory and processors are organized in such a way that some parts of memory are closer to a given processor while other parts are farther from it. A processor can access memory that is closer to it, much faster than the memory that is farther from it. Hence, the latency between the processors and different portions of the memory in a NUMA machine may be significantly different.

numatop is an observation tool for the runtime memory locality characterization and analysis of processes and threads running on a NUMA system. It helps the user characterize the NUMA behavior of processes and threads and identify where the NUMA-related performance bottlenecks reside. The tool can be used to:

  • Characterize the locality of all running processes and threads to identify those with the poorest locality in the system.

  • Identify the 'hot' memory areas, report average memory access latency, and provide the location where accessed memory is allocated. A 'hot' memory area is where process/thread(s) accesses are most frequent. numatop has a metric called ACCESS% that specifies what percentage of memory accesses are attributable to each memory area.


    Note - numatop records only the memory accesses which have latencies greater than a predefined threshold.
  • Provide the call-chain(s) when the process/thread generates certain counter events, such as Remote Memory Access (RMA), Local Memory Access (LMA), Instruction Retired (IR), and CPU cycles (CYCLE). The call-chains help the user locate the source code that generates the events.

  • Provide per-node statistics for memory and CPU utilization. A node is a region of memory in which every byte has the same distance from each CPU.

  • Show, using a user-friendly interface, the list of processes/threads sorted by some metrics (by default, sorted by CPU utilization), with the top process having the highest CPU utilization in the system and the bottom one having the lowest CPU utilization. Users can also use hotkeys to resort the output by these metrics: RMA, LMA, RMA/LMA, CPU cycle per Instruction (CPI), and CPU Utilization (CPU%).

numatop is a GUI tool that periodically tracks and analyzes the NUMA activity of processes and threads and displays useful metrics. Users can scroll up/down by using the up or down key to navigate in the current window and can use several hot keys shown at the bottom of the window, to switch between windows or to change the running state of the tool. For example, hotkey R refreshes the data in the current window.

The tool supports the Intel Westmere-EX and Sandy Bridge-EP platforms.

Below is a detailed description of the various display windows and the data items that numatop displays:

WIN1 - Monitoring processes and threads

Get the locality characterization of all processes. This is the first window upon startup and numatop's Home window. This window displays a list processes. The top process has the highest system CPU utilization (CPU%), while the bottom process has the lowest CPU% in the system. Generally, the memory-intensive process is also CPU-intensive, so the processes shown in WIN1 are sorted by CPU% by default. The user can use hotkeys 1, 2, 3, 4, or 5 to resort the output by RMA, LMA, RMA/LMA, CPI, or CPU% respectively.


[KEY METRICS]:
RMA(K): number of Remote Memory Access (unit is 1000).
RMA(K) = RMA / 1000
LMA(K): number of Local Memory Access (unit is 1000).
LMA(K) = LMA / 1000
RMA/LMA: ratio of RMA / LMA.
CPI: CPU cycles per instruction.
CPU%: System CPU utilization (busy time across all CPUs).

[HOTKEY]:
'Q': Quit the application.
'H': WIN1 refresh.
'R': Refresh to show the latest data.
'I': Show the normalized data.
'N': Show the per-node statistics
<Enter>: Switch to WIN3 for the selected process.
'1': Sort by 'RMA'.
'2': Sort by 'LMA'.
'3': Sort by 'RMA/LMA'.
'4': Sort by 'CPI'.
'5': Sort by 'CPU%'
WIN2 - Monitoring processes and threads (normalized)

Get the normalized locality characterization of all processes.


[KEY METRICS]:
RPI(K): RMA normalized by 1000 instructions.
RPI(K) = RMA / (IR / 1000);
LPI(K): LMA normalized by 1000 instructions.
LPI(K) = LMA / (IR / 1000);
Other metrics remain the same.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
'N': Show the per-node statistics.
<Enter>: Switch to WIN3 for the selected process.
'1': Sort by 'RPI'.
'2': Sort by 'LPI'.
'3': Sort by 'RMA/LMA'.
'4': Sort by 'CPI'.
'5': Sort by 'CPU%'
WIN3 - Monitoring the process

Get the locality characterization with node affinity of a specified process.


[KEY METRICS]:
NODE: the node ID.
	CPU%: per-node CPU utilization.
	Other metrics remain the same.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
	'N': Show the per-node statistics.
'L': Show the latency information.
	'C': Show the call-chain.
	<Enter>: Switch to WIN4 for the specified process.
WIN4 - Monitoring all threads

Get the locality characterization of all threads in a specified process.


[KEY METRICS]:
CPU%: per-CPU CPU utilization
	Other metrics remain the same.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
	'N': Show the per-node statistics.
WIN5 - Monitoring the thread

Get the locality characterization with node affinity of a specified thread.


[KEY METRICS]:
CPU%: per-CPU CPU utilization.
	Other metrics remain the same.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
'N': Show the per-node statistics.
'L': Show the latency information.
'C': Show the call-chain.
WIN6 - Monitoring memory areas

Get the memory area use with the associated accessing latency of a specified process/thread.


[KEY METRICS]:
ADDR: starting address of the memory area.
SIZE: size of memory area (K/M/G bytes)
ACCESS%: percentage of memory accesses are to this memory area.
LAT(ns): the average latency (nanosecond) of memory accesses.
DESC: description of memory area (from /proc/<pid>/maps).

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
'D': Show the memory access node distribution.
'M': Recalculate the address mapping.
	<Enter>: Show break down the memory area into physical memory on node.
WIN7 - Memory access node distribution overview

Get the percentage of memory accesses originated from the process/thread to each node.


[KEY METRICS]:
NODE: the node ID.
	ACCESS%: percentage of memory accesses are to this node.
	LAT(ns): the average latency (nanoseconds) of memory accesses
         to this node.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
'M': Recalculate the address mapping.
WIN8 - Break down the memory area into physical memory on node

Break down the memory area into the physical mapping on node with the associated accessing latency of a process/thread.


[KEY METRICS]:
NODE: the node ID.
	Other metrics remain the same.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
'M': Recalculate the address mapping.
WIN9 - Call-chain when process/thread generates the specified event

Shows the call-chains when the process generates RMA, LMA, CYCLE, or IR.


[KEY METRICS]:
Call-chain list: a list of call-chains.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.
	'1': Show the call-chain for RMA
	'2': Show the call-chain for LMA
	'3': Show the call-chain for CYCLE
	'4': Show the call-chain for IR
WIN10 - Node Overview

Shows the basic per-node statistics for this system.


[KEY METRICS]:
LG: node id of this node.
MEM.ALL: total physical memory in this node.
	MEM.FREE: free physical memory in this node.
	CPU%: per-node CPU utilization
	Other metrics remain the same

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window (WIN).
'R': Refresh to show the latest data.
	<Enter>: Show the information of the specified node.
WIN11 - Information of the node

Shows the memory use and CPU utilization for the specified node.


CPU: array of logical CPUs which belong to this node.
	CPU%: per-node CPU utilization
	Other metrics remain the same.

[HOTKEY]:
'Q': Quit the application.
'H': Switch to WIN1.
'B': Back to previous window.
'R': Refresh to show the latest data.

Options

The following options are supported:

–s sampling_precision
normal

Balance the precision and overhead (default)

high

High sampling precision (high overhead)

low

Low sampling precision, suitable for high load system

–l log_level

Specifies the level of logging in the log file. The valid values are:

0

None (default)

1

Unknown (reserved)

2

All

–f log_file

Specifies the log file where output will be written.

–d dump_file

Specifies the dump file where the screen data will be written.

–h

Displays the command's usage.

Examples

Example 1 Launching numatop With Default Behavior

The following command launches the tool with default values for the supported options:

# numatop
Example 2 Launching numatop With High Sampling Precision

The following command launches the tool with high sampling precision:

# numatop -s high
Example 3 Specifying a log File

The following command sets the log file to /tmp/numatop.log and dumps all warning messages into it.

# numatop -l 2 -o /tmp/numatop.log
Example 4 Specifying a Dump File

The following command sets the dump file to /tmp/dump.log and dumps all screen data into it.

# numatop -d /tmp/dump.log

Exit Status

The following exit values are returned:

0

Successful operation.

Other Value

An error occurred.

Usage

You must have root privileges to run numatop.

Attributes

See attributes(7) for descriptions of the following attributes:

ATTRIBUTE TYPE
ATTRIBUTE VALUE
Architecture
x86
Availability
diagnostic/numatop
Interface Stability
Committed