1.1.7 NUMA

Many modern multiprocessors have non-uniform memory access (NUMA) memory designs, where the performance of a process can depend on whether the memory range being accessed is attached to the local CPU or to another CPU. As performance is different depending on memory locality, the operating system should ideally schedule a process to run on the CPU whose memory controller is connected to the memory to be accessed.

The following notable NUMA features are implemented in UEK R4:

  • Support NUMA affinity for unbound workqueues.

  • A new NUMA subsystem provides improved performance for NUMA systems. New NUMA policies attempt to place a process near its memory, can share pages between processes and handle transparent huge pages. (3.8, 3.13)

    The following sysctl parameters allow you to enable, disable and tune NUMA scheduling:


    Scan delay in milliseconds used for starting a task when it initially forks.


    Maximum delay in milliseconds between scanning for tasks.


    Minimum delay in milliseconds between scanning for tasks.


    Resets the scan delay period.


    Amount of pages in megabytes scanned per scan.

    For more information, see http://lwn.net/Articles/568870/.

  • Add the numa_balancing sysctl parameter to enable or disable automatic NUMA memory balancing.

  • Improved algorithm for NUMA migrations that maximizes the performance of workloads that do not fit on one NUMA node.

  • Memory zones are allocated by the page allocator in node order on 64-bit NUMA systems by default.