1.1.2 Core Kernel Functionality

The following notable core kernel features are implemented in UEK R4:

  • The performance of SPECjbb is improved for a system with more than 10 CPUs by removing contention for the global epmutex lock, which is used in EPOLL_CTL_ADD and EPOLL_CTL_DEL operations. For example, in a typical 16-socket run the performance increases from 35k jOPS to 125k jOPS. Benchmarks also exhibit good scaling from 10 sockets to over 40 sockets.

  • The sysctl_numa_balancing_settle_count parameter used by the NUMA scheduler has been removed.

  • The following tracepoints are now provided to monitor NUMA scheduler activity:


    Triggered when a task is moved to a node.


    Triggered when a NUMA migration fails.


    Triggered when a task is swapped for another task.

  • The new SCHED_STACK_END_CHECK kernel debugging option can be used to check for a stack overrun on calls to schedule() on a NUMA system. If the stack end location is overwritten, the system panics as the content of the corrupted region cannot be trusted.

  • Sysbench performance has been improved by preventing spurious active NUMA migration.

  • CPU clock frequency scaling for performance management. The possible governor settings as displayed by /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor are:


    Sets the CPU clock frequency between the minimum and maximum possible frequencies, according to the current demand usage. The following sysfs parameters are adjustable:


    Whether processes with a nice value count (0) or do not count (1) toward CPU usage. The default value is 0.


    How much to reduce the target CPU frequency by as a fraction of 1000. A value of 0 disables this feature.


    A multiplier that the kernel applies to sampling_rate when the CPU is running at its maximum clock frequency. The default value is 1.


    Minimum sampling rate.


    Interval in microseconds between assessments of whether the kernel needs to change the clock frequency.


    Threshold of average CPU usage as a percentage for the kernel to increase the clock frequency.

    ondemand is the default governor setting if tuned is not configured.

    This setting is equivalent to powersave for more recent microarchitecture CPUs (for example, Haswell, Broadwell, and later) with which the pstate power scaling driver can interact. For older design architecture CPUs (for example, Ivy Bridge, Sandy Bridge, and earlier), ondemand is equivalent to performance as the cores must be kept in a higher power state to minimize CPU latency.


    Sets the CPU clock frequency to the maximum possible frequency.


    performance is the default governor setting for the tuned throughput-performance profile.

    The performance profile is appropriate for some real-time applications but it might not be appropriate for all workloads. Running a CPU at maximum frequency can prevent turbo mode from being enabled because doing so would exceed the thermal envelope.


    Sets the CPU clock frequency to the minimum possible frequency.


    Permits a user-space program running as an effective root user to control the CPU clock frequency by creating and using a file named scaling_setspeed in the CPU-device directory under sysfs.

    Oracle recommends that you use tuned-adm to select a tuned performance profile for your system that is based on its hardware and software configuration, for example:

    • If your system has Xeon processors or multiple disks, choose a profile such as latency-performance for a cloud server, throughput-performance for a database server, or virtual-host for a virtual host server.


      These profiles set the CPU governor setting to performance, which might not be appropriate for all workloads.

    • For a virtual machine guest, choose the virtual-guest profile.

    • For a laptop, choose a suitable laptop profile such as laptop-ac-powersave or laptop-battery-powersave.

    • For a desktop machine, choose either the desktop or balanced profile.

    You can use the tuned-adm list command to display the available profiles.

    If tuned is not configured, the default CPU governor setting is ondemand, which can cause some bursty, CPU-intensive workloads to run more slowly because of demand hysteresis.

    If necessary, you can create your own performance profiles based on the profiles that are provided in the /etc/tune-profiles directory hierarchy.

    When comparing system performance under different profiles, use benchmarks that simulate your server's typical workload.

    For more information, see the tuned(8) and tuned-adm(1) manual pages, which are available in the tuned package.