Go to main content

man pages section 7: Standards, Environments, Macros, Character Sets, and Miscellany

Exit Print View

Updated: Wednesday, August 8, 2018
 
 

solaris-kz(7)

Name

solaris-kz - solaris kernel zone

Description

The solaris-kz brand uses the branded zones framework described in brands(7) to run zones with a separate kernel and OS installation from that used by the global zone.

Installation and Update

A solaris-kz installation is independent of that of the global zone; it is not a pkg linked image and can be modified regardless of the global zone content. A solaris-kz zone can be installed in the same manner as other brands directly from the global zone, or via a boot media as described below.

When specifying a manifest for installation, the manifest used should be the one suitable for a global zone installation. As kernel zones always install into a known location for the root pool, an installation target disk should not be specified.

Boot environment (BE) management is independent of the global zone. BE creation in the global zone does not create a new BE in the zone. For more information, see the beadm(8) man page.

Process Management and Visibility

As, unlike other brands, a solaris-kz zone runs a separate kernel, some differences are apparent when examining the zone from the global zone.

Processes that are running in a solaris-kz zone are not directly accessible by the global zone. For example, to see the list of processes in a kernel zone named kz-zone, rather than using the ps command with –z kz-zone options, you need to use the following command:

# zlogin kz-zone ps -e

The global zone and each kernel zone manage their own process ID space. Thus, the process 1234 may exist in the global zone and one or more kernel zones. Those are unique processes. If the global zone administrator wish to kill process 1234 in kz-zone, it should be done with the following command or an equivalent:

# zlogin kz-zone kill 1234

ps(1) and similar tools run from the global zone will see processes associated with managing a solaris-kz zone instance, such as kzhost and zlogin-kz. This can be useful for debugging, but otherwise they are private implementation details.

Similarly, resource management functionality is different. For example, resource controls such as max-processes are not available when configuring a solaris-kz zone, as they are only meaningful when sharing a single kernel instance. That is, a process running inside a solaris-kz zone cannot take up a process table slot in the global zone, as the kernels are independent.

The zonestat utility displays the resource usage of the zone. The output is generally correct, but may reflect the host values. For example, the resource control values such as lwps show the lwps used on the host, not the ones used inside the zone.

The solaris-kz brand uses certain hardware features which may not be available in older systems, or in virtualized environments. To detect whether a system supports the solaris-kz brand, install the brand-solaris-kz package and then run the virtinfo command.

# virtinfo -c supported list kernel-zone

If kernel-zone is not shown in the supported list, you can see syslog for more information. Messages pertaining to kernel zones will contain the string kernel-zone.

Stolen time as reported by the mpstat(8), iostat(8), vmstat(8) and other utilities directly reflects the time when the kernel zone could not run as the host might be using CPU resources for other purposes.

Storage Access

A solaris-kz brand zone must reside on one or more devices. A default zfs(8) volume will be created in the global zone's root zpool if the configuration is not customized prior to installation. The device onto which the zone is installed is specified with device resources that have the bootpri property set to any positive integer value. If a device will not be used as a boot device, it must not have the bootpri property set. To unset bootpri, use clear bootpri while in the device resource scope. If multiple bootable devices are present during installation, the devices will be used for a mirrored root ZFS pool in the zone. The bootpri property specifies a relative order with respect to other bootpri entries. The default boot order is determined by sorting the entries first by bootpri where smaller entries are ordered before larger entries, then by id if multiple devices have the same bootpri. For example,

# zonecfg -z vzl-129 info device
device:
storage: dev:dsk/c0t0d0s0
id: 0
bootpri: 2
device:
storage: dev:dsk/c0t1d0s0
id: 1
bootpri: 2
device:
storage: dev:dsk/c0t2d0s0
id: 2
bootpri: 1

In the above example, the boot order is: C, A, B.

The zonepath cannot be set for a kernel zone. As an implementation detail, it is set to a fixed location using tmpfs(4FS). It contains no persistent or otherwise user-serviceable data. As the zone root is contained with the root ZFS volume, it is not mounted in the global zone under the zone path, unlike traditional zones. Access to the zone root can only be done through the zone itself, for example zlogin.

A solaris-kz zone cannot directly take advantage of shared-kernel features such as ZFS datasets and file system mounts. Instead, storage is made available to the zone via block devices such as raw disks, ZFS volumes, and lofi devices.

A solaris-kz zone's root is always accessible. Storage can be added by using add device in zonecfg. Only storage property can be set. The match property is not supported. Storage URI is used to configure disk device. For more information, see the suri(7) man page.

A local device path storage URI can be set to storage property. It can be either a ZFS volume, a raw disk, or a lofi device. The specified device must be a whole disk or LUN. You can use the device path without any partition/slice suffix, for example:

# zonecfg -z myzone
zonecfg:myzone> add device
zonecfg:myzone:device> set storage=dev:/dev/rdsk/c4t9d0
zonecfg:myzone:device> set id=4
zonecfg:myzone:device> set bootpri=1

The id can be specified to fix the disk address inside the zone. If not given, it is automatically allocated.

A portable storage URI can also be configured in storage property to make the zone's configuration portable to other host systems.

For example:

# zonecfg -z myzone
zonecfg:myzone> add device
zonecfg:myzone:device> set storage=nfs://user1:staff@host1/export/file1
zonecfg:myzone:device> set create-size=4g

To see information about the current configuration for device resources, use the info subcommand. For example:

# zonecfg -z myzone info device
device:
    storage: dev:/dev/zvol/dsk/rpool/VARSHARE/zones/myzone/disk0
    id: 0
    bootpri: 0
device:
    storage: nfs://user1:staff@host1/export/file1
    create-size: 4g
    id: 1
    bootpri not specified

You can also shorten the output by specifying the id:

# zonecfg -z myzone info device id=1
device:
    storage: nfs://user1:staff@host1/export/file1
    create-size: 4g
    id: 1
    bootpri not specified

To install a zone to a non-default location, to an iSCSI logical unit, for example, the device resource for the root disk must be modified. For example:

# zonecfg -z myzone
zonecfg:myzone> select device id=0
zonecfg:myzone:device> set storage=iscsi://host/luname.naa.0000abcd

At least one device must have bootpri set to a positive integer to indicate that it is bootable. Within a kernel zone, all devices that act as mirrors or spares for the root ZFS pool must be bootable. If the URI of a device cannot be mapped during boot time, for example, device is missing or iSCSI storage goes offline, booting such zone will fail.

Only storage devices are supported by add device for the solaris-kz brand.

SCSI Reservations

Setting the allow-mhd property to "true" allows applications to use the mhd(4I) SCSI reservation ioctls on the given device. This is possible only if the backend SCSI device supports reservation. Setting this property has the following impact on the zone:

  • Live migration and suspend/resume of the zone is disabled.

  • Live Zone Reconfiguration is disallowed for such devices.

  • The device cannot be shared at the same time by any other running zone on the host.

Network Access

Kernel zones must be exclusive stack. Network access is provided by adding net or anet resources for Ethernet datalinks and by adding anet resources for IPoIB datalinks. The datalink specified by these resources will be used as the backend of the datalinks visible in the zone. Both IPoIB and Ethernet network resources can be specified, and the datalinks visible in the zone will be of the corresponding media type. As with storage devices, an ID may be specified to identify the virtual NIC address inside the zone. Adding InfiniBand network links through net resources is not supported.

Kernel zones may themselves host zones (in which case they play the role of the global zone for those zones). The network access to the hosted zones are provided over the Ethernet datalinks only and not over the IPoIB datalinks. However, because the networking configuration of the kernel zone is partially defined by its zone configuration, hosted zones are restricted in which MAC addresses may be used.

Attempting to boot a zone with mac-address settings of random is permitted for the following cases:

  1. If anet is configured with allowed-mac-address as any.

  2. If anet is configured with allowed-mac-address as 2:8:20, where 2:8:20 is the default OUI for VNICs.

Attempting to boot a zone with mac-address settings of specific MAC address is permitted if the user specified mac address matches any of the allowed-mac-address list. For more information about the allowed-mac-address anet property, see the zonecfg(8) man page.

To supply additional MAC addresses to a kernel zone, add them to the mac-address property for the relevant resource. Form ore information, see the zonecfg(8) man page. This will make that mac-address available as a factory address inside the kernel zone.

A hosted zone may then use that MAC address itself. To do this, configure the mac-address property of the hosted zone to be either the explicit MAC address configured (use mac-address property), or specify auto. For details of these settings, see the zonecfg(8) man page.

Memory Configuration

A fixed amount of host RAM must be allocated to a kernel zone. This is configured by the physical property of the capped-memory resource in zonecfg(8). The given value may be rounded up to a supported platform value. The allocated memory is locked, and hence not pageable to a swap device.

When specifying physical property you also need to specify the pagesize-policy property of capped-memory resource in zonecfg(8). The pagesize-policy property is used to specify a policy for solaris-kz brand to use large page(s) for its physical memory. The pagesize-policy property can only be used in conjunction with physical property. Following are the acceptable keywords for pagesize-policy property:

largest-only

Only the largest possible page size for the Kernel Zone's physical memory is allocated. If you fail to assign all the pages, then you fail to boot the zone.

largest-available

You can attempt to provide the largest possible page size, scaling down the page size if one cannot allocate all physical memory with a particular page size. The priority is to boot the zone.

smallest-only

Lowest allowable page size required to boot the Kernel Zone for the particular platform is chosen.

Clearing the pagesize-policy property, or not present, is necessary for supporting older suspend image format. It allows live migration and resume of KZ from new systems to older systems. Lowest allowable page size required to boot the Kernel Zone for the particular platform is chosen.

CPU Configuration

As described in zonecfg(8), the virtual CPU and dedicated CPU resources can be used to define the CPUs available to the kernel zone. Typically, dedicated CPU is used to isolate a CPU resources for the sole use of the kernel zone, while virtual CPU is used when sharing CPU to provide finer-grained control over CPU resources available in the kernel zone.

CPU configuration behaves differently depending on how the virtual CPU and dedicated CPU resources are configured.

Case where neither the virtual CPU nor the dedicated vCPU resources are specified:

The kernel zone gets four virtual CPU threads that competes for compute time with all other application threads on the physical host system.

Case where the virtual CPU resource is specified but not the dedicated vCPU resource:

For a virtual-cpu:ncpus value of N, the kernel zone gets N virtual CPU threads that compete for or share the compute time with all the other application threads on the physical host system.

Case where the dedicated CPU resource is specified but not the virtual vCPU resource:

The kernel zone gets a number of virtual CPU threads equal to the equivalent number of CPUs in the dedicated CPU resource, which might be in terms of CPUs, cores, sockets, etc. The virtual CPU threads will have sole use of the dedicated compute resource, and not share it with other application threads running on the same physical host.

Case where both the virtual CPU and the dedicated vCPU resources are specified:

This is not a common usage case. It allows you to override the number of virtual CPU threads so that it does not exactly match the equivalent number of dedicated vCPU resources. However many virtual CPUs are specified, they will still have sole use of the dedicated CPU resources. The virtual CPU count may not exceed the dedicated CPU count.

Using a range for dedicated-cpu resource is not recommended. The number of virtual CPUs created for a kernel zone is fixed at the time the kernel zone is booted. For a dedicated-cpu zone with an ncpu range, the number of CPUs can be anywhere in the range. If or when more CPUs are automatically added to the zone's pset, the kernel zone will be unable to use the CPUs causing them to sit idle. When CPUs are automatically removed from the zone's pset, the guest can become severely overcommitted, that is, more virtual CPUs than CPUs, resulting in poor performance.

Suspend, Resume, and Warm Migration

Kernel zones may be suspended to disk by the zoneadm suspend command. The running state of the zone is written to the disk. As this includes the entire RAM used by the zone, this can take a significant amount of time and space.

Suspend and resume are supported for a kernel zone only if it has a suspend resource in its configuration. Within a suspend resource, the path or storage (but not both) must be specified. The path property specifies the name of a file that will contain the suspend image. The directory containing the file must exist and be writable by the root user. Any file system that is mounted prior to the start of svc:/system/zones:default may be used. The storage property specifies the storage URI (see suri(7)) of a disk device that will contain the suspend image. The whole device will be used. This device may not be shared with anything else.

The suspend image is compressed prior to writing. As such, the size of the suspend image will typically be significantly smaller than the size of the zone's RAM. During suspend, a message is printed and logged to the console log indicating the size of the suspend image.

After compression, the suspend image is encrypted using AES-128-CCM. The encryption key is automatically generated by /dev/random (see random(4D) man page) and is stored in the keysource resource's raw property.

If a zone is suspended, the zoneadm boot command will resume it. The boot –R option can be used to boot afresh if a resume is not desired.

If the suspend image and the rest of the zone's storage is accessible by multiple hosts (typically by using suspend:storage and device:storage properties), the suspend image can be used to support warm migration by using the zoneadm migrate command in zoneadm(8), but using zoneadm suspend instead of zoneadm shutdown before the migration. This will avoid any zone startup cost on the destination host, excluding the time spent to resume.

Warm migration does not check for compatibility between the source destination hosts.

The source and the destination host must be the same platform. On x86, the vendor (AMD/Intel) as well as the CPU model name must match. On SPARC, the hardware platform must be the same. For example, you cannot warm migrate from a T4 host to a T5 host. If you want to migrate between different hardware platforms, you need to specify migration class in cpu-arch property appropriately.

The possible migration classes on SPARC platforms are:

generic

kernel zone can perform a CPU-type independent migration, but not to a system older than T4.

migration-class1

kernel zone can perform cross-CPU type migration between SPARC T4, SPARC T5, SPARC M5, SPARC M6, SPARC M7, SPARC T7, and SPARC S7.

sparc64-class1

kernel zone can perform cross-CPU type migration between Fujitsu M10 and Fujitsu SPARC M12.

If no value is set, the kernel zone's CPU migration class is the same as the host. It can migrate between CPU types that is compatible of host CPU class.

Note that the kernel zone's CPU migration class cannot exceed the limit of host's CPU class.

Also note that performance counters are not available when cpu-arch is set to a migration class.

The possible migration classes on Intel platforms are:

migration-class1

kernel zone can perform cross-CPU type migration between CPUs of Nehalem or later micro architectures. Features supported by this class are: sse, sse2, sse3, sse4.1, sse4.2, ssse, cx8, cx16, pdcm, popcnt, fpu, pse, pse36, tsc, tscp, msr, pae, mce, sep, pge, cmov, clfsh, mmx, fxsr, htt, ss, ahf64, sysc, nx-bit, long-mode.

migration-class2

kernel zone can perform cross-CPU type migration between CPUs of Westmere or later micro architectures. Features supported by this class are: all features supported by migration-class1 and pclmulqdq, aes, 1g-page.

migration-class3

kernel zone can perform cross-CPU type migration between CPUs of Sandy Bridge or later micro architectures. Features supported by this class are: all features supported by migration-class2 and xsave, avx.

migration-class4

kernel zone can perform cross-CPU type migration between CPUs of Ivy Bridge or later micro architectures. Features supported by this class are: all features supported by migration-class3 and f16c, rdrand, efs.

migration-class5

kernel zone can perform cross-CPU type migration between CPUs of Haswell or later micro architectures. Features supported by this class are: all features supported by migration-class4 and fma, movbe, bmi1, bmi2, avx2, lzcnt.

migration-class6

kernel zone can perform cross-CPU type migration between CPUs of Broadwell or later micro architectures. Features supported by this class are: all features supported by migration-class5 and rdseed, adx, prfchw.

Note that performance counters are not available when cpu-arch is set to a migration class. Only the strand or hyperthread specific CPU performance counters are available. This means that some commands, such as busstat and daxstat, which reference other kinds of counters may not work in kernel zones.

There are no migration classes applicable to AMD CPUs.

If no value is set, the kernel zone can migrate between CPUs of the same micro architecture or exact same type, if micro architecture cannot be recognized.

Also, besides migration class you may need to specify host compatibility level in host-compatible property to make sure the hardware features supported by the version of Oracle Solaris running on source and target host match.

    On resume, the current configuration of the zone is used to boot and to allow specifying a new configuration. However, there are restrictions, as the resuming zone is expecting a particular setup. Any incompatibilities will cause boot to fail. For example, the boot process might fail if:

  • The CPU supports different features (for example, see cpuid(4D))

  • The configuration has a different capped-memory value

  • The configuration defines different number of virtual CPUs

  • A disk is missing (no device resource with a suitable id property)

  • A virtual NIC is missing (no net or anet resource with a suitable id property)

No specific check for storage identification is done. Note that it is the administrator's responsibility to ensure that the device listed under a particular ID is the one that the zone is expecting to see.

Live and Cold Migration

Kernel zones can be cold or live migrated to compatible hosts by using the zoneadm migrate command, as described in the zoneadm(8) man page.

For live and cold migration, the following services and packages must be configured:

  • The package pkg://system/management/rad/module/rad-zonemgr must be installed on both the target and the source system.

  • The instances svc:/system/rad:local or svc:/system/rad:remote must be enabled depending on the RAD URI for use by the zoneadm migrate command.

  • The instance svc:/system/rad:local must be enabled on the source system.

Specifically for live migration alone, the service svc:/network/kz-migr:stream must be enabled on the destination system.

Live migration has the same compatibility restrictions as described in the Suspend, Resume, and Warm Migration section above.

Auxiliary State

The following auxiliary states (as shown by zoneadm list -is) are defined for this brand:

suspended

The zone has been suspended and will resume on next boot. Note that the zone must be attached before this state is visible.

debugging

The zone is in running state, but the kernel debugger is running within the zone and therefore cannot service network requests etc. Connect to the zone console to interact with the debugger (kmdb).

panicked

The zone is in running state, but the zone has panicked and the host is not affected.

migrating-out

The zone is fully running, but is being live migrated to another host.

migrating-in

The zone is booted on the host, and is receiving the live migration image, so is not yet fully running until migration is complete.

no-config

The zone is known to the system, but its configuration is missing. State of the zone is always incomplete.

Host Data

Each of a kernel zone's bootable devices contains state information known as host data. This data keeps track of where a zone is in use, if it is suspended, and other state information. Host data is encrypted and authenticated with AES-128-CCM, using the same encryption key used for the suspend image.

As a kernel zone is readied or booted, the host data is read to determine if the kernel zone's boot storage is in use on another system. If it is in use by another system, the kernel zone will enter the unavailable state and an error message will indicate which system is using it. If it is certain that the storage is not in use on the other system, the kernel zone can be repaired by using the -x force-takeover extended option to zoneadm attach. See the warning below before executing this command.

If the encryption key is inaccessible, the host data and any suspend image will not be readable. In such a circumstance, any attempt to ready or boot the zone will cause the zone to enter the unavailable state. If recovery of the encryption key is not possible, the -x initialize-hostdata extended option to the zoneadm attach subcommand can be used to generate a new encryption key and host data. See the warning below before executing this command.


Note -  WARNING: Forcing a take over or reinitialization of host data will make it impossible to detect if the zone is in use on any other system. Running multiple instances of a zone that reference the same storage will lead to unrepairable corruption of the zone's file systems.

To prevent loss of the encryption key during a manual warm or cold migration, use zonecfg export on the source system to generate a command file to be used on the destination system. For example:

root@host1# zonecfg -z myzone export -f /net/.../myzone.cfg
root@host2# zonecfg -z myzone -f /net/.../myzone.cfg

Because myzone.cfg in this example contains the encryption key, it is important to protect its contents from disclosure.

Configuration

A solaris-kz brand zone can be configured by using the SYSsolaris-kz template.

The following zonecfg(8) resources and properties are not supported for this brand:

anet:address
capped-memory:locked
capped-memory:swap
dataset
device:allow-partition
device:allow-raw-io
fs
file-mac-profile
fs-allowed
ip-type
limitpriv
global-time
max-lwps
max-msg-ids
max-processes
max-sem-ids
max-shm-memory
rctl:zone.max-lofi
rctl:zone.max-swap
rctl:zone.max-locked-memory
rctl:zone.max-shm-memory
rctl:zone.max-shm-ids
rctl:zone.max-sem-ids
rctl:zone.max-msg-ids
rctl:zone.max-processes
rctl:zone.max-lwps
rootzpool
zpool

The following zonecfg(8) resources and properties are supported by the live zone reconfiguration for this brand:

anet (with exceptions stated below)
device
ib-vhca
ib-vhca:port
net (with exceptions stated below)
virtual-cpu

The following zonecfg(8) resources and properties are not supported by the live zone reconfiguration for this brand:

anet:allowed-address
anet:configure-allowed-address
anet:defrouter
anet:evs
anet:vport
capped-cpu (zone.cpu-cap)
capped-memory
cpu-shares (zone.cpu-shares)
dedicated-cpu
hostid
keysource
net:allowed-address
net:configure-allowed-address
net:defrouter
pool
rctl
scheduling-class
cpu-arch
tenant
host-compatible

Any changes made to the listed unsupported resources and properties in the persistent configuration will be ignored by the live zone reconfiguration if they are applied to the running zone.

Any attempts to modify listed unsupported resources and properties in the live configuration will be refused.

Live reconfiguration of virtual-cpu is enabled for a kernel zone with virtual NUMA topology until its first suspend. After resuming virtual-cpu, live reconfiguration is disabled for such a kernel zone until its next reboot. Kernel zones without virtual NUMA topology are not affected by this limitation.

Changes made to anet and net properties supported for solaris-kz brand should be for the same media type.

There are specific defaults for properties supported for solaris-kz brand as listed below:

Resource        Property                    Default Value
global          zonepath                    /system/zones/%{zonename}
                autoboot                    false
                ip-type                     exclusive
                auto-shutdown               shutdown
net             configure-allowed-address   true
anet            mac-address                 auto
                lower-link                  auto
                link-protection             mac-nospoof
                linkmode                    cm
anet:mac        mac-address                 auto
ib-vhca         smi-enabled                 off
ib-vhca:port    pkey                        auto

Sub Commands

For the list of solaris-kz brand-specific subcommand options, see zoneadm(8).

Examples

Example 1 Boot from a particular BE
# zoneadm -z myzone boot -- -Z rpool/ROOT/solaris
Example 2 Boot from an alternate boot device
# zoneadm -z myzone halt
# zoneadm -z myzone boot -- disk2

See Also

ai_manifest(5), archiveadm(8) brands(7), zfs(8), zlogin(1), zoneadm(8), zonecfg(8), zones(7)

Notes

VirtualBox can be used on the same host as kernel zones, but must be configured appropriately. See the VirtualBox documentation for more details. Since kernel zones are running in a separate Oracle Solaris kernel environment they may possibly crash and dump the same core that a kernel in a global zone running on metal would. In such a case the dump is saved in the kernel zone storage and found in the same place as any Oracle Solaris crash dump would be found, subject to the crash dump parameters as configured by dumpadm(8). Kernel zones also have the ability to have a core dump generated from the host environment using the zoneadm savecore subcommand. Additionally, if a kernel zone does crash and attempts to dump a core image but is unable to successfully save a core in the kernel zone's storage it will request the host to attempt to save a core image as if a zoneadm savecore subcommand had been issued. The core will be saved in a location specified by coreadm(8), this will only succeed if coreadm(8) has configured a location for and enabled kernel zone core dumps.