This chapter provides a brief overview of Solaris zones technology and discusses potential problems that may be encountered by developers who are writing resource management applications. For more information on zones, see Part II, Zones, in System Administration Guide: Solaris Containers-Resource Management and Solaris Zones.
A zone is a virtualized operating system environment that is created within a single instance of the Solaris Operating System. Zones are a partitioning technology that provides an isolated, secure environment for applications. When you create a zone, you produce an application execution environment in which processes are isolated from the rest of the system. This isolation prevents a process that is running in one zone from monitoring or affecting processes that are running in other zones. Even a process running with superuser credentials cannot view or affect activity in other zones. A zone also provides an abstract layer that separates applications from the physical attributes of the machine on which the zone is deployed. Examples of these attributes include physical device paths and network interface names.
By default, all systems have a global zone. The global zone has a global view of the Solaris environment in similar fashion to the superuser model. All other zones are referred to as non-global zones. A non-global zone is analogous to an unprivileged user in the superuser model. Processes in non-global zones can control only the processes and files within that zone. Typically, system administration work is mainly performed in the global zone. In rare cases where a system administrator needs to be isolated, privileged applications can be used in a non-global zone. In general, though, resource management activities take place in the global zone.
IP networking in a zone can be configured in two different ways, depending on whether the non-global zone is given its own exclusive IP instance or shares the IP layer configuration and state with the global zone. The shared-IP type is the default.
Exclusive-IP zones are assigned zero or more network interface names, and for those network interfaces they can send and receive any packets, snoop, and change the IP configuration, including IP addresses and the routing table. Note that those changes do not affect any of the other IP instances on the system.
All applications are fully functional in the global zone as they would be in a conventional Solaris environment. Most applications should run without problem in a non-global environment as long as the application does not need any privileges. If an application does require privileges, then the developer needs to take a close look at which privileges are needed and how a particular privilege is used. If a privilege is required, then a system administrator must assign the appropriate privilege to the application.
The known situations that a developer needs to investigate are as follows:
The ioctl(2) system call requires the PRIV_SYS_NET_CONFIG privilege to be able to unlock an anchor on a STREAMS module. .
The link(2) and unlink(2) system calls require the PRIV_SYS_LINKDIR privilege to create a link or unlink a directory in a non-global zone. Applications that install or configure software or that create temporary directories could be affected by this limitation.
The PRIV_PROC_LOCK_MEMORY privilege is required for the mlock(3C), munlock(3C), mlockall(3C), munlockall(3C), and plock(3C) functions and the MC_LOCK, MC_LOCKAS, MC_UNLOCK, and MC_UNLOCKAS flags for the memcntl(2) system. This privilege is a default privilege in a non-global zone. See Privileges in a Non-Global Zone in System Administration Guide: Solaris Containers-Resource Management and Solaris Zones for more information.
The mknod(2) system call requires the PRIV_SYS_DEVICES privilege to create a block (S_IFBLK) or character (S_IFCHAR) special file. This limitation affects applications that need to create device nodes on the fly.
The IPC_SET flag in the msgctl(2) system call requires the PRIV_SYS_IPC_CONFIG privilege to increase the number of message queue bytes. This limitation affects any applications that need to resize the message queue dynamically.
The nice(2) system call requires the PRIV_PROC_PRIOCNTL privilege to change the priority of a process. This privilege is available by default in a non-global zone. Another way to change the priority is to bind the non-global zone in which the application is running to a resource pool, although scheduling processes in that zone is ultimately decided by the Fair Share Scheduler.
The P_ONLINE, P_OFFLINE, P_NOINTR, P_FAULTED, P_SPARE, and PZ-FORCED flags in the p_online(2) system call require the PRIV_SYS_RES_CONFIG privilege to return or change process operational status. This limitation affects applications that need to enable or disable CPUs.
The PC_SETPARMS and PC_SETXPARMS flags in the priocntl(2)system call requires the PRIV_PROC_PRIOCNTL privilege to change the scheduling parameters of a lightweight process (LWP).
System calls that need to manage processor sets (psets), including binding LWPs to psets and setting pset attributes require the PRIV_SYS_RES_CONFIG privilege. This limitation affects the following system calls: pset_assign(2), pset_bind(2), pset_create(2), pset_destroy(2), and pset_setattr(2).
The SHM_LOCK and SHM_UNLOCK flags in the shmctl(2) system call require the PRIV_PROC_LOCK_MEMORY privilege to share memory control operations. If the application is locking memory for performance purposes, using the intimate shared memory (ISM) feature provides a potential workaround.
The swapctl(2)system call requires the PRIV_SYS_CONFIG privilege to add or remove swapping resources. This limitation affects installation and configuration software.
The uadmin(2) system call requires the PRIV_SYS_CONFIG privilege to use the A_REMOUNT, A_FREEZE, A_DUMP, and AD_IBOOT commands. This limitation affects applications that need to force crash dumps under certain circumstances.
The clock_settime(3RT) function requires the PRIV_SYS_TIME privilege to set the CLOCK_REALTIME and CLOCK_HIRES clocks.
The cpc_bind_cpu(3CPC) function requires the PRIV_CPC_CPU privilege to bind request sets to hardware counters. As a workaround, the cpc_bind_curlwp(3CPC) function can be used to monitor CPU counters for the LWP in question.
The pthread_attr_setschedparam(3C) function requires the PRIV_PROC_PRIOCNTL privilege to change the underlying scheduling policy and parameters for a thread.
The timer_create(3RT) function requires the PRIV_PROC_CLOCK_HIGHRES privilege to create a timer using the high-resolution system clock.
The APIs that are provided by the following list of libraries are not supported in a non-global zone. The shared objects are present in the zone's /usr/lib directory, so no link time errors occur if your code includes references to these libraries. You can inspect your make files to determine if your application has explicit bindings to any of these libraries and use pmap(1) while the application is executing to verify that none of these libraries are dynamically loaded.
Zones have a restricted set of devices, consisting primarily of pseudo devices that form part of the Solaris programming API. These pseudo devices include /dev/null, /dev/zero, /dev/poll, /dev/random, /dev/tcp, and so on. Physical devices are not directly accessible from within a zone unless the device has been configured by a system administrator. Since devices, in general, are shared resources in a system, to make devices available in a zone requires some restrictions so system security will not be compromised, as follows:
The /dev name space consists of symbolic links, that is, logical paths, to the physical paths in /devices. The /devices name space, which is available only in the global zone, reflects the current state of attached device instances that have been created by the driver. Only the logical path /dev is visible in a non-global zone.
Processes within a non-global zone cannot create new device nodes . For example, mknod(2) cannot create special files in a non-global zone. The creat(2), link(2), mkdir(2), rename(2), symlink(2), and unlink(2) system calls fail with EACCES if a file in /dev is specified. You can create a symbolic link to an entry in /dev, but that link cannot be created in /dev.
The /dev name space consists of device nodes made up of a default, “safe” set of drivers as well as device nodes that have been specified for the zone by the zonecfg(1M) command.
For non-global zones that are configured to use the shared-IP instance, the following restrictions apply.
The socket(3SOCKET) function requires the PRIV_NET_RAWACCESS privilege to create a raw socket with the protocol set to IPPROTO_RAW or IPPROTO_IGMP. This limitation affects applications that use raw sockets or need to create or inspect TCP/IP headers.
The t_open(3NSL) function requires the PRIV_NET_RAWACCESS privilege to establish a transport endpoint. This limitation affects applications that use the /dev/rawip device to implement network protocols as wall as applications that operate on TCP/IP headers.
Each non-global shared-IP zone has its own logical network and loopback interface. Bindings between upper layer streams and logical interfaces are restricted such that a stream may only establish bindings to logical interfaces in the same zone. Likewise, packets from a logical interface can only be passed to upper layer streams in the same zone as the logical interface. Bindings to the loopback address are kept within a zone with one exception: When a stream in one zone attempts to access the IP address of an interface in another zone. While applications within a zone can bind to privileged network ports, they have no control over the network configuration, including IP addresses and the routing table.
Note that these restrictions do not apply to exclusive-IP zones.