This chapter describes SunOS 5.0 through 5.8 virtual memory from the application developer's viewpoint. It identifies the capabilities that virtual memory makes available to application developers that are not found in systems with fixed memory. And it describes the interfaces provided in SunOS 5.0 through 5.8 to use and control these capabilities.
In a system with fixed memory (non-virtual), the address space of a process occupies and is limited to a portion of the system's main memory.
In SunOS 5.0 through 5.8 virtual memory, the actual address space of a process occupies a file in the swap partition of disk storage (the file is called the backing store). Pages of main memory buffer the active (or recently active) portions of the process address space to provide code for the CPU(s) to execute and data for the program to process.
A page of address space is loaded when an address that is not currently in memory is accessed by a CPU, causing a page fault. Since execution cannot continue until the page fault is resolved, by reading the referenced address segment into memory, the process sleeps until the page has been read.
The most obvious difference between the two memory systems for the application developer is that virtual memory lets applications occupy much larger address spaces. Less obvious advantages of virtual memory are much simpler and more efficient file I/O and very efficient sharing of memory between processes.
Since backing store files (the process address space) exist only in swap storage, they are not included in the UNIX named file space. (This makes backing store files inaccessible to other processes.) However, a simple extension allows the logical insertion of all, or part, of one or more, named files in the backing store and treats the result as a single address space. This is called mapping.
With mapping, any part of any readable or writable file can be logically included in a process's address space. Like any other portion of the process's address space, no page of the file is actually loaded into memory until a page fault forces this action. Pages of memory are written to the file only if their contents have been modified. So, reading from and writing to files is completely automatic and very efficient.
More than one process can map a single named file. This provides very efficient memory sharing between processes. All or part of other files can also be shared between processes.
Not all named file system objects can be mapped. Devices that cannot be treated as storage, such as terminal and network device files, are examples of objects that cannot be mapped.
A process address space is defined by all of the files (or portions of files) mapped into the address space. Each mapping is sized and aligned to the page boundaries of the system on which the process is executing. No memory is associated with processes themselves.
A process page maps to only one object at a time, although an object address can be the subject of many process mappings. The notion of a "page" is not a property of the mapped object. Mapping an object provides only the potential for a process to read or write the object's contents.
Mapping makes the object's contents directly addressable by a process. Applications can access the storage resources they use directly rather than indirectly through read and write. Potential advantages include efficiency (elimination of unnecessary data copying) and reduced complexity (single-step updates rather than the read, modify buffer, write cycle). The ability to access an object and have it retain its identity over the course of the access is unique to this access method, and facilitates the sharing of common code and data.
Because the file system name space includes any directory trees that are connected from other systems by NFS, any networked file can also be mapped into a process's address space.
Whether to share memory or to share data contained in the file, when multiple processes map a file simultaneously there can be problems with simultaneous access to data elements. Such processes can cooperate through any of the synchronization mechanisms provided in SunOS 5.0 through 5.8. Because they are very light weight, the most efficient synchronization mechanisms in SunOS 5.0 through 5.8 are those provides in the threads library. See mutex(3THR), condition(3THR), rwlock(3THR), and semaphore(3THR) in the manual pages for details on their use.
mmap(2) establishes a mapping of a named file system object (or part of one) into a process address space. It is the basic memory management function and it is very simple. First open(2) the file, then mmap(2) it with appropriate access and sharing options and go about your business.
The mapping established by mmap(2) replaces any previous mappings for specified address range.
The flags MAP_SHARED and MAP_PRIVATE specify the mapping type, and one of them must be specified. MAP_SHARED specifies that writes modify the mapped object. No further operations on the object are needed to make the change. MAP_PRIVATE specifies that an initial write to the mapped area creates a copy of the page and all writes reference the copy. Only modified pages are copied.
A mapping type is retained across a fork(2).
The file descriptor used in a mmap(2) call need not be kept open after the mapping is established. If it is closed, the mapping remains until the mapping is undone by munmap(2) or be replacing it with a new mapping.
If a mapped file is shortened by a call to truncate, an access to the area of the file that no longer exists causes a SIGBUS signal.
Mapping /dev/zero gives the calling program a block of zero-filled virtual memory of the size specified in the call to mmap(2). The following code fragment demonstrates a use of this to create a block of scratch storage in a program, at an address that the system chooses.
int fd; caddr_t result; if ((fd = open("/dev/zero", O_RDWR)) == -1) return ((caddr_t)-1); result = mmap(0, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); (void) close(fd);
Some devices or files are useful only if accessed by mapping. An example of this is frame buffer devices used to support bit-mapped displays, where display management algorithms function best if they can operate randomly on the addresses of the display directly.
The SunOS 5.0 through 5.8 virtual memory system is a cache system, in which processor memory buffers data from file system objects. There are interfaces to control or interrogate the status of the cache.
mincore(2) determines the residency of the memory pages in the address space covered by mappings in the specified range. Because the status of a page can change after mincore(2) checks it, but before mincore(2) returns the data, returned information can be outdated. Only locked pages are guaranteed to remain in memory.
mlock(3C) causes the pages in the specified address range to be locked in physical memory. References to locked pages (in this or other processes) do not result in page faults that require an I/O operation. This operation ties up physical resources and can disrupt normal system operation, so, use of mlock(3C) is limited to the superuser. The system lets only a configuration dependent limit of pages be locked in memory. The call to mlock(3C) fails if this limit is exceeded.
munlock(3C) releases the locks on physical pages. If multiple mlock(3C) calls are made on an address range of a single mapping, a singlemunlock(3C) call is release the locks. However, if different mappings to the same pages are mlock(3C)ed, the pages are not unlocked until the locks on all the mappings are released.
A lock is transferred between pages on the "copy-on-write" event associated with a MAP_PRIVATE mapping, thus locks on an address range that includes MAP_PRIVATE mappings are retained transparently along with the copy-on-write redirection (see mmap(2) above for a discussion of this redirection).
mlockall(3C)and munlockall(3C) are similar to mlock(3C) and munlock(3C), but they operate on entire address spaces. mlockall(3C) sets locks on all pages in the address space and munlockall(3C) removes all locks on all pages in the address space, whether established by mlock(3C) or mlockall(3C).
sysconf(3C) returns the system dependent size of a memory page. For portability, applications should not embed any constants specifying the size of a page. Note that it is not unusual for page sizes to vary even among implementations of the same instruction set.
mprotect(2) assigns the specified protection to all pages in the specified address range. The protection cannot exceed the permissions allowed on the underlying object.
caddr_t brk(caddr_t addr); caddr_t sbrk(intptr_t incr);
brk(2) identifies the lowest data segment location not used by the caller as addr (rounded up to the next multiple of the system page size).
sbrk(2), the alternate function, adds incr bytes to the caller data space and returns a pointer to the start of the new data area.