Sun Enterprise 10000 Dynamic Reconfiguration User Guide

Memory

If you use memory interleaving between system boards, those system boards cannot be detached because DR does not yet support interboard interleaving. By default, hpost(1M) does not set up boards with interleaved memory. Look for the following line in the hpost(1M) file .postrc (see postrc(4)):


mem_board_interleave_ok

If mem_board_interleave_ok is present, you may not be able to detach a board that uses memory interleaving.

Pageable and Nonpageable Memory

Before you can detach a board, the operating system must vacate the memory on that board. Vacating a board means flushing its pageable memory to swap space and copying its nonpageable (that is, kernel and OBP memory) to another memory board. To relocate nonpageable memory, the operating environment on a domain must be temporarily suspended, or quiesced. The length of the suspension depends on the domain I/O configuration and the running workloads. Detaching a board with nonpageable memory is the only time when the operating environment is suspended; therefore, you should know where nonpageable memory resides, so you can avoid significantly impacting the operation of the domain. When permanent memory is on the board, the operating environment must find other memory to receive the copy.

You can use the dr(1M) command drshow(1M) to determine whether the memory on a board is pageable or nonpageable:


% dr
dr> drshow board_number mem

Similarly, you can determine whether the memory on a board is pageable by looking at the DR Memory Configuration window, which is available when you perform a detach operation within Hostview. The DR Memory Configuration window is described in the Sun Enterprise 10000 DR Configuration Guide in the Solaris 8, Update 6, Sun Hardware Answerbook Collection.

Target Memory Constraints

When permanent memory is detached, DR chooses a target memory area to receive a copy of the memory. The DR software automatically checks for total adherence. It does not allow the DR memory operation to continue if it cannot verify total adherence. A DR memory operation can be disallowed because of the following reasons:

In Solaris 7 and later releases, if no target board is found, the detach operation is refused, and DR displays an error message. (See Appendix A, Appendix A, DR Error Messages for more information about DR error messages.)

Correctable Memory Errors

Correctable memory errors indicate that the memory on a system board (that is, one or more of its Dual Inline Memory Modules (DIMMs), or portions of the hardware interconnect) may be faulty and need replacement. When the SSP detects correctable memory errors, it initiates a record-stop dump to save the diagnostic data, which can interfere with a DR detach operation. Therefore, Sun Microsystems suggests that when a record-stop occurs from a correctable memory error, you allow the record-stop dump to complete its process before you initiate a DR detach operation.

If the faulty component causes repeated reporting of correctable memory errors, the SSP performs multiple record-stop dumps. If this happens, you should temporarily disable the dump-detection mechanism on the SSP, allow the current dump to finish, then initiate the DR detach operation. After the detach operation finishes, you should re-enable the dump detection.

To Re-Enable Dump Detection
  1. Log in to the SSP as the user ssp.

  2. Disable record-stop dump detection:


    SSP% edd_cmd -x stop
    

    This command suspends all event detection on all of the domains.

  3. Monitor the in-progress record-stop dump:


    SSP% ps -ef | grep hpost
    

    In the grep(1) output, the -D option of hpost indicates that a record-stop dump is in progress.

  4. Perform the DR detach operation.

  5. Enable event detection:


    SSP% edd_cmd -x start