7 Monitor Functions and Recovery Procedures

SMC provides several internal monitor functions designed to ensure that the SMC subsystem and all client/server communications are operating correctly.

The SMC monitor subtask periodically performs the following actions, depending on the parameters you set in the SMC MONitor command:

  • Checks TapePlex communications when there are no currently active communication paths, or when the current communication path is not the preferred path.

  • Checks TapePlex communications when there has been no communication with the TapePlex for a period of time.

  • Ensures that the SMC IEFJFRQ exit, where SMC influences z/OS allocation, is active.

  • Cleans up inactive communication tokens

  • Re-drives pending mounts

  • Optionally, reports on scratch subpools that have reached their low scratch threshold value

If you do not enter a MONitor command, all of the above actions, except for scratch thresholds, are monitored. In addition, by default, SMC always attempts to revert to the primary communication path (first defined server) after an outage.

Refer to the ELS Command, Control Statement, and Utility Reference for more information about the SMC MONitor command.

Communications Monitoring

If SMC monitoring is active, the status of each TapePlex is checked periodically.

If the TapePlex appears to be active, communicating on the local or primary server path (or has PREFprimary set to OFF), is at full service level, and has established communication since the last active check interval, then no further processing is performed.

However, in any of the following circumstances, the SMC attempts to communicate with the TapePlex, restarting at the first defined communication path if PREFprimary ON is set.

  • The TapePlex does not have a current active communication path.

  • The TapePlex is active on a secondary communication path and the default value PREFprimary ON is set.

  • The TapePlex is not at the full service level.

  • The TapePlex has not established communication since the last active check interval.

Whenever communication is switched from one communication path to another, or communication is successfully established after a period of not communicating to the TapePlex, an SMC message indicating communication switched or active is produced.

When SMC detects an error preventing communication, one of the following SMC messages is produced and remains as a non-deletable message on the console:

  • Message SMC0260 indicates a specific error for a local path or server.

  • Message SMC0261 indicates that there are no defined, non-disabled communication paths for a TapePlex.

The presence of either of these messages indicates that SMC cannot currently communicate with a TapePlex, and cannot influence tape allocation based on server volume information. When this situation occurs, allocation may be directed to drives with a device type that is incompatible with the volume. Oracle recommends setting the ALLOCDef command FAILnoinfo SPECIFIC parameter to fail jobs during allocation to prevent specific tape allocations from being directed to incorrect device types.

Mount Monitor

An important function of the SMC monitor subtask is to ensure that all mounts have been successfully automated.

The monitor subtask periodically checks all UCBs for pending mount status, and compares this status to the last mount request sent by SMC to the server for the device. Mounts that were not sent to the server due to TapePlex or communication outages are re-driven as soon as possible. For other types of mounts, SMC issues message SMC0231 to indicate that the mount monitor has detected an outstanding mount, and then performs different processing for virtual and real tape mounts.

  • For virtual tape mounts, SMC sends the request to the server and receives a response indicating that the mount request was accepted by the server. If the mount remains pending after a pre-determined interval, SMC attempts to re-issue the mount request, with an indication that no response should be generated until the mount completes or fails. If a failure occurs, SMC updates the SMC0231 message with the failure reason (for example, that a VTV could not be recalled from an MVC volume), and the message remains non-deletable until the mount succeeds or the job is canceled.

  • For real mount failures, which may result from hardware outages or other issues where the operator responded ”I” (ignore) to an HSC mount WTOR message, SMC waits for a pre-determined interval and then attempts to re-drive the mount.

  • For both real and virtual mounts, only a single attempt is made to re-drive a mount. The SMC0231 message remains outstanding to indicate the reason why a pending mount was not satisfied.

Note:

SMC is not able to support the detection of pending mounts when all of the following conditions are present:
  • ALLOCDEF DEFER(OFF) has been specified.

  • The job entry subsystem is JES3.

  • The mount is outstanding on a JES3 LOCAL processor.

  • ALLOCDEF DEFER(OFF) has been specified, or,

  • The mount was requested before SMC had initialized and the mount request did not request the DEFER option.

You can use the SMC Display DRives command to determine the current status of a pending mount within the SMC subsystem. Refer to the ELS Command, Control Statement, and Utility Reference for more information about the statuses displayed by this command.

Recovery Procedures

Because the SMC Mount Monitor checks and re-drives pending mounts, it should not be necessary to perform manual procedures to cause the system to re-drive an outstanding mount. However, if a mount re-drive was unsuccessful, and the cause of the problem was resolved, you can use the SMC RESYNChronize command to force outstanding mounts to be re-driven again by SMC. If the mount remains unsatisfied, you may need to perform a manual recovery.

Note:

SMC is not able to support the detection of pending mounts when all of the following conditions are present:
  • ALLOCDEF DEFER(OFF) has been specified.

  • The job entry subsystem is JES3.

  • The mount is outstanding on a JES3 LOCAL processor.

  • ALLOCDEF DEFER(OFF) has been specified, or,

  • The mount was requested before SMC had initialized and the mount request did not request the DEFER option.

Inactive TapePlex or Inactive SMC: Preventing Allocation Errors

When a TapePlex becomes inactive, or communication errors prevent SMC from communicating with a TapePlex, allocation may select a device incompatible with a specific volume. To prevent this from occurring, it is recommended that you set the ALLOCDef command FAILnoinfo parameter to SPECIFIC, which will cause jobs to fail in allocation rather than be allocated to an incompatible device.

Certain software products allow you to suspend processing that may require dynamic allocation. For example, if Data Facility Hierarchical Storage Manager (DFSMS/hsm) is installed on the local processor, you can issue commands to prevent this type of processing without stopping DFSMS/hsm.

In JES2, you can postpone common allocations by holding the job queue or purging all initiators. Refer to the appropriate IBM publication for more information about JES2 operator commands.

In JES3, you can use the following modify command to postpone the C/I process for batch jobs while SMC is inactive:

*F X,D=POSTSCAN,MC=00

After communication with the TapePlex is reestablished, or SMC is restarted, use the following modify command to restore the maximum count to its original value, xx:

*F X,D=POSTSCAN,MC=xx

Inactive TapePlex or Inactive SMC: Re-Driving Mounts

You can use the operating system facilities to determine mounts that may not have been successfully re-driven by either the SMC Mount Monitor or the SMC RESYNChronize command.

For JES3, if the mount is lost during JES3 mount processing, issue the following command:

*I,S,V

Issue the following command to determine how long the job has been waiting:

*I,J=jjjj,W

where jjjj is the job number.

Issue the following command to determine the volume and drive the job is waiting on:

*CALL,DISPLAY,J=jjjj

If a mount is lost during MVS processing, issue the following MVS command on the system requesting the mount to determine if any drives have a mount request pending:

D R,L

Issue the following command to determine which volser to mount:

D U,,,uuuu,1

where uuuu is the address of the device for which the mount is pending.

If the SMC is inactive but the TapePlex is active, you can use the HSC Mount command to request HSC to perform the mount:

M vvvvv,dddd

Refer to the ELS Command, Control Statement, and Utility Reference for more information about the HSC Mount command.

JES3 Global/Local Considerations

In a JES3 environment, consider the following recovery guidelines when JES3 fails on either a local or global processor.

Inactive JES3 on a Local Processor

When JES3 fails on a local processor, active jobs continue to execute unless they require JES3 services. Drive exclusion still occurs for dynamic allocation requests.

To recover, restart JES3 (LOCAL start). The SMC continues processing and requires no recovery.

Inactive JES3 on a Global Processor

When JES3 fails on a global processor, jobs that are executing continue to execute. Drive exclusion still occurs for dynamic allocation requests.

To recover, restart JES3 or invoke Dynamic System Interchange (DSI) processing.

You can use DSI to reassign the JES3 global function to a JES3 local processor when the global processor becomes inactive or requires maintenance. One of the JES3 local processors becomes the new JES3 global processor. By reassigning the global function to a local processor, the JES3 environment continues processing. The SMC continues processing and requires no recovery.

Refer to the ELS Programming Reference for more information about cross host recovery.

SMC Recovery Procedures (JES2)

This section describes recovery procedures for the following problem scenarios:

Inactive SMC - Active TapePlex

When the SMC fails while one or more TapePlexes remain active, the following functions are not performed:

  • Allocation processing

  • Automation of mount/dismount/swap messages

When this occurs, restart the SMC.

Certain software products allow you to suspend processing that may require dynamic allocation. For example, if Data Facility Hierarchical Storage Manager (DFSMS/hsm) is installed on the local processor, you can issue commands to prevent this type of processing without stopping DFSMS/hsm.

Common allocations can be postponed by holding the job queue or purging all initiators. Refer to the appropriate IBM publication for more information about JES2 operator commands.

If the SMC MOUNTDef AUTOPendmount (ON) option was specified, outstanding mount messages are re-driven.

Active SMC - Inactive TapePlex

When a TapePlex fails or is terminated, volumes and drives owned by that TapePlex become unknown to SMC. The following functions are not performed:

  • Volume lookup for allocation influencing

  • Automated mount processing

When this occurs, restart the TapePlex and issue the SMC RESYNC command. The SMC re-establishes communication with the TapePlex and automates any outstanding mounts, regardless of the SMC MOUNTDef AUTOPendmount setting. See "Automating Mount Requests for Inactive TapePlexes" below for more information.

Certain software products allow you to suspend processing that may require dynamic allocation. For example, if Data Facility Hierarchical Storage Manager (DFSMS/hsm) is installed on the local processor, you can issue commands to prevent this type of processing without stopping DFSMS/hsm.

Common allocations can be postponed by holding the job queue or purging all initiators. Refer to the appropriate IBM publication for more information about JES2 operator commands.

Note:

You can provide a backup path to a remote TapePlex that is automatically activated when the local HSC is discovered to be inactive. See Chapter 3, "SMC and StorageTek TapePlex Management" for more information.

Automating Mount Requests for Inactive TapePlexes

MVS mount requests for drives owned by inactive TapePlexes are automatically redriven when the corresponding TapePlexes are activated.

Lost MVS Mount Requests for Active TapePlexes

An MVS mount request may be lost when an LMU error occurs. Use this procedure if you suspect lost mounts.

  1. Issue the following MVS command on the system requesting the mount to determine if any drives have a mount request pending:

    D R,L

  2. Issue the following MVS command on the same system to determine which VOLSER to mount:

    D U,,,uuuu,1

  3. If the drive is defined to an HSC TapePlex, issue the HSC Mount command for the volume on the MVS system on which the HSC is active.

SMC Recovery Procedures (JES3)

This section describes recovery procedures for the following problem scenarios:

Inactive SMC - Active TapePlex Subsystem

When the SMC fails while one or more TapePlexes remain active, the following functions are not performed:

  • Allocation processing

  • Automation of mount/dismount/swap messages

When this occurs, restart the SMC.

Certain software products allow you to suspend processing that may require dynamic allocation. For example, if Data Facility Hierarchical Storage Manager (DFSMS/hsm) is installed on the local processor, you can issue commands to prevent this type of processing without stopping DFSMS/hsm.

To postpone the C/I process for batch jobs while SMC is inactive, use the following modify command:

*F X,D=POSTSCAN,MC=00

After the SMC is restarted, restore the maximum count to its original value, xx:

*F X,D=POSTSCAN,MC=xx

If the HSC and MVS/CSC were started with the AMPND startup parameter, outstanding mount messages are re-driven when SMC is restarted and an MVS allocation or mount event occurs. Alternatively, the SMC RESYNChronize command may be issued to redrive pending mounts under these circumstances.

Active SMC - Inactive TapePlex

When a TapePlex fails or is terminated, volumes and drives owned by that TapePlex become unknown to SMC. The following functions are not performed:

  • Volume lookup for allocation influencing

  • Automated mount processing

When this occurs, restart the TapePlex and issue the SMC RESYNC command. The SMC re-establishes communication with the TapePlex and automates any outstanding mounts, regardless of the SMC MOUNTDef AUTOPendmount setting. See "Automating Mount Requests for Inactive TapePlexes" for more information.

Certain software products allow you to suspend processing that may require dynamic allocation. For example, if Data Facility Hierarchical Storage Manager (DFSMS/hsm) is installed on the local processor, you can issue commands to prevent this type of processing without stopping DFSMS/hsm.

Note:

You can provide a backup path to a remote TapePlex that is automatically activated when the local HSC is discovered to be inactive. See Chapter 1, "Introduction" for more information.

Inactive JES3 on a Local Processor

When JES3 fails on a local processor, active jobs that do not require JES3 services continue to execute. Drive exclusion still occurs for dynamic allocation requests.

To recover, restart JES3 (LOCAL start). The SMC continues processing and requires no recovery.

Inactive JES3 on a Global Processor

When JES3 fails on a global processor, active jobs that do not require JES3 services continue to execute. Drive exclusion still occurs for dynamic allocation requests.

To recover, restart JES3 or invoke Dynamic System Interchange (DSI) processing.

You can use DSI to reassign the JES3 global function to a JES3 local processor when the global processor becomes inactive or requires maintenance. One of the JES3 local processors becomes the new JES3 global processor. By reassigning the global function to a local processor, the JES3 environment continues processing. The SMC continues processing and requires no recovery.

Refer to the ELS Programming Reference or MVS/CSC System Programmer's Guide for more information about cross host recovery.

Automating Mount Requests for Inactive TapePlexes

MVS mount requests for drives owned by inactive TapePlexes are automatically redriven when the corresponding TapePlexes are activated.

Lost JES3 Mount Requests for Active TapePlexes

A JES3 mount request may be lost when an LMU error occurs. Use this procedure if you suspect lost mounts.

  1. Issue the following JES3 command to determine which jobs are awaiting a volume mount:

    *I,S,V

  2. Issue the following JES3 command to determine how long a job has been waiting:

    *I,J=nnnn,W

  3. Issue the following JES3 command to determine the volume and drive the job is waiting on:

    *CALL,DISPLAY,J=nnnn

  4. If the drive with a pending mount is defined to an HSC TapePlex, issue the HSC Mount command for the volume on the MVS system on which the HSC is active.

Lost MVS Mount Requests for Active TapePlexes

An MVS mount request may be lost when an LMU error occurs. Use this procedure if you suspect lost mounts.

  1. Issue the following MVS command on the system requesting the mount to determine if any drives have a mount request pending:

    D R,L

  2. Issue the following MVS command on the same system to determine which VOLSER to mount:

    D U,,,uuuu,1

  3. If the drive is defined to an HSC TapePlex, issue the HSC Mount command for the volume on the MVS system on which the HSC is active.