8 Using VLE for Disaster Recovery

The use of the VLE (Virtual Library Extension) as a disaster recovery solution provides a simplified and non-disruptive method of performing DR testing, and also recovery from a business disruption event.

The system manages VLE like a library (ACS). However, because the VLE uses disk storage rather than tape storage, and because it maintains an internal inventory of VTVs in its contents, it offers features that a real library cannot provide:

  • The VLE is a "tapeless" solution, avoiding issues of media management.

  • Data is sent to the VLE using IP, and does not require channel extension.

  • The VLE can perform an MVC audit in a matter of a few seconds, using its internal database, compared to mounting and reading an MVC cartridge.

This chapter describes the use of the VLE in a simple two site environment. However, it should be noted that the solution supports any number of sites with any number of VLEs at each site. Also, one of the sites can be a DR-only site, not running MSP LPARs except during a DR test or declared disaster.

The procedure that follows use the following environment: There are two sites, SITE1 and SITE2. Each site has one VSM and one VLE. In this example SITE2 is described as a DR-only site, but SITE2 can also be a production site defined as a mirror image of SITE1.

Note:

The size of the VLE buffer at SITE2 must be sufficient to hold both the migrated production data and the data created during a DR test.

Normal Production Mode

During normal production policies are defined at SITE1 to migrate one copy of the data to the local VLE at SITE1 and a second copy to the remote VLE at SITE2. Additional copies can be created if desired, including both copies in another VLE and tape copies.

The following shows an example of policies defined at SITE1:

SMC Definitions are used to assign a MGMTCLAS name of VLEPROD to data sets with a high level qualifier of "PAYROLL."

POLICY NAME(VLEPOL) MEDIA(VIRTUAL) MGMT(VLEMGMT) + SUBP(VIRTSCR)
TAPEREQ DSN(PAYROLL.*) POLICY(VLEPOL)

HSC POOLPARM/VOLPARM definitions are used to define the production volumes:

POOLPARM TYPE(MVC)  NAME(LOCAL)  
VOLPARM VOLSER(VLL000-VLL099) 
POOLPARM TYPE(MVC)  NAME(VAULT1)
VOLPARM VOLSER(VLV000-VLV099)
POOLPARM TYPE(SCRATCH) NAME(VIRTSCR)
VOLPARM VOLSER(V00000-V99999) MEDIA(VIRTUAL)

Note:

The MVCs in the pools LOCAL and VAULT1 are VMVCs (virtual MVCs) in the SITE1 and SITE2 VLEs, respectively, and do not have a media type associated with them.

VTCS STORCLAS and MGMTCLAS are used to define the VTCS policies:

STOR NAME(VLE1) STORMNGR(SITE1VLE) MVCPOOL(LOCAL)
STOR NAME(VLE2) STORMNGR(SITE2VLE) MVCPOOL(VAULT1)
MGMT NAME(VLEMGMT) DELSCR(YES) MIGPOL(VLE1,VLE2)

When a job runs with a data set starting with the high level qualifier "PAYROLL," SMC uses the TAPEREQ and POLICY to assign a MGMTCLAS of VLEPROD to the mount request. VTCS selects a virtual scratch volume in the pool LOCSCR (range V00000-V99999) and assigns it a MGMTCLAS of VLEPROD. After the volume is dismounted, one copy is migrated to the local VLE (STORMNGR SITE1VLE) and the second copy is migrated to the remote VLE (STORMNGR SITE2VLE).

Running a DR Test with VLE

The setup process for a DR test at SITE2 is simple and fast, and requires minimal restrictions at SITE1.

The basic steps are:

  1. Create a new CDS at SITE2 containing only basic configuration data.

  2. Mark the SITE1 VMVCs as READONLY to avoid conflicts.

  3. Perform an audit of the virtual production MVCs in the SITE2VLE. This step populates the CDS with existing virtual metadata. Depending on the number of VTVs in the VLE this step may take anywhere from a few minutes to less than an hour.

  4. Run the DR test workload, using a range of VTVs and MVCs that does not overlap with the production volumes.

The remainder of this section gives the details of defining the parameters at the DR site, and describes steps you must take to ensure that the contents production VMVCs are not changed during the test.

  1. Creating the DR test CDS.

    1. Use the LIBGEN/SLICREAT process to create the CDS at the SITE2. Note that you create this CDS even if you are already running production work at SITE2. The new CDS contains only DR data from SITE1. Also note that you must define at least one ACS in the LIBGEN macros, even if your configuration does not contain physical tape.

    2. Run the SET VOLPARM utility to define the volumes for the DR test:

      POOLPARM TYPE(MVC)  NAME(VAULT1) 
      VOLPARM VOLSER(VLV000-VLV099)
      POOLPARM TYPE(EXTERNAL) NAME(PRODVTVS)
      VOLPARM VOLSER(V00000-V99999) MEDIA(VIRTUAL)
      POOLPARM TYPE(MVC)  NAME(DRMVC)
      VOLPARM VOLSER(VLT000-VLT099)
      POOLPARM TYPE(SCRATCH) NAME(VIRTSCR)
      VOLPARM VOLSER(VT0000-VT9999) MEDIA(VIRTUAL)
      

      Note that the first two pools define volumes created by SITE1 that will be used as input to the test at SITE2. The pool type of EXTERNAL indicates that these are volumes that are not part of a scratch subpool. The last two pools are local pools that will be used as output from the test at SITE2.

    3. Define VTCS MGMTCLAS and STORCLAS that will be used for the DR test:

      STOR NAME(DRVLE) STORMNGR(SITE2VLE) MVCPOOL(DRMVC)
      MGMT NAME(VLEMGMT) DELSCR(YES) MIGPOL(DRVLE)
      
    4. Note that because the MGMTCLAS and scratch subpools in the SITE2 DR system have the same names as the production policies (but different definitions), you can now use the same SMC POLICY and TAPEREQ statements for your SITE2 DR test as you use in your SITE1 production.

    5. Bring up HSC/VTCS on the DR test LPAR.

  2. Mark the production MVCs as READONLY.

    1. This is a critical step in the process and must be done on both the production CDS at SITE1 and the DR test CDS at SITE2. Note that once the MVCs have been defined as READONLY on the production CDS, you can continue to run normal processing, including:

      RECLAIM. Automatic reclaim will not select an MVC in READONLY status.

      SCRATCH. Although VTVs will get updated in the production CDS to be in scratch status, and may be reused, the copy on the VLE read only virtual MVC is unaffected.

      Normal processing to append to or overwrite the VTVs on the VMVCs. The new VTV versions will be migrated to new VMVCs, while the copy on the VLE read only virtual MVC is unaffected.

      Note:

      Note that you cannot, however, run the DRAIN utility against these MVCs, as that removes the VLE copy of the virtual MVC metadata.
    2. Use the utility function ACTMVCGN to select the production MVCs at the production site, using the production CDS. This utility generates control statements to set the READONLY flag on the MVCs it selects, and control statements to turn off the READONLY flag after the test is complete. Using the ALL keyword on the ACTMVCGN control statement ensures the full MVCs are selected for READONLY processing, which allows automatic reclaim to execute on the production system without impact to the DR test. The SLUSMAUD DD statement should also be included to generate AUDIT statements for the VMVCs that will be used in the test. Note that you can, if you want, run the ACTMVCGN utility at the production site to create the production updates, and at the DR site on a mirrored copy of the CDS to create the DR test CDS updates. Following is an example of the JCL to run this utility:

      //ACTMVCGN JOB (ACCT),'ACTMVCGN',NOTIFY=&SYSUID
      //ACTMVCG1  EXEC PGM=SLUADMIN,PARM='MIXED'
      //STEPLIB    DD DSN=hlq.SEALINK,DISP=SHR
      //SLSPRINT   DD SYSOUT=*
      //*      NOTE: CDS DD statements are optional if running at the production
      //*      site with an active HSC LPAR.
      //SLSCNTL    DD DSN=hlq.DBASEPRM,DISP=SHR
      //SLSCNTL2   DD DSN=hlq.DBASESEC,DISP=SHR
      //SLSSTBY    DD DSN=hlq.DBASESBY,DISP=SHR
      //* NOTE: MVCMAINT READONLY(ON) STATEMENTS
      //SLUSMVON   DD DSN=hlq.SLUSMVON,DISP=(NEW,CATLG,DELETE),
      //                         SPACE=(CYL,1)
      //* NOTE: MVCMAINT READONLY(OFF) STATEMENTS
      //SLUSMVOF   DD DSN=hlq.SLUSMVOF,DISP=(NEW,CATLG,DELETE),
      //                         SPACE=(CYL,1)
      //*       NOTE: AUDIT MVC(VVVVVV) STATEMENTS
      //SLUSMAUD   DD DSN=hlq.SLUSMAUD,DISP=(NEW,CATLG,DELETE),
      //                         SPACE=(CYL,1)
      //* NOTE: THE FOLLOWING SELECTS ALL "NON-EMPTY" VMVCS
      //SLSIN      DD *
       ACTMVCGN ALL MVCPOOL(VAULT1)
      /*
      
  3. At the production site, run the MVCMAINT utility function to mark the VMVCs as READONLY.

    //RDONLYON   EXEC PGM=SLUADMIN,PARM='MIXED'
    //STEPLIB    DD DSN=hlq.SEALINK,DISP=SHR
    //SLSPRINT   DD SYSOUT=*
    //* NOTE: EXEC MVCMAINT TO SET READONLY(ON).  Output of
    //*       ACTMVCGN utility.
    //SLSIN      DD DSN=hlq.SLUSMVON,DISP=SHR
    
  4. Bring up the HSC/VTCS at the DR site.

  5. Run an MVC audit of the production VMVCs in the SITE2 VLE using the newly created SITE2 CDS and the output of the ACTMVCGN utility. This step populates the CDS metadata containing the relationships between the VTVs and the VMVCs.

    //AUDIT EXEC PGM=SLUADMIN
    //STEPLIB  DD DSN=hlq.SEALINK,DISP=SHR
    //SLSPRINT DD SYSOUT=*
    //*     NOTE: AUDIT CONTROL STATEMENTS FROM ACTMVCGN UTILITY
    //SLSIN    DD   DSN=hlq.SLUSMAUD,DISP=SHR
    

    Optionally, you can recall VTVs that will be used in the DR test into the VTSS buffer, using another method of selecting VTVs to recall. However, this step is not necessary, since recalls from the VLE buffer are relatively fast.

  6. Run the MVCMAINT utility using the output of the ACTMVCGN READONLY(ON) to set the production VMVCs to READONLY at SITE2 on the DR CDS.

    //RDONLYON   EXEC PGM=SLUADMIN,PARM='MIXED'
    //STEPLIB    DD DSN=hlq.SEALINK,DISP=SHR
    //SLSPRINT   DD SYSOUT=*
    //* NOTE: EXEC MVCMAINT TO SET READONLY(ON).  Output of
    //*       ACTMVCGN utility.
    //SLSIN      DD DSN=hlq.SLUSMVON,DISP=SHR
    
  7. Optional: Before starting your DR test, you may want to run a VTVRPT and an MVCRPT to validate the contents of your DR test CDS.

  8. Run your DR test workload.

    1. Bring up SMC. If you used the same names on your MGMTCLAS and scratch subpools as the production system, you can use your production TAPEREQ and POLICY statements. It is recommended but not required that you use a different TapePlex name for the DR test TapePlex.

    2. Run your DR test workload using the SMC and new HSC/VTCS CDS.

    3. There are no limitations on updating production VTV volumes during the DR test. Data on production VTVs may be appended to (DISP=MOD) or overwritten (DISP=OLD). These updates do not affect the contents of the VTV copy on the READONLY production virtual MVC, and therefore do not affect the production copy of the data.

Cleanup After a DR Test with VLE

When the DR test is complete, the purpose of the cleanup is to remove metadata from the VTSS and from the VLE so that the next DR test will not see this data. Note that the DR test HSC/VTCS must remain active until the cleanup is complete. The steps are:

  1. Run the SCRATCH utility function to scratch all VTVs created during the test from both the VTSS and from the VLE DR test VMVCs. When the DELSCR(YES) parameter is specified on the DR test MGMTCLAS, running the scratch utility causes the VTVs to be deleted from both the buffer and from the VLE metadata.

    //SCRATCH EXEC PGM=SLUADMIN
    //STEPLIB  DD DSN=hlq.SEALINK,DISP=SHR
    //SLSPRINT DD SYSOUT=*
    //SLSIN    DD *
    SCRATCH VOL(VT0000-VT9999)
    

    Note that if you have modified any production VTVs using DISP=MOD or DISP=OLD, these VTVs remain in the buffer and on the VLE.

    By scratching the VTVs in the DR test subpool after the test, you minimize the amount of time required to clean up the VTSS, and minimize the amount of data left in the VLE after the completion of the test.

  2. Migrate the VTSS to 0.

    //MIGRTO0   EXEC PGM=SLUADMIN
    //STEPLIB  DD DSN=hlq.SEALINK,DISP=SHR
    //SLSPRINT DD SYSOUT=*
    //SLSIN    DD *
    MIGRATE VTSS(DRVTSS) THRESHLD(0)
    

    This step is required only if the output of your DR test included new versions of production VTVs

  3. Verify that the DR VTSS is now empty.

    //AUDVTSS    EXEC PGM=SLUADMIN
    //STEPLIB  DD DSN=hlq.SEALINK,DISP=SHR
    //SLSPRINT DD SYSOUT=*
    //SLSIN    DD *
    AUDIT VTSS(DRVTSS) 
    

    Note that if you have modified production VTVs during your DR test, copies of this data, and the metadata, remain in the VLE for the DR test MVC pool (VLT000-VLT099, VTVs V00000-V99999). During the next DR test, these VMVCs will be written starting at the logical beginning of tape, and any data they contain will be removed from the VLE. Since the new DR test CDS has no knowledge of this data, it will not affect the next DR test.

  4. At the production site, use the READONLY(OFF) control cards created by the ACTMVCGN utility at the beginning of the test to put the production VMVCs back into a writable status.

    //RDONLYOF   EXEC PGM=SLUADMIN,PARM='MIXED'
    //STEPLIB    DD DSN=hlq.SEALINK,DISP=SHR
    //SLSPRINT   DD SYSOUT=*
    //* NOTE: EXEC MVCMAINT TO SET READONLY(OFF)
    //SLSIN      DD DSN=hlq.SLUSMVOF,DISP=SHR
    

Using VLE for Business Continuance

When an outage occurs at SITE1 that requires SITE2 to take over SITE1's workload, the process is almost identical to the DR test procedure.

If a DR test happens to be executing when the SITE1 outage occurs, follow the above process to clean up after the DR test and stop the DR test.

To begin running the SITE1 workload at SITE2, follow the procedure described above for starting a DR test. You will, of course, omit the step of marking the production VMVCs as READONLY on the production CDS, as there is no "production" CDS to update. However, you will use the mirrored copy of the production CDS to generate the MVCMAINT READONLY control cards for the production MVCs in the VLE.

You will also want to use the DR test policies that segregate the VTVs being created and the output VMVCs into a separate range, to avoid any possibility of corrupting the production data until after the business continuance has been verified.

Note:

If production jobs performing DISP=MOD processing for tape data do not have a defined synchronization point, it is possible that the contents of a VTV at the time of an outage may be unpredictable. StorageTek recommends that all disaster recovery procedures be reviewed to ensure predictable synchronization points for tape data.