6 Managing Archives for Digital Preservation

Thus far, this document has discussed managing Oracle Hierarchical Storage Manager and StorageTek QFS Software solutions as ordinary UNIX file systems, where users and applications are regularly creating, modifying, and deleting files. The focus has been on the disk cache, with the archive serving primarily as a highly integrated backup service. In this chapter, we refocus on the archive as a repository and management solution for long-term data preservation. The previously covered management principles and techniques remain relevant. But now the disk cache serves primarily as a means of ingesting files into an archive that does not allow deletion or modification following ingestion.

Exact requirements vary. A repository that retains business or medical records for a legally mandated period may need to discard records periodically. But an archive storing scientific data, historical or genealogical records, or digital music, films, or television programs may need to store content, in effect, forever. For this reason, Oracle HSM supports digital preservation is several ways:

  • Message digests (checksums) let you detect damage, data corruption, and unauthorized modifications to files, so that you can correct hardware problems and replace unsound files with sound copies stored elsewhere in the archive.

  • File fixity attributes work with message digests to insure that only the super user can alter files that have been fixed. Whenever Oracle HSM stages or archives a fixed file, it revalidates a checksum stored with the fixity attribute to prove that the file remains unchanged.

  • Oracle HSM Write Once Read Many (WORM) file systems let you make files read-only and enforce retention for a specified period. These file systems can be configured so that the super user cannot alter files or file attributes, such as the fixity attribute discussed above.

The chapter starts with a brief review of the basic, Oracle HSM data-protection measures that form the foundation of any long-term storage solution. It then explains the tasks that specifically address data-preservation:

Configuring File Systems for Preservation

Every preservation solution starts with sound, highly redundant file systems. So review the implementation chapters of the companion Oracle Hierarchical Storage Manager and StorageTek QFS Installation and Configuration Guide, if you have not already done so. Protect access to the archive by providing redundant servers, network connections, and storage devices. Protect file data by configuring at least two additional copies of each file, with each stored on independent media. Archiving one copy to disk or solid-state storage devices and two copies to tape media is ideal, in most situations. When possible, insure that tape blocks are correctly written and read by implementing the Oracle HSM Data Integrity Verification feature. Protect file-system metadata by regularly generating dump files and by regularly backing up the archiving logs.

Using Message Digests (Checksums)

Message digests (checksums) let preservationists test archived files for changes that might indicate gradual deterioration, hardware or operator error, or deliberate, unauthorized alterations to the content. A message digest is simply a mathematical summary of a file's contents that has been generated by a one-way cryptographic hash function. Cryptographic hash functions are extremely sensitive to changes in their input data. Even small changes in the input produce large changes in the output. So message digests are ideal for detecting file corruption and unauthorized alterations. Recomputing a file's digest and comparing the resulting value to a stored digest value shows whether the file has changed.

Oracle Hierarchical Storage Manager file systems can ingest, create, store, and validate message digests using any of the following cryptographic hash functions:

  • SHA1, the 160-bit member of the Secure Hash Algorithm family of cryptographic functions

    The Secure Hash Algorithms are defined in Federal Information Processing Standard (FIPS) Publication 180-4, National Institute of Standards and Technology (2012). Oracle HSM uses SHA1 by default.

  • SHA256, the 256-bit member of the Secure Hash Algorithm family

  • SHA384, the 384-bit member of the Secure Hash Algorithm family

  • SHA512, the 512-bit member of the Secure Hash Algorithm family.

  • MD5, the 128-bit Message Digest function defined by the Internet Engineering Task Force (IETF) in Request for Comment (RFC) 1321

  • A proprietary, 128-bit Oracle HSM function that is now mainly useful for backward compatibility with older, Storage Archive Manager implementations.

Users can supply an existing digest value when a file in ingested into the repository or they can have the file system compute one, either immediately or when the file is first archived. Oracle HSM file systems store digest values with the file system metadata, using a special file attribute. Once the attribute is set, the file system recomputes a digest and validates it against the stored value whenever the corresponding file is rearchived and, optionally, whenever the file is staged from archival media to the disk cache.

Note, however, that the Oracle HSM media migration feature copies files to new media without recalculating checksums (for information on media migration see Chapter 8, "Migrating to New Storage Media"). If a file is not copied correctly, there is thus a small risk that the corruption will not be detected until the file is restaged and validated. Using Data Integrity Validation (DIV) minimizes this risk (see the Oracle Hierarchical Storage Manager and StorageTek QFS Installation and Configuration Guide for details).

Before you start using message digests, you should first make sure that the host can handle the required calculations without undue reductions in host performance. You can then carry out the following tasks as needed:

Make Sure that File-System Host Performance Will Be Adequate

If you plan to make significant use of message digests, make sure that the file-system host has enough computing resources for adequate performance. Most modern platforms incorporate dedicated cryptographic hardware that can efficiently perform specialized calculations without consuming central processor cycles. Be sure to take advantage of these capabilities when available.

To check the capabilities of a potential file-system host, proceed as follows:

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. Make sure that the host operating system is Solaris 11.1 or higher. Use the command uname -v.

    Earlier versions of the operating system do not support hardware acceleration of hash functions. In the example, the host operating system is Solaris 11.2:

    root@mds1:~# uname -v
    11.2
    root@mds1:~# 
    
  3. Display the instruction set architecture. At the command prompt, enter the command isainfo -v:

    root@mds1:~# isainfo -v 
    
  4. If the Solaris 11 host is an Oracle Sun SPARC T3 or later system, the output of the isainfo -v command should list instruction-sets that support sha512, sha256, sha1, and md5 cryptographic algorithms.

    In the example, the SPARC host provides hardware acceleration for the SHA1, SHA2, and MD5 algorithm families:

    root@mds1:~# isainfo -v 
    64-bit sparcv9 applications
            crc32c cbcond pause mont mpmul sha512 sha256 sha1 md5 camellia kasumi 
            des aes ima hpc vis3 fmaf asi_blk_init vis2 vis popc
    root@mds1:~# 
    
  5. If the Solaris host is an x86/64 system, it will support SHA-1 hardware acceleration if the output of the isainfo -v command includes the ssse3 (Supplemental Streaming SIMD Extensions 3) instruction set.

    In the example, the host x86/64 system includes the ssse3 instruction set:

    root@mds1:~# isainfo -v 
    64-bit amd64 applications
            avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 
            sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 
    root@mds1:~# 
    

Supply a Message Digest and Enable Validation for a File

When you are archiving files that already have associated message digests, proceed as follows.

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. At the command prompt, enter the command ssum -a algorithm -h digest -G [-u]filename, where:

    • -a algorithm identifies the cryptographic hashing function that the file system should use when validating the file against the supplied message digest.

    • -h digest identifies the message digest that the file system should use to validate the file.

    • -G specifies immediate validation. The file system sets the hash file attribute to the value of the supplied message digest, independently calculates a message digest for the file and compares the result to the stored value. If the supplied and calculated digests match, the file system sets the validated attribute for the file. It then sets the generate attribute so that validity is rechecked whenever the file is rearchived.

    • -u sets the use file attribute (optional). Whenever the file is staged, the file system recalculates the digest and validates the result against the value stored in the hash attribute.

    • filename is the path and name of the file.

    In the example, we supply a SHA256 digest and ask the file system to immediately recalculate and validate the digest value for the file data10 against the supplied value. When we check the file attributes with the command sls -D -h data10, we see that the generate and validated file attributes have been set, the algorithm attribute has been set to SHA-256 and the digest value has been calculated and stored in the hash attribute

    root@mds1:~# ssum -h f03ce01b3828...f7459503007e -a sha256 -g data10
    root@mds1:~# sls -D -h data10
    data10:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14975  admin id:      0  inode:    90217.1
      project: user.root(1)
      access:      Jul 16 16:14  modification: Jul 16 16:14
      changed:     Jul 16 16:15  attributes:   Jul 16 16:14
      creation:    Jul 16 16:14  residence:    Jul 16 16:14
      checksum: generate validated  algorithm: SHA-256
      hash: f03ce01b3828...f7459503007e
    root@mds1:~# 
    
  3. When necessary, edit the file as you normally would.

    In the example, we have modified a file named data10m since it was last archived. The sls -D -h command shows that the S (stale) flag has been set on both copies, since neither reflects the most recent changes. When we check the SHA-256 digest value for the modified file using the Solaris digest command, we see that the file's hash attribute also stores an out of date digest value:

    root@mds1:~# sls -D -h data10m
    data10m:
      mode: -rw-r--r--    links:   1  owner: root        group: root
      length:     14983  admin id:      0  inode:    90307.1
      project: user.root(1)
      copy 1: S----- Jul 17 16:47        dd.1    dk diskarchive f221
      copy 2: S----- Jul 20 11:31       a8d.1    li VOL002
      access:      Jul 20 11:32  modification: Jul 20 11:31
      changed:     Jul 17 16:37  attributes:   Jul 17 16:36
      creation:    Jul 17 16:36  residence:    Jul 17 16:36
      checksum: generate  algorithm: SHA-256
      hash: f03ce01b3828...f7459503007e
    root@mds1:~# digest -a sha256 data10m
    56c55bb421cc...71ac2ac0b7b0
    root@mds1:~# 
    
  4. If necessary, you can change the digest attributes of a modified file prior to rearchiving.

    In the example, we change the digest algorithm from SHA256 to SHA1, with immediate effect:

    root@mds1:~# ssum -a sha1 -G data10m
    root@mds1:~# sls -D -h data10m
    data10m:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90307.1
      project: user.root(1)
      release -a;
      copy 1: S----- Jul 20 13:00        e0.1    dk diskarchive f224
      copy 2: S----- Jul 20 13:05       a93.1    li VOL002
      access:      Jul 20 16:39  modification: Jul 20 16:39
      changed:     Jul 17 16:37  attributes:   Jul 17 16:36
      creation:    Jul 17 16:36  residence:    Jul 20 16:29
      checksum: generate validated  algorithm: SHA-1
      hash: 92003525f0f8...53e29d0718c8
    root@mds1:~# 
    
  5. Otherwise, wait for the file system to archive the modified file and automatically update the digest-related attributes.

    When a modified file is archived, the file system recalculates the digest value, stores the new value to the hash attribute, and sets the S (stale) flag on any archived copies of older versions of the file. In the example, we have edited the file data10m without altering the digest attributes. The archiver has created a new copy 1 on disk, as scheduled, and updated the hash attribute. A copy of the unmodified file remains on tape, flagged S (stale), until it is time for the archiver to create copy 2:

    root@mds1:~# sls -D -h data10m
    data10m:
      mode: -rw-r--r--    links:   1  owner: root        group: root
      length:     14983  admin id:      0  inode:    90307.1
      project: user.root(1)
      copy 1: ------ Jul 17 16:47        dd.1    dk diskarchive f221
      copy 2: S----- Jul 20 11:31       a8d.1    li VOL002
      access:      Jul 20 11:32  modification: Jul 20 11:31
      changed:     Jul 17 16:37  attributes:   Jul 17 16:36
      creation:    Jul 17 16:36  residence:    Jul 17 16:36
      checksum: generate  algorithm: SHA-256
      hash: 56c55bb421cc...71ac2ac0b7b0
    

Generate a Message Digest and Enable Validation for a File

To generate a digest for a file and enable file validation, proceed as follows:

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. At the command prompt, enter the command ssum -a algorithm -g|G [-u] filename, where:

    • -a algorithm specifies the cryptographic hashing function that the file system will use when generating a message digest for the file.

    • -g sets the generate file attribute for the file. The first time that the file is archived, the file system calculates a message digest. Whenever the file is rearchived, the file system recalculates the digest and validates the result against the stored value.

    • -G sets the generate and validate file attributes for the file. The file system immediately calculates a message digest and stores the result in the hash attribute. Whenever the file is archived, the file system recalculates the digest and validates the result against the stored value.

    • -u sets the use file attribute (optional). Whenever the file is staged, the file system recalculates the digest and validates the result against the value stored in the hash attribute.

    • filename is the path and name of the file.

    In the example, we ask the file system to use the SHA256 algorithm to calculate the digest for the file data11 prior to archiving. When we check the file attributes with the command sls -D -h data10, we see that, for each file, the generate file attribute has been set and the algorithm attribute has been set to SHA-256. The file has not yet been archived, so the digest value has not as yet been calculated and stored in the hash attribute:

    root@mds1:~# ssum -a sha256 -g data11
    root@mds1:~# sls -D -h data11
    data11:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14975  admin id:      0  inode:    90218.1
      project: user.root(1)
      access:      Jul 16 16:14  modification: Jul 16 16:14
      changed:     Jul 16 16:22  attributes:   Jul 16 16:14
      creation:    Jul 16 16:14  residence:    Jul 16 16:14
      checksum: generate  algorithm: SHA-256
      hash:
    root@mds1:~# 
    
  3. When necessary, edit the file as you normally would.

    In the example, we have modified a file named data11m since it was last archived. The sls -D -h command shows that the S (stale) flag has been set on both copies, since neither reflects the most recent changes. When we check the SHA-256 digest value for the modified file using the Solaris digest command, we see that the file's hash attribute also stores an out of date digest value:

    root@mds1:~# sls -D -h data11m
    data11m:
      mode: -rw-r--r--    links:   1  owner: root        group: root
      length:     14983  admin id:      0  inode:    90307.1
      project: user.root(1)
      copy 1: S----- Jul 17 16:47        dd.1    dk diskarchive f221
      copy 2: S----- Jul 20 11:31       a8d.1    li VOL002
      access:      Jul 20 11:32  modification: Jul 20 11:31
      changed:     Jul 17 16:37  attributes:   Jul 17 16:36
      creation:    Jul 17 16:36  residence:    Jul 17 16:36
      checksum: generate  algorithm: SHA-256
      hash: f03ce01b3828...f7459503007e
    root@mds1:~# digest -a sha256 data11m
    56c55bb421cc...71ac2ac0b7b0
    root@mds1:~# 
    
  4. If necessary, you can change the digest attributes of a modified file prior to rearchiving.

    In the example, we change the digest algorithm from SHA256 to SHA1, with immediate effect:

    root@mds1:~# ssum -a sha1 -G data11m
    root@mds1:~# sls -D -h data11m
    data11m:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90307.1
      project: user.root(1)
      release -a;
      copy 1: S----- Jul 20 13:00        e0.1    dk diskarchive f224
      copy 2: S----- Jul 20 13:05       a93.1    li VOL002
      access:      Jul 20 16:39  modification: Jul 20 16:39
      changed:     Jul 17 16:37  attributes:   Jul 17 16:36
      creation:    Jul 17 16:36  residence:    Jul 20 16:29
      checksum: generate validated  algorithm: SHA-1
      hash: 92003525f0f8...53e29d0718c8
    root@mds1:~# 
    
  5. Otherwise, wait for the file system to archive the modified file and automatically update the digest-related attributes.

    When a modified file is archived, the file system recalculates the digest value, stores the new value to the hash attribute, and sets the S (stale) flag on any archived copies of older versions of the file.

    In the example, we have edited the file data11m without altering the digest attributes. The archiver has created a new copy 1 on disk, as scheduled, and updated the hash attribute. A copy of the unmodified file remains on tape, flagged S (stale), until it is time for the archiver to create copy 2:

    root@mds1:~# sls -D -h data11m
    mdata11:
      mode: -rw-r--r--    links:   1  owner: root        group: root
      length:     14983  admin id:      0  inode:    90307.1
      project: user.root(1)
      copy 1: ------ Jul 17 16:47        dd.1    dk diskarchive f221
      copy 2: S----- Jul 20 11:31       a8d.1    li VOL002
      access:      Jul 20 11:32  modification: Jul 20 11:31
      changed:     Jul 17 16:37  attributes:   Jul 17 16:36
      creation:    Jul 17 16:36  residence:    Jul 17 16:36
      checksum: generate  algorithm: SHA-256
      hash: 56c55bb421cc...71ac2ac0b7b0
    

Generate a Message Digest and Enable Validation for Each File in a Directory

To recursively generate a digest and set the validation attributes for every file in a directory, proceed as follows:

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. At the command prompt, enter the command ssum -a algorithm -g|G [-u] -r directoryname, where:

    • -a algorithm specifies the cryptographic hashing function that the file system will use when generating message digests.

    • -g sets the generate file attribute for each file. The first time that a file is archived, the file system calculates a message digest for the file. Whenever the file is rearchived, the file system recalculates the digest and validates the result against the stored value.

    • -G sets the generate and validate file attributes for each file. The file system immediately calculates a message digest and stores the result in the hash attribute. Whenever the file is archived, the file system recalculates the digest and validates the result against the stored value.

    • -u sets the use file attribute (optional). Whenever the file is staged, the file system recalculates the digest and validates the result against the stored value.

    • -r recursively applies the command to all files in the specified directory.

    • directoryname is the path and name of the directory.

    In the first example, we tell the file system to use the SHA256 algorithm to calculate the digest for the files in the directory datasetA prior to archiving. When we check the file attributes with the command sls -D -h datasetA, we see that, for each file, the generate file attribute has been set and the algorithm attribute has been set to SHA-256. The files have not yet been archived, so the digest values have not as yet been calculated and stored in the hash attribute:

    root@mds1:~# ssum -a sha256 -g -r datasetA
    root@mds1:~# sls -D -h datasetA
    datasetA/pdata0:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90232.1
      project: user.root(1)
      access:      Jul 16 16:47  modification: Jul 16 16:47
      changed:     Jul 16 16:47  attributes:   Jul 16 16:47
      creation:    Jul 16 16:47  residence:    Jul 16 16:47
      checksum: generate  algorithm: SHA-256
      hash: 
    ...
    datasetA/pdata20:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90234.1
      project: user.root(1)
      access:      Jul 16 16:47  modification: Jul 16 16:47
      changed:     Jul 16 16:47  attributes:   Jul 16 16:47
      creation:    Jul 16 16:47  residence:    Jul 16 16:47
      checksum: generate  algorithm: SHA-256
      hash: 
    ...
    root@mds1:~# 
    

    In the second example, we ask the file system to use the SHA256 algorithm to immediately calculate the digest for the files in the directory datasetB prior to archiving. When we check the file attributes with the command sls -D -h datasetB, we see that, for each file, the generate and validated file attributes have been set, the algorithm attribute has been set to SHA-256, and the digest value has been calculated and stored in the hash attribute:

    root@mds1:~# ssum -a sha256 -G -r datasetB
    root@mds1:~# sls -D -h datasetB
    datasetB/qdata0:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90232.1
      project: user.root(1)
      access:      Jul 16 16:47  modification: Jul 16 16:47
      changed:     Jul 16 16:47  attributes:   Jul 16 16:47
      creation:    Jul 16 16:47  residence:    Jul 16 16:47
      checksum: generate validated  algorithm: SHA-256
      hash: 4d2800eb82b3...520341edde95
    ...
    datasetB/qdata12:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90234.1
      project: user.root(1)
      access:      Jul 16 16:47  modification: Jul 16 16:47
      changed:     Jul 16 16:47  attributes:   Jul 16 16:47
      creation:    Jul 16 16:47  residence:    Jul 16 16:47
      checksum: generate validated  algorithm: SHA-256
      hash: 5b057f1b7b48...88c590d47dec
    ...
    root@mds1:~# 
    

Validate the Message Digest of a File During Staging

When required, you can validate a file before it is staged to the disk cache for use. Proceed as follows:

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. At the command prompt, enter the command ssum -u [-a algorithm [-h digest] -g|G] filename, where:

    • -u specifies validation prior to staging by setting the use file attribute. When the use attribute is set for a file, the file system will not copy the file from archival media to the disk cache until it has generated a message digest and successfully validated the result against the value stored in the file's hash attribute.

    • -a algorithm, -h digest, and -g|G are optional parameters that set the required algorithm, hash, and generate attributes on the file if the attributes have not been set previously.

    • filename is the path and name of the file.

    In the example, we have already enabled validation for the file data102. As the command sls -D -h data102 shows, the generate and validated file attributes have been set, the algorithm attribute has been set to SHA-256, and the digest value has been calculated and stored in the hash attribute:

    root@mds1:~# ssum -a sha256 -F data102
    root@mds1:~# sls -D -h data102
    data102:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14979  admin id:      0  inode:    90264.1
      project: user.root(1)
      access:      Jul 16 17:34  modification: Jul 16 17:34
      changed:     Jul 16 17:34  attributes:   Jul 16 17:34
      creation:    Jul 16 17:34  residence:    Jul 16 17:34
      checksum: generate validated  algorithm: SHA-256
      hash: baae932ce1cf...93166a2e36b5
    root@mds1:~# 
    

    So we can set the use attribute to insure that the file system validates the file prior to staging. The command sls -D -h data102 shows that the use attribute is now set:

    root@mds1:~# ssum -u data102
    root@mds1:~# sls -D -h data102
    data102:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14979  admin id:      0  inode:    90264.1
      project: user.root(1)
      access:      Jul 16 17:34  modification: Jul 16 17:34
      changed:     Jul 16 17:34  attributes:   Jul 16 17:34
      creation:    Jul 16 17:34  residence:    Jul 16 17:34
      checksum: generate use validated  algorithm: SHA-256
      hash: baae932ce1cf...93166a2e36b5
    root@mds1:~# 
    

Change Message Digesting and Validation Attributes Before a File is Archived

If a file that has not been made immutable and has not yet been archived, you can change message digesting and validation attributes using the procedure below.

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. If necessary, change the digesting algorithm. At the command prompt, enter the command ssum -a newalgorithm filename, where:

    • -a newalgorithm specifies the cryptographic hash function that replaces the previously specified digesting algorithm.

    • filename is the path and name of the file.

    In the example, our preservation policies require the highly collision-resistent SHA256 function. But as the command sls -D -h shows, we have inadvertently specified the SHA1 algorithm when we set the digest attributes of the file data319. Since the file has not yet been archived, we can successfully change the algorithm to SHA256:

    root@mds1:~# sls -D -h data319
    data319:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90301.1
      project: user.root(1)
      access:      Jul 17 15:27  modification: Jul 17 15:27
      changed:     Jul 17 15:28  attributes:   Jul 17 15:27
      creation:    Jul 17 15:27  residence:    Jul 17 15:27
      checksum: generate  algorithm: SHA-1
      hash: 
    root@mds1:~# ssum -a sha256 data319
    root@mds1:~# sls -D -h data319
    data319:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90301.1
      project: user.root(1)
      access:      Jul 17 15:27  modification: Jul 17 15:27
      changed:     Jul 17 15:28  attributes:   Jul 17 15:27
      creation:    Jul 17 15:27  residence:    Jul 17 15:27
      checksum: generate  algorithm: SHA-256
      hash: 
    root@mds1:~# 
    
  3. If necessary, clear the digest attributes and restore the default file settings. At the command prompt, enter the command ssum -d filename, where:

    • -d resets the file digest attributes to their default values.

    • filename is the path and name of the file.

    In the example, we did not mean to configure message digesting and validation for the file data44. But, as the command sls -D -h shows, we have inadvertently done so. Since the file has not yet been archived, we can successfully clear generate and use, the attributes that control digest validation during archiving and staging. The data in validated, algorithm, and hash attributes remains but does not affect file system's behavior:

    root@mds1:~# sls -D -h data44
    data44:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90292.1
      project: user.root(1)
      access:      Jul 17 14:58  modification: Jul 17 14:57
      changed:     Jul 17 14:58  attributes:   Jul 17 14:57
      creation:    Jul 17 14:57  residence:    Jul 17 14:57
      checksum: generate use validated  algorithm: SHA-256
      hash: 3b4b15f8f69c...bae62c7e7568
    root@mds1:~# ssum -d data44
    root@mds1:~# sls -D -h data44
    data44:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90292.1
      project: user.root(1)
      access:      Jul 17 14:58  modification: Jul 17 14:57
      changed:     Jul 17 14:58  attributes:   Jul 17 14:57
      creation:    Jul 17 14:57  residence:    Jul 17 14:57
      checksum: validated  algorithm: SHA-256
      hash:  3b4b15f8f69c...bae62c7e7568
    root@mds1:~# 
    
  4. If necessary, reset any required message digesting and validation attributes before the file is archived. At the command prompt, enter the command ssum with the appropriate options and file name.

    In the example, we decide to re-enable message digesting on the file qndat44 and validate digests prior to archiving. But we do not require validation prior to staging. So we restore the generate attribute but not the use attribute:

    root@mds1:~# ssum -g data44
    root@mds1:~# sls -D -h data44
    data44:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14983  admin id:      0  inode:    90292.1
      project: user.root(1)
      access:      Jul 17 14:58  modification: Jul 17 14:57
      changed:     Jul 17 14:58  attributes:   Jul 17 14:57
      creation:    Jul 17 14:57  residence:    Jul 17 14:57
      checksum: generate validated  algorithm: SHA-256
      hash:  3b4b15f8f69c...bae62c7e7568
    root@mds1:~# 
    

Making Files Immutable

Preservation requirements frequently require mechanisms that assure file fixity. The archive must both prevent changes and prove that such changes have not occurred. To provide fixity, Oracle HSM archival file systems combine the message digests and digest-related file attributes discussed above with additional attributes that render the file immutable. Once a file has been made immutable, only those with super-user authority can change its status. If you combine immutability with a strict Write Once Read Many (WORM) file system, even super users will be unable to make changes (for details, see "Understanding WORM File Systems").

You can make a file immutable in either of the following situations:

Supply a Message Digest and Make a File Immutable

When you need to insure that a file remains unchanged after ingestion into the archive, proceed as follows.

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. At the command prompt, enter the command ssum -a algorithm [-h digest] -F filename, where:

    • -a algorithm identifies the cryptographic hashing function that the file system should use when validating the file against the supplied message digest.

    • -h digest identifies the message digest that the file system should use to validate the file.

    • -F specifies immediate validation and immutability, and sets the fixity, generate, validated, and use file attributes. The file system immediately calculates and validates a message digest. When the file is staged or archived, the file system recalculates and revalidates a message digest.

    • filename is the path and name of the file.

    In the example, we supply a SHA256 digest and tell the file system to recalculate the digest, validate the value for the file data20, and make the file immutable. When we check the file attributes with the command sls -D -h data10, we see that, for each file, the fixity, generate, use, and validated file attributes have been set, the algorithm attribute has been set to SHA-256, and the digest value has been calculated and stored in the hash attribute:

    root@mds1:~# ssum -h bfaefde932cf...d450892eda63 -a sha256 -F data20
    root@mds1:~# sls -D -h data20
    data20:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14979  admin id:      0  inode:    90264.1
      project: user.root(1)
      access:      Jul 16 17:34  modification: Jul 16 17:34
      changed:     Jul 16 17:34  attributes:   Jul 16 17:34
      creation:    Jul 16 17:34  residence:    Jul 16 17:34
      checksum: fixity generate use validated  algorithm: SHA-256
      hash: bfaefde932cf...d450892eda63
    root@mds1:~# 
    

Generate a Message Digest and Make a File Immutable

When you are archiving files that already have associated message digests and need to insure that the file remains unchanged after ingestion into the archive, proceed as follows.

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. At the command prompt, enter the command ssum -a algorithm [-h digest] -F filename, where:

    • -a algorithm identifies the cryptographic hashing function that was used to generate the digest that is specified in the -h digest parameter.

    • -F sets the fixity, generate, validated, and use file attributes. The file system immediately calculates and validates a message digest. When the file is staged or archived, the file system recalculates and revalidates a message digest.

    • filename is the path and name of the file.

    In the example, we tell the file system to calculate a SHA256 digest, validate the value for the file data200, and make the file immutable. When we check the file attributes with the command sls -D -h data10, we see that, for each file, the fixity, generate, validated, and use file attributes have been set, the algorithm attribute has been set to SHA-256, and the digest value has been calculated and stored in the hash attribute:

    root@mds1:~# ssum -a sha256 -F data200
    root@mds1:~# sls -D -h data200
    data200:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14979  admin id:      0  inode:    90264.1
      project: user.root(1)
      access:      Jul 16 17:34  modification: Jul 16 17:34
      changed:     Jul 16 17:34  attributes:   Jul 16 17:34
      creation:    Jul 16 17:34  residence:    Jul 16 17:34
      checksum: fixity generate use validated  algorithm: SHA-256
      hash: efde93cc12cf...d496602e36dd
    root@mds1:~# 
    

Checking File Digest and Fixity Attributes

To view the message digest and fixity attributes of one or more files, use the Oracle HSM directory listing command, sls. Proceed as follows.

List Message Digesting and Validation Attributes

  1. Log in to the file system host as root:

    root@mds1:~# 
    
  2. At the command prompt, enter the command sls -D -h filename, where:

    • -D specifies a detailed display of file attributes.

    • -h includes the hash (digest) value in the display.

    • filename identifies one or more files by path and name.

    In the example, we see the file digest attributes for the file data02 listed in the checksum and hash fields of the display:

    root@mds1:~# sls -D -h data02
    data02:
      mode: -rw-r--r--    links:   1  owner: root        group: root      
      length:     14975  admin id:      0  inode:    90217.1
      project: user.root(1)
      access:      Jul 16 16:14  modification: Jul 16 16:14
      changed:     Jul 16 16:15  attributes:   Jul 16 16:14
      creation:    Jul 16 16:14  residence:    Jul 16 16:14
      checksum: generate use validated  algorithm: SHA-256
      hash: f03ce01b3828...f7459503007e
    root@mds1:~# 
    
    • The hash attribute stores the message digest for the file, f03ce01b3828...f7459503007e.

    • The algorithm attribute shows that the SHA-256 cryptographic hashing function generated the stored message digest.

    • The generate attribute shows that the file system independently recalculates the message digest and validates it against the stored value whenever the file is archived.

    • The use attribute shows that the file system independently recalculates the message digest and validates it against the stored value whenever the file is staged.

    • The validated attribute shows that the independently calculated message digest matched the value stored in the hash attribute when last checked.

    • The fixity attribute appears if the file has been made immutable.

Using WORM File Systems

When legal or archival considerations so require, you can create write-once read-many (WORM) directories and files in any Oracle HSM file system that has been configured to support them. This section focuses on understanding WORM file systems and on specific tasks that you need to perform when working with WORM files and directories, including:

  • WORM-enabling directories

  • activating WORM protection for a file

  • finding and listing WORM files.

For information on enabling WORM support for a file system, see the Oracle Hierarchical Storage Manager and StorageTek QFS Installation and Configuration Guide.

Understanding WORM File Systems

Write Once Read Many (WORM) file systems protect data by letting users make files read-only for the duration of a specified retention period. WORM-enabled Oracle HSM file systems support default and customizable file-retention periods, data and path immutability, and subdirectory inheritance of the WORM setting.

Depending on how your file systems are configured, you use one of two Oracle HSM WORM modes:

  • standard compliance mode (the default)

  • emulation mode

In a file system that is mounted under the standard WORM mode, a user WORM-enables directories and starts the read-only retention period for files by executing the command chmod 4000 path_name, where path_name is the path and name of the file or directory. This sets UNIX setuid (set user ID upon execution) permission. Setting setuid permission on a file that also has execute permission is a security risk, so, in standard WORM mode, only non-executable files can be made read-only.

In a file system that is mounted under the WORM emulation mode, a user WORM-enables directories and starts the read-only retention period for files by executing the command chmod 555 path_name, where path_name is the path and name of a writable file or directory. Since emulation mode does not require setuid permission, executable files can be made read-only and assigned retention periods.

Both standard and emulation modes have a strict WORM implementation and a less restrictive, lite implementation. Both strict and lite implementations do not allow changes to data or paths once retention has been triggered on a file or directory. Both set the default retention period to 43,200 minutes (30 days). But the lite implementation relaxes some restrictions for root users.

The strict implementations do not let anyone shorten the specified retention period or delete files or directories prior to the end of the retention period. They also do not let anyone use sammkfs to delete volumes that hold currently retained files and directories. The strict implementations are thus well-suited to meeting the most demanding legal, regulatory compliance, and preservation requirements.

The lite implementations let root users shorten retention periods, delete files and directories, and delete volumes using the sammkfs command. This provides a high level of protection against casual data loss and provides more flexibility when administering file systems and storage resources. But file systems that allow super users this degree of control may not meet some legal and regulatory compliance requirements.

You can create both hard and soft links to WORM files. You can only create hard links with files that reside in a WORM-capable directory. After a hard link is created, it has the same WORM characteristics as the original file. Soft links can also be established, but a soft link cannot use the WORM features. Soft links to WORM files can be created in any directory in an Oracle HSM file system.

For full information on creating and configuring WORM file systems, see the Oracle Hierarchical Storage Manager and StorageTek QFS Installation and Configuration Guide in the Customer Documentation Library.

WORM-Enable a Directory

When you WORM-enable a directory, you add support for WORM files, but do not otherwise change the characteristics of the directory. Users can continue to create and edit files within a WORM-enabled directory, and WORM-enabled directories that do not contain WORM files can be deleted. To WORM-enable a directory, proceed as follows:

  1. Log in to the file-system server.

    user@mds1:~# 
    
  2. See if the directory has already been WORM-enabled. Use the command sls -Dd directory, where directory is the path and name of the directory. Look for the attribute worm-capable in the output of the command.

    Usually, directories will be WORM-enabled, because, when one user WORM-enables a directory, all current and future child directories inherit the WORM capability (for full information on the command, see the sls man page). In the first example, we find that our target directory, /hsm/hsmfs1/records, is already worm-enabled:

    user@mds1:~# sls -Dd /hsm/hsmfs1/records/2013/
    /hsm/hsmfs1/records/2013:
      mode: drwxr-xr-x    links:   2  owner: root        group: root      
      length:      4096  admin id:      0  inode:     1048.1
      project: user.root(1)
      access:      Mar  3 12:15  modification: Mar  3 12:15
      changed:     Mar  3 12:15  attributes:   Mar  3 12:15
      creation:    Mar  3 12:15  residence:    Mar  3 12:15
      worm-capable        retention-period: 0y, 30d, 0h, 0m
    

    But in the second example, we find that our target directory, /hsm/hsmfs1/documents, is not worm-enabled:

    user@mds1:~# sls -Dd /hsm/hsmfs1/documents
    /hsm/hsmfs1/documents
      mode: drwxr-xr-x    links:   2  owner: root        group: root      
      length:      4096  admin id:      0  inode:     1049.1
      project: user.root(1)
      access:      Mar  3 12:28  modification: Mar  3 12:28
      changed:     Mar  3 12:28  attributes:   Mar  3 12:28
      creation:    Mar  3 12:28  residence:    Mar  3 12:28
    
  3. If the directory is not WORM-enabled and if the file system was mounted with the worm_capable or worm_lite mount option, enable WORM support with the Solaris command chmod 4000 directory-name, where directory-name is the path and name of the directory that will hold the WORM files.

    The command chmod 4000 sets the setuid (set user ID upon execution) attribute on the directory and enables standard WORM support. In the example, we WORM-enable the directory /hsm/hsmfs1/documents and check the result with sls -Dd. The operation succeeds and the directory is WORM-enabled:

    user@mds1:~# chmod 4000 /hsm/hsmfs1/documents
    user@mds1:~# sls -Dd /hsm/hsmfs1/documents
    /hsm/hsmfs1/documents
      mode: drwxr-xr-x    links:   2  owner: root        group: root      
      length:      4096  admin id:      0  inode:     1049.1
      project: user.root(1)
      access:      Mar  3 12:28  modification: Mar  3 12:28
      changed:     Mar  3 12:28  attributes:   Mar  3 12:28
      creation:    Mar  3 12:28  residence:    Mar  3 12:28
      worm-capable        retention-period: 0y, 30d, 0h, 0m
    
  4. If the directory is not WORM-enabled and if the file system was mounted with the worm_emul or emul_lite mount option, enable WORM support with the Solaris command chmod 555 directory-name, where directory-name is the path and name of the directory that will hold the WORM files.

    The command chmod 555 removes write permissions for the directory and enables WORM-emulation support. In the example, we WORM-enable the directory /hsm/hsmfs1/documents and check the result using the command sls -Dd. The operation succeeds and the directory is WORM-enabled:

    user@mds1:~# chmod 555 /hsm/hsmfs1/documents
    user@mds1:~# sls -Dd /hsm/hsmfs1/documents
    /hsm/hsmfs1/documents
      mode: drwxr-xr-x    links:   2  owner: root        group: root      
      length:      4096  admin id:      0  inode:     1049.1
      project: user.root(1)
      access:      Mar  3 12:28  modification: Mar  3 12:28
      changed:     Mar  3 12:28  attributes:   Mar  3 12:28
      creation:    Mar  3 12:28  residence:    Mar  3 12:28
      worm-capable        retention-period: 0y, 30d, 0h, 0m
    

Activate WORM Protection for a File

When you activate WORM protection on a file in a WORM-enabled directory, the file system no longer allows modifications to the file data or the path to the data until the retention period expires. So you must use care. To activate WORM protection, proceed as follows:

  1. Log in to the file-system server.

    user@mds1:~# 
    
  2. If you need to retain the file for some period other than the default for the file system, specify the required retention time by changing the access time for the file. Use the Solaris command touch -a -texpiration-date path-name, where:

    • expiration-date is a string of numerals consisting of a four-digit year, a two-digit month, a two-digit day of the month, a two-digit hour of the day, a two digit minute within the hour, and, optionally, a two-digit second within the minute.

    • path-name is the path and name of the file.

    Note that Oracle Solaris UNIX utilities such as touch cannot extend a retention period beyond 10:14 PM on 01/18/2038. These utilities use signed 32–bit numbers to represent time in seconds starting from 01/01/1970. So use a default retention period if you need to retain files beyond this cut-off date.

    In the example, we set the retention period for the file to expire in four years, on October 4, 2019 at 11:59 AM:

    user@mds1:~# touch -a -t201910141159  /hsm/hsmfs1/plans/master.odt
    
  3. If the file system was mounted with the worm_capable or worm_lite mount option, activate WORM protection with the Solaris command chmod 4000 path-name, where path-name is the path and name of the file.

    The chmod 4000 command sets the setuid (set user ID upon execution) attribute on the specified file. Setting this attribute on an executable file is insecure. So, if the file system was mounted with the worm_capable or worm_lite mount option, you cannot set WORM protections on files that have UNIX execute permission.

    In the example, we activate WORM protection for the file master.odt. We check the result with sls -D, and note that the retention attribute is now set to active, and the retention-period is set to four years:

    user@mds1:~# chmod 4000 /hsm/hsmfs1/plans/master.odt
    user@mds1:~# sls -Dd /hsm/hsmfs1/plans/master.odt
    /hsm/hsmfs1/plans/master.odt:
      mode: -r-xr-xr-x    links:   1  owner: root        group: root      
      length:       104  admin id:      0  inode:     1051.1
      project: user.root(1)
      access:      Mar  4 2018  modification: Mar  3 13:14
      changed:     Mar  3 13:16  retention-end: Apr  2 14:16 2014
      creation:    Mar  3 13:16  residence:    Mar  3 13:16
      retention:   active        retention-period: 4y, 0d, 0h, 0m
    
  4. If the file system was mounted with the worm_emul or emul_lite mount option, activate WORM protection with the Solaris command chmod 555 path-name, where path-name is the path and name of the file.

    The command chmod 555 removes write permissions for the directory. So you can WORM protect executable files, if required. In the example, we activate WORM retention for the file master-plan.odt. We check the result with sls -D, and note that the retention attribute is now set to active, and the retention-period is set to four years:

    user@mds1:~# chmod 555 /hsm/hsmfs1/plans/master.odt
    user@mds1:~# sls -Dd /hsm/hsmfs1/plans/master.odt
    /hsm/hsmfs1/plans/master.odt:
      mode: -r-xr-xr-x    links:   1  owner: root        group: root      
      length:       104  admin id:      0  inode:     1051.1
      project: user.root(1)
      access:      Mar  4 2018  modification: Mar  3 13:14
      changed:     Mar  3 13:16  retention-end: Apr  2 14:16 2014
      creation:    Mar  3 13:16  residence:    Mar  3 13:16
      retention:   active        retention-period: 4y, 0d, 0h, 0m
    

Find and List WORM Files

To find and list WORM files that meet specified search criteria, use the sfind command. Proceed as follows:

  1. Log in to the file-system server.

    user@mds1:~# 
    
  2. To list files that are WORM-protected and being actively retained, use the command sfind starting-directory -ractive, where starting-directory is the path and name for the directory where you want the listing process to start.

    user@mds1:~# sfind /hsm/hsmfs1/ -ractive 
    /hsm/hsmfs1/documents/2013/master-plan.odt
    /hsm/hsmfs1/documents/2013/schedule.ods
    /samma1/records/2013/progress/report01.odt
    /samma1/records/2013/progress/report02.odt
    /samma1/records/2013/progress/report03.odt ...
    user@mds1:~# 
    
  3. To list WORM-protected files for which the retention period has expired, use the command sfind starting-directory -rover, where starting-directory is the path and name for the directory where you want the listing process to start.

    user@mds1:~# sfind /hsm/hsmfs1/ -rover 
    /samma1/documents/2007/master-plan.odt
    /samma1/documents/2007/schedule.ods
    user@mds1:~# 
    
  4. To list WORM-protected files for which the retention period will expire after a specified date and time, use the command sfind starting-directory -rafter expiration-date, where:

    • starting-directory is the path and name for the directory where you want the listing process to start

    • expiration-date is a string of numerals consisting of a four-digit year, a two-digit month, a two-digit day of the month, a two-digit hour of the day, a two digit minute within the hour, and, optionally, a two-digit second within the minute.

    In the example, we list any files for which the retention period expires after January 1, 2015 at one minute after midnight:

    user@mds1:~# sfind /hsm/hsmfs1/ -rafter 201501010001
    /hsm/hsmfs1/documents/2013/master-plan.odt
    user@mds1:~# 
    
  5. To list WORM-protected files that must remain in the file system for at least a specified amount of time, use the command sfind starting-directory -rremain time-remaining, where:

    • starting-directory is the location in the directory tree where the search starts.

    • time-remaining is a string of non-negative integers paired with the following units of time: y for years, d for days, h for hours, m for minutes.

    In the example, we find all files under the directory /hsm/hsmfs1/ that will be retained for at least three more years:

    user@mds1:~# sfind /hsm/hsmfs1/ -rremain 3y
    /hsm/hsmfs1/documents/2013/master-plan.odt
    user@mds1:~# 
    
  6. To list WORM-protected files that must remain in the file system for more than a specified amount of time, use the command sfind starting-directory -rlonger time, where:

    • starting-directory is the location in the directory tree where the search starts.

    • time-remaining is a string of non-negative integers paired with the following units of time: y for years, d for days, h for hours, m for minutes.

    In the example, we find all files under the directory /hsm/hsmfs1/ that will be retained for more than three years and ninety days:

    user@mds1:~# sfind /hsm/hsmfs1/ -rremain 3y90d
    /hsm/hsmfs1/documents/2013/master-plan.odt
    user@mds1:~# 
    
  7. To list WORM-protected files that must remain in the file system permanently, use the command sfind starting-directory -rpermanent.

    In the example, we find that no files under the directory /hsm/hsmfs1/ are being retained permanently:

    user@mds1:~# sfind /hsm/hsmfs1/ -rpermanent
    user@mds1:~#