Backing Up the Store

To make backups of your KVStore, use the CLI snapshot command to copy nodes in the store. To maintain consistency, no topology changes should be in process when you create a snapshot. Restoring a snapshot relies on the system configuration having exactly the same topology that was in effect when you created the snapshot.

When you create a snapshot, it is stored in a subdirectory of the SN. But these snapshots don't become persistent backups unless they are copied to separate storage. It is your responsibility to copy each of the snapshots to another location, preferably on a different machine, for data safety.

Due to the distributed nature and scale of Oracle NoSQL Database, it is unlikely that a single machine has the resources to contain snapshots for the entire store. This document does not address where and how you should store your snapshots.

Taking a Snapshot

Note:

To avoid any snapshot from being inconsistent or unusable, do not take snapshots while any configuration (topological) changes are in process. At the time of the snapshot, use the ping command and save the output information that identifies Masters for later use during a load or restore. For more information, see Managing Snapshots.

To create a snapshot from the Admin CLI, use the snapshot create command:

kv-> snapshot create -name <snapshot name>

A snapshot consists of a set of hard links to data files in the current topology, specifically, all partition records within the same shard. The snapshot does not include partitions in independent shards. To minimize any potential inconsistencies, the snapshot utility performs its operations in parallel as much as possible.

To create a snapshot with a name of your choice, use snapshot create –name <name>.

kv-> snapshot create -name Thursday
Created snapshot named 110915-153514-Thursday on all 3 nodes
Successfully backup configurations on sn1, sn2, sn3

Snapshot Activities

Creating a snapshot of the Oracle NoSQL Database store performs these activities:

Backs up the data files
Backs up the configuration and environment files required for restore activities

To complete a full set of snapshot files, the snapshot command attempts to backup the storage node data files, configuration files, and adds other required files. Following is a description of the various files and directories the snapshot command creates or copies:

Creates a snapshots directory as a peer to the env directory. Each snapshots directory contains one subdirectory for each snapshot you create. That subdirectory contains the *.jdb files.

The snapshot name subdirectory with a date-time-name prefix has the name you supply with the –name parameter. The date-time prefix consists of a 6-digit, year, month, day value in YYMMDD format, and a 6-digit hour, minute, seconds timestamp as HHMMSS. The date and time values are separated from each other with a dash (-), and include a dash (-) suffix before the snapshot name.

kvroot/mystore/sn1/rg1-rn1/snapshots/170417-104506-snapshotName/*.jdb 
kvroot/mystore/sn1/rg1-rn1/env/*.jdb 
kvroot/mystore/sn1/admin1/snapshots/170417-104506-snapshotName/*.jdb 
kvroot/mystore/sn1/admin1/env/*.jdb

Copies the root config.xml file to the date-time-name directory.

kvroot/config.xml > 
kvroot/snapshots/170417-104506-snapshotName/config.xml

Creates a status file in the date-time-name subdirectory. The contents of this file, snapshot.stat, indicate whether creating a snapshot was successful. When you restore to a snapshot, the procedure first validates the status file contents, continuing only if the file contains the string SNAPSHOT=COMPLETED.

kvroot/snapshots/170417-104506-snapshotName/snapshot.stat

Creates a lock file in the date-time-name subdirectory. The lock file, snapshot.lck, is used to avoid concurrent modifications from different SN Admins within the same root directory.

kvroot/snapshots/170417-104506-snapshotName/snapshot.lck

Creates a subdirectory of the date-time-name subdirectory, security. This subdirectory has copies of security information copied from kvroot/security.

kvroot/snapshots/170417-104506-snapshotName/security

Copies the root security policy from kvroot/security.policy, to the date-time-name subdirectory.

kvroot/snapshots/170417-104506-snapshotName/security.policy

Copies the store security policy to date-time-name subdirectory, into another subdirectory, mystore.

kvroot/snapshots/170417-104506-snapshotName/mystore/security.policy

Copies the Storage Node configuration file, config.xml, from kvroot/mystore/sn1/config.xml to a corresponding SN subdirectory in the date-time-name directory.

kvroot/snapshots/170417-104506-snapshotName/mystore/sn1/config.xml

Copying a Snapshot

Keeping a snapshot in place for a short time so that it can be used to rollback the store after an upgrade is a reasonable thing to do. In such a scenario, it might be sufficient to delete the snapshot without copying it if the upgrade can be verified relatively quickly and the snapshot is no longer needed.

For ensuring data safety during disk or hardware failures, it is recommended that you convert these snapshots into persistent backups. Otherwise, if the machine suffers a disk or other hardware failure, or if store files are deleted or overwritten, the snapshot will be lost along with the live data for the store maintained on that machine.

To convert the snapshot into a persistent backup, the snapshot needs to be copied to another location on a different machine. Later, you can use the persistent backup to restore the store after a disk or hardware failure.

Deleting a Snapshot

To remove an existing snapshot, use snapshot remove <name>.

kv-> snapshot remove -name 110915-153514-Thursday
Removed snapshot 110915-153514-Thursday

To remove all snapshots currently stored in the store, use snapshot remove –all.

kv-> snapshot create -name Thursday
Created snapshot named 110915-153700-Thursday on all 3 nodes
kv-> snapshot create -name later
Created snapshot named 110915-153710-later on all 3 nodes
kv-> snapshot remove -all
Removed all snapshots

Managing Snapshots

When you create a snapshot, the utility collects data from every Replication Node in the system, including Masters and replicas. If the operation does not succeed for any one node in a shard, the entire snapshot fails.

When you are preparing to take the snapshot, you can use the ping command to identify which nodes are currently running as the Master. Each shard has a Master, identified by the MASTER keyword. For example, in the sample output, replication node rg1-rn1, running on Storage Node sn1, is the current Master:

java -Xmx64m -Xms64m \
-jar KVHOME/lib/kvstore.jar ping -port 5000 -host node01 \
-security USER/security/admin/security
Pinging components of store mystore based upon topology sequence #316
300 partitions and 3 storage nodes
Time: 2018-09-28 06:57:10 UTC   Version: 18.3.2
Shard Status: healthy:3 writable-degraded:0 read-only:0 offline:0
Admin Status: healthy
Zone [name=Boston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
RN Status: online:9 offline:0 maxDelayMillis:1 maxCatchupTimeSecs:0
Storage Node [sn1] on node01:5000
   Zone: [name=Boston id=zn1 type=PRIMARY]
   Status: RUNNING
   Ver: 18.3.2 2018-09-17 09:33:45 UTC  Build id: a72484b8b33c
      Admin [admin1]          Status: RUNNING,MASTER
      Rep Node [rg1-rn1]      Status: RUNNING,REPLICA
        sequenceNumber:231 haPort:5011 available storage size:14 GB delayMillis:1 catchupTimeSecs:0
      Rep Node [rg2-rn1]      Status: RUNNING,REPLICA
        sequenceNumber:231 haPort:5012 available storage size:12 GB delayMillis:1 catchupTimeSecs:0
      Rep Node [rg3-rn1]      Status: RUNNING,MASTER
        sequenceNumber:227 haPort:5013 available storage size:13 GB
Storage Node [sn2] on node02:6000
   Zone: [name=Boston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
   Status: RUNNING
   Ver: 18.3.2 2018-09-17 09:33:45 UTC  Build id: a72484b8b33c
      Rep Node [rg1-rn2]      Status: RUNNING,MASTER
        sequenceNumber:231 haPort:6010 available storage size:15 GB
      Rep Node [rg2-rn2]      Status: RUNNING,REPLICA
        sequenceNumber:231 haPort:6011 available storage size:18 GB delayMillis:1 catchupTimeSecs:0
      Rep Node [rg3-rn2]      Status: RUNNING,REPLICA
        sequenceNumber:227 haPort:6012 available storage size:12 GB delayMillis:1 catchupTimeSecs:0
Storage Node [sn3] on node03:7000
   Zone: [name=Boston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
   Status: RUNNING
   Ver: 18.3.2 2018-09-17 09:33:45 UTC  Build id: a72484b8b33c
      Rep Node [rg1-rn3]      Status: RUNNING,REPLICA
        sequenceNumber:231 haPort:7010 available storage size:11 GB delayMillis:1 catchupTimeSecs:0
      Rep Node [rg2-rn3]      Status: RUNNING,MASTER
        sequenceNumber:231 haPort:7011 available storage size:11 GB
      Rep Node [rg3-rn3]      Status: RUNNING,REPLICA
        sequenceNumber:227 haPort:7012 available storage size:10 GB delayMillis:1 catchupTimeSecs:0

You should save the above information and associate it with the respective snapshot, for later use during a load or restore. If you decide to create an off-store copy of the snapshot, you should copy the snapshot data for only one of the nodes in each shard. If possible, copy the snapshot data taken from the node that was serving as the Master at the time the snapshot was taken.

Note:

Snapshots include the admin database, which may be required if the store needs to be restored from this snapshot.

Snapshot data for the local Storage Node is stored in a directory inside of the KVROOT directory. For each Storage Node in the store, you have a directory named:

KVROOT/<store>/<SN>/<resource>/snapshots/<snapshot_name>/files

where:

<store> is the name of the store.
<SN> is the name of the Storage Node.
<resource> is the name of the resource running on the Storage Node. Typically, this is the name of a replication node.
<snapshot_name> is the name of the snapshot.

Snapshot data consists of a number of files. For example:

 > ls /var/kvroot/mystore/sn1/rg1-rn1/snapshots/110915-153514-Thursday
00000000.jdb 00000002.jdb 00000004.jdb 00000006.jdb
00000001.jdb 00000003.jdb 00000005.jdb 00000007.jdb

Note:

To preserve storage, purge obsolete snapshots on a periodic basis.

Impact of Erasure with snapshots

Snapshot based backups create hard-links to original files. Until these backups are copied to their target location (complete off-store copy work) and the corresponding hard-links are removed (performing a snapshot remove command), erasure doesn't process obsolete data in those files. Erasure ignores files with hard-links to them.

Avoiding Disk Usage Violation

The storage engine does not consider the data consumed by snapshots when it collects information about disk space usage. Initially, the files in the snapshot are considered to be part of the live data of the store. Over time, though, as older files are cleaned and deleted, their presence in the snapshot causes the files to be retained and use the disk space that is not taken into account by the storage engine. It could cause a disk usage violation, in which case further writes to the store are disabled. To avoid this problem, users should delete snapshot files at regular intervals.