If a disk has failed, or is in the process of failing, you can replace the disk. Disk replacement procedures are necessary to keep the store running. This section walks you through the steps of replacing a failed disk, to preserve data availability.
The following example deploys a KVStore to a set of three machines,
each with 3 disks. Use the storagedir
flag of the
makebootconfig
command to specify the storage location of the
other two disks.
> java -jar KVHOME/lib/kvstore.jar makebootconfig \ -root /opt/ondb/var/kvroot \ -port 5000 \ -admin 5001 \ -host node09 -harange 5010,5020 \ -num_cpus 0 \ -memory_mb 0 \ -store-security none \ -capacity 2 \ -storagedir /disk1/ondb/data \ -storagedir /disk2/ondb/data
With a boot configuration such as that shown above, the directory structure that is created and populated on each machine would then be:
- Machine 1 (SN1) - - Machine 2 (SN2) - - Machine 3 (SN3) - /opt/ondb/var/kvroot /opt/ondb/var/kvroot /opt/ondb/var/kvroot log files log files log files /store-name /store-name /store-name /log /log /log /sn1 /sn2 /sn3 config.xml config.xml config.xml /admin1 /admin2 /admin3 /env /env /env /disk1/ondb/data /disk1/ondb/data /disk1/ondb/data /rg1-rn1 /rg1-rn2 /rg1-rn3 /env /env /env /disk2/ondb/data /disk2/ondb/data /disk2/ondb/data /rg2-rn1 /rg2-rn2 /rg2-rn3 /env /env /env
In this case, configuration information and administrative data is stored in a location that is separate from all of the replication data. The replication data itself is stored by each distinct Replication Node service on separate, physical media as well. Storing data in this way provides failure isolation and will typically make disk replacement less complicated and time consuming.
To replace a failed disk:
Determine which disk has failed. To do this, you can use standard system monitoring and management mechanisms. In the previous example, suppose disk2 on Storage Node 3 fails and needs to be replaced.
Then given a directory structure, determine which Replication Node service to stop.
With the structure described above, the store writes replicated data to disk2 on Storage Node 3, so
rg2-rn3
must be stopped before replacing the failed disk.
Use the plan stop-service
command to stop the affected
service (rg2-rn3) so that any attempts by the system to communicate with it
are no longer made; resulting in a reduction in the amount of error output
related to a failure you are already aware of.
kv-> plan stop-service -service rg2-rn3
Remove the failed disk (disk2) using whatever procedure is dictated by the operating system, disk manufacturer, and/or hardware platform.
Install a new disk using any appropriate procedures.
Format the disk to have the same storage directory as before;
in this case, /disk2/ondb/var/kvroot
.
With the new disk in place, use the plan start-service
command to start the rg2-rn3
service.
kv-> plan start-service -service rg2-rn3
It can take a considerable amount of time for the disk to recover all of its data; depending on the amount of data that previously resided on the disk before failure. It is also important to note that the system may encounter additional network traffic and load while the new disk is being repopulated.