Sun Java System Calendar Server 6.3 Administration Guide

22.5 Dealing with Calendar Server Database Issues

This section covers various issues involving the Calendar Server (Berkeley Database) databases:

This section contains the following topics:

22.5.1 Finding Berkeley Database Tools

Many of the troubleshooting steps you will want to take require having access to the Berkeley database utility programs. While a version of these utility programs is available in the Calendar Server bundle, they are not supported. You might want to obtain more information directly from Sleepycat Software (http://www.oracle.com/database/berkeley-db/index.html.

This section covers the following topics:

22.5.1.1 To Access the Berkeley Database Utilities

Set and export the LD_LIBRARY_PATH environment variable to reflect the following directory:

cal-svr-base/SUNWics5/cal/tools/unsupported/bin/

22.5.1.2 List of Available Tools

The following table lists some of the commonly used Berkeley database tools (utility programs).

Berkeley Database Tools 

Description 

db_archive

Writes the path names of log files that are no longer in use to the standard output, one pathname per line. 

db_checkpoint

A daemon process that monitors the database log and periodically calls the checkpoint routine to checkpoint it. 

db_deadlock

Traverses the database environment lock region and aborts a lock request each time it detects a deadlock or a lock request that has timed out. 

db_dump

Writes the specified file to standard output in a flat-text format understood by the db_load utility.

db_load

Reads from the standard input and loads it into the database file specified. If the file does not already exist it creates it. 

db_printlog

Debugging utility that dumps log files in human-readable format. 

db_recover

Restores the database to a consistent state after an unexpected application, database, or system failure. 

db_stat

Displays statistics for the database environment. 

db_verify

Verifies the structure of one or more files and the databases they contain. 

ProcedureDetecting and Fixing Database Deadlocks

If the Berkeley database is in a deadlock state, you must reset the database. It is important to detect this condition as early as possible.

To enable the system to periodically check the databases to detect a deadlock state and inform the Administrator:

  1. Log in as an administrator with permission to change the configuration.

  2. Stop Calendar Server services by issuing the stop-cal command.

  3. Change to the /etc/opt/SUNWics5/cal/config directory.

  4. Save your old ics.conf file by copying and renaming it.

  5. Edit the ics.conf, if necessary, to have the following value:

    local.caldb.deadlock.autodetect=”yes”


    Note –

    When this parameter is set to “yes”, the db_deadlock daemon is launched that will monitor the lock region.


22.5.2 Detecting Database Corruption

Calendar database corruption can be caused by various reasons: system resource contention, hardware failures, application errors, database failures, and of course human error. This section describes how to detect calendar database corruption:

22.5.2.1 Database Corruption Basics

No one can guarantee corruption free databases. But you can minimize data loss and operational downtime. Closely monitoring the database and calendar server is key to detecting corruption early. Frequent and complete backups are the key to recovering from corruption once it is found.

There are two levels of corruption possible in a calendar database:

22.5.2.2 Monitoring Log Files

Monitor the Calendar Server log files, including the alarm logs, for any error messages that might indicate database corruption.

You should inspect the log files on a regular basis for ALERT, CRITICAL, ERROR, and WARNING level errors and, if found, examine the events for possible problems with the operation of Calendar Server. The NOTICE and INFORMATION level log events are generated during normal operation of Calendar Server and are provided to help you monitor server activity.

Never remove any transaction log files in the database directory. The transaction log files contain the transaction updates (additions, modifications, or deletions), and removing them can corrupt the calendar database beyond recovery.


Note –

When requesting technical support for Calendar Server, you might be asked to provide the log files for help in resolving problems.


ProcedureTo Check for Calendar Database Corruption

Use the check command to scan for corruptions in the calendar database, including calendar properties (calprops) and events and todos (tasks). If the check command finds an inconsistency that cannot be resolved, it reports the situation in its output.

The check command does not check for corruption in the alarm or group scheduling engine (GSE) databases.

  1. Log in as a user who has administration rights to the system where Calendar Server is installed.

  2. Calendar Server can be either running or stopped; however, if possible, stop Calendar Server.

  3. Make a copy of your calendar database, if you haven’t already done so.

    Copy only the database (.db) files. You don’t need to copy any share (__db.*) or log (log.*) files.

  4. Change to the cal-svr-base/SUNWics5/cal/sbin directory.

    For example, on Solaris Operating Systems for the default directory, enter:

    cd /opt/SUNWics5/cal/sbin

  5. Run the check command on the copy of your calendar database:

    ./csdb check dbdir /tmp/check.out

    If you don’t specify dbdir, check uses the database in the current directory.

    The check command can generate a lot of information, so consider redirecting all output, including stdout and stderr, to a file (as shown in the example).

  6. When check has finished, review the output file. If your database is corrupted, run the rebuild command.

    (See 22.5.6 Rebuilding a Corrupted Calendar Database.)

22.5.3 Dealing with Transaction Log Files Suddenly Very Large and Numerous

It is possible that your automatic purge configuration settings do not properly account for the client user interface your end users prefer. The sudden appearance of large numbers of large transaction log files can be simply the result of a long delay in purging Delete log records. If this delay is purposefully done to accommodate users of the Connector for Microsoft Outlook or the Sync Tool, then the appearance of the large and numerous transaction log files is to be expected. No further action is required. Eventually the system will catch up. However, if your end users are using the Communications Express client, returning the automatic purge settings to their defaults should fix the problem.

22.5.4 Preventing Service Interruptions When Your Database is Corrupted (Read-only Mode)

This sections covers how to keep your corrupted database accessible while you are in recovery mode and includes the following topics:

22.5.4.1 Using Read-only Mode

If you are encountering database corruption, one way to prevent service interruptions is to put your database in read-only mode. This mode allows end users to read database entries, but does not allow additions, modifications, or deletions. If an end user attempts to add, modify or delete any calendar data, the system gives an error message. In addition, administrator tools that add, modify or delete calendar events and todos will not work while the database is in read-only mode.


Note –

If the database is corrupted to the point that it can’t be read, you must interrupt service long enough to restore a backup. The quickest way to restore a backup is to have a good hot backup. See 22.5.8.1 Before You Restore.


ProcedureTo Put a Database in Read-only Mode

  1. Log in as an administrator with permission to change the configuration.

  2. Stop Calendar Server services by issuing the stop-cal command.

  3. At a command line, change to the directory where the ics.conf is located:

    cd /etc/opt/SUNWics5/config

  4. Specify read-only mode for the calendar, by setting the parameter as follows:

    caldb.berkeleydb.readonly=”yes”

  5. Restart Calendar Server by issuing the start-cal command.

    cal-svr-base/SUNWics5/cal/sbin/start-cal

    You must restart the services in order for the ics.conf changes to take effect.

22.5.5 Handling Common Database Failures

This section covers a few of the common database failures and includes some suggested remedies. It contains the following topics:

Procedurecsadmind Won’t Start or Crashes During Startup

Since csadmind is the service that handles both the group scheduling engine (GSE) and the alarm dispatch engine, this could have been caused by offending entries in the GSE queue or the alarm queue.

Remedies:

  1. If csadmind is not running, issue the stop-cal command immediately.

    Leaving calendar server running could cause transaction logs to accumulate, which could further corrupt the database, and could take much longer to reconcile the transaction log files to the database.

  2. Verify that all Calendar Server processes are stopped.

    For instructions on how to verify that all processes are stopped, see To Stop Child Processes.

  3. Try restarting csadmind again by issuing the start-cal -csadmind command.

    If it starts successfully, make sure the two queues are functioning by performing the following steps:

    1. Checking the GSE queue using csschedule.

    2. Checking the alarm queue using dbrig.

      For instructions on running csschedule and dbrig, see Appendix D, Calendar Server Command-Line Utilities Reference.

  4. If csadmind crashes with a dump, analyze the pstack.

    If you notice any GSE related functions in the trace (they will have the letters GSE in them), look at the first entry in the GSE queue and the referenced entry in the events database. Most of the time, the event referred to in the GSE entry is the offending entry. To fix this problem:

    1. Remove the GSE entry using csschedule.

    2. Remove the offending event from the database using cscomponents.

      For instructions on running csschedule and cscomponents, see Appendix D, Calendar Server Command-Line Utilities Reference.

  5. If the entries are not corrupted, then it could be a special case that the calendar server could not handle.

    Take the following steps:

    1. Take a calendar environment snapshot of the corrupted database, and contact customer support.

      To create an environmental backup:

      1. Use the db_checkpoint utility found at:

        cal-svr-base/SUNWics5/cal/tools/unsupported/bin/db_checkpoint

      2. Run db_archive -s.

        Use the -s option to identify all the database files and copy them to a removable medium, such as CD, or DVD, or tape.

      3. Run db_archive -l.

        Use the -l option to identify all the log files and copy unapplied log files to a removable-medium device.

    2. To avoid service interruptions, place your calendar database into a read-only state temporarily, and revert to a hot backup copy.

      • Placing your calendar database into a read-only state temporarily prevents any add, modify or delete transactions from taking place. End users will get an error message when they try to add, modify or delete any calendar data. Administrator tools that add, modify or delete calendar events and todos also will not work while the database is in read-only mode.

        To put your calendar database in read-only mode, edit the ics.conf file and set the following parameter to “yes”, as shown:

        caldb.berkeleydb.readonly=”yes”

      • Revert to a hot backup copy, using the instructions found in 22.5.8 Restoring an Automatic Backup Copy.

        With csstored configured and enabled, a hot backup is available that should be within minutes of being uptodate. You should always verify your hot backup copy to make sure it is not corrupt also. (Run db_verify.)

  6. If all else fails, perform the dump and reload procedure to see if it can salvage the database.

    This procedure is described in 22.5.7 Using the Dump and Load Procedure to Recover a Calendar Database.

ProcedureServices Hung, and End Users Can’t Connect–Orphaned Locks

This condition may be caused by a control thread, which holds a Berkeley DB database page lock, quitting without releasing the lock. To confirm the problem, run pstack on cshttpd processes and csadmind. (pstack is a standard UNIX utility found at: /usr/bin/pstack) It should show threads that are waiting to acquire a lock.

To fix the problem, restart Calendar Server, as follows:

  1. Change to the directory where start-cal resides.

    cd cal-svr-base/SUNWics5/cal/sbin

  2. Issue the start-cal command.

    ./start-cal

Procedurecsdb rebuild Never Finishes–Database Looping

Database looping is usually caused by corruption in the database files. Since it is a database corruption, it can be unrecoverable. There are several options:

  1. Revert to the hot backup.

    If the corruption occurred recently, you can use one of your hot backups.

  2. Use your catastrophe archival recovery process.

    For a suggested process, see 22.5.8 Restoring an Automatic Backup Copy.

  3. Use the dump and reload procedure, 22.5.7 Using the Dump and Load Procedure to Recover a Calendar Database.

22.5.6 Rebuilding a Corrupted Calendar Database

This section describes how to use the csdb rebuild command and contains the following topics:

22.5.6.1 rebuild Overview

The rebuild command scans a calendar database and checks the calendar properties (calprops) events and todos (tasks) for corruption. If the rebuild command finds an inconsistency, it generates a rebuilt calendar database (.db files) in the cal-svr-base/SUNWics5/cal/sbin/rebuild_db directory.

The rebuild command without the -g option rebuilds all databases except the group scheduling engine (GSE) database. To also rebuild the GSE database, include the -g option.

To determine if the GSE database has any entries, run the csschedule -v list command and then let the GSE finish processing the entries before you run the rebuild command.

ProcedureTo Rebuild a Calendar Database

  1. Log in as a user who has administration rights to the system where Calendar Server is installed.

  2. Stop Calendar Server.

  3. Make a copy of your calendar databases, placing them into the /tmp/db directory.

    Copy the database (.db) files and the log (log.*) files. You don’t need to copy any share (__db.*) files.

  4. Change to the cal-svr-base/SUNWics5/cal/sbin directory.

    For example, on Solaris Operating Systems, for the default directory, enter:

    cd /opt/SUNWics5/cal/sbin


    Note –

    If disk space is a problem for the sbin directory, run the rebuild command in a different directory.


  5. Run the rebuild command on the copy of your calendar database:

    ./csdb rebuild /tmp/db /tmp/

    If you don’t specify a database path, rebuild uses the current directory. The /tmp/ parameter species the destination directory for the rebuilt database.

    To also rebuild the GSE database, include the -g option.

    The rebuild command can generate a lot of information, so consider redirecting all output, including stdout and stderr, to a file.


    Note –

    Always rebuild your calendar database using the latest backup copy.

    However, if you have experienced a significant loss of data and you have periodically backed up your database and have more than one copy available, rebuild from the latest copy to the oldest one. (The only drawback is that calendar components that were deleted will reappear in the rebuilt database.)

    For example, if you have three sets of backup calendar database files in directories db_0601, db_0615, and db_0629, run the rebuild command in the following sequence:


    ./csdb rebuild db_0629 
    ./csdb rebuild db_0615 
    ./csdb rebuild db_0601

    The rebuild command then writes the rebuilt database to the cal-svr-base/SUNWics5/cal/sbin/rebuild_db directory.


  6. When rebuild has finished, review the output in the rebuild.out file.

    If the rebuild was successful, the last line in the rebuild.out file should be:


    Calendar database has been rebuilt
  7. After you have verified that rebuild was successful in the previous step, copy the rebuilt database (.db) files from the rebuild_db directory to your production database.

  8. If you have any share (__db.*) or log (log.*) files from the corrupted database, move them to another directory.

  9. Restart Calendar Server.

22.5.6.2 Sample Rebuild Output

The following example shows the command and the output that it generated:


# ./csdb -g rebuild
Building calprops based on component information.
Please be patient, this may take a while...
Scanning events database...
512 events scanned
Scanning todos database...
34 todos scanned
Scanning events database...
512 events scanned
Scanning todos database...
34 todos scanned
Scanning deletelog database...
15 deletelog entries scanned
Scanning gse database...
21 gse entries scanned
Scanning recurring database...
12 recurring entries scanned
Successful components db scan
Calendar database has been rebuilt
Building components based on calprops information.
Please be patient, this may take a while...
Scanning calprops database to uncover events...
25 calendars scanned
Scanning calprops database to uncover todos...
25 calendars scanned
Successful calprops db scan
Calendar database has been rebuilt
            

Note –

The preceding sample output shows the events and the todos databases scanned twice each. This is not an error. It scans the first time to verify the information in the calendar properties database and then scans again to make sure calendar properties database is accessible.


22.5.7 Using the Dump and Load Procedure to Recover a Calendar Database

This sections contains the following topics:

22.5.7.1 Dump and Load Overview

Use the dump and load procedure to try to recover a corrupted database. The dump and load procedure uses the Berkeley database db_dump and db_load utilities, which Calendar Server includes in the following directory:

cal-svr-base/SUNWics5/cal/tools/unsupported/bin

The db_dump utility reads a database file and writes the database entries to an output file, using a format that is compatible with the db_load utility.

For documentation about the db_dump and db_load utilities, refer to the Sleepycat Software Web site:

http://www.sleepycat.com/docs/utility/index.html

Your success in recovering a database using the db_dump and db_load utilities depends on the degree of corruption of your database. You might need to try several db_dump options before you successfully recover your database. If your database is severely corrupted, however, recovery might not be possible, and you might need to revert to the last good hot backup or archive backup of your database.


Note –

Before you perform the dump and load procedure, your calendar database must be Berkeley DB version 3.2.9, or later. If you have an earlier version, first run the cs5migrate utility to upgrade your calendar database.

For the most up to date version of cs5migrate, call Sun technical support.


ProcedureTo Perform the Dump and Load Procedure

  1. Log in as the user and group under which Calendar Server is running, such as icsuser and icsgroup, or as superuser (root).

  2. Stop Calendar Server, if necessary.

  3. Backup your corrupted database using a utility such as csbackup, the Sun StorEdge Enterprise BackupTM software, or Legato Networker®.

    For more information refer to Chapter 17, Backing Up and Restoring Calendar Server Data.

  4. Dump each corrupted database file using the db_dump utility.

    The database files are ics50calprops.db, ics50journals.db, ics50alarms.db, ics50events.db, ics50todos.db, and ics50gse.db.

    Run db_dump using the following options, in order, until your database is recovered (or until you determine that the database can’t be recovered):

    • No options for minor database corruption.

    • -R option for moderate database corruption.

    • -R option for severe database corruption. The -R option dumps more data than the -r option, including partial and deleted records, from the corrupted database.

      For example, to run db_dump with the -r option:

      db_dump -r ics50events.db \> ics50events.db.txt

  5. Load the output file into a new database file using the db_load utility.

    For example:

    db_load new.ics50events.db < ics50events.db.txt

    If db_load reports an odd number of keys or data entries, edit the db_dump output file, and remove the odd key or data entries. Then run db_load again.

  6. Repeat the previous two steps for the other corrupted database files.

    That is, run db_dump for the other corrupted database files.

  7. Rebuild the recovered database files using the csdb rebuild command, as described in 22.5.6 Rebuilding a Corrupted Calendar Database.

    When rebuild has finished, review the output in the output file. If the rebuild was successful, the last line in the rebuild.out file should be:


    Calendar database has been rebuilt

    If the csdb rebuild command was not successful, dump your database using the next db_dump option (-r or -R).

    If the db_dump -R option does not recover your corrupted database, contact your Sun Microsystems technical support or sales account representative for assistance. In the meantime, you might need to revert to the last good backup of your database.

22.5.8 Restoring an Automatic Backup Copy

If you have used the automatic backup feature described in Chapter 9, Configuring Automatic Backups (csstored), you can use the hot backup copy when your live database is corrupted.

This sections covers how to restore the two different automatic backups:

22.5.8.1 Before You Restore

Before you restore a backup, be sure that you have:

ProcedureTo Restore a Hot Backup

Hot backups should be your first choice of backup when your live database is corrupted. To restore a hot backup, follow these steps:

  1. Identify any log files that were unapplied or open for writing in the corrupted live database directory.

  2. Close the log that was open for writing. It contains the most recent transactions.

  3. Create a new (recovery) directory.

  4. Copy the current hot backup copy into the new recovery database directory.

  5. Copy the log.* files from your corrupted live database directory into your new recovery database directory.

  6. If you are keeping an archive copy of the database, copy the logs that had not been applied to the live database into the archive directory, so your archive backup copy will be complete.

  7. Run db_recover with the -c -h options specified against the new recovery database.

    For example, if your new recovery directory is called recoverydb, then the command would be as follows:

    db_recover -c -h recoverydb

  8. Leave the log.* files in the new recovery directory.

    The db_recover program applied the log files to the new recovery databases, but starting with version 42, the Berkeley DB expects them to remain.

  9. Run db_verify against the database files in the new recovery directory. To run db_verify:

    1. Stop the Calendar Server using these commands.

      cd /opt/SUNWics5/cal/sbin

      ./stop-cal

    2. Create another copy of the Calendar Server database (csdb) using this command.

      cp -Rp /var/opt/SUNWics5/csdb /var/opt/SUNWics5/csdb.db_verify

    3. Run db_verify on the copy of csdb.


      Note –

      Do not run db_verify on the original csdb.


      LD_LIBRARY_PATH=/opt/SUNWics5/cal/lib
      export LD_LIBRARY_PATH
      cd /opt/SUNWics5/cal/tools/unsupported/bin
      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50alarms.db
      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50calprops.db
      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50events.db
      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50gse.db
      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50journals.db
      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50recurring.db
      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50todos.db
      ./db_verify -o -h /var/opt/SUNWics5/csdb.db_verify ics50deletelog.db

      Note –

      Run db_verify with -o option for ics50deletelog.db.


      If db_verify completes running successfully, you will not get any error messages. If the database file gets corrupted, it throws error messages. For example:


      ./db_verify -h /var/opt/SUNWics5/csdb.db_verify ics50todos.db
      db_verify:Page 612: last item on page sorted greater than parent entry
      db_verify: Page 612: incorrect next_pgno 885 found in leaf chain (should be 501)
      db_verify: Page 0: page 501 encountered a second time on free list
      db_verify: DB->verify: ics50todos.db: DB_VERIFY_BAD: Database verification failed
      
  10. Run csdb -v list against the new recovery directory.

  11. If the new recovery directory passed all three preceding recovery steps, replace the old corrupted live database with the new recovery database.

  12. Copy the new live database into your hot backup directory to function as the new snapshot.

    All new logs will be applied to this copy until the next regular snapshot is taken.

  13. Start Calendar Server.

  14. If the new recovery directory failed any of the steps, identify an uncorrupted older hot backup as follows:

    1. Working backward through your hot backups, find the most recent copy that is not corrupted by running db_verify and csdb -v list on each in turn.

    2. The first hot backup copy that passes can be restored to your live database directory.

      Replace the corrupted live database with the clean hot backup, as described in To Restore a Hot Backup. (Be sure to read 22.5.8.1 Before You Restore first.)

    3. If none of your hot backups work and you do not have archive backups to try, call technical support. If you do have archive backups, follow the procedure that followsTo Restore an Archive Backup. (See also, 22.5.8.1 Before You Restore.)

ProcedureTo Restore an Archive Backup

If you do not have an uncorrupted hot backup, but have archive backups and their transaction logs, you can restore the most current uncorrupted version of the archived database by performing the following steps:

  1. Identify any log files that were unapplied or open for writing in the corrupted live database directory.

  2. Close the log that was open for writing. It contains the most recent transactions.

  3. Create a new (recovery) directory.

  4. Copy the most recent archive copy and its log files into the new recovery database directory.

  5. Copy any unapplied log.* files from your corrupted live database directory into your new recovery database directory.

  6. Run db_recover with the -c-h options specified against the new recovery database.

    For example, if your new recovery directory is called recoverydb, then the command would be as follows:

    db_recover -c -h recoverydb

  7. Leave the log.* files in the new recovery directory.

    The db_recover program applied the log files to the new recovery databases, but starting with version 4.2, Berkeley DB expects the log files to still be there.

  8. Run db_verify against the database files in the new recovery directory.

    Steps in the To Restore a Hot Backup procedure, explains how to run db_verify.

  9. Run csdb -v list against the new recovery directory.

  10. If the new recovery directory passed all three preceding recovery steps, replace the old corrupted live database with the new recovery database.

  11. Copy the new live database into your hot backup directory to function as the new snapshot.

  12. Start Calendar Server.

  13. If the new recovery directory failed any of the steps, identify an uncorrupted older archive backup as follows:

    1. Working backward through your archive backup copies, find the most recent copy that is not corrupted by running the three recovery programs against each of them in turn: db_recover -c-h, db_verify, and csdb -v list.

    2. The first archive copy that passes can be restored to your live database directory.

      Replace the corrupted live database with the clean archive backup, as shown in To Restore an Archive Backup.

    3. If none of your archive backups work, call technical support.

22.5.9 Repairing Custom Backup Scripts

This section includes the following topics:

22.5.9.1 Berkeley Tools Now Compiled with a Dynamic Library

If you have created a custom backup script using the Berkeley database tools, such as db_recover, you may find that it will no longer work after upgrading to Calendar Server. The reason for this is that the earlier versions of Calendar Server compiled the tools with a static library. The tools are now compiled with a dynamic library, libdb-4.2.so.

22.5.9.2 To Repair a Custom Backup Script

To use the new dynamic library with your existing custom scripts, set the following global variable as shown:

LD_LIBRARY_PATH=libdb-4.2.so