Sun StorageTek 5800 System Version 1.1.1 Release Notes |
This document contains important information about the Sun StorageTek 5800 System, Version 1.1.1. Read this document so that you are aware of issues or requirements that can affect the installation and operation of the 5800 system.
This document contains the following sections:
This section includes a description of the major enhancements available with Version 1.1.1.
When Sun service personnel install, upgrade, or expand the 5800 system hardware, they also update the service tags on the system that describe the hardware. You can register these service tags with Sun to allow you to identify your equipment and expedite service calls.
You can use the new CLI command logdump to collect information from the system and send it back to Sun via Hypertext Transfer Protocol over Secure Socket Layer (HTTPS).
You can install multiple full-cell 5800 systems as a multicell hive. (Half-cell systems are not supported in a multicell hive.)
Sun has tested and qualified operating with as many as eight full-cell systems in a hive.
Version 1.1.1 includes hot-swappable disks that can be removed and replaced easily while the system is operational.
Upgrading from Version 1.1 to Version 1.1.1 should be performed only by Sun service personnel. Please keep in mind that all nodes and disks on the 5800 system must be online and operating correctly before the upgrade.
If the service person has to replace nodes and/or disks, you will have to wait at least 12 hours after replacement while the system completes a Data Reliability check before the upgrade can begin.
Consult Sun service personnel for complete information about the upgrade process.
If your 5800 system uses the Sun Fire X2100 Server for a service node, you may notice some issues with the power LED.
To determine if your system uses the Sun Fire X2100 Server for a service node, compare the service node on your system with the front and back views of the Sun Fire X2100 Server, as shown in FIGURE 1 and FIGURE 2.
FIGURE 1 Front View of the Sun Fire X2100 Server
FIGURE 2 Rear View of the Sun Fire X2100 Server
If your system uses the Sun Fire X2100 Server, you can determine if the power is on or off to the service node when the power LED is lit by pushing in the CD/DVD drive eject button. If the LED on the caddy starts to blink, the service node is powered on. If the LED on the caddy does not light, then the service node is powered off.
If you receive email alerts and syslog messages indicating that one or more nodes are offline, inspect the 5800 system to determine if the secondary switch (the top switch on the system) is active. FIGURE 3 shows the components on the back of the switch. If the port connection status LEDs are mostly lit for the top switch, that secondary switch is active.
If the secondary (top) switch is active, contact Sun service immediately to arrange for a replacement of the primary (bottom) switch.
Until the primary switch is replaced, the system might encounter the following issues while operating from the secondary switch:
The maximum heat output of the 5800 system is listed incorrectly in the Sun StorageTek 5800 Site Preparation Guide. The correct maximum heat output is as follows:
TABLE 1 lists the remaining power available for additional equipment in a rack when a 5800 system is installed in the rack.
During a restore operation, you must reboot the cell after the restore of the first (most recent) tape is completed. After the restore of the first (most recent) tape is completed, use the CLI reboot command to reboot the cell. When the cell comes back online after reboot, continue restoring from the remaining tapes.
Query and WebDAV functionality are not available until after the reboot. During approximately the first 12 hours after the reboot, some objects from the first tape might not be accessible via query and WebDAV, although objects restored from the remaining tapes will be accessible as soon as they are restored.
The emulator directory in the Software Developer’s Kit (SDK) .zip archive has been renamed in Version 1.1.1 to openedition. This directory contains the open edition software that allows you to test client applications without having to connect to a 5800 system.
This section lists the client operating systems from which you can run applications that store, retrieve, and query data on the 5800 system. The applications can be written in either the C or Java programming languages, using the 5800 system application programming interface (API):
You can access the 5800 system GUI using the following browsers:
The browser must be running Versions 1.5 or 1.6 of the Java Runtime Environment.
You can access the data on the 5800 system using Web-based Distributed Authoring and Versioning (WebDAV).
You can read the data on the system using WebDAV from any Hypertext Transfer Protocol (HTTP) browser running on any system that is on the same network as the 5800 system
For full read and write access to data on the 5800 system, you can use free software cadaver (command-line WebDAV client for Unix) or neon (HTTP and WebDAV client library with C interface). Consult the following URL for more information about this cadaver and neon:
MAC OS X allows you to mount the 5800 system as a network share and gain read and write access via WebDAV to the data on the 5800 system.
The WebDAV implementation on the 5800 system has also been tested with the KDE Konqueror, Version 3, browser, and the Internet Explorer, Version 6 and later, browsers, which provide full read/write WebDAV access.
The use of WebDAV in multicell configurations is not supported.
This section provides information about functional limitations and bugs that were described in the Version 1.1 release notes, and which have been resolved in this Version 1.1.1 product release.
This section provides information about functional limitations and bugs in this version of the product release. Note that if a recommended workaround is available for a bug, it follows the bug description.
Bug 6403951 - The open edition software supports the Delete Record operation of NameValueObjectArchive.delete and hc_delete_ez. However, the open edition software does not remove the underlying data file when the last metadata record is deleted. The semantics are correct, but the underlying space is not reclaimed.
Bug 6406170 - When you make a configuration change, certain properties require a reboot to take effect. Once the change is entered, however, you can no longer determine the current value, since the cellcfg command shows the new (pending) value instead. You also cannot tell that the displayed value is a pending value and that a reboot is still required.
Bug 6413553 - When you access a virtual view from a browser (issue a WebDAV query), the system might not return complete results since the number of entries listed in a WebDAV directory is limited by the size of the file system cache on the system on which the WebDAV query was issued. The maximum number of results displayed is 5000.
Bug 6450745 - In some rare cases, the query engine might get hung in starting or stopped state.
Workaround - Try rebooting the system to create the query engine and repopulate it with metadata. The process could take 12 to 48 hours.
Bug 6451150 - Sometimes when you issue the CLI commands shutdown or reboot, the system returns the messages “It is not safe to shut down the system” or “It is not safe to reboot the system.” These messages indicate that the system is in the process of initializing the query engine.
Workaround - Although you can continue with the shutdown or reboot process, for best performance, wait until the query engine is fully initialized before proceeding.
Bug 6458653 - To ensure the integrity of data on the 5800 system, the system must be operated only on a secure, internal network.
Bug 6464055 - In the schema definition file, you can specify a metadata field as queryable = false. If you later change the schema definition file to indicate that queryable = true for that field, any data that you add to the system after the change includes that field as a queryable field. However, data that was previously stored on the system is not updated and is not queryable with that field.
Bug 6464866 - It is not possible to clear metadata schema after it is configured.
Workaround - If you need to clear fields from the metadata schema, either wipe all hive data (which clears the schema as a side effect), or contact Sun support for assistance.
Bug 6481476 - The system may respond to some queries with an out-of-memory error message.
Workaround - When developing queries using the Java API, set
maxFetchsize in the range of 2000 - 5000.
Bug 6489627 - When the system first starts up, data operations may fail, even if the CLI reports that data services online.
Workaround - Wait for all disks to come online (run the sysstat or hwstat -v commands to determine the number of disks online). If the problem persists after all working disks are online, retry the operations from the client following the best practices described in the Sun StorageTek 5800 System Client API Reference Guide. If necessary, contact Sun services to replace bad drives.
Bug 6491877 - If clients attempt a large number of concurrent deletes, the system might go offline.
Workaround - Avoid large numbers of concurrent deletes; if the system goes offline, reboot to bring it back online.
Bug 6495883 - After a disk or node fails, delete operations to the system might fail for up to three minutes.
Workaround - Retry the deletions after three minutes.
Bug 6500528 - If a record is deleted from the 5800 system open edition software using DeleteRecord, a WebDAV view might still show a link to the data, although not the data itself.
Workaround - Stop and restart the open edition software.
Bug 6501640 - Time stamps on stores, retrieves, and queries in the SDK example programs might seem inconsistent.
Workaround - When planning stores, retrieves, and queries using the SDK Java example programs, be aware of the following:
Bug 6502605 - The system erroneously allows you to change attributes such as queryable for nonextensible namespaces.
Workaround - Do not change attributes for nonextensible namespaces.
Bug 6507353 - C API core dumps if a query result is freed after the session is freed.
Workaround - Do not call hc_session_free() before the resultset is freed with hc_qrs_free().
Bug 6516036 - The first attempt to do a restore operation after the 5800 system is rebooted might fail with a message of Connection Refused.
Workaround - Retry the restore operation; it is expected to work on the second try.
Bug 6518738 - While the system is performing a backup operation, it might generate multiple alert messages about nodes joining and leaving the system.
Workaround - You can safely ignore these messages.
Bug 6520374 - If you stop and then restart the system open edition software soon after, it might fail with Java errors.
Workaround - Try restarting the open edition software again.
Bug 6522009 - After you delete a file in a WebDAV view, the file might still appear to be present.
Workaround - Wait approximately five minutes and the file should no longer be displayed.
Bug 6531153 - You might not be able to access the CLI on a 5800 system from a system running Linux with a Kernel version greater than 2.6.17.
Workaround - Disable windows scaling on the Linux system using the following command:
echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
Or, use sysctl to turn off window scaling.
Bug 6533145 - A query to the data on the 5800 system that includes metadata fields that are stored in more than one table might fail.
Workaround - Make sure that fields that are queried together are grouped in the same table.
Bug 6535947 - A query to the data on the 5800 system that includes a high number (more than 40, for example) of large string metadata attributes might cause the query to fail.
Workaround - Limit the number of large string metadata attributes in a query to fewer than 40.
Bug 6538378 - The 5800 system open edition software might display a number of WARN!! EOF error messages.
Workaround - These innocuous messages can be safely ignored.
Bug 6539494 - The CLI sensors command and the GUI Environmental Status Panel might erroneously indicate that nodes 1, 3, and 13 are disabled.
Workaround - Check the CLI hwstat command and the GUI Cell Summary Panel to determine if the nodes are actually disabled.
Bug 6539500 - For the first 5 or 10 minutes after a node goes offline, the CLI sensors command and the GUI Environmental Status Panel might erroneously report active voltages, temperatures, and fan speeds for the node.
Workaround - Wait a few minutes and the CLI and GUI should report sensor data as disabled for the offline node.
Bug 6541837 - Rarely, an add or delete metadata operation for an object might fail if the system has not released the lock put on that object during a previous operation.
Workaround - If the symptom persists for more than 30 minutes, reboot the system.
Bug 6542247 - You cannot use an SMTP server that requires authentication to receive system alert emails.
Workaround - Configure the 5800 system with an SMTP server that does not require authentication.
Bug 6554457 - In some cases, switch 1 might fail over to switch 2, but no email alert might be sent to indicate this.
Bug 6557612 - If the network cable connection on a node is experiencing transient failures, the 5800 system might log missing heartbeat messages as well as messages indicating that the switch has failed over.
Workaround - Report the symptom to Sun service, and schedule a replacement of the node or network cable.
Bug 6558322 - If a client stores a large (greater than 1000 MB) object as the 5800 system is nearing capacity, the system might generate warning message about being unable to store the object. Also, the healing processes on the system might not be able to remove any fragments of the object that were successfully stored.
Workaround - Do not store objects larger than 1000 MB when the system is nearing capacity. A cell has reached capacity when any one of its disks has reached 80% raw utilization. To display the raw utilization of the disks in a cell, issue the CLI command df -p.
Bug 6562925 - The system does not reject the metadata schema file when the metadata name field contains one or more Unicode supplemental characters.
Workaround - Do not use Unicode supplemental characters for the metadata name field.
Bug 6566083 - If the wipe command fails and then you try to issue it again immediately, it might fail again.
Workaround - If the wipe command fails, reboot the system and then try the command again.
Bug 6570304 - Some hardware or software failures can cause a node to reboot repeatedly. Such a situation will be accompanied by email alerts and/or external syslog messages indicating that the node is leaving and joining.
Workaround - Power the node down by holding down the power button on the node that is exhibiting this behavior. Call Sun service to arrange to have the node replaced.
Bug 6570324 - The reboot -all command fails if the system is running on the secondary switch. The command requires both switches to be online.
Bug 6573144 - During the process of expanding a cell from 8 to 16 nodes, store operations to the system may time out.
Workaround - When programming applications, use retry loops within API calls to handle timeouts during cell expansion. One immediate retry should be sufficient in the great majority of cases.
Bug 6580181 - You cannot use a backup made of an 8-node system to restore data to a 16-node system until at least one backup session is created from the 16-node system.
Workaround - After you expand the system from 8 nodes to 16 nodes, perform one backup of any length from the 16-node system. You can then use this and any previous backups of the 8-node system to restore data to the 16-node system.
Bug 6582274 - When multiple system parameter are changed using the cellcfg command, all of the changes might not issue alerts.
Workaround - If you receive an alert indicating that a parameter has been changed using cellcfg, keep in mind that other parameters might also have been changed. Use the cellcfg command to check the current settings of all parameters.
Bug 6582486 - A connection attempt from a client to the 5800 system might fail with a java.net.ConnectionException error.
Workaround - Retry the connection.
Bug 6584310 - If you issue the wipe command and then retry the command without waiting for the wipe to be complete, the system might disable disks.
Workaround - Wait for one wipe action to complete before retrying the command again. If the system has already started disabling disks, reboot the system, and then reenable the disabled disks.
Bug 6584329 - A restore operation might not work correctly if the system has not been wiped and then rebooted before the restore begins.
Bug 6585878 - If a disk fails, or if a Sun service technician disables a disk, you might see severe error messages in the external syslog host.
Workaround - You can ignore these messages; they simply reflect the fact that the disk has failed and can be ignored.
Bug 6588218 - Some valid C API queries to the 5800 system might return the error code HCERR_BAD_REQUEST, which seems to indicate that the query is not valid.
Workaround - Use hc_session_get_status() to determine if the error string from the query contains the substring Relalg server involved in current operation failed. If so, retry the query.
Bug 6589653 - If you are running the 5800 system open edition software on a system running Red Hat Version 4, you might not be able to shut down the open edition software via the browser.
Workaround - Kill the process manually. The simplest way to do this is to start the open edition software from a dedicated command prompt without running it in the background, and use Ctrl-C when it is idle to exit the program.
Bug 6595040 - If a restore operation fails, you may have to wait approximately 10 minutes while the system reclaims socket resources.
Bug 6601977 - When a system starts up or shuts down, it might not send a complete set of email and log alerts for all nodes and disks.
Bug 6603323 - Issuing the reboot -all command might cause a switch “split-brain” situation, in which neither switch is fully functioning as the primary switch and both switches are performing some of the primary switch’s duties.
Workaround - Call Sun service to help you troubleshoot and correct the problem.
Bug 6609313 - When you store an object with the storeObject API function, the object_ctime reported for the object might not match the object_ctime that is actually stored with the object.
Workaround - To determine the object_ctime that was actually stored with the object, retrieve the metadata for the object after the store operation is completed. The system metadata retrieved will include the object_ctime that was actually stored with the object and inserted in the query engine.
Bug 6612017 - If you issue a query on a metadata field of type binary, entries in the query engine that include the first bits specified in the query are returned as matches, even if the entry includes more bits than were specified in the query. For example, suppose an entry for binary field, bfield, contains the value ABCDEFGHIJ. A query on bfield = “ABCD” will return a match to that entry.
Bug 6612146 - You cannot start a restore on a cell that has been wiped in preparation for the restore.
Workaround - After you wipe the cell, reboot it.
Bug 6612244 - During the process of expanding a cell from 8 to 16 nodes, when you are running the celladm expand command, you cannot backup data from the cell.
Bug 6615347 - Storing a very large file (greater than 400 GB) might fail with an error similar to the following:
com.sun.honeycomb.common.ArchiveException: Failed to get system metadata from the fragments
Workaround - Break the file into smaller pieces and then retry the store operation.
Bug 6619221 - In the C API, time stamps supplied to hc_nvr_add_timestamp and dates supplied to hc_nvr_add_date should be in the range 00:00:00 January 1, 1970 UTC to 00:00:00 January 1, 2038 UTC. This is due to limitations in converting “seconds since the epoch” to the human-readable dates used for metadata storage. The Java interface is not as restrictive; however, dates outside of these limits stored in Java might not be retrieved properly by the C API.
Bug 6621320 - In the JAVA API and SDK RetrieveMetadata program, the SystemRecord.isIndexed() method always returns False.
Workaround - Ignore the SystemRecord.isIndexed() value.
Bug 6627590 - The signature for hc_query_ez() in the C API hcclient.h has an int variable named max_records. This int variable should be renamed results_per_fetch to make its function more apparent. The Sun StorageTek 5800 System Client API Reference Guide refers to the variable as results_per_fetch, but in the code it is named max_records. Functionality is not affected.
Bug 6643867 - Do not attempt to delete data while restoring data to the system because the restore process might fail.
Workaround - If the restore process does fail, restart the restoration.
Bug 6653812 - If you issue the sysstat command with the -i or --interval option, the Estimated Free Space for online cells and the number of online disks is not updated.
Workaround - Use a script to run sysstat at repeated intervals.
Bug 6662213 - If the 5800 system is operating on the secondary switch, the GUI does not start when administrative client is running Java version 1.6.
Workaround - Running Java version 1.5 on the administrative client allows the GUI to start.
Bug 6662951 - If you use the GUI to shut down one cell in a multicell hive, the GUI interface might hang.
Workaround - Use the CLI to manage the system until the cell is restarted.
Bug 6669944 - If the primary switch fails and the system is running with the secondary switch, the hwstat command erroneously reports the secondary switch as offline.
Bug 6671766 - If you issue the hwstat command on a multicell system on which the service node has failed, the system might take up to four minutes to display the results of the command.
Bug 6672229 - If you accidentally pull an active disk, you must wait at least ten seconds before pushing it back in. If you push the disk in too early, it will be disabled.
Workaround - If the disk is disabled, use hwcfg -E to reenable it. Note the system will have to run fsck to verify the disk, so it might take about 15 minutes before before the disk is enabled.
Bug 6672943 - The system cannot process more than one disk being pulled from the system or pushed back into the system simultaneously.
Workaround - Pull and/or push one disk at a time.
Bug 6673454 - If you merge cells to create a multicell system, the service tag information on the cells might be lost.
Workaround - Re-enter service tag information using the CLI.
Bug 6679593 - On a Windows client, if you open a WebDAV view as a web folder it is not possible to map that folder to a letter drive.
Workaround - Do not open the WebDAV view as a web folder; use a web browser to see the view instead.
Bug 6687072 - The logdump command might fail to run and report an error of Disc quota exceeded if the /var/adm partition is out of space.
Workaround - Access the master node from the service node and delete all zipped message logs (for example, messages.2.gz, messages.3.gz, etc.) to free up space on the /var/adm/ partition.
Bug 6691729 - Email alerts might not be sent during heavy input/output load.
Bug 6698513 - When using WebDAV to access files, you cannot write a file larger than 2166444680 bytes.
Workaround - Break the large file into smaller files before writing it, or use the Java or C API to store the large file.
Bug 6700747 - The com.sun.honeycomb.client.NameValueRecord.getString function might return a com.sun.honeycomb.common.NoSuchValueException exception indicating that the string was not in NameValueRecord map. This exception is not documented in the JavaDoc.
Bug 6703189 - Sometimes when the 5800 system reboots, the system returns the error message Cannot extract Document from body.
Workaround - You can ignore this error message; it does not indicate a problem with the reboot.
Bug 6711236 - If you issue the version command within the open edition software for Version 1.1.1, the system erroneously returns Version 1.1.
The following table lists the documents for this product. The online documentation is available at:
http://docs.sun.com/app/docs/prod/stortek.5800
Sun StorageTek 5800 System Regulatory and Safety Compliance Manual |
|||
If you need help installing or using this product, go to:
http://www.sun.com/service/contacting
Copyright © 2008, Sun Microsystems, Inc. All Rights Reserved.