1 General Maintenance Information

This chapter contains topics about general maintenance of Oracle Exadata Database Machine.

Note:

  • For ease of reading, the name "Oracle Exadata Rack" is used when information refers to both Oracle Exadata Database Machine and Oracle Exadata Storage Expansion Rack.

  • All procedures in this chapter are applicable to Oracle Exadata Database Machine and Oracle Exadata Storage Expansion Rack.

1.1 Overview of Roles and Responsibilities

Most IT organizations have teams of database administrators, system administrators, network administrators, and storage administrators. These administrators are responsible for system implementation and ongoing operations. In an Oracle Exadata Database Machine environment, it is usually more efficient and effective to have the database administrator to take the lead role for Oracle Exadata Database Machine management, with assistance from the system administrator. This is because Oracle Exadata Database Machine is engineered to run Oracle Database, and administration is specific to Oracle Database and Oracle Exadata Storage Server Software. The other teams may have distinct responsibilities or be a second level of support to provide assistance.

Initial system deployment is usually performed by Oracle engineers. The primary responsibilities for the database administrator begin with typical operational tasks. The following table lists some of the tasks required to manage Oracle Exadata Database Machine environment. Table 1-1 lists common administrative tasks.

Table 1-1 Common Administrative Tasks for Oracle Exadata Database Machine Management

Task or Event Administrator Actions

Slow performance

Database administrator

System Administrator

Receive alerts from Oracle Enterprise Manager Grid Control that performance thresholds have been exceeded.

Review system performance, CPU, memory and I/O on all servers for unusual trends.

Review database performance, wait events, locking, parallelism, and execution plans.

Patch application or upgrades

Database administrator

System Administrator

Apply Oracle Exadata Storage Server Software patches or upgrades, and Sun Datacenter InfiniBand Switch 36 switch firmware upgrades.

Apply Oracle Database patches, Grid Infrastructure patches or upgrades.

System outage or failure

Database administrator

System Administrator

Connect to Integrated Lights Out Manager (ILOM), verify current system state, identify hardware issue or restart system, and review logs for root cause analysis.

Check surviving instances for errors, monitor performance on surviving instances, and verify application functionality has not been disrupted.

Suspected network issues

Database administrator

System Administrator

Inspect network interfaces for errors or dropped packets, check if any switches have restarted, and escalate to network administration team, as needed.

Inspect database-side performance to assess impact, if any.

Backup database

Database administrator

System Administrator

Run database backup routines, and ensure database server backups are completed.

Failed disk replacement

Database administrator

System Administrator

Receive alerts about hardware replacement, verify Auto Service Request (ASR) has opened a service request, and verify operators will allow field service technician in the data center to replace drive or provide spare drive.

Usually there is one individual or group that has primary responsibility for any issue that arises. This individual or group receives the first contact from Oracle Enterprise Manager Grid Control, the help desk, or operations team when there is an issue on the system. For Oracle Exadata Database Machine, the primary contact is typically the database administrator. If the database administrator needs assistance from another team to resolve the issue, then they collaborate to resolve the issue. Ownership of the issue should remain clear.

1.1.1 Understanding the Administrative Differences with Oracle Exadata Database Machine

Most administration tasks are similar on Oracle Exadata Database Machine servers as on traditional database servers and storage servers, but there are some differences.

The following list shows the differences and exceptions for Oracle Exadata Database Machine servers:

  • Configuration settings for Oracle Exadata Database Machine database servers, Sun Datacenter InfiniBand Switch 36 switches, and other components have settings based on testing and performance criteria. Changing the configuration settings, such as database server firmware or kernel parameters, based on company policy or other reasons should be reviewed for the potential impact to Oracle Exadata Database Machine.

  • Restarting a server incorrectly can disrupt the database. The storage servers have special procedures and guidelines that must be followed to minimize disruption, such as offlining grid disks before restarting the server, and not restarting more than one server at a time.

  • Exadata Storage Servers cannot be modified the same way as the database servers. Network changes, such as those for the NTP servers or DNS servers, are done using the ipconf utility. Network changes cannot be done manually by editing the configuration files. In addition, no software or additional packages can be installed on the storage servers. This restriction includes monitoring software. Exadata Storage Server system updates are provided by Oracle Exadata Storage Server Software upgrades.

  • Exadata Storage Servers do not require backups, and have a self-maintained internal USB drive that can be used for cell recovery. No backup clients can be installed on the storage servers.

  • Oracle wait events in Oracle Real Application Clusters databases using Exadata Storage Servers may include events with %cell% in the name. These events are related to the storage servers.

  • The V$CELL views include rows for any database using Exadata Storage Servers.

  • Oracle Automatic Storage Management (Oracle ASM) disk path names are of the format o/cell_ip_address/cell_griddisk_name, such as the following:

    o/192.168.10.1/data_CD_01_dm01cel01
    
  • SQL plans may include storage to indicate that some operations may be offloaded to Exadata Storage Servers.

  • Operations such as backup and recovery use Oracle Recovery Manager (RMAN), and all data for backup and recovery continues to pass through the database instances. The backup clients for RMAN should be installed on the database servers in Oracle Exadata Database Machine to facilitate integration with enterprise backup solutions in the same way as in traditional environments.

  • The practice of deploying one or more non-production environments for development, testing and quality assurance still apply for Oracle Exadata Database Machine environments.

See Also:

1.2 Powering On and Off Oracle Exadata Rack

This section includes the following topics:

1.2.1 Non-emergency Power Procedures

This section contains the procedures for powering on and off the components of Oracle Exadata Rack in an orderly fashion. This section contains the following topics:

1.2.1.1 Powering On Oracle Exadata Rack

Oracle Exadata Rack is powered on by either pressing the power button on the front of the servers, or by logging in to the ILOM interface, and applying power to the system. When a database server is powered on and the operating system boots, Oracle Clusterware is automatically started, if it is installed. Oracle Clusterware then starts all resources that are configured to start automatically.

The power on sequence is as follows:

  1. Rack, including switches.

    Ensure the switches have had power applied for a few minutes to complete power-on configuration before starting Exadata Storage Servers.

  2. Exadata Storage Servers.

    Ensure all Exadata Storage Servers complete the boot process before starting the database servers. This may take five to ten minutes before all services start.

  3. Database servers.
1.2.1.1.1 Powering On Servers Remotely using ILOM

Servers can be powered on remotely using the ILOM interface.

The ILOM can be accessed using the Web console, the command-line interface (CLI), IPMI, or SNMP. For example, to apply power to server dm01cel01 using IPMI, where dm01cel01-ilom is the host name of the ILOM for the server to be powered on, run the following command from a server that has IPMItool installed:

# ipmitool -I lanplus -H dm01cel01-ilom -U root chassis power on

The preceding command causes the system to prompt for the password.

1.2.1.2 Powering Off Oracle Exadata Rack

The power off sequence for Oracle Exadata Rack is as follows:

  1. Database servers (Oracle Exadata Database Machine only).
  2. Exadata Storage Servers.
  3. Rack, including switches.
1.2.1.2.1 Powering Off Database Servers

When powering off database servers, Oracle Clusterware should be stopped prior to restarting or shutting down a database server. Oracle Clusterware is stopped using the following command:

crsctl stop cluster

The following procedure is the recommended shutdown procedure for database servers:

  1. Stop Oracle Clusterware using the following command:
    # GRID_HOME/grid/bin/crsctl stop cluster
    

    If any resources managed by Oracle Clusterware are still running after issuing the crsctl stop cluster command, then the command fails. Use the -f option to unconditionally stop all resources, and stop Oracle Clusterware.

  2. Shut down the operating system using the following command:
    # shutdown -h now
    
1.2.1.2.2 Powering Off Exadata Storage Servers

Exadata Storage Servers are powered off and restarted using the Linux shutdown command. The following command shuts down Exadata Storage Server immediately:

# shutdown -h now

When powering off Exadata Storage Servers, all storage services are automatically stopped.

The following command restarts Exadata Storage Server immediately:

# shutdown -r now

Note the following when powering off Exadata Storage Servers:

  • All database and Oracle Clusterware processes should be shut down prior to shutting down more than one Exadata Storage Server.

  • Powering off one Exadata Storage Server does not affect running database processes or Oracle ASM.

  • Powering off or restarting Exadata Storage Servers can impact database availability.

  • The shutdown commands can be used to power off or restart database servers.

See Also:

  • "Shutting Down Exadata Storage Server" if the databases or Oracle Clusterware will remain operational while powering down Exadata Storage Server

  • SHUTDOWN(8) manual page for details.

1.2.1.2.3 Powering Off Multiple Servers at the Same Time

The dcli utility can be used to run the shutdown command on multiple servers at the same time. Do not run the dcli utility from a server that will be shut down. For example, to shut down all Exadata Storage Servers using the dcli utility, run the command from a database server. The following command shows the command syntax:

# dcli -l root -g group_name shutdown -h now

In the preceding syntax, group_name is the file that contains a list of all Exadata Storage Servers, cell_group, or database servers, dbs_group.

The following command shows the syntax to shut down all Exadata Storage Servers at the same time:

# dcli -l root -g cell_group shutdown -h now

Example 1-1 shows the power off procedure for Oracle Exadata Rack when using the dcli utility to shut down multiple servers at the same time. The commands are run from a database server.

Example 1-1 Powering Off Oracle Exadata Rack Using the dcli Utility

  1. Stop Oracle Clusterware on all database servers using the following command:

    # GRID_HOME/grid/bin/crsctl stop cluster -all
    
  2. Shut down all remote database servers using the following command:

    # dcli -l root -g remote_dbs_group shutdown -h now
    

    In the preceding command, remote_dbs_group is the file that contains a list of all the remote database servers.

  3. Shut down all Exadata Storage Servers using the following command:

    # dcli -l root -g cell_group shutdown -h now
    

    In the preceding command, cell_group is the file that contains a list of all Exadata Storage Servers.

  4. Shut down the local database server using the following command:

    shutdown -h now
    
  5. Remove power from the rack.

1.2.1.3 Powering On and Off Network Switches

The network switches do not have power switches. They power off when power is removed, by way of the power distribution unit (PDU) or at the breaker in the data center.

1.2.2 Emergency Power-off Considerations

If there is an emergency, then power to Oracle Exadata Rack should be halted immediately. The following emergencies may require powering off Oracle Exadata Rack:

  • Natural disasters such as earthquake, flood, hurricane, tornado or cyclone.

  • Abnormal noise, smell or smoke coming from the machine.

  • Threat to human safety.

1.2.2.1 Emergency Power-off Procedure

To perform an emergency power-off procedure for Oracle Exadata Rack, turn off power at the circuit breaker or pull the emergency power-off switch in the computer room. After the emergency, contact Oracle Support Services to restore power to the machine.

1.2.2.2 Emergency Power-off Switch

Emergency power-off (EPO) switches are required when computer equipment contains batteries capable of supplying more than 750 volt-amperes for more than five minutes. Systems that have these batteries include internal EPO hardware for connection to a site EPO switch or relay. Use of the EPO switch removes power from Oracle Exadata Rack.

1.2.3 Cautions and Warnings

The following cautions and warnings apply to Oracle Exadata Rack:

  • Do not touch the parts of this product that use high-voltage power. Touching them might result in serious injury.

  • Do not power off Oracle Exadata Rack unless there is an emergency. In that case, follow the Emergency Power-off Procedure.

  • Keep the front and rear cabinet doors closed. Failure to do so might cause system failure or result in damage to hardware components.

  • Keep the top, front, and back of the cabinets clear to allow proper airflow and prevent overheating of components.

  • Use only the supplied hardware.

1.3 Understanding Auto Service Request

Auto Service Request (ASR) is designed to automatically open service requests when specific Oracle Exadata Rack hardware faults occur.

To enable this feature, the Oracle Exadata Rack components must be configured to send hardware fault telemetry to the ASR Manager software. This service covers components in Exadata Storage Servers and Oracle Database servers, such as disks and flash cards.

ASR Manager must be installed on a server that has connectivity to Oracle Exadata Rack, and an outbound Internet connection using HTTPS or an HTTPS proxy. Oracle recommends that ASR Manager be installed on a server outside of Oracle Exadata Rack. The following are some of the reasons for the recommendation:

  • If the server that has ASR Manager installed goes down, then ASR Manager is unavailable for the other components of Oracle Exadata Database Machine. This is very important when there are several Oracle Exadata Database Machines using ASR at a site.

  • In order to submit a service request (SR), the server must be able to access the Internet.

Note:

ASR can only use the management network. Ensure the management network is set up to allow ASR to run.

When a hardware problem is detected, ASR Manager submits a service request to Oracle Support Services. In many cases, Oracle Support Services can begin work on resolving the issue before the database administrator is even aware the problem exists.

Prior to using ASR, the following must be set up:

  • Oracle Premier Support for Systems or Oracle/Sun Limited Warranty

  • Technical contact responsible for Oracle Exadata Rack

  • Valid shipping address for Oracle Exadata Rack parts

An e-mail message is sent to the technical contact for the activated asset to notify the creation of the service request. The following are examples of the disk failure Simple Network Management Protocol (SNMP) traps sent to ASR Manager.

Note:

  • ASR is applicable only for component faults. Not all component failures are covered, though the most common components such as disk, fan, and power supplies are covered.

  • ASR is not a replacement for other monitoring mechanisms, such as SMTP, and SNMP alerts, within the customer data center. ASR is a complementary mechanism that expedites and simplifies the delivery of replacement hardware. ASR should not be used for downtime events in high-priority systems. For high-priority events, contact Oracle Support Services directly.

  • There are occasions when a service request may not be automatically filed. This can happen because of the unreliable nature of the SNMP protocol, or loss of connectivity to the ASR Manager. Oracle recommends that customers continue to monitor their systems for faults, and call Oracle Support Services if they do not receive notice that a service request has been automatically filed.

  • ASR can monitor Sun Datacenter InfiniBand Switch 36 switches that have firmware release 2.1.2 and later in Oracle Exadata Database Machines running release 11.2.3.3.0 or later. Switches may need a field engineer to set the entitlement serial number.

Example 1-2 Example of Exadata Storage Server SNMP Trap

This example shows the SNMP trap for an Exadata Storage Server disk failure. The corresponding hardware alert code has been highlighted.

2011-09-07 10:59:54 server1.example.com [UDP: [192.85.884.156]:61945]:
RFC1213-MIB::sysUpTime.0 = Timeticks: (52455631) 6 days, 1:42:36.31
SNMPv2-SMI::snmpModules.1.1.4.1.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapHardDriveFault
SUN-HW-TRAP-MIB::sunHwTrapSystemIdentifier = STRING: Sun Oracle Database Machine
1007AK215C
SUN-HW-TRAP-MIB::sunHwTrapChassisId = STRING: 0921XFG004
SUN-HW-TRAP-MIB::sunHwTrapProductName = STRING: SUN FIRE X4270 M2 SERVER
SUN-HW-TRAP-MIB::sunHwTrapSuspectComponentName = STRING: SEAGATE ST32000SSSUN2.0T;
Slot: 0SUN-HW-TRAP-MIB::sunHwTrapFaultClass = STRING: NULL
SUN-HW-TRAP-MIB::sunHwTrapFaultCertainty = INTEGER: 0
SUN-HW-TRAP-MIB::sunHwTrapFaultMessageID = STRING: HALRT-02001
SUN-HW-TRAP-MIB::sunHwTrapFaultUUID = STRING: acb0a175-70b8-435f-9622-38a9a55ee8d3
SUN-HW-TRAP-MIB::sunHwTrapAssocObjectId = OID: SNMPv2-SMI::zeroDotZero
SUN-HW-TRAP-MIB::sunHwTrapAdditionalInfo = STRING: Exadata Storage Server: 
cellname  Disk Serial Number:   E06S8K 
server1.example.com failure trap. 

Example 1-3 Example of Oracle Database Server SNMP Trap

This example shows the SNMP trap from an Oracle database server disk failure. The corresponding hardware alert code has been highlighted.

2011-09-09 10:59:54 dbserv01.example.com [UDP: [192.22.645.342]:61945]:
RFC1213-MIB::sysUpTime.0 = Timeticks: (52455631) 6 days, 1:42:36.31
SNMPv2-SMI::snmpModules.1.1.4.1.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapHardDriveFault
SUN-HW-TRAP-MIB::sunHwTrapSystemIdentifier = STRING: Sun Oracle Database Machine
1007AK215C
SUN-HW-TRAP-MIB::sunHwTrapChassisId = STRING: 0921XFG004
SUN-HW-TRAP-MIB::sunHwTrapProductName = STRING: SUN FIRE X4170 M2 SERVER
SUN-HW-TRAP-MIB::sunHwTrapSuspectComponentName = STRING: HITACHI H103030SCSUN300G
Slot: 0SUN-HW-TRAP-MIB::sunHwTrapFaultClass = STRING: NULL
SUN-HW-TRAP-MIB::sunHwTrapFaultCertainty = INTEGER: 0
SUN-HW-TRAP-MIB::sunHwTrapFaultMessageID = STRING: HALRT-02007
SUN-HW-TRAP-MIB::sunHwTrapFaultUUID = STRING: acb0a175-70b8-435f-9622-38a9a55ee8d3
SUN-HW-TRAP-MIB::sunHwTrapAssocObjectId = OID: SNMPv2-SMI::zeroDotZero
SUN-HW-TRAP-MIB::sunHwTrapAdditionalInfo = STRING: Exadata Database Server: db03 
Disk Serial Number: HITACHI H103030SCSUN300GA2A81019GGDE5E 
dbserv01.example.com failure trap. 

1.3.1 Installing and Configuring ASR

Oracle recommends installing Oracle ASR on a standalone server running Oracle Solaris or Linux.

After installation is complete, configure fault telemetry destinations for the servers on Oracle Exadata Database Machine. The Oracle Exadata Database Machine servers can be set up during initial configuration. Oracle Exadata Database Machine Deployment Assistant collects the configuration information, and then configures the servers.

Note:

When configuring the ILOM alert settings, do not remove the rules at the top of the rule list. To add a new rule, enter the rule at the bottom of the rule list.

  • To install and configure OracleASR after initial configuration, refer to the installation and configuration information available in Oracle Auto Service Request Quick Installation Guide for Oracle Exadata Database Machine.

See Also:

1.4 Monitoring the System Using Oracle Enterprise Manager Cloud Control

Oracle Exadata Database Machine can be monitored by Oracle Enterprise Manager Cloud Control agents using the Oracle Exadata Plug-in and the Oracle Systems Infrastructure Plug-in. The Oracle Exadata Database Machine is discovered and monitored as a system target in Oracle Enterprise Manager Cloud Control. Individual database servers, storage servers, and switches are grouped together under the system target for the Oracle Exadata Database Machine so they can be monitored as a group

The Exadata Storage Server metrics are collected and managed by Management Server (MS). When used with Oracle Enterprise Manager Cloud Control, the metrics are presented as Oracle Enterprise Manager Cloud Control metrics.

All Exadata server alerts are delivered to Oracle Enterprise Manager Cloud Control using SNMP. The Exadata hardware and software components are monitored by ILOM and Exadata software in the following ways:

  • Hardware components are monitored by ILOM. When a hardware component reports a failure or an exceeded threshold, ILOM reports the failure as an SNMP trap to MS. MS processes the trap, creates an alert, and delivers the alert to the Oracle Enterprise Manager Cloud Control agent.

  • Hardware and software components are also monitored by MS directly. When a failure or threshold is exceeded, MS processes the trap, creates an alert, and delivers the alert to the Oracle Enterprise Manager Cloud Control agent.

From the end-user perspective, there is no difference between the two types of alerts. The alert message contains the corrective action to resolve the alert.

See Also:

1.5 Monitoring the System Using Oracle Configuration Manager

Oracle Configuration Manager collects configuration information and uploads it to the Oracle repository.

When the configuration information is uploaded daily, Oracle Support Services can analyze the data and provide better service. When a service request is logged, the configuration data is associated with the service request. The following are some of the benefits of Oracle Configuration Manager:

  • Reduced time for problem resolution

  • Proactive problem avoidance

  • Improved access to best practices, and the Oracle knowledge base

  • Improved understanding of the customer's business needs

  • Consistent responses and services

The Oracle Configuration Manager software is installed and configured in each ORACLE_HOME directory on a server. For clustered databases, only one instance is configured for Oracle Configuration Manager. A configuration script is run on every database on the server. The Oracle Configuration Manager collectors then send their data to a centralized Oracle repository.

1.6 Changing Component Passwords

The passwords for the components can be changed after initial configuration. This section includes the following topics:

1.6.1 Changing the Database Server Passwords

The user accounts and GRUB passwords can be changed on the database servers. The default user accounts on the database server are root, and the software owner account. Typically the software owner account is oracle or grid.

1.6.1.1 Changing the User Account Password on the Database Server

Use the following command at the operating system prompt to change the user accounts on the database server:

passwd user_name

In the preceding command, user_name is the name of the account to change.

1.6.1.2 Changing the GRUB Account Password on the Database Server

Run the following command to change the GRUB account password on the database server:

/opt/oracle.cellos/host_access_control grub-password

1.6.2 Changing the Exadata Storage Server Passwords

The default user accounts on Exadata Storage Servers are root, celladmin, and cellmonitor.

  • To change one of the passwords for Exadata Storage Server, use the operating system passwd command.

    At the operating system prompt, use the following command, where user_name is the name of the account to change:

    passwd user_name
    

1.6.3 Changing the Power Distribution Unit Password

The default account user for the power distribution unit (PDU) is admin. The following procedure describes how to change the password for the PDU:

  1. Use a Web browser to access the PDU metering unit by entering the IP address for the unit in the address line of the browser. The Current Measurement page appears.
  2. Click Network Configuration in the upper left of the page.
  3. Log in as the admin user on the PDU metering unit.
  4. Locate the Admin/User fields. Only letters and numbers are allowed for user names and passwords.
  5. Enter up to five users and passwords in the Admin/Users fields.
  6. Designate each user to be either an administrator or user.
  7. Click Submit to set the users and passwords.

1.6.4 Changing the ILOM Password

The default user account for the Integrated Lights Out Manager (ILOM) is root. The following procedure describes how to change the password for the ILOM:

  1. Log in to the ILOM using SSH.
  2. Use the following command to change the password:
    set /SP/users/user_name password
    

    In the preceding command, user_name is the user account to be changed. The following is an example of the command:

    set /SP/users/user1 password
    
    Changing password for user /SP/users/user1/password...
    Enter new password:********
    Enter new password again:********
    New password was successfully set for user /SP/users/user1
    

1.6.5 Changing the InfiniBand Switch Password

This procedure describes how to change a password for the InfiniBand switch.

The default user accounts for the InfiniBand switch are root, ilom-admin, ilom-user, ilom-operator, and nm2user.

  1. Log in to the InfiniBand switch with SSH using the following command:
    ssh user_name@switch_name
    

    In the preceding command, user_name is the name of the user, and switch_name is the name of the InfiniBand switch.

  2. Check the firmware version of the switch.
  3. Use the ILOM to change the password using the following commands:
    ssh -l ilom-admin switch_name
    
    set /SP/users/user_name password
    

See Also:

Sun Datacenter InfiniBand Switch 36 Administration Guide for Firmware Version 2.1 at http://docs.oracle.com/cd/E36265_01/html/E36266/gentextid-2626.html#scrolltoc

1.6.6 Changing the Cisco Ethernet Switch Password

You can change the passwords for both serial port access and SSH access to the switch.

There are two different methods to access the switch.  One is through a serial port and the other is through ssh.  When using serial port access, there are no user accounts, so the enable password is all that is required.  When the switch is accessed via ssh,you must supply a user account and password before you can change the password. During the installation of the system, the admin user is created and you can use this user to access the switch using ssh.

Changing the Password for Serial Port Access

  1. Access the switch using telnet, ssh, or via the serial port.  If using the serial port for access, you will not be prompted for a user name or password, you will just get the prompt.

    For example:

    my_host> ssh admin@my_switch 
    Using keyboard-interactive authentication. 
    Password:
    

    Note:

    • In Oracle Linux 5 update 5 or higher telnet was removed for security reasons. You  may need to install telnet client package in the compute node before you can access the switch using telnet.

    • Before using SSH to access the switch, you must enable SSH in the switch following the steps described in "Configuring SSH on Cisco Catalyst 4948 Ethernet Switch" (My Oracle Support Doc ID 1415044.1).

  2. Change to enable mode using the following command:

    Switch> enable
    
  3. Set the password as follows:

    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no enable password 
    Switch(config)# enable secret new_password 
    Switch(config)# end 
    Switch# write memory
    *Sep 15 14:25:05.893:%SYS-5-CONFIG_I:Configured from console by console
    Building configuration...
    Compressed configuration from 2502 bytes to 1085 bytes [OK ]
    
  4. Save the current configuration using the following command:

    Switch# copy running-config startup-config
    
  5. Exit from the session.

    Switch# exit
    

To Change a User Password for Telnet or SSH Access

  1. Access the switch using telnet, ssh, or via the serial port.  If using the serial port for access, you will not be prompted for a user name or password, you will just get the prompt.

    For example:

    my_host> ssh admin@my_switch 
    Using keyboard-interactive authentication. 
    Password:
    
  2. Change to enable mode using the following command:
    Switch> enable
    Password:
    
  3. Verify the password will be sent in encrypted format.

    Use the following command to check that service password configuration is set to -encryption.

    Switch# show running-config all | include service password-encryption
    service password-encryption
    

    If this is set to no service password-encryption, then passwords will be sent in clear text. You can change this setting, as shown in Step 5.

  4. Enter configuration mode.
    Switch# configure terminal 
    Enter configuration commands, one per line. End with CNTL/Z. 
    
  5. If password encryption is set to no service password-encryption, then change it to service password-encryption.
    Switch(config)# service password-encryption
    
  6. Change the password for a specific user.
    Switch(config)# username user_name password new_password
    
  7. Exit configuration mode and save the changes.
    Switch(config)# end 
    Switch# write memory
    

1.6.7 Changing the KVM Password

The default user account for the KVM is Admin. The following procedure describes how to change the password for the KVM:

  1. Pull the KVM tray out from the front of the rack, and open it using the handle.
  2. Touch the touch pad.
  3. Toggle between the host and KVM interface by pressing the Ctrl key on the left side twice, similar to a double-click on a mouse.
  4. Select Local from User Accounts.
  5. Click Admin under Users.
  6. Set a password for the Admin account. Do not modify any other parameters.
  7. Click Save.

1.7 Determining the Server Model

Use the following command to determine the model of the cell or database server:

/usr/sbin/exadata.img.hw --get model

Reference the following table for the server model numbers for each Oracle Exadata Database Machine.

Table 1-2 Oracle Exadata Database Machine Server Models

Oracle Exadata Database Machine Database Server Model Exadata Storage Server Model

Oracle Exadata Database Machine X7-2

ORACLE SERVER X7-2

ORACLE SERVER X7-2L (High Capacity)

ORACLE SERVER X7-2L_EXTEREME_FLASH

Oracle Exadata Database Machine X7-8

ORACLE SERVER X7-8

ORACLE SERVER X7-2L (High Capacity)

ORACLE SERVER X7-2L_EXTEREME_FLASH

Oracle Exadata Database Machine X6-2

ORACLE SERVER X6-2

ORACLE SERVER X6-2L

ORACLE SERVER X6-2L_EXTEREME_FLASH

Oracle Exadata Database Machine X6-8

ORACLE SERVER X5-8

ORACLE SERVER X6-2L

ORACLE SERVER X6-2L_EXTEREME_FLASH

Oracle Exadata Database Machine X5-2

ORACLE SERVER X5-2

ORACLE SERVER X5-2L

Oracle Exadata Database Machine X5-8

ORACLE SERVER X5-8

ORACLE SERVER X5-2L

Oracle Exadata Database Machine X4-2

SUN SERVER X4-2

SUN SERVER X4-2L

Oracle Exadata Database Machine X4-8 Full Rack

SUN SERVER X4-8

SUN SERVER X4-2L

ORACLE SERVER X5-2L

Oracle Exadata Database Machine X3-2

SUN FIRE X4170 M3

SUN FIRE X4270 M3

Oracle Exadata Database Machine X3-8 Full Rack

Sun Fire X4800 M2

SUN FIRE X4270 M3

Oracle Exadata Database Machine X2-2

SUN FIRE X4170 M2 SERVER

SUN FIRE X4270 M2 SERVER

Oracle Exadata Database Machine X2-8 Full Rack

Sun Fire X4800 or Sun Fire X4800 M2

SUN FIRE X4270 M2 SERVER

Oracle Exadata Database Machine

SUN FIRE X4170 SERVER

SUN FIRE X4275 SERVER

1.8 Monitoring Ambient Temperature of Servers

Maintaining environmental temperature conditions within design specification for Oracle Exadata Rack helps achieve maximum efficiency and targeted component service lifetimes. The impact of validating the ambient temperature range is minimal. The impact of corrective actions will vary depending on the environmental conditions.

Temperatures outside the ambient temperature range of 21 to 23 degrees Celsius (70 to 74 degrees Fahrenheit) affect all components within Oracle Exadata Rack, possibly causing performance problems and shortened service lifetimes.

Use the following command as the root user on the first database server in the cluster to verify the ambient temperature range of all servers:

dcli -g /opt/oracle.SupportTools/onecommand/all_group -l root 'ipmitool   \
sunoem cli "show /SYS/T_AMB" | grep value'

The following is an example of the output from the command:

dm01db01: value = 21.440 degree C
dm01db02: value = 21.440 degree C
dm01db03: value = 22.190 degree C
...
dm01db08: value = 21.940 degree C
dm01cel01: value = 22.000 degree C
dm01cel02: value = 22.000 degree C
dm01cel03: value = 23.000 degree C
...
dm01cel14: value = 22.080 degree C

If the output is outside the ambient temperature range, then investigate and correct the problem. The following items should be checked:

  • Sufficient air flow into the rack

  • Room temperature is within the specified range

  • Rear of rack is clear of obstructions

1.9 Replacing a Disk Controller Battery Backup Unit

The disk controller battery backup unit (disk controller BBU) resides on a drive tray in the database and Exadata Storage Servers. The disk controller BBU can be replaced without downtime for the server or the applications running on the server. The following procedures describe how to replace the disk controller BBU:

Note:

The procedures in this section do not apply to on-controller battery backup units. Replacement of those units require a system shutdown because the system must be opened to access the controller card.

1.9.1 Replacing a Disk Controller BBU on a Database Server

This section describes how to replace a disk controller BBU on a database server. The high-level steps are:

Note:

After any maintenance procedure, Oracle recommends using the Exachk tool. The tool is available with My Oracle Support note 1070954.1.

1.9.1.1 Step 1: Prepare the Disk Controller BBU for Removal

On certain X3-2, X4-2, and X4-8 database nodes, and X3-2, X4-2, and X3-8, X4-8 storage servers, the BBU is remote mounted and does not require a system shutdown to be accessed. However you must still prepare it for removal from the RAID HBA to avoid the risk of data corruption to the disk volumes. Note there is no remote mount BBU option for X3-8 database nodes.

If your system has a remote mount BBU, follow the steps in "For Systems with Remote Mount BBU". If your system does not have a remote mount BBU, see "For Systems That Do Not Have a Remote Mount BBU".

For Systems with Remote Mount BBU

The following steps are for systems with remote mount BBU. If your system does not have a remote mount BBU, see "For Systems That Do Not Have a Remote Mount BBU".

  1. Log in as the root user.

  2. Get the version of the image that is running on the server in the rack that requires service.

    # imageinfo -ver
    11.2.3.2.1.130302
    

    The version is the first five parts, "11.2.3.2.1" in the example. The last part is the image date.

  3. Prepare the disk controller BBU for removal.

    If you are running version 12.1.2.1.0 or later:

    1. Drop the disk controller BBU for replacement:

      DBMCLI> alter dbserver bbu drop for replacement
      
    2. Verify:

      DBMCLI> list dbserver attributes bbustatus -
               dropped for replacement
      

    If you are running versions between 11.2.3.3.0 and 12.1.2.1.0:

    1. Drop the disk controller BBU for replacement:

      # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -drop_bbu_for_replacement
      
    2. Verify:

      # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -list_bbu_status
      BBU status: dropped for replacement.
      
      

    If you are running version 11.2.3.2.x:

    1. Locate the server in the rack being serviced, and turn on the indicator light.

      Exadata Storage Servers are identified by a number 1 through 18, where 1 is the lowest Storage Server in the rack installed in RU2, counting up to the top of the rack.

      Exadata Database Nodes are identified by a number 1 through 8, where 1 is the lowest most database node in the rack installed in RU16.

      Turn on the locate indicator light for easier identification of the server being serviced. If the server number has been identified, then the Locate Button on the front panel may be pressed.

      To turn on the indicator light remotely, use any of the following methods:

      From a login to the CellCli on Exadata Storage Servers:

      CellCli> alter cell led on
      

      From a login to the server's ILOM:

      -> set /SYS/LOCATE value=Fast_Blink
      

      From a login to the server's root account:

      # ipmitool chassis identify force
      Chassis identify interval: indefinite
      
    2. Check that HBA can see the battery and its current status.

      Note:

      If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.

      # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
      

      The default output should show that the battery is still visible and may show low voltage or other issues depending on the fault. It may return an error reading the BBU if it is hard failed and no longer accessible to the HBA.

    3. Verify the current cache policy for all logical volumes.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      

      The default cache policy should be "WriteBack" for all volumes. If the battery is functioning normally it will report as current cache policy "WriteBack". However if it is failed it may report current cache policy as "WriteThrough".

    4. Set the cache policy for all logical volumes to WriteThrough cache mode, which does not use the battery.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    5. Verify the current cache policy for all logical volumes is now WriteThrough.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      

 

For Systems That Do Not Have a Remote Mount BBU

The following steps are for systems that do not have a remote mount BBU. If your system has a remote mount BBU, see "For Systems with Remote Mount BBU".

If the system does not have the remote mounted battery installed, you need to shut down the node for which the battery requires replacement.

  1. Revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the battery occurs.

    1. Set all logical volumes cache policy to WriteThrough cache mode.

      Note:

      If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    2. Verify the current cache policy for all logical volumes is now WriteThrough, which does not use the battery.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      
  2. Shut down the server operating system.

    1. Perform the steps in My Oracle Support note 1093890.1, "Steps to Shutdown/Startup the Exadata and RDBMS Services and Cell/Compute Nodes on an Exadata Configuration".

    2. Shut down CRS services prior to powering down the database node. Run the following commands as the root user:

      # . oraenv
      ORACLE_SID = [root] ? +ASM1
      The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
      

      In the example output above, the 1 of +ASM1 refers to the database node number. For example, for database node 3, the value would be +ASM3.

      # $ORACLE_HOME/bin/crsctl stop crs
      

      Or:

      # <GI_HOME>/bin/crsctl stop crs
      

      <GI_HOME> is typically set to /u01/app/11.2.0/grid, but this can vary depending on your setup.

    3. Verify that CRS is down. There should be no CRS processes running.

      # ps -ef | grep css
      
    4. Shut down the server operating system.

      Linux:

      # shutdown -hP now
      

      Solaris:

      # shutdown -y -i 5 -g 0
      

1.9.1.2 Step 2: Replace the Disk Controller BBU

In this step, you remove the old disk controller BBU and replace it with the new BBU.

Exadata X3-2, X4-2, or X4-8 Compute Nodes and X3-2, X3-8, X4-2, or X4-8 Storage Cell nodes with the Remote Battery

These steps apply to Exadata nodes based on X3-2, X4-2, X4-8 and X3-2L, X4-2L servers with the remote battery installed.

  1. Locate the battery slot marked with an orange and white BBU label.

    X3-2 and X4-2 Compute nodes: this is the upper right-most slot on the front of the chassis labelled BBU (previously designated "HDD7").

    X4-8 Compute nodes: this is in the lower slot, second from the left, on the rear of the chassis labeled BBU.

    X3-2L and X4-2L Storage cells: this is the right-hand slot on the rear of the chassis above PS1, labelled BBU (previously designated "REAR HDD 1").

  2. Unlatch and carefully slide out the old BBU carrier.

  3. Insert and carefully slide in the new BBU carrier, and latch it closed.

Exadata X3-2L or X4-2L Storage Cell nodes without the Remote Battery

Replace the existing HBA BBU with a remote-mounted battery kit (part 7060020) following the CAP detailed in Support Note 1561949.1.

Exadata X3-2 or X4-2 Database Machine Compute nodes without the Remote Battery

Replace the existing HBA BBU with a remote-mounted battery kit (part 7060020) following the CAP detailed in Support Note 1561949.1.

Exadata X3-8 Database Machine Compute nodes

These steps are relevant to Exadata nodes based on X2-8 servers (formerly x4800m2).

  1. Remove CMOD0 from the server and set it on a flat, antistatic surface.

  2. Remove the CMOD top cover.

  3. Remove the HBA REM with BBU attached:

    1. Lift the REM ejector handle and rotate it to its fully open position.

    2. Lift the connector end of the REM and pull the REM away from the retaining clip on the front support bracket.

  4. Remove the old BBU from the REM:

    1. Use a No. 1 Phillips screwdriver to remove the 3 retaining screws that secure the battery to the REM card. Do NOT attempt to remove any screws from the top side of the REM and battery pack; those screws hold the standoffs that provide the bottom screw holes and should remain with the battery pack.

    2. Detach the battery pack including circuit board from the REM by gently lifting it from its circuit board connector.

  5. Install the new BBU on the REM.

    1. Attach the battery pack circuit board connector to mate with the REM's connector.

    2. Use a No. 1 Phillips screwdriver to secure the battery to the REM. If the BBU comes with a package of new screws, then use those new screws - do not re-use the screws from the old BBU attachment.

  6. Re-install the HBA REM with BBU attached.

    1. Ensure that the REM ejector lever is in the closed position. The lever should be flat with the REM support bracket.

    2. Position the REM so that the battery is facing downward and the connector is aligned with the connector on the motherboard.

    3. Slip the opposite end of the REM under the retaining clips on the front support bracket and ensure that the notch on the edge of the REM is positioned around the alignment post on the bracket.

    4. Carefully lower and position the connector end of the REM until the REM contacts the connector on the motherboard, ensuring that the connectors are aligned. To seat the connector, carefully push the REM downward until it is in a level position.

  7. Install the cover on the CMOD.

  8. Return the CMOD back into the unit in CMOD0 slot.

1.9.1.3 Step 3: Enable and Verify the New Disk Controller BBU

Similar to "Step 1: Prepare the Disk Controller BBU for Removal", this step has two subsections:

 

For Systems with Remote Mount BBU

For systems with remote mount BBU, the system was not shut down at the end of "Step 1: Prepare the Disk Controller BBU for Removal".

If you are using image version 11.2.3.3.0 or later:

  1. Log in as the root user.

  2. Verify the disk controller BBU battery state is present and seen by the RAID controller. It may take several minutes for the new BBU battery to be detected.

    Note:

    If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.

    # /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -a0 | grep BBU
    BBU : Present
    BBU : Yes
    Cache When BBU Bad : Disabled
    
  3. Re-enable the disk controller BBU and disk cache.

    # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -reenable_bbu
    HDD disk controller battery has been reenabled.
    
  4. Verify the disk controller BBU battery state is operational.

    # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -list_bbu_status
    BBU status: present
    
  5. Verify the current logical disk drive cache policy uses writeback mode:

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i bbu
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    ... <repeated for each logical volume present>
    
  6. If the current cache policy is WriteThrough mode, and not WriteBack, then check the status of the battery.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -getbbustatus -a0|grep Battery
    BatteryType: iBBU08
    Battery State : Operational
    Battery Pack Missing : No
    Battery Replacement required : No
    

    If the "Battery State" is anything other than "Operational" or "Optimal" (exact term depends on image version), investigate and correct the problem before continuing.

    The following shows which image version uses "Optimal" and "Operational".

    Exadata image version       Battery State         Raid f/w version
    ---------------------       -------------------   -----------------
     X4    12.1.2.1.0            Optimal              12.12.0-0178
     X4    12.1.1.1.1            Optimal              12.12.0-0178
     X3    11.2.3.3.0            Optimal              12.12.0-0178
     X3    11.2.3.2.2            Optimal              12.12.0-0178
     X3    11.2.3.2.1            Operational          12.12.0-0140
    

If you are using image version 11.2.3.2.x:

  1. Log in as the root user.

  2. Turn off the server's locate LED.

    # ipmitool chassis identify off
    Chassis identify interval: off
    
  3. Wait approximately 5 minutes for the HBA to recognize and start communicating with the new BBU.

  4. Verify the HBA battery status is Operational and charging.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i bbu
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    ... <repeated for each logical volume present>
    

 

For Systems That Do Not Have a Remote Mount BBU

For systems that do not have a remote mount BBU, you shut down the system at the end of "Step 1: Prepare the Disk Controller BBU for Removal". In this section you restart the system and enable the new BBU.

  1. Power on the server by pressing the power button.

  2. After ILOM has booted, power on the server by pressing the power button, and then connect to the server's console.

    To connect to the console from the ILOM Web browser (preferred): Access the "Remote Control -> Redirection" tab and click the "Launch Remote Console" button. On ILOM 3.1.x systems, the console button can be launched from the initial Summary Information screen.

    To connect to the console from the ILOM CLI:

    > start /SP/console
    
  3. From the server's console, monitor the system booting. Watch in particular the LSI controller BIOS while it is loading. If it gives a warning message regarding drives with preserved cache, then choose "D" to discard the cache and continue. This is not an issue as the disk will get re-synced after boot by ASM. If it gives a warning message regarding drives are in write-through mode due to a low battery, then choose to continue.

    The Exadata boot should continue normally after that, showing the Exadata boot splash screen and continue with normal OS boot messages. Note that there may be a long pause between screen outputs on the ILOM serial console during subsequent boot steps as the default console is the graphics, and the Exadata boot splash screen will not display.

  4. Once full boot is completed, log in as the root user and verify the new battery is seen and is charging.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode using the battery.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
    
  7. Verify the database services were started automatically.

    1. Verify that CRS is running.

      # . oraenv
      ORACLE_SID = [root] ? +ASM1
      The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
      
      # crsctl check crs
      CRS-4638: Oracle High Availability Services is online
      CRS-4537: Cluster Ready Services is online
      CRS-4529: Cluster Synchronization Services is online
      CRS-4533: Event Manager is online
      

      In the above output the 1 of +ASM1 refers to the database node number. For example, for database node #3, the value would be +ASM3.

    2. Validate that instances are running.

      # ps -ef |grep pmon
      

      It should return a record for ASM instance and a record for each database.

1.9.2 Replacing a Disk Controller BBU on Exadata Storage Server

This section describes how to replace a disk controller BBU on Exadata Storage Server: The high-level steps are:

Note:

After any maintenance procedure, Oracle recommends using the Exachk tool. The tool is available with My Oracle Support note 1070954.1.

1.9.2.1 Step 1: Prepare the Disk Controller BBU for Removal

On certain X3-2, X4-2, and X4-8 database nodes, and X3-2, X4-2, and X3-8, X4-8 storage servers, the BBU is remote mounted and does not require a system shutdown to be accessed. However you must still prepare it for removal from the RAID HBA to avoid the risk of data corruption to the disk volumes. Note there is no remote mount BBU option for X3-8 database nodes.

For Systems with Remote Mount BBU

Perform the steps in this section if your system has a remote mount BBU. If your system does not have a remote mount BBU, perform the steps in "For Systems That Do Not Have a Remote Mount BBU".

  1. Log in as the root user.

  2. Get the version of the image that is running on the server in the rack that requires service.

    # cellcli -e LIST CELL ATTRIBUTES releaseVersion
    11.2.3.2.1
    
  3. Drop the disk controller BBU.

    If you are running version 11.2.3.3.0 or later:

    1. Drop the disk controller BBU for replacement. Run the following command as the celladmin or root user:

      # cellcli -e ALTER CELL BBU DROP FOR REPLACEMENT
      HDD disk controller battery has been dropped for replacement
      
    2. Verify that the BBU was dropped for replacement:

      # cellcli -e LIST CELL ATTRIBUTES bbustatus
      dropped for replacement.
      

    If you are running version 11.2.3.2.x:

    1. Locate the server in the rack being serviced, and turn on the indicator light.

      Exadata Storage Servers are identified by a number 1 through 18, where 1 is the lowest Storage Server in the rack installed in RU2, counting up to the top of the rack.

      Exadata Database Nodes are identified by a number 1 through 8, where 1 is the lowest most database node in the rack installed in RU16.

      Turn on the locate indicator light for easier identification of the server being serviced. If the server number has been identified, then the Locate Button on the front panel may be pressed.

      To turn on the indicator light remotely, use any of the following methods:

      From a login to the CellCli on Exadata Storage Servers:

      CellCli> ALTER CELL LED ON
      

      From a login to the server's ILOM:

      -> set /SYS/LOCATE value=Fast_Blink
      

      From a login to the server's root account:

      # ipmitool chassis identify force
      Chassis identify interval: indefinite
      
    2. Check that HBA can see the battery and its current status.

      Note:

      If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.

      # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
      

      The default output should show that the battery is still visible and may show low voltage or other issues depending on the fault. It may return an error reading the BBU if it is hard failed and no longer accessible to the HBA.

    3. Verify the current cache policy for all logical volumes.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      

      The default cache policy should be WriteBack for all volumes. If the battery is functioning normally it will report as current cache policy WriteBack. However if it is failed it may report current cache policy as WriteThrough.

    4. Set the cache policy for all logical volumes to WriteThrough cache mode, which does not use the battery.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    5. Verify the current cache policy for all logical volumes is now WriteThrough.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      

For Systems That Do Not Have a Remote Mount BBU

Perform the steps in this section if your system does not have a remote mount BBU. If your system has a remote mount BBU, see "For Systems with Remote Mount BBU".

If the system does not have the remote mounted battery installed, you need to shut down the node for which the battery requires replacement.

  1. Revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the battery occurs.

    1. Set all logical volumes cache policy to WriteThrough cache mode.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    2. Verify the current cache policy for all logical volumes is now WriteThrough, which does not use the battery:

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      
  2. Shut down the server operating system.

    Note the following when powering off Exadata Storage Servers:

    • Verify there are no other storage servers with disk faults. Shutting down a storage server while another disk is failing may cause database processes and Oracle ASM to crash if it loses both disks in the partner pair when this server's disks go offline.

    • Powering off one Exadata Storage Server with no disk faults in the rest of the rack will not affect running database processes or Oracle ASM.

    • All database and Oracle Clusterware processes should be shut down prior to shutting down more than one Exadata Storage Server. Refer to the Exadata Owner's Guide for details if this is necessary.

    ASM drops a disk shortly after it is taken offline. Powering off or restarting Exadata Storage Servers can impact database performance if the storage server is offline for longer than the ASM disk repair timer to be restored. The default DISK_REPAIR_TIME attribute value of 3.6hrs should be adequate for replacing components, but may need to be changed if you need more time.

    1. Check the disk repair time by logging into ASM and running the following query.

      SQL> SELECT dg.name,a.value FROM v$asm_attribute a, v$asm_diskgroup dg
       WHERE a.name = 'disk_repair_time' AND a.group_number = dg.group_number;
      

      As long as the value is large enough to comfortably replace the components being replaced, there is no need to change it.

      If you need to change it, you can use this statement:

      SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'disk_repair_time'='8.5H';
      
    2. Check if ASM will be OK if the grid disks go offline. The following command should return Yes for the grid disks being listed.

      # cellcli -e LIST GRIDDISK ATTRIBUTES name,asmmodestatus,asmdeactivationoutcome
      ...sample ...
      DATA_CD_09_cel01 ONLINE Yes
      DATA_CD_10_cel01 ONLINE Yes
      DATA_CD_11_cel01 ONLINE Yes
      RECO_CD_00_cel01 ONLINE Yes
      RECO_CD_01_cel01 ONLINE Yes
      ...repeated for all griddisks....
      

      If one or more disks does not return asmdeactivationoutcome='Yes', check the respective disk group and restore the data redundancy for that disk group. Once the disk group data redundancy is fully restored, re-run the command to verify that asmdeactivationoutcome='Yes' for all grid disks. Once all disks return asmdeactivationoutcome='Yes', proceed to the next step.

      Note:

      Shutting down the cell services when one or more grid disks does not return asmdeactivationoutcome='Yes' will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.

    3. Inactivate all grid disks on the cell that needs to be powered down for maintenance. This could take up to 10 minutes or longer.

      # cellcli
      ...sample ...
      CellCLI> ALTER GRIDDISK ALL INACTIVE
      GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_02_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_00_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_01_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_02_dmorlx8cel01 successfully altered
      ...repeated for all griddisks...
      
    4. Verify that the grid disks are now offline. The output should show asmmodestatus='UNUSED' or 'OFFLINE' and asmdeactivationoutcome=Yes for all grid disks once the disks are offline and inactive in ASM.

      CellCLI> LIST GRIDDISK ATTRIBUTES name,status,asmmodestatus,asmdeactivationoutcome
      DATA_CD_00_dmorlx8cel01 inactive OFFLINE Yes
      DATA_CD_01_dmorlx8cel01 inactive OFFLINE Yes
      DATA_CD_02_dmorlx8cel01 inactive OFFLINE Yes
      RECO_CD_00_dmorlx8cel01 inactive OFFLINE Yes
      RECO_CD_01_dmorlx8cel01 inactive OFFLINE Yes
      RECO_CD_02_dmorlx8cel01 inactive OFFLINE Yes
      ...repeated for all griddisks...
      
    5. Once all disks are offline and inactive, you can shut down the cell.

      # shutdown -hP now
      

      When powering off Exadata Storage Servers, all storage services are automatically stopped.

1.9.2.2 Step 2: Replace the Disk Controller BBU

1.9.2.3 Step 3: Enable the New Disk Controller BBU

Similar to "Step 1: Prepare the Disk Controller BBU for Removal", this section has two subsections:

For Systems with Remote Mount BBU

Perform the steps in this section if your system has a remote mount BBU. In this scenario, the system was not shut down at the end of "Step 1: Prepare the Disk Controller BBU for Removal".

If you are running image version 11.2.3.3.0 or later:

  1. Log in as the celladmin or root user.

  2. Re-enable the BBU.

    # cellcli -e alter cell bbu reenable
    HDD disk controller battery has been reenabled
    
  3. Verify the disk controller BBU battery state is operational.

    # cellcli -e list cell attributes bbustatus
    normal
    

    If the "BBU status" is anything other than "normal", then investigate and correct the problem before continuing.

If you are running image version 11.2.3.2.x:

  1. Log in as the root user.

  2. Turn off the server's locate LED.

    # ipmitool chassis identify off
    Chassis identify interval: off
    
  3. Wait approximately 5 minutes for the HBA to recognize and start communicating with the new BBU.

  4. Verify the HBA battery status is Operational and charging.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i bbu
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    ... <repeated for each logical volume present>
    

For Systems That Do Not Have Remote Mount BBU

At the end of "Step 1: Prepare the Disk Controller BBU for Removal", systems without a remote mount BBU were shut down. You now have to restart the system.

  1. Power on the server by pressing the power button.

  2. After ILOM has booted, power on the server by pressing the power button, and then connect to the server's console.

    To connect to the console from the ILOM Web browser (preferred): Access the "Remote Control -> Redirection" tab and click the "Launch Remote Console" button. On ILOM 3.1.x systems, the console button can be launched from the initial Summary Information screen.

    To connect to the console from the ILOM CLI:

    > start /SP/console
    
  3. From the server's console, monitor the system booting. Watch in particular the LSI controller BIOS while it is loading. If it gives a warning message regarding drives with preserved cache, then choose "D" to discard the cache and continue. This is not an issue as the disk will get re-synced after boot by ASM. If it gives a warning message regarding drives are in write-through mode due to a low battery, then choose to continue.

    The Exadata boot should continue normally after that, showing the Exadata boot splash screen and continue with normal OS boot messages. Note that there may be a long pause between screen outputs on the ILOM serial console during subsequent boot steps as the default console is the graphics, and the Exadata boot splash screen will not display.

  4. Once full boot is completed, log in as the root user and verify the new battery is seen and is charging.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode using the battery.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
    
  7. Return the cell back to service.

    1. Activate the grid disks.

      # cellcli
      CellCLI> alter griddisk all active
      GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_02_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_00_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_01_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_02_dmorlx8cel01 successfully altered
      ...etc...
      
    2. Verify that all disks are active.

      CellCLI> list griddisk
      DATA_CD_00_dmorlx8cel01         active
      DATA_CD_01_dmorlx8cel01         active
      DATA_CD_02_dmorlx8cel01         active
      RECO_CD_00_dmorlx8cel01         active
      RECO_CD_01_dmorlx8cel01         active
      RECO_CD_02_dmorlx8cel01         active
      ...etc...
      
    3. Verify all grid disks have been successfully put online. Wait until 'asmmodestatus' is in status 'ONLINE' for all grid disks. The following is an example of the output early in the activation process.

      CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
      DATA_CD_00_dmorlx8cel01 active ONLINE Yes
      DATA_CD_01_dmorlx8cel01 active ONLINE Yes
      DATA_CD_02_dmorlx8cel01 active ONLINE Yes
      RECO_CD_00_dmorlx8cel01 active SYNCING Yes
      RECO_CD_01_dmorlx8cel01 active ONLINE Yes
      ...etc...
      

      In the example above 'RECO_CD_00_dmorlx8cel01' is still in the 'SYNCING' process. Oracle ASM synchronization is only complete when ALL grid disks show 'asmmodestatus=ONLINE'. This process can take some time depending on how busy the machine is, and has been while this individual server was down for repair.

1.10 Overview of the dbmsrv Service

  • For releases 12.1.2.1.2 and later

    Starting with the 12.1.2.1.2 release, Management Server (MS) on the database nodes does not use sudo any more. This means that configuration for sudoers is no longer needed.

  • For releases before 12.1.2.1.2

    For security reasons, Management Server on the database nodes is not run as root. However, it needs root permission to run certain utilities that monitor the system, such as disk status, ILOM, power supply unit, and to send ASRs and alerts. To achieve this, a sudoers configuration file, dbmsvc_sudo_conf, is added to enable the dbmmgmt user to run the utilities with root privilege.

    You should not disable the dbmmgmt service or dbserverd, or edit the sudoers configuration file. If the entries in the file are removed, then the dbmmgmt service may not be able to monitor some parts of the system. For example, if a disk fails, it might not be possible to send an ASR in time, and this may cause a disruption on the database node and delay recovery.

1.11 Changing User IDs and Group IDs for dbmsrv

Starting with Oracle Exadata Database Machine 12.1.2.1.0:

  • The database nodes now run the Management Server (MS). Previously MS ran only on the storage nodes.

  • The database nodes now run a new service called Database Machine Service (dbmsrv). This new service is based on the MS that runs on the storage servers and provides enhanced management capabilities to the database nodes.

To manage the new dbmsrv service, new users and groups were added:

Table 1-3 Users for dbmsrv

User Default User ID

dbmsvc

12137

dbmadmin

12138

dbmmonitor

12139

Table 1-4 Groups for dbmsrv

Group Default Group ID

dbmsvc

11137

dbmadmin

11138

dbmmonitor

11139

dbmusers

11140

You can change the user ID and group ID of the DBM users if there are conflicts with the default values (for example, if you are using LDAP or another directory, or if you are using session management tools that require different values from the default values).

These steps are specific to the DBM Service users and groups only. Do not use them to modify the user and group IDs for other Oracle products.

To change the default DBM user or group IDs:

  1. Shut down the dbserver services. Run the following command as root or the dbmadmin user.
    dbmcli -e alter dbserver shutdown services all
    
  2. Change the group ID of the user. Run the following command as root:
    groupmod -g <NEW_GID> <GROUPNAME>
    

    For example:

    groupmod -g 3001 dbmusers
    
  3. Update the files containing the old group ID. Run the following command as root:
    find / -gid <OLD_GID> -exec chgrp -h <NEW_GID> {} \;
    

    For example:

    find / -gid 11140 -exec chgrp -h 3001 {} \;
    
  4. Change the user ID. This has to be done after changing the group ID or you will get a "GID does not exist" error. Run the following command as root:
    usermod -u <NEW_UID> -g <NEW_GID> <USER>
    

    For example:

    usermod -u 2998 -g 3001 dbmsvc
    
  5. Update the files containing the old user ID. Run the following command as root:
    find / -uid <OLD_UID> -exec chown -h <NEW_ID> {} \;
    

    For example:

    find / -uid 12137 -exec chown -h 2998 {} \;
    
  6. Reset the setuid bit on the dbrsMain executable so it can run. The setuid bit was changed by the chgrp and chown commands. Run the following command as root:
    chmod 6550 /opt/oracle/dbserver/dbms/bin/dbrsMain
    
  7. Restart the dbserver services. Run the following command as the dbmadmin user.
    dbmcli -e alter dbserver startup services all
    

1.12 State of Cell and Database Nodes During Configuration Changes

Table 1-5 shows whether cell and database nodes need to be offline or online when you make a change to a configuration.

Table 1-5 State of Cell and Database Nodes for Operations

Operation Cell Node Database Node

DNS server update

Online

Online

NTP server update

Online

Online

Time zone update

Offline

Online

Admin network IP address, netmask, gateway, or hostname change

Offline

Online

Client network IP address, netmask, gateway, or hostname change

Offline

Online

ILOM IP address change

Offline

Online if the ipmitool sunoem getval/setval command is supported

Other ILOM parameter change

Online if the ipmitool sunoem getval/setval command is supported

Online if the ipmitool sunoem getval/setval command is supported

InfiniBand IP, netmask, or hostname change

Offline

Online

pkey change

Offline

Online

1.13 Rescue Plan

In Exadata releases earlier than 12.2.1.1.0, after a storage server or database server rescue, you need to re-run multiple commands to configure items such as IORM plans, thresholds, and storage server and database server notification setting.

In Oracle Exadata release 12.2.1.1.0, there is a new attribute called rescuePlan for the cell and dbserver objects. When you are done configuring your database servers and storage servers, you should save the value of the rescuePlan attribute to a file. The file should be saved to a remote server because the data on the rescued server will be erased in the event of a rescue. After you rescue the server, you can retrieve the file from the remote server and run the file to restore the settings. See Example 3 below.

For security reasons, the rescue plan does not include configurations that require a password.

Example 1-4 Rescue Plan for a Storage Cell

The rescuePlan attribute for a storage server could look like this:

$ cellcli -e list cell attributes rescuePlan

CREATE ROLE "admin"

GRANT PRIVILEGE all actions ON diagpack all attributes WITH all options TO ROLE "admin"

CREATE ROLE "diagRole"

GRANT PRIVILEGE download ON diagpack all attributes WITH all options TO ROLE "diagRole"

GRANT PRIVILEGE create ON diagpack all attributes WITH all options TO ROLE "diagRole"

GRANT PRIVILEGE list ON diagpack all attributes WITH all options TO ROLE "diagRole"

ALTER CELL accessLevelPerm="remoteLoginEnabled", diagHistoryDays="7", metricHistoryDays="7", notificationMethod="mail,snmp",
 notificationPolicy="warning,critical,clear", snmpSubscriber=((host="localhost", port=162, community="public", type=asr)), 
 bbuLearnCycleTime="2016-10-17T02:00:00-07:00", bbuLearnSchedule="MONTH 1 DATE 17 HOUR 2 MINUTE 0", 
 alertSummaryStartTime="2016-09-21T17:00:00-07:00", alertSummaryInterval=weekly, 
 hardDiskScrubInterval=biweekly, hardDiskScrubFollowupIntervalInDays="14"

ALTER IORMPLAN objective=basic

Example 1-5 Rescue Plan for a Database Server

The rescuePlan attribute for a database server could look like this:

$ dbmcli -e list dbserver attributes rescuePlan

CREATE ROLE "listdbserverattrs"

GRANT PRIVILEGE list ON dbserver ATTRIBUTES bbuStatus, coreCount WITH all options TO ROLE "listdbserverattrs"

ALTER DBSERVER diagHistoryDays="7", metricHistoryDays="7", bbuLearnSchedule="MONTH 1 DATE 17 HOUR 2 MINUTE 0", 
 alertSummaryStartTime="2016-09-26T08:00:00-07:00", alertSummaryInterval=weekly, pendingCoreCount="128" force

Example 1-6 Creating a Rescue Plan script for a cell

The following command stores the commands in the rescuePlan attribute to a file called rescue.cli located on a remote server.

$ cellcli -e list cell attributes rescuePlan >& /location/on/remote/server/rescue_cell.cli

If you need to rescue the server, you can run the script after the server rescue to restore the settings. The following command runs the rescue_cell.cli file using the CellCLI start command:

$ cellcli -e start /location/on/remote/server/rescue_cell.cli

Example 1-7 Creating a Rescue Plan script for a database server

The following command stores the commands in the rescuePlan attribute to a file called rescue_db.cli located on a remote server.

$ dbmcli -e list dbserver attributes rescuePlan >& /location/on/remote/server/rescue_db.cli

If you need to rescue the server, you can run the script after the server rescue to restore the settings. The following command runs the rescue_cell.cli file using the CellCLI start command:

$ dbmcli -e start /location/on/remote/server/rescue_db.cli

1.14 ExaWatcher Charts

ExaWatcher is a utility that collects performance data on the storage servers and database servers of an Exadata system. The data collected includes operating system statistics, such as iostat, cell statistics (cellsrvstat), and network statistics.

To extract the data collected by ExaWatcher, you would run GetExaWatcherResults.sh and specify the start and end time of the desired time range. The results are then placed in a compressed archive file in a directory called ExtractedResults.

For example:

$ GetExaWatcherResults.sh --from 08/24/2016_17:00:00 --to 08/25/2016_17:00:00

Note:

You can use the -c or --scp options with GetExaWatcherResults.sh to copy the resulting archive file to a different location.

In release 12.2.1.1.0, GetExaWatcherResults.sh also generates HTML pages that contain charts for IO, CPU utilization, cell server statistics, and alert history. The IO and CPU utilization charts use data from iostat, CPU detail uses data from mpstat, and cell server statistics use data from cellsrvstat. Alert history will be retrieved for the specified time frame.

You can find the new charts in the resulting archive file. In the archive file, there is a subdirectory named: Charts.ExaWatcher.<hostname>/<timestamp>_<duration>/, for example, Charts.ExaWatcher.xxxxceladm13.oracle.com/2016_08_24_17_00_00_01h00m00s_0.

To view the HTML pages, the archive file needs to be moved to a machine with a local browser that has access to the internet.  The file needs to be uncompressed from a bz2 compressed file, then untar with tar -xvf. Then you can open Charts.ExaWatcher.<hostname>/<timestamp>_<duration>/index.html in a browser. The left panel on that page shows the following menu:

Figure 1-1 ExaWatcher Menu in the Left Panel

Description of Figure 1-1 follows
Description of "Figure 1-1 ExaWatcher Menu in the Left Panel"

Note:

For screen reader users, the menu items are navigated using the UP/DOWN arrow keys and activated using the SPACE bar. The TAB key will move you to the frame on the right side.

The CellSrvStat menu item is available only when run against a storage server. The Alert History menu item is available only if there were alerts during the requested timeframe.

1.14.1 IO Charts

There are two pages for IO statistics:

1.14.1.1 IO Stat Summary

IOStat Summary shows a summary of IO performance for the entire server. The four charts shown in this page are:

Table 1-6 Statistics for IOStat Summary

Statistic Description

Flash IOPs

Hard Disk IOPs

Total reads per second, writes per second, and IO per second (reads per second + writes per second) for the server.

This uses r/s and w/s from iostat.

Flash MB/s

Hard Disk MB/s

Total read MB per second, write MB per second, and IO MB per second.

This uses rsec/s and wsec/s from iostat, converted into MB.

The statistics are shown for flash and hard disks, when applicable. On Exadata Extreme Flash, there are no hard disks. On database servers, there are no flash devices.

If there is a suspected I/O performance problem, the IOPs and the MB/s statistics for the storage servers can be compared to the data sheet to determine if the storage is at maximum capacity. High read times observed on the database can also be correlated to the service time and average wait time from iostat, to determine if the high times could potentially be due to the storage server. Note that the database times would typically include IOs that are satisfied from flash cache, as well as hard disk. In addition, these charts enable you to visualize any peaks during the time frame.

The partial screenshot below shows the IOPs and MB/s charts for flash and hard disk

Figure 1-2 IO Summary Charts

Description of Figure 1-2 follows
Description of "Figure 1-2 IO Summary Charts"

Below each chart, there is a range selector that you can use to drill down to a specific time within the chart. Moving the range selector on any chart affects all charts on the page.

Note:

The range selector is not accessible to screen readers. Also, not all values presented in the chart are accessible to a screen reader. Only the first value of each chart data point is.

Figure 1-3 IO Summary Charts Showing Range Selector

Description of Figure 1-3 follows
Description of "Figure 1-3 IO Summary Charts Showing Range Selector"

When you use the range selector, the displayed chart changes to show only the data for the time range specified by the range selector.

1.14.1.2 I/O Stat Detail

IOStat Detail shows performance for each disk on the storage server. The following charts are shown in this page:

Table 1-7 Statistics for IO Stat Detail

Statistic Description

Flash Service Time

Hard Disk Service Time

Average service time per disk contrasted against the range of wait times.

Flash Wait Time

Hard Disk Wait Time

Average wait time per disk

By default, the charts include a line that depicts the average across all disks on the server. The shaded, background image indicates the minimum and maximum range for the statistic. You can choose to display individual disks by using the drop down selector.

If the background image has a wide range, then this can indicate possible differences in disk performance. You can use this metric to look more closely at each individual disk on the storage server to see if there is an imbalance. If the background image has a narrow range, then that indicates the disks are performing similarly.

The individual disk IOPs and MB/s for a storage server can also be compared to the data sheet numbers to see if the disks are potentially hitting their maximum capacity.

1.14.2 CPU Charts

The CPU charts show CPU utilization for the server. These statistics are from iostat (avg-cpu: %user, %system, %iowait).

1.14.3 CPU Detail

The CPU detail charts show detailed information for CPU usage, including the average CPU utilization per CPU ID. These statistics are from mpstat.

Figure 1-6 CPU Detail Charts

Description of Figure 1-6 follows
Description of "Figure 1-6 CPU Detail Charts"

1.14.4 Cell Server Charts

Cell server statistics are useful for tracking features that are specific to Exadata storage servers. This page displays statistics related to Smart Flash Cache and Smart IOs.

Figure 1-7 Cell Server Charts

Description of Figure 1-7 follows
Description of "Figure 1-7 Cell Server Charts"

1.14.5 Alert History

This page displays alerts that were present during the specified time frame. Alerts may be raised from errors or issues, which may result in IO performance issues on the servers.