1 General Maintenance Information

This chapter contains topics about general maintenance of Oracle Exadata Database Machine.

Note:

  • All procedures in this chapter are applicable to Oracle Exadata and Oracle Exadata Storage Expansion Rack.
  • For ease of reading, the name "Oracle Exadata Rack" is used when information refers to both Oracle Exadata and Oracle Exadata Storage Expansion Rack.

1.1 Overview of Roles and Responsibilities

You should determine which individuals or groups are responsible for resolving any issue that arises.

Most IT organizations have teams of database administrators, system administrators, network administrators, and storage administrators. These administrators are responsible for system implementation and ongoing operations. In an Oracle Exadata environment, it is usually more efficient and effective to have the database administrator to take the lead role for Oracle Exadata management, with assistance from the system administrator. This is because Oracle Exadata is engineered to run Oracle Database, and administration is specific to Oracle Database and Oracle Exadata System Software. The other teams may have distinct responsibilities or be a second level of support to provide assistance.

Usually there is one individual or group that has primary responsibility for any issue that arises. This individual or group receives the first contact from Oracle Enterprise Manager Cloud Control, the help desk, or operations team when there is an issue on the system. For Oracle Exadata, the primary contact is typically the database administrator. If the database administrator needs assistance from another team to resolve the issue, then they collaborate to resolve the issue. Ownership of the issue should remain clear.

1.1.1 Common Administrative Tasks for Oracle Exadata Management

Initial system deployment is usually performed by Oracle engineers. The primary responsibilities for the database administrator begin with typical operational tasks.

Table 1-1 Common Administrative Tasks for Oracle Exadata Management

Task or Event Administrator Actions

Slow performance

Database administrator

System Administrator

Receive alerts from Oracle Enterprise Manager Cloud Control that performance thresholds have been exceeded.

Review system performance, CPU, memory and I/O on all servers for unusual trends.

Review database performance, wait events, locking, parallelism, and execution plans.

Patch application or upgrades

Database administrator

System Administrator

Apply Oracle Exadata System Software patches or upgrades, and RDMA Network Fabric switch firmware upgrades.

Apply Oracle Database patches, Oracle Grid Infrastructure patches or upgrades.

System outage or failure

Database administrator

System Administrator

Connect to Integrated Lights Out Manager (ILOM), verify current system state, identify hardware issue or restart system, and review logs for root cause analysis.

Check running instances for errors, monitor performance on running instances, and verify application functionality has not been disrupted.

Suspected network issues

Database administrator

System Administrator

Inspect network interfaces for errors or dropped packets, check if any switches have restarted, and escalate to network administration team, as needed.

Inspect database-side performance to assess impact, if any.

Backup database

Database administrator

System Administrator

Run database backup routines, and ensure database server backups are completed.

Failed disk replacement

Database administrator

System Administrator

Receive alerts about hardware replacement, verify Oracle Auto Service Request (ASR) has opened a service request, and verify operators will allow field service technician in the data center to replace drive or provide spare drive.

1.1.2 Understanding the Administrative Differences with Oracle Exadata

Most administration tasks are similar on Oracle Exadata servers as on traditional database servers and storage servers, but there are some differences.

The following list shows the differences and exceptions for Oracle Exadata servers:

  • Configuration settings for Oracle Exadata database servers, RDMA Network Fabric switches, and other components have settings based on testing and performance criteria. Changing the configuration settings, such as database server firmware or kernel parameters, based on company policy or other reasons should be reviewed for the potential impact to Oracle Exadata.

  • Restarting a server incorrectly can disrupt the database. The storage servers have special procedures and guidelines that must be followed to minimize disruption, such as off-lining grid disks before restarting the server, and not restarting more than one server at a time.

  • Storage servers cannot be modified the same way as the database servers. Network changes, such as those for the NTP servers or DNS servers, are done using the ipconf utility. Network changes cannot be done manually by editing the configuration files. In addition, no software or additional packages can be installed on the storage servers. This restriction includes monitoring software. Storage server system updates are provided by Oracle Exadata System Software upgrades.

  • Storage servers do not require backups. A self-maintained internal USB drive or M.2 device that can be used for cell recovery. Backup clients cannot be installed on the storage servers.

  • Oracle wait events in Oracle Real Application Clusters (Oracle RAC) databases using storage servers may include events with %cell% in the name. These events are related to the storage servers.

  • The Oracle Database V$CELL views include rows for any database using Oracle Exadata Storage Server.

  • Oracle Automatic Storage Management (Oracle ASM) disk path names are of the format o/cell_ip_address/cell_griddisk_name, such as the following:

    o/192.168.10.1/data_CD_01_dm01cel01
    
  • SQL plans may include storage to indicate that some operations may be off-loaded to the storage servers.

  • Operations such as backup and recovery use Oracle Recovery Manager (RMAN), and all data for backup and recovery continues to pass through the database instances. The backup clients for RMAN should be installed on the database servers in Oracle Exadata to facilitate integration with enterprise backup solutions in the same way as in traditional environments.

  • The practice of deploying one or more non-production environments for development, testing and quality assurance still apply for Oracle Exadata environments.

1.2 Powering On and Off Oracle Exadata Rack

This section contains the procedures for powering on and off the components of an Oracle Exadata Rack.

1.2.1 Non-emergency Power Procedures

When the outage is planned, use these procedures for powering on and off the components of Oracle Exadata Rack in an orderly fashion.

1.2.1.1 Powering On Oracle Exadata Rack

Oracle Exadata Rack is powered on by either pressing the power button on the front of the servers, or by logging in to the ILOM interface, and applying power to the system. When a database server is powered on and the operating system boots, Oracle Clusterware is automatically started, if it is installed. Oracle Clusterware then starts all resources that are configured to start automatically.

The power on sequence is as follows:

  1. Rack, including switches.

    Ensure the switches have had power applied for a few minutes to complete power-on configuration before starting Exadata Storage Servers.

  2. Exadata Storage Servers.

    Ensure all Exadata Storage Servers complete the boot process before starting the database servers. This may take five to ten minutes before all services start.

  3. Database servers.
1.2.1.2 Powering On Servers Remotely using ILOM

Servers can be powered on remotely using the Integrated Lights Out Manager (ILOM) interface.

The ILOM can be accessed using the Web console, the command-line interface (CLI), IPMI, or SNMP. For example, to apply power to server dm01cel01 using IPMI, where dm01cel01-ilom is the host name of the ILOM for the server to be powered on, run the following command from a server that has IPMItool installed:

# ipmitool -I lanplus -H dm01cel01-ilom -U root chassis power on

The preceding command causes the system to prompt for the password.

1.2.1.3 Powering Off Oracle Exadata Rack

Power off the components of the Oracle Exadata Rack in the correct order.

The power off sequence for Oracle Exadata Rack is as follows:

  1. Database servers (Oracle Exadata only).
  2. Exadata Storage Servers.
  3. Rack, including switches.
1.2.1.3.1 Powering Off Database Servers

When powering off database servers, Oracle Clusterware should be stopped prior to restarting or shutting down a database server. Oracle Clusterware is stopped using the following command:

crsctl stop cluster

The following procedure is the recommended shutdown procedure for database servers:

  1. Stop Oracle Clusterware using the following command:
    # GRID_HOME/grid/bin/crsctl stop cluster
    

    If any resources managed by Oracle Clusterware are still running after issuing the crsctl stop cluster command, then the command fails. Use the -f option to unconditionally stop all resources, and stop Oracle Clusterware.

  2. Shut down the operating system using the following command:
    # shutdown -h now
    
1.2.1.3.2 Powering Off Oracle Exadata Storage Servers

Oracle Exadata Storage Servers are powered off and restarted using the Linux shutdown command.

The following command shuts down Oracle Exadata Storage Server immediately:

# shutdown -h now

When powering off Oracle Exadata Storage Servers, all storage services are automatically stopped.

If you use the -r option, then the shutdown command shuts down and then restarts Oracle Exadata Storage Server. The -now option indicates you want to stop the server immediately.

# shutdown -r now

Another system command to reboot a server is the reboot command. However, shutdown -r now is the preferred command to restart a server. You should never use the command reboot -f command to shut down Oracle Exadata Storage Servers.

Note:

Do not perform successive shutdown or reboot commands, which is essentially the same as reboot -f.

Note the following when powering off Oracle Exadata Storage Server:

  • All database and Oracle Clusterware processes should be shut down prior to shutting down more than one Oracle Exadata Storage Server.

  • Powering off one Oracle Exadata Storage Server does not affect running database processes or Oracle Automatic Storage Management (Oracle ASM).

  • Powering off or restarting Oracle Exadata Storage Servers can impact database availability.

  • The shutdown commands can be used to power off or reboot Oracle Exadata Storage Server.

See Also:

  • "Shutting Down Exadata Storage Server" if the databases or Oracle Clusterware will remain operational while powering down Oracle Exadata Storage Server

  • SHUTDOWN(8) manual page for details.

1.2.1.3.3 Powering Off Multiple Servers at the Same Time

The dcli utility can be used to run the shutdown command on multiple servers at the same time. Do not run the dcli utility from a server that will be shut down. For example, to shut down all Exadata Storage Servers using the dcli utility, run the command from a database server. The following command shows the command syntax:

# dcli -l root -g group_name shutdown -h now

In the preceding syntax, group_name is the file that contains a list of all Exadata Storage Servers, cell_group, or database servers, dbs_group.

The following command shows the syntax to shut down all Exadata Storage Servers at the same time:

# dcli -l root -g cell_group shutdown -h now

Example 1-1 shows the power off procedure for Oracle Exadata Rack when using the dcli utility to shut down multiple servers at the same time. The commands are run from a database server.

Example 1-1 Powering Off Oracle Exadata Rack Using the dcli Utility

  1. Stop Oracle Clusterware on all database servers using the following command:

    # GRID_HOME/grid/bin/crsctl stop cluster -all
    
  2. Shut down all remote database servers using the following command:

    # dcli -l root -g remote_dbs_group shutdown -h now
    

    In the preceding command, remote_dbs_group is the file that contains a list of all the remote database servers.

  3. Shut down all Exadata Storage Servers using the following command:

    # dcli -l root -g cell_group shutdown -h now
    

    In the preceding command, cell_group is the file that contains a list of all Exadata Storage Servers.

  4. Shut down the local database server using the following command:

    shutdown -h now
    
  5. Remove power from the rack.

1.2.1.4 Powering On and Off Network Switches

The network switches do not have power switches. They power off when power is removed, by way of the power distribution unit (PDU) or at the breaker in the data center.

1.2.2 Emergency Power-off Considerations

If there is an emergency, then power to Oracle Exadata Rack should be halted immediately. The following emergencies may require powering off Oracle Exadata Rack:

  • Natural disasters such as earthquake, flood, hurricane, tornado, or cyclone.

  • Unusual noise, smell, or smoke coming from the machine.

  • Threat to human safety.

1.2.2.1 Emergency Power-off Procedure

To perform an emergency power-off procedure for Oracle Exadata Rack, turn off power at the circuit breaker or pull the emergency power-off switch in the computer room. After the emergency, contact Oracle Support Services to restore power to the machine.

1.2.2.2 Emergency Power-off Switch

Emergency power-off (EPO) switches are required when computer equipment contains batteries capable of supplying more than 750 volt-amperes for more than five minutes. Systems that have these batteries include internal EPO hardware for connection to a site EPO switch or relay. Use of the EPO switch removes power from Oracle Exadata Rack.

1.2.3 Cautions and Warnings

The following cautions and warnings apply to Oracle Exadata Rack:

  • Do not touch the parts of this product that use high-voltage power. Touching them might result in serious injury.

  • Do not power off Oracle Exadata Rack unless there is an emergency. In that case, follow the Emergency Power-off Procedure.

  • Keep the front and rear cabinet doors closed. Failure to do so might cause system failure or result in damage to hardware components.

  • Keep the top, front, and back of the cabinets clear to allow proper airflow and prevent overheating of components.

  • Use only the supplied hardware.

1.3 Using Auto Service Request to Manage Hardware Faults

Auto Service Request (ASR) is designed to automatically open service requests when specific Oracle Exadata Rack hardware faults occur.

1.3.1 Understanding Auto Service Request

When a hardware problem is detected, Oracle ASR Manager submits a service request to Oracle Support Services. In many cases, Oracle Support Services can begin work on resolving the issue before the database administrator is even aware the problem exists. Oracle Auto Service Request (ASR) is designed to automatically open service requests when specific Oracle Exadata hardware faults occur.

To enable this feature, the Oracle Exadata components must be configured to send hardware fault telemetry to the Oracle ASR Manager software. This service covers components in storage servers and Oracle Database servers, such as disks and flash cards.

Oracle ASR Manager must be installed on a server that has connectivity to Oracle Exadata, and an outbound Internet connection using HTTPS or an HTTPS proxy. Oracle recommends that Oracle ASR Manager be installed on a server outside of Oracle Exadata. The following are some of the reasons for the recommendation:

  • If the server or the rack containing Oracle ASR Manager goes down, then Oracle ASR Manager is unavailable for all of the Oracle Exadata components that it supports. This is very important to consider when several Oracle Exadata systems use the Oracle ASR Manager.

  • In order to submit a service request (SR), the server must be able to access the Internet.

Note:

Oracle ASR can only use the management network. Ensure the management network is set up to allow Oracle ASR to run.

Prior to using Oracle ASR, the following must be set up:

  • Oracle Premier Support for Systems or Oracle/Sun Limited Warranty

  • Technical contact responsible for Oracle Exadata

  • Valid shipping address for Oracle Exadata parts

An e-mail message is sent to the technical contact for the activated asset to notify the creation of the service request. The following are examples of the disk failure Simple Network Management Protocol (SNMP) traps sent to Oracle ASR Manager.

Note:

  • Oracle ASR is applicable only for component faults. Not all component failures are covered, though the most common components such as disk, fan, and power supplies are covered.

  • Oracle ASR is not a replacement for other monitoring mechanisms, such as SMTP, and SNMP alerts, within the customer data center. Oracle ASR is a complementary mechanism that expedites and simplifies the delivery of replacement hardware. Oracle ASR should not be used for downtime events in high-priority systems. For high-priority events, contact Oracle Support Services directly.

  • There are occasions when a service request may not be automatically filed. This can happen because of the unreliable nature of the SNMP protocol, or loss of connectivity to the Oracle ASR Manager. Oracle recommends that customers continue to monitor their systems for faults, and call Oracle Support Services if they do not receive notice that a service request has been automatically filed.

  • Oracle ASR can monitor Sun Datacenter InfiniBand Switch 36 switches that have firmware release 2.1.2 and later in Oracle Exadata systems running Oracle Exadata System Software release 11.2.3.3.0 or later. Switches may need a field engineer to set the entitlement serial number.

Example 1-2 Example of Exadata Storage Server SNMP Trap

This example shows the SNMP trap for a storage server disk failure. The corresponding hardware alert code has been highlighted.

2011-09-07 10:59:54 server1.example.com [UDP: [192.85.884.156]:61945]:
RFC1213-MIB::sysUpTime.0 = Timeticks: (52455631) 6 days, 1:42:36.31
SNMPv2-SMI::snmpModules.1.1.4.1.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapHardDriveFault
SUN-HW-TRAP-MIB::sunHwTrapSystemIdentifier = STRING: Sun Oracle Database Machine
1007AK215C
SUN-HW-TRAP-MIB::sunHwTrapChassisId = STRING: 0921XFG004
SUN-HW-TRAP-MIB::sunHwTrapProductName = STRING: SUN FIRE X4270 M2 SERVER
SUN-HW-TRAP-MIB::sunHwTrapSuspectComponentName = STRING: SEAGATE ST32000SSSUN2.0T;
Slot: 0SUN-HW-TRAP-MIB::sunHwTrapFaultClass = STRING: NULL
SUN-HW-TRAP-MIB::sunHwTrapFaultCertainty = INTEGER: 0
SUN-HW-TRAP-MIB::sunHwTrapFaultMessageID = STRING: HALRT-02001
SUN-HW-TRAP-MIB::sunHwTrapFaultUUID = STRING: acb0a175-70b8-435f-9622-38a9a55ee8d3
SUN-HW-TRAP-MIB::sunHwTrapAssocObjectId = OID: SNMPv2-SMI::zeroDotZero
SUN-HW-TRAP-MIB::sunHwTrapAdditionalInfo = STRING: Exadata Storage Server: 
cellname  Disk Serial Number:   E06S8K 
server1.example.com failure trap. 

Example 1-3 Example of Oracle Database Server SNMP Trap

This example shows the SNMP trap from an Oracle database server disk failure. The corresponding hardware alert code has been highlighted.

2011-09-09 10:59:54 dbserv01.example.com [UDP: [192.22.645.342]:61945]:
RFC1213-MIB::sysUpTime.0 = Timeticks: (52455631) 6 days, 1:42:36.31
SNMPv2-SMI::snmpModules.1.1.4.1.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapHardDriveFault
SUN-HW-TRAP-MIB::sunHwTrapSystemIdentifier = STRING: Sun Oracle Database Machine
1007AK215C
SUN-HW-TRAP-MIB::sunHwTrapChassisId = STRING: 0921XFG004
SUN-HW-TRAP-MIB::sunHwTrapProductName = STRING: SUN FIRE X4170 M2 SERVER
SUN-HW-TRAP-MIB::sunHwTrapSuspectComponentName = STRING: HITACHI H103030SCSUN300G
Slot: 0SUN-HW-TRAP-MIB::sunHwTrapFaultClass = STRING: NULL
SUN-HW-TRAP-MIB::sunHwTrapFaultCertainty = INTEGER: 0
SUN-HW-TRAP-MIB::sunHwTrapFaultMessageID = STRING: HALRT-02007
SUN-HW-TRAP-MIB::sunHwTrapFaultUUID = STRING: acb0a175-70b8-435f-9622-38a9a55ee8d3
SUN-HW-TRAP-MIB::sunHwTrapAssocObjectId = OID: SNMPv2-SMI::zeroDotZero
SUN-HW-TRAP-MIB::sunHwTrapAdditionalInfo = STRING: Exadata Database Server: db03 
Disk Serial Number: HITACHI H103030SCSUN300GA2A81019GGDE5E 
dbserv01.example.com failure trap. 

1.3.2 Installing and Configuring ASR

Oracle recommends installing Oracle Auto Service Request (ASR) on a standalone server running Oracle Solaris or Oracle Linux.

After installation is complete, configure fault telemetry destinations for the servers on Oracle Exadata. The Oracle Exadata servers can be set up during initial configuration. Oracle Exadata Deployment Assistant (OEDA) collects the configuration information, and then configures the servers.

Note:

When configuring the Integrated Lights Out Manager (ILOM) alert settings, do not remove the rules at the top of the rule list. To add a new rule, enter the rule at the bottom of the rule list.
  • To install and configure Oracle ASR after initial configuration, refer to the installation and configuration information available in Oracle Auto Service Request Quick Installation Guide for Oracle Exadata Database Machine.

1.4 Monitoring the System Using Oracle Enterprise Manager Cloud Control

Oracle Exadata Database Machine can be monitored by Oracle Enterprise Manager Cloud Control agents using the Oracle Exadata Plug-in and the Oracle Systems Infrastructure Plug-in. The Oracle Exadata Database Machine is discovered and monitored as a system target in Oracle Enterprise Manager Cloud Control. Individual database servers, storage servers, and switches are grouped together under the system target for the Oracle Exadata Database Machine so they can be monitored as a group

The Oracle Exadata Storage Server metrics are collected and managed by Management Server (MS). When used with Oracle Enterprise Manager Cloud Control, the metrics are presented as Oracle Enterprise Manager Cloud Control metrics.

All Exadata server alerts are delivered to Oracle Enterprise Manager Cloud Control using SNMP. The Exadata hardware and software components are monitored by Integrated Lights Out Manager (ILOM) and Oracle Exadata System Software in the following ways:

  • Hardware components are monitored by ILOM. When a hardware component reports a failure or an exceeded threshold, ILOM reports the failure as an SNMP trap to MS. MS processes the trap, creates an alert, and delivers the alert to the Oracle Enterprise Manager Cloud Control agent.

  • Hardware and software components are also monitored by MS directly. When a failure or threshold is exceeded, MS processes the trap, creates an alert, and delivers the alert to the Oracle Enterprise Manager Cloud Control agent.

From the end-user perspective, there is no difference between the two types of alerts. The alert message contains the corrective action to resolve the alert.

See Also:

1.5 Monitoring the System Using Oracle Configuration Manager

Oracle Configuration Manager collects configuration information and uploads it to the Oracle repository.

When the configuration information is uploaded daily, Oracle Support Services can analyze the data and provide better service. When a service request is logged, the configuration data is associated with the service request. The following are some of the benefits of Oracle Configuration Manager:

  • Reduced time for problem resolution

  • Proactive problem avoidance

  • Improved access to best practices, and the Oracle knowledge base

  • Improved understanding of the customer's business needs

  • Consistent responses and services

The Oracle Configuration Manager software is installed and configured in each ORACLE_HOME directory on a server. For clustered databases, only one instance is configured for Oracle Configuration Manager. A configuration script is run on every database on the server. The Oracle Configuration Manager collectors then send their data to a centralized Oracle repository.

1.6 Changing Component Passwords

The passwords for the components can be changed after initial configuration.

1.6.1 Changing the Database Server Passwords

The user accounts and GRUB passwords can be changed on the database servers. The default user accounts on the database server are root, and the software owner account. Typically the software owner account is oracle or grid.

1.6.1.1 Changing the User Account Password on the Database Server

As the root user, use the passwd command to change operating system user passwords.

At the operating system prompt, use the following command, where user_name identifies the user account that you want to change:

# passwd user_name
1.6.1.2 Changing the GRUB Account Password on the Database Server

Use the host_access_control script to change the GRUB account password on database servers.

Run the following command as the root user to change the GRUB account password on the database server:

# /opt/oracle.cellos/host_access_control grub-password

1.6.2 Changing the Exadata Storage Server Passwords

As the root user, use the passwd command to change operating system user passwords on Exadata Storage Server.

The default user accounts on Exadata Storage Servers are root, celladmin, and cellmonitor.

At the operating system prompt, use the following command, where user_name identifies the user account that you want to change:

# passwd user_name

1.6.3 Changing the Power Distribution Unit Password

The default account user for the power distribution unit (PDU) is admin. The following procedure describes how to change the password for the PDU:

  1. Use a Web browser to access the PDU metering unit by entering the IP address for the unit in the address line of the browser. The Current Measurement page appears.
  2. Click Network Configuration in the upper left of the page.
  3. Log in as the admin user on the PDU metering unit.
  4. Locate the Admin/User fields. Only letters and numbers are allowed for user names and passwords.
  5. Enter up to five users and passwords in the Admin/Users fields.
  6. Designate each user to be either an administrator or user.
  7. Click Submit to set the users and passwords.

1.6.4 Changing the ILOM Password

The default user account for the Integrated Lights Out Manager (ILOM) is root. The following procedure describes how to change the password for the ILOM:

  1. Log in to the ILOM using SSH.
  2. Use the following command to change the password:
    set /SP/users/user_name password
    

    In the preceding command, user_name is the user account to be changed. The following is an example of the command:

    set /SP/users/user1 password
    
    Changing password for user /SP/users/user1/password...
    Enter new password:********
    Enter new password again:********
    New password was successfully set for user /SP/users/user1
    

1.6.5 Changing the InfiniBand Switch Password

This procedure describes how to change a password for the InfiniBand switch.

The default user accounts for the InfiniBand switch are root, ilom-admin, ilom-user, ilom-operator, and nm2user.

  1. Log in to the InfiniBand switch with SSH using the following command:
    ssh user_name@switch_name
    

    In the preceding command, user_name is the name of the user, and switch_name is the name of the InfiniBand switch.

  2. Check the firmware version of the switch.
  3. Use the ILOM to change the password using the following commands:
    ssh -l ilom-admin switch_name
    
    set /SP/users/user_name password
    

See Also:

Sun Datacenter InfiniBand Switch 36 Administration Guide for Firmware Version 2.1 at http://docs.oracle.com/cd/E36265_01/html/E36266/gentextid-2626.html#scrolltoc

1.6.6 Changing the Cisco Management Network Switch or RoCE Network Fabric Switch Password

To change the password for the Cisco Management Network Switch or RDMA over Converged Ethernet (RoCE) switch, you must connect to the switch and run a command on the switch.

1.6.6.1 Changing the Password for Cisco 93xx Switches

Use the change-password command to change the password for the Cisco Nexus 93108-1G, Cisco Nexus 9348, or Cisco Nexus 9336C-FX2 switch.

  1. Access the switch using ssh or via the serial port.
    my_host$ ssh admin@my_switch 
    User Access Verification 
    Password:
  2. Change the password.
    switch# change-password
      Enter old password:
      Enter new password:
      Confirm new password: 
  3. Copy the running configuration to the startup configuration.
    switch# copy running-config startup-config
  4. Exit from the session.
    switch# exit
1.6.6.2 Changing the Cisco 4948 Ethernet Switch Password

You can change the passwords for both serial port access and SSH access to the switch.

There are two different methods to access the switch. One is through a serial port and the other is through ssh.

  • Serial port: When using serial port access, there are no user accounts, so the enable password is all that is required.

  • ssh: When the switch is accessed via ssh, you must supply a user account and password before you can change the password.

During the installation of the system, the admin user is created and you can use this user to access the switch using ssh.
1.6.6.2.1 Changing the Cisco 4948 Ethernet Switch Password for Serial Port Access

You can change the passwords for both serial port access and SSH access to the switch.

  1. Access the switch using telnet, ssh, or via the serial port.
    If you use the serial port for access, you will not be prompted for a user name or password, you will just get the prompt.
    my_host> ssh admin@my_switch 
    Using keyboard-interactive authentication. 
    Password:

    Note:

    • In Oracle Linux 5 update 5 or higher, telnet was removed for security reasons. You may need to install telnet client package in the compute node before you can access the switch using telnet.

    • Before using SSH to access the switch, you must enable SSH in the switch following the steps described in "Configuring SSH on Cisco Catalyst 4948 Ethernet Switch" (My Oracle Support Doc ID 1415044.1).

  2. Change to enable mode.
    Switch> enable
    
  3. Set the password.
    Switch# configure terminal
    Enter configuration commands,one per line.End with CNTL/Z.
    Switch(config)# no enable password 
    Switch(config)# enable secret new_password 
    Switch(config)# end 
    Switch# write memory
    *Sep 15 14:25:05.893:%SYS-5-CONFIG_I:Configured from console by console
    Building configuration...
    Compressed configuration from 2502 bytes to 1085 bytes [OK ]
    
  4. Save the current configuration.
    Switch# copy running-config startup-config
    
  5. Exit from the session.
    Switch# exit
    
1.6.6.2.2 Changing the Cisco 4948 Ethernet Switch Password for Telnet or SSH Access

You can change the passwords for both serial port access and SSH access to the switch.

  1. Access the switch using telnet, ssh, or via the serial port.
    If you use the serial port for access, you will not be prompted for a user name or password, you will just get the prompt.
    my_host> ssh admin@my_switch 
    Using keyboard-interactive authentication. 
    Password:
  2. Change to enable mode.
    Switch> enable
    Password:
  3. Verify the password will be sent in encrypted format.

    Use the following command to check that service password configuration is set to -encryption.

    Switch# show running-config all | include service password-encryption
    service password-encryption

    If this is set to no service password-encryption, then passwords will be sent in clear text. You can change this setting, as shown in Step 5.

  4. Enter configuration mode.
    Switch# configure terminal 
    Enter configuration commands, one per line. End with CNTL/Z. 
    
  5. If password encryption is set to no service password-encryption, then change it to service password-encryption.
    Switch(config)# service password-encryption
  6. Change the password for a specific user.
    Switch(config)# username user_name password new_password
  7. Exit configuration mode and save the changes.
    Switch(config)# end 
    Switch# write memory

1.6.7 Changing the KVM Password

The default user account for the KVM is Admin. The following procedure describes how to change the password for the KVM:

  1. Pull the KVM tray out from the front of the rack, and open it using the handle.
  2. Touch the touch pad.
  3. Toggle between the host and KVM interface by pressing the Ctrl key on the left side twice, similar to a double-click on a mouse.
  4. Select Local from User Accounts.
  5. Click Admin under Users.
  6. Set a password for the Admin account. Do not modify any other parameters.
  7. Click Save.

1.7 Determining the Server Model

Use the exadata.img.hw command to determine the model of the cell or database server.

/usr/sbin/exadata.img.hw --get model

Reference the following table for the server model names and numbers for Oracle Exadata.

Table 1-2 Oracle Exadata Server Models

Oracle Exadata Database Server Model Exadata Storage Server Model

Oracle Exadata X10M

ORACLE SERVER E5-2L

ORACLE SERVER X10-2L (High Capacity)

ORACLE SERVER X10-2L_EXTREME_FLASH

Oracle Exadata X9M-2

ORACLE SERVER X9-2

ORACLE SERVER X9-2L (High Capacity)

ORACLE SERVER X9-2L_EXTREME_FLASH

Oracle Exadata X9M-8

ORACLE SERVER X8-8

ORACLE SERVER X9-2L (High Capacity)

ORACLE SERVER X9-2L_EXTREME_FLASH

Oracle Exadata X8M-2

ORACLE SERVER X8-2

ORACLE SERVER X8-2L (High Capacity)

ORACLE SERVER X8-2L_EXTREME_FLASH

Oracle Exadata X8-2

ORACLE SERVER X8-2

ORACLE SERVER X8-2L (High Capacity)

ORACLE SERVER X8-2L_EXTREME_FLASH

Oracle Exadata X8M-8

ORACLE SERVER X8-8

ORACLE SERVER X8-2L (High Capacity)

ORACLE SERVER X8-2L_EXTREME_FLASH

Oracle Exadata X8-8

ORACLE SERVER X8-8

ORACLE SERVER X8-2L (High Capacity)

ORACLE SERVER X8-2L_EXTREME_FLASH

Oracle Exadata X7-2

ORACLE SERVER X7-2

ORACLE SERVER X7-2L (High Capacity)

ORACLE SERVER X7-2L_EXTREME_FLASH

Oracle Exadata X7-8

ORACLE SERVER X7-8

ORACLE SERVER X7-2L (High Capacity)

ORACLE SERVER X7-2L_EXTREME_FLASH

Oracle Exadata X6-2

ORACLE SERVER X6-2

ORACLE SERVER X6-2L

ORACLE SERVER X6-2L_EXTREME_FLASH

Oracle Exadata X6-8

ORACLE SERVER X5-8

ORACLE SERVER X6-2L

ORACLE SERVER X6-2L_EXTREME_FLASH

Oracle Exadata X5-2

ORACLE SERVER X5-2

ORACLE SERVER X5-2L

Oracle Exadata X5-8

ORACLE SERVER X5-8

ORACLE SERVER X5-2L

Oracle Exadata X4-2

SUN SERVER X4-2

SUN SERVER X4-2L

Oracle Exadata X4-8 Full Rack

SUN SERVER X4-8

SUN SERVER X4-2L

ORACLE SERVER X5-2L

Oracle Exadata X3-2

SUN FIRE X4170 M3

SUN FIRE X4270 M3

Oracle Exadata X3-8 Full Rack

Sun Fire X4800 M2

SUN FIRE X4270 M3

Oracle Exadata X2-2

SUN FIRE X4170 M2 SERVER

SUN FIRE X4270 M2 SERVER

Oracle Exadata X2-8 Full Rack

Sun Fire X4800 or Sun Fire X4800 M2

SUN FIRE X4270 M2 SERVER

1.8 Monitoring Ambient Temperature of Servers

Maintaining environmental temperature conditions within design specification for Oracle Exadata Rack helps achieve maximum efficiency and targeted component service lifetimes. The impact of validating the ambient temperature range is minimal. The impact of corrective actions will vary depending on the environmental conditions.

The ambient temperature range for operating Oracle Exadata Rack is 5 to 32 degrees Celsius (41 to 89.6 degrees Fahrenheit). However, for maximum efficiency and component longevity, the optimal temperature range is 21 to 23 degrees Celsius (70 to 74 degrees Fahrenheit). For details, see Temperature and Humidity Requirements.

Use the following command as the root user on the first database server in the cluster to verify the temperature on all servers in the cluster:

dcli -g /opt/oracle.SupportTools/onecommand/all_group -l root 'ipmitool sunoem cli "show /SYS/T_AMB" | grep value'

The following is an example of the output from the command:

dm01db01: value = 21.440 degree C
dm01db02: value = 21.440 degree C
dm01db03: value = 22.190 degree C
...
dm01db08: value = 21.940 degree C
dm01cel01: value = 22.000 degree C
dm01cel02: value = 22.000 degree C
dm01cel03: value = 23.000 degree C
...
dm01cel14: value = 22.080 degree C

If the output is outside the ambient temperature range, then investigate and correct the problem. The following items should be checked:

  • Sufficient air flow into the rack

  • Room temperature is within the specified range

  • Rear of rack is clear of obstructions

1.9 Replacing a Disk Controller Battery Backup Unit

The disk controller battery backup unit (disk controller BBU) resides on a drive tray in the database and Exadata Storage Servers. The disk controller BBU can be replaced without downtime for the server or the applications running on the server.

Note:

The procedures in this section do not apply to on-controller battery backup units. Replacement of those units require a system shutdown because the system must be opened to access the controller card.

1.9.1 Replacing a Disk Controller BBU on a Database Server

This section describes how to replace a disk controller BBU on a database server.

Note:

After any maintenance procedure, Oracle recommends using the Oracle EXAchk tool. The tool is available with My Oracle Support note 1070954.1.

The high-level steps are:

1.9.1.1 Step 1: Prepare the Disk Controller BBU for Removal

The method of removing the Disk Controller BBU depends on the Oracle Exadata Database Machine model.

On certain Oracle Exadata Database Machine X3-2, X4-2, and X4-8 database nodes, and Oracle Exadata Database Machine X3-2, X4-2, X3-8, and X4-8 storage servers, the BBU is remote mounted and does not require a system shutdown to be accessed. However you must still prepare it for removal from the RAID HBA to avoid the risk of data corruption to the disk volumes.

Note:

There is no remote mount BBU option for Oracle Exadata Database Machine X3-8 database nodes.
1.9.1.1.1 Preparing Systems with Remote Mount BBU

Describes how to prepare to remove the disk controller BBU on systems with remote mount BBU.

If your system does not have a remote mount BBU, see Preparing Systems That Do Not Have a Remote Mount BBU.

  1. Log in as the root user.
  2. Get the version of the image that is running on the server in the rack that requires service.
    # imageinfo -ver
    11.2.3.2.1.130302
    

    The version is the first five parts, such as 11.2.3.2.1 in the example. The last part is the image date.

  3. Prepare the disk controller BBU for removal.

    If you are running version 12.1.2.1.0 or later:

    1. Drop the disk controller BBU for replacement:
      DBMCLI> alter dbserver bbu drop for replacement
    2. Verify the BBU status has been updated.
      DBMCLI> list dbserver attributes bbustatus -
               dropped for replacement

    If you are running versions between 11.2.3.3.0 and 12.1.2.1.0:

    1. Drop the disk controller BBU for replacement.
      # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -drop_bbu_for_replacement
      
    2. Verify the status has been updated.
      # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -list_bbu_status
      BBU status: dropped for replacement.
      

    If you are running version 11.2.3.2.x:

    1. Locate the server in the rack being serviced, and turn on the indicator light.

      Exadata Storage Servers are identified by a number 1 through 18, where 1 is the lowest Storage Server in the rack installed in RU2, counting up to the top of the rack.

      Exadata Database Nodes are identified by a number 1 through 8, where 1 is the lowest most database node in the rack installed in RU16.

      Turn on the locate indicator light for easier identification of the server being serviced. If the server number has been identified, then the Locate Button on the front panel may be pressed. To turn on the indicator light remotely, use any of the following methods:

      • From a login to the CellCli on Exadata Storage Servers:

        CellCli> alter cell led on
        
      • From a login to the server's ILOM:

        -> set /SYS/LOCATE value=Fast_Blink
        
      • From a login to the server's root account:

        # ipmitool chassis identify force
        Chassis identify interval: indefinite
        
    2. Check that HBA can see the battery and its current status.

      Note:

      If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.

      # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
      

      The default output should show that the battery is still visible and may show low voltage or other issues depending on the fault. It may return an error reading the BBU if it is hard failed and no longer accessible to the HBA.

    3. Verify the current cache policy for all logical volumes.
      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      

      The default cache policy should be WriteBack for all volumes. If the battery is functioning normally it will report as current cache policy WriteBack. However if it is failed it may report current cache policy as WriteThrough.

    4. Set the cache policy for all logical volumes to WriteThrough cache mode, which does not use the battery.
      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    5. Verify the current cache policy for all logical volumes is now WriteThrough.
      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      
1.9.1.1.2 Preparing Systems That Do Not Have a Remote Mount BBU

This topic describes how to prepare to remove the disk controller BBU on systems without a remote mount BBU.

If the system does not have the remote mounted battery installed, you need to shut down the node for which the battery requires replacement.

If your system has a remote mount BBU, see Preparing Systems with Remote Mount BBU.

  1. Revert all the RAID disk volumes to WriteThrough mode

    This ensures that all data in the RAID cache memory is flushed to disk and is not lost when the battery is replaced.

    Note:

    If you are running Oracle Exadata System Software 19.1.0 or above substitute /opt/MegaRAID/storcli/storcli64 for opt/MegaRAID/MegaCli/MegaCli64 in the following commands.
    1. Set all logical volumes cache policy to WriteThrough cache mode.

      Note:

      If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    2. Verify the current cache policy for all logical volumes is now WriteThrough, which does not use the battery.
      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      
  2. Shut down the server operating system.
    1. Perform the steps in Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration (My Oracle Support Doc ID 1093890.1).
    2. Change the environment to point to the Oracle Grid Infrastructure Home.

      Run the following commands as the root user, where the 1 of +ASM1 refers to the database node number:

      # . oraenv
      ORACLE_SID = [root] ? +ASM1
      The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
      

      For example, for database node 3, the value would be +ASM3.

    3. Shut down Oracle Clusterware Services prior to powering down the database node.

      Run the following commands as the root user:

      # $ORACLE_HOME/bin/crsctl stop crs
      

      Or:

      # Grid_home/bin/crsctl stop crs
      

      Grid_home is typically set to /u01/app/11.2.0/grid, but this can vary depending on your installation configuration.

    4. Verify that Oracle Clusterware Services have been stopped.

      There should be no Clusterware processes running.

      # ps -ef | grep css
      
    5. Shut down the server operating system.
      • Linux:

        # shutdown -hP now
        
      • Solaris:

        # shutdown -y -i 5 -g 0
        
1.9.1.2 Step 2: Replace the Disk Controller BBU

In this step, you remove the old disk controller BBU and replace it with the new BBU.

Exadata X3-2, X4-2, or X4-8 Compute Nodes and X3-2, X3-8, X4-2, or X4-8 Storage Cell nodes with the Remote Battery

These steps apply to Exadata nodes based on X3-2, X4-2, X4-8 and X3-2L, X4-2L servers with the remote battery installed.

  1. Locate the battery slot marked with an orange and white BBU label.

    X3-2 and X4-2 Compute nodes: this is the upper right-most slot on the front of the chassis labelled BBU (previously designated "HDD7").

    X4-8 Compute nodes: this is in the lower slot, second from the left, on the rear of the chassis labeled BBU.

    X3-2L and X4-2L Storage cells: this is the right-hand slot on the rear of the chassis above PS1, labelled BBU (previously designated "REAR HDD 1").

  2. Unlatch and carefully slide out the old BBU carrier.

  3. Insert and carefully slide in the new BBU carrier, and latch it closed.

Exadata X3-2L or X4-2L Storage Cell nodes without the Remote Battery

Replace the existing HBA BBU with a remote-mounted battery kit (part 7060020) following the CAP detailed in Support Note 1561949.1.

Exadata X3-2 or X4-2 Database Machine Compute nodes without the Remote Battery

Replace the existing HBA BBU with a remote-mounted battery kit (part 7060020) following the CAP detailed in Support Note 1561949.1.

Exadata X3-8 Database Machine Compute nodes

These steps are relevant to Exadata nodes based on X2-8 servers (formerly x4800m2).

Note:

The Exadata X3-8 Database Machine Compute nodes are based on X2-8 servers. See Table 1-2
  1. Remove CMOD0 from the server and set it on a flat, antistatic surface.

  2. Remove the CMOD top cover.

  3. Remove the HBA REM with BBU attached:

    1. Lift the REM ejector handle and rotate it to its fully open position.

    2. Lift the connector end of the REM and pull the REM away from the retaining clip on the front support bracket.

  4. Remove the old BBU from the REM:

    1. Use a No. 1 Phillips screwdriver to remove the 3 retaining screws that secure the battery to the REM card. Do NOT attempt to remove any screws from the top side of the REM and battery pack; those screws hold the standoffs that provide the bottom screw holes and should remain with the battery pack.

    2. Detach the battery pack including circuit board from the REM by gently lifting it from its circuit board connector.

  5. Install the new BBU on the REM.

    1. Attach the battery pack circuit board connector to mate with the REM's connector.

    2. Use a No. 1 Phillips screwdriver to secure the battery to the REM. If the BBU comes with a package of new screws, then use those new screws - do not re-use the screws from the old BBU attachment.

  6. Re-install the HBA REM with BBU attached.

    1. Ensure that the REM ejector lever is in the closed position. The lever should be flat with the REM support bracket.

    2. Position the REM so that the battery is facing downward and the connector is aligned with the connector on the motherboard.

    3. Slip the opposite end of the REM under the retaining clips on the front support bracket and ensure that the notch on the edge of the REM is positioned around the alignment post on the bracket.

    4. Carefully lower and position the connector end of the REM until the REM contacts the connector on the motherboard, ensuring that the connectors are aligned. To seat the connector, carefully push the REM downward until it is in a level position.

  7. Install the cover on the CMOD.

  8. Return the CMOD back into the unit in CMOD0 slot.

1.9.1.3 Step 3: Enable and Verify the New Disk Controller BBU

Similar to Step 1: Prepare the Disk Controller BBU for Removal, this step has two subsections:

For Systems with Remote Mount BBU

For systems with remote mount BBU, the system was not shut down at the end of Step 1: Prepare the Disk Controller BBU for Removal.

If you are using Oracle Exadata System Software version 11.2.3.3.0 or later:

Note:

If you are running Oracle Exadata System Software 19.1.0 or later, substitute /opt/MegaRAID/storcli/storcli64 for /opt/MegaRAID/MegaCli/MegaCli64 in the following commands:
  1. Log in as the root user.

  2. Verify the disk controller BBU battery state is present and seen by the RAID controller. It may take several minutes for the new BBU battery to be detected.

    Note:

    If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.
    # /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -a0 | grep BBU
    BBU : Present
    BBU : Yes
    Cache When BBU Bad : Disabled
    
  3. Re-enable the disk controller BBU and disk cache.

    • If you are running Oracle Exadata System Software version 12.1.2.1.0 or later:

      DBMCLI> ALTER DBSERVER BBU REENABLE
    • If you are running Oracle Exadata System Software version earlier than 12.1.2.1.0:

      # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -reenable_bbu
      HDD disk controller battery has been reenabled.
      
  4. Verify the disk controller BBU battery state is operational.

    • If you are running Oracle Exadata System Software version 12.1.2.1.0 or later:

      DBMCLI> LIST DBSERVER ATTRIBUTES bbustatus
    • If you are running Oracle Exadata System Software version earlier than 12.1.2.1.0:

      # /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -list_bbu_status
      BBU status: present
      
  5. Verify the current logical disk drive cache policy uses writeback mode:

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i bbu
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    ... <repeated for each logical volume present>
    
  6. If the current cache policy is WriteThrough mode, and not WriteBack, then check the status of the battery.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -getbbustatus -a0|grep Battery
    BatteryType: iBBU08
    Battery State : Operational
    Battery Pack Missing : No
    Battery Replacement required : No
    

    If the "Battery State" is anything other than "Operational" or "Optimal" (exact term depends on image version), investigate and correct the problem before continuing.

    The following shows which image version uses "Optimal" and "Operational".

    Exadata image version       Battery State         Raid f/w version
    ---------------------       -------------------   -----------------
     X4    12.1.2.1.0            Optimal              12.12.0-0178
     X4    12.1.1.1.1            Optimal              12.12.0-0178
     X3    11.2.3.3.0            Optimal              12.12.0-0178
     X3    11.2.3.2.2            Optimal              12.12.0-0178
     X3    11.2.3.2.1            Operational          12.12.0-0140
    

If you are using image version 11.2.3.2.x:

  1. Log in as the root user.

  2. Turn off the server's locate LED.

    # ipmitool chassis identify off
    Chassis identify interval: off
    
  3. Wait approximately 5 minutes for the HBA to recognize and start communicating with the new BBU.

  4. Verify the HBA battery status is Operational and charging.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i bbu
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    ... <repeated for each logical volume present>
    

 

For Systems That Do Not Have a Remote Mount BBU

For systems that do not have a remote mount BBU, you shut down the system at the end of Step 1: Prepare the Disk Controller BBU for Removal. In this section you restart the system and enable the new BBU.

Note:

If you are running Oracle Exadata System Software 19.1.0 or later, substitute /opt/MegaRAID/storcli/storcli64 for /opt/MegaRAID/MegaCli/MegaCli64 in the following commands:
  1. Power on the server by pressing the power button.
  2. After ILOM has booted, power on the server by pressing the power button, and then connect to the server's console.

    To connect to the console from the ILOM Web browser (preferred): Access the "Remote Control -> Redirection" tab and click the "Launch Remote Console" button. On ILOM 3.1.x systems, the console button can be launched from the initial Summary Information screen.

    To connect to the console from the ILOM CLI:

    > start /SP/console
    
  3. From the server's console, monitor the system booting. Watch in particular the LSI controller BIOS while it is loading. If it gives a warning message regarding drives with preserved cache, then choose "D" to discard the cache and continue. This is not an issue as the disk will get re-synced after boot by ASM. If it gives a warning message regarding drives are in write-through mode due to a low battery, then choose to continue.

    The Exadata boot should continue normally after that, showing the Exadata boot splash screen and continue with normal OS boot messages. Note that there may be a long pause between screen outputs on the ILOM serial console during subsequent boot steps as the default console is the graphics, and the Exadata boot splash screen will not display.

  4. Once full boot is completed, log in as the root user and verify the new battery is seen and is charging.
    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode using the battery.
    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.
    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
    
  7. Verify the database services were started automatically.
    1. Verify that CRS is running.
      # . oraenv
      ORACLE_SID = [root] ? +ASM1
      The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
      
      # crsctl check crs
      CRS-4638: Oracle High Availability Services is online
      CRS-4537: Cluster Ready Services is online
      CRS-4529: Cluster Synchronization Services is online
      CRS-4533: Event Manager is online
      
      In the above output the 1 of +ASM1 refers to the database node number. For example, for database node #3, the value would be +ASM3.
    2. Validate that instances are running.
      # ps -ef |grep pmon
      
      It should return a record for ASM instance and a record for each database.

1.9.2 Replacing a Disk Controller BBU on an Exadata Storage Server

This section describes how to replace a disk controller BBU on an Exadata Storage Server.

Note:

After any maintenance procedure, Oracle recommends using the Oracle EXAchk tool. The tool is available with My Oracle Support note 1070954.1.

The high-level steps are:

1.9.2.1 Step 1: Prepare the Disk Controller BBU for Removal

On certain X3-2, X4-2, and X4-8 database nodes, and X3-2, X4-2, and X3-8, X4-8 storage servers, the BBU is remote mounted and does not require a system shutdown to be accessed. However you must still prepare it for removal from the RAID HBA to avoid the risk of data corruption to the disk volumes. Note there is no remote mount BBU option for X3-8 database nodes.

For Systems with Remote Mount BBU

Perform the steps in this section if your system has a remote mount BBU. If your system does not have a remote mount BBU, perform the steps in "For Systems That Do Not Have a Remote Mount BBU".

  1. Log in as the root user.
  2. Get the version of the image that is running on the server in the rack that requires service.
    # cellcli -e LIST CELL ATTRIBUTES releaseVersion
    11.2.3.2.1
    
  3. Drop the disk controller BBU.

    If you are running version 11.2.3.3.0 or later:

    1. Drop the disk controller BBU for replacement. Run the following command as the celladmin or root user:
      # cellcli -e ALTER CELL BBU DROP FOR REPLACEMENT
      HDD disk controller battery has been dropped for replacement
      
    2. Verify that the BBU was dropped for replacement:
      # cellcli -e LIST CELL ATTRIBUTES bbustatus
      dropped for replacement.

    If you are running version 11.2.3.2.x:

    1. Locate the server in the rack being serviced, and turn on the indicator light.

      Exadata Storage Servers are identified by a number 1 through 18, where 1 is the lowest Storage Server in the rack installed in RU2, counting up to the top of the rack.

      Exadata Database Nodes are identified by a number 1 through 8, where 1 is the lowest most database node in the rack installed in RU16.

      Turn on the locate indicator light for easier identification of the server being serviced. If the server number has been identified, then the Locate Button on the front panel may be pressed.

      To turn on the indicator light remotely, use any of the following methods:

      From a login to the CellCli on Exadata Storage Servers:

      CellCli> ALTER CELL LED ON
      

      From a login to the server's ILOM:

      -> set /SYS/LOCATE value=Fast_Blink
      

      From a login to the server's root account:

      # ipmitool chassis identify force
      Chassis identify interval: indefinite
      
    2. Check that HBA can see the battery and its current status.

      Note:

      If you are running on Solaris, use /opt/MegaRAID/MegaCli in place of /opt/MegaRAID/MegaCli/MegaCli64 in the commands below.

      # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
      

      The default output should show that the battery is still visible and may show low voltage or other issues depending on the fault. It may return an error reading the BBU if it is hard failed and no longer accessible to the HBA.

    3. Verify the current cache policy for all logical volumes.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      

      The default cache policy should be WriteBack for all volumes. If the battery is functioning normally it will report as current cache policy WriteBack. However if it is failed it may report current cache policy as WriteThrough.

    4. Set the cache policy for all logical volumes to WriteThrough cache mode, which does not use the battery.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    5. Verify the current cache policy for all logical volumes is now WriteThrough.

      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      

For Systems That Do Not Have a Remote Mount BBU

Perform the steps in this section if your system does not have a remote mount BBU. If your system has a remote mount BBU, see "For Systems with Remote Mount BBU".

If the system does not have the remote mounted battery installed, you need to shut down the node for which the battery requires replacement.

Note:

If you are running Oracle Exadata System Software 19.0 or later, substitute /opt/MegaRAID/storcli/storcli64 for /opt/MegaRAID/MegaCli/MegaCli64 in the following commands:
  1. Revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the battery occurs.
    1. Set all logical volumes cache policy to WriteThrough cache mode.
      # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0
      
    2. Verify the current cache policy for all logical volumes is now WriteThrough, which does not use the battery:
      # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
      
  2. Shut down the server operating system.

    Note the following when powering off Exadata Storage Servers:

    • Verify there are no other storage servers with disk faults. Shutting down a storage server while another disk is failing may cause database processes and Oracle ASM to crash if it loses both disks in the partner pair when this server's disks go offline.
    • Powering off one Exadata Storage Server with no disk faults in the rest of the rack will not affect running database processes or Oracle ASM.
    • All database and Oracle Clusterware processes should be shut down prior to shutting down more than one Exadata Storage Server. Refer to the Exadata Owner's Guide for details if this is necessary.

    ASM drops a disk shortly after it is taken offline. Powering off or restarting Exadata Storage Servers can impact database performance if the storage server is offline for longer than the ASM disk repair timer to be restored. The default DISK_REPAIR_TIME attribute value of 3.6hrs should be adequate for replacing components, but may need to be changed if you need more time.

    1. Check the disk repair time by logging into ASM and running the following query.
      SQL> SELECT dg.name,a.value FROM v$asm_attribute a, v$asm_diskgroup dg
       WHERE a.name = 'disk_repair_time' AND a.group_number = dg.group_number;
      

      As long as the value is large enough to comfortably replace the components being replaced, there is no need to change it.

      If you need to change it, you can use this statement:

      SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'disk_repair_time'='8.5H';
      
    2. Check if ASM will be OK if the grid disks go offline. The following command should return Yes for the grid disks being listed.
      # cellcli -e LIST GRIDDISK ATTRIBUTES name,asmmodestatus,asmdeactivationoutcome
      ...sample ...
      DATA_CD_09_cel01 ONLINE Yes
      DATA_CD_10_cel01 ONLINE Yes
      DATA_CD_11_cel01 ONLINE Yes
      RECO_CD_00_cel01 ONLINE Yes
      RECO_CD_01_cel01 ONLINE Yes
      ...repeated for all griddisks....
      

      If one or more disks does not return asmdeactivationoutcome='Yes', check the respective disk group and restore the data redundancy for that disk group. Once the disk group data redundancy is fully restored, re-run the command to verify that asmdeactivationoutcome='Yes' for all grid disks. Once all disks return asmdeactivationoutcome='Yes', proceed to the next step.

      Note:

      Shutting down the cell services when one or more grid disks does not return asmdeactivationoutcome='Yes' will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.

    3. Inactivate all grid disks on the cell that needs to be powered down for maintenance. This could take up to 10 minutes or longer.

      # cellcli
      ...sample ...
      CellCLI> ALTER GRIDDISK ALL INACTIVE
      GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_02_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_00_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_01_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_02_dmorlx8cel01 successfully altered
      ...repeated for all griddisks...
      
    4. Verify that the grid disks are now offline. The output should show asmmodestatus='UNUSED' or 'OFFLINE' and asmdeactivationoutcome=Yes for all grid disks once the disks are offline and inactive in ASM.

      CellCLI> LIST GRIDDISK ATTRIBUTES name,status,asmmodestatus,asmdeactivationoutcome
      DATA_CD_00_dmorlx8cel01 inactive OFFLINE Yes
      DATA_CD_01_dmorlx8cel01 inactive OFFLINE Yes
      DATA_CD_02_dmorlx8cel01 inactive OFFLINE Yes
      RECO_CD_00_dmorlx8cel01 inactive OFFLINE Yes
      RECO_CD_01_dmorlx8cel01 inactive OFFLINE Yes
      RECO_CD_02_dmorlx8cel01 inactive OFFLINE Yes
      ...repeated for all griddisks...
      
    5. Once all disks are offline and inactive, you can shut down the cell.
      # shutdown -hP now
      
      When powering off Exadata Storage Servers, all storage services are automatically stopped.
1.9.2.2 Step 2: Replace the Disk Controller BBU
1.9.2.3 Step 3: Enable the New Disk Controller BBU

Similar to "Step 1: Prepare the Disk Controller BBU for Removal", this section has two subsections:

For Systems with Remote Mount BBU

Perform the steps in this section if your system has a remote mount BBU. In this scenario, the system was not shut down at the end of "Step 1: Prepare the Disk Controller BBU for Removal".

If you are running image version 11.2.3.3.0 or later:

  1. Log in as the celladmin or root user.

  2. Re-enable the BBU.

    # cellcli -e alter cell bbu reenable
    HDD disk controller battery has been reenabled
    
  3. Verify the disk controller BBU battery state is operational.

    # cellcli -e list cell attributes bbustatus
    normal
    

    If the "BBU status" is anything other than "normal", then investigate and correct the problem before continuing.

If you are running image version 11.2.3.2.x:

  1. Log in as the root user.

  2. Turn off the server's locate LED.

    # ipmitool chassis identify off
    Chassis identify interval: off
    
  3. Wait approximately 5 minutes for the HBA to recognize and start communicating with the new BBU.

  4. Verify the HBA battery status is Operational and charging.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i bbu
    Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    ... <repeated for each logical volume present>
    

For Systems That Do Not Have Remote Mount BBU

At the end of "Step 1: Prepare the Disk Controller BBU for Removal", systems without a remote mount BBU were shut down. You now have to restart the system.

  1. Power on the server by pressing the power button.

  2. After ILOM has booted, power on the server by pressing the power button, and then connect to the server's console.

    To connect to the console from the ILOM Web browser (preferred): Access the "Remote Control -> Redirection" tab and click the "Launch Remote Console" button. On ILOM 3.1.x systems, the console button can be launched from the initial Summary Information screen.

    To connect to the console from the ILOM CLI:

    > start /SP/console
    
  3. From the server's console, monitor the system booting. Watch in particular the LSI controller BIOS while it is loading. If it gives a warning message regarding drives with preserved cache, then choose "D" to discard the cache and continue. This is not an issue as the disk will get re-synced after boot by ASM. If it gives a warning message regarding drives are in write-through mode due to a low battery, then choose to continue.

    The Exadata boot should continue normally after that, showing the Exadata boot splash screen and continue with normal OS boot messages. Note that there may be a long pause between screen outputs on the ILOM serial console during subsequent boot steps as the default console is the graphics, and the Exadata boot splash screen will not display.

  4. Once full boot is completed, log in as the root user and verify the new battery is seen and is charging.

    # /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0
    
  5. Set all logical drives cache policy to WriteBack cache mode using the battery.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0
    
  6. Verify the current cache policy for all logical drives is now using WriteBack cache mode.

    # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU
    
  7. Return the cell back to service.

    1. Activate the grid disks.

      # cellcli
      CellCLI> alter griddisk all active
      GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
      GridDisk DATA_CD_02_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_00_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_01_dmorlx8cel01 successfully altered
      GridDisk RECO_CD_02_dmorlx8cel01 successfully altered
      ...etc...
      
    2. Verify that all disks are active.

      CellCLI> list griddisk
      DATA_CD_00_dmorlx8cel01         active
      DATA_CD_01_dmorlx8cel01         active
      DATA_CD_02_dmorlx8cel01         active
      RECO_CD_00_dmorlx8cel01         active
      RECO_CD_01_dmorlx8cel01         active
      RECO_CD_02_dmorlx8cel01         active
      ...etc...
      
    3. Verify all grid disks have been successfully put online. Wait until 'asmmodestatus' is in status 'ONLINE' for all grid disks. The following is an example of the output early in the activation process.

      CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
      DATA_CD_00_dmorlx8cel01 active ONLINE Yes
      DATA_CD_01_dmorlx8cel01 active ONLINE Yes
      DATA_CD_02_dmorlx8cel01 active ONLINE Yes
      RECO_CD_00_dmorlx8cel01 active SYNCING Yes
      RECO_CD_01_dmorlx8cel01 active ONLINE Yes
      ...etc...
      

      In the example above 'RECO_CD_00_dmorlx8cel01' is still in the 'SYNCING' process. Oracle ASM synchronization is only complete when ALL grid disks show 'asmmodestatus=ONLINE'. This process can take some time depending on how busy the machine is, and has been while this individual server was down for repair.

1.10 Overview of the dbmsrv Service

Starting with Oracle Exadata System Software release 12.1.2.1.0:

  • The database nodes now run the Management Server (MS). Previously MS ran only on the storage nodes.
  • The database nodes now run a new service called Database Machine Service (dbmsrv). This new service is based on the MS that runs on the storage servers and provides enhanced management capabilities to the database nodes.
  • Starting with Oracle Exadata System Software release 12.1.2.1.2, Management Server (MS) on the database nodes does not use sudo any more. This means that configuration for sudoers is no longer needed.

Prior to Oracle Exadata System Software release 12.1.2.1.2:

  • For security reasons, Management Server on the database nodes is not run as root. However, it needs root permission to run certain utilities that monitor the system, such as disk status, ILOM, power supply unit, and to send Oracle Auto Service Request (ASR) messages and alerts. To achieve this, a sudoers configuration file, dbmsvc_sudo_conf, is added to enable the Management Server users on the database nodes to run the utilities with root privilege.

    You should not disable the dbmsrv service or dbserverd, or edit the sudoers configuration file. If the entries in the file are removed, then the dbmsrv service may not be able to monitor some parts of the system. For example, if a disk fails, it might not be possible to send an Oracle ASR message in time, and this may cause a disruption on the database node and delay recovery.

To manage the new Management Server on the database node service in Oracle Exadata System Software release 12.1.2.1.0 and later, new users and groups were added.

The users and their IDs are:

dbmsvc: 12137
dbmadmin: 12138
dbmmonitor: 12139

The new groups and their IDs are:

dbmsvc: 11137
dbmadmin: 11138
dbmmonitor: 11139
dbmusers: 11140

The following topics describe how to modify user IDs for this service:

1.10.1 Using a Script to Change User IDs and Group IDs for dbmsrv

Starting with Oracle Exadata System Software releases 18.1.12 and 19.1.2, you can use the migrate_ids.sh script to change the user and group IDs for the dbmsrv users.

You can change the user ID and group ID of the dbmsrv service users if there are conflicts with the default values (for example, if you are using LDAP or if you are using session management tools that require different values from the default values).

These steps are specific to the dbmsrv service users and groups only. Do not use them to modify the user and group IDs for other Oracle products.

  1. Navigate to the opt/oracle.SupportTools directory.
  2. Run the migrate_ids.sh script.

    The migrate_ids.sh script has the following syntax and options:

    migrate_ids.sh [-uid username new_uid] 
                            [-gid group_name new_group_id] 
                            [-skipdirs directory_path [,directory_path ]] 
    
    • -uid: Specify user name and the new uid to migrate the user to a new UID
    • -gid: Specify group name and the new group ID to migrate the group to a new ID
    • -skipdirs: Specify a list of absolute paths of directories to skip during the user or group ID migration.

The script searches all directories to find files that use the uid or gid being migrated so that the script can update the owner or group access to use the new uid or gid. The -skipdirs option allows you to specify which directories do not need to be searched. The specified directories and any files within them are skipped while changing the uid and gid values.

Using the -skipdirs option can be useful if you have large NFS directories that you want to skip to make the migration faster. However, if there are files in the directories being skipped that use the uid or gid being migrated, then those files are not updated. It is your responsibility to make sure that the directories being skipped with this option do not contain such files to ensure successful migration of the IDs.

Example 1-4 Migrate the dbmadmin user to a new user ID

This example shows how to migrate only the uid of user dbmadmin to 3001.

migrate_ids.sh -uid dbmadmin 3001

Example 1-5 Migrate the dbmusers group to a new group ID

This example shows how to migrate only the gid of group dbmusers to 4001.

migrate_ids.sh -gid dbmusers 4001

Example 1-6 Migrate all dbmsrv users and groups to new values

This example shows how to migrate all the user and group IDs for dbmsrv to new values.

migrate_ids.sh -uid dbmsvc 3001 -gid dbmsvc 4001
migrate_ids.sh -uid dbmadmin 3002 -gid dbmadmin 4002
migrate_ids.sh -uid dbmmonitor 3003 -gid dbmmonitor 4003
migrate_ids.sh -gid dbmusers 4004

Example 1-7 Migrate a user ID while skipping directories

This example shows how to migrate the user ID of user dbmadmin to 3001 while not searching the files in the /proc or /sys directories.

migrate_ids.sh -uid dbmadmin 3001 -skipdirs /proc,/sys

1.10.2 Manually Changing User IDs and Group IDs for dbmsrv

Prior to Oracle Exadata System Software releases 18.1.12 and 19.1.2 when the migrate_ids.sh script was introduced, you have to manually change the user and group IDs for the dbmsrv users.

You can change the user ID and group ID of the dbmsrv service users if there are conflicts with the default values (for example, if you are using LDAP or if you are using session management tools that require different values from the default values).

If possible, you should upgrade to the latest version of Oracle Exadata System Software and use the migrate_ids.sh script instead of using the manual procedure.

These steps are specific to the dbmsrv service users and groups only. Do not use them to modify the user and group IDs for other Oracle products.

  1. Shut down the services on the database server. Run the following command as root or the dbmadmin user.
    dbmcli -e alter dbserver shutdown services all
    
  2. Change the group ID of the group.
    1. Change the assigned group ID for the group.
      Run the following command as root, where new_group_ID is the new group ID, and group_name the name of group you want to change:
      groupmod -g new_group_ID group_name

      For example:

      groupmod -g 3001 dbmusers
    2. Update the files containing the old group ID.

      Run the following command as root:

      find / -gid old_group_ID -exec chgrp -h new_group_ID {} \;
      

      For example:

      find / -gid 11140 -exec chgrp -h 3001 {} \;
      
  3. Change the user ID.
    This step has to be done after changing the group ID or you will get a "GID does not exist" error.
    1. Change the user ID assigned to the user.
      Run the following command as c, where new_user_ID is the new ID for the user, new_group_ID is the new group ID assigned in the previous step, and username is the name of the user you want to change.
      usermod -u new_user_ID -g new_group_ID username
      

      For example:

      usermod -u 2998 -g 3001 dbmsvc
      
    2. Update the files containing the old user ID.

      Run the following command as root:

      find / -uid old_user_ID -exec chown -h new_user_ID {} \;
      

      For example:

      find / -uid 12137 -exec chown -h 2998 {} \;
      
  4. Reset the setuid bit on the executable files.
    The setuid bit was changed by the chgrp and chown commands. Perform the following sub-steps as root.
    1. Modify the permissions for the dbrsMain executable.
      # chmod 6550 /opt/oracle/dbserver/dbms/bin/dbrsMain
    2. Modify the permissions for the exaCmdHelper executable.
      chmod 4550 /opt/oracle/dbserver/dbms/bin/exaCmdHelper
  5. Restart the services on the database server.

    Run the following command as the root or dbmadmin user:

    dbmcli -e alter dbserver startup services all
    

1.11 Configuring Password Expiration for Users Accessing the Server Remotely

You can configure DBSERVER attributes to expire user passwords.

In Oracle Exadata System Software release 19.1.0, there are new DBSERVER attributes for configuring password security for users that access Oracle Exadata System Software servers remotely, such as with REST API or ExaCLI. These attributes determine if the user is able to change the password remotely, the amount of time before a user password expires, and the number of days prior to password expiration that the user receives warning messages. In the default configuration, user passwords do not expire.

Note:

The DBSERVER attributes for password expiration apply only to users created with Oracle Exadata System Software. Password expiration applies only to users that are displayed with the LIST USER command and does not apply to operating system users like dbmadmin or oracle.
  • To allow the user to change the password remotely, use the ALTER DBSERVER command to set the remotePwdChangeAllowed attribute to true.
    If you set the value to false, then the user receives a message indicating that they must contact the server administrator to have their password changed.
    DBMCLI> ALTER DBSERVER remotePwdChangeAllowed=true
  • To change the length of time before a user password expires, use the ALTER DBSERVER command to modify the pwdExpInDays attribute.
    Set the value n to the number of days before the password expires. If pwdExpInDays is set to 0 (the default value), then the user password does not expire.
    DBMCLI> ALTER DBSERVER pwdExpInDays=n
  • To configure the length of the warning period before the password expires, use the ALTER DBSERVER command to modify the pwdExpWarnInDays attribute.
    Set the value n to the number of days to warn the user before the password expires. The default user account password expiration warning time is 7 days.
    DBMCLI> ALTER DBSERVER pwdExpWarnInDays=n
  • To specify the length of time before a user account is locked after the user password expires, use the ALTER DBSERVER command to modify the accountLockInDays attribute.
    Set the value n to the number of days before the user account is locked. The default user account lock time is 7 days.
    DBMCLI> ALTER DBSERVER accountLockInDays=n

1.12 State of Storage Server and Database Servers During Configuration Changes

Before a change to a configuration, determine if the database and storage servers need to be offline or online.

Table 1-3 State of Storage Server and Database Servers for Operations

Operation Storage Server Database Server

DNS server update

Online

Online

NTP server update

Online

Online

Time zone update

Offline

Online

Admin network IP address, netmask, gateway, or host name change

Offline

Online

Client network IP address, netmask, gateway, or host name change

Offline

Online

Integrated Lights Out Manager (ILOM) IP address change

Offline

Online if the ipmitool sunoem getval/setval command is supported

Other ILOM parameter change

Online if the ipmitool sunoem getval/setval command is supported

Online if the ipmitool sunoem getval/setval command is supported

RDMA Network Fabric IP address, netmask, or host name change

Offline

Online

Partition key (pkey) change

Offline

Online

1.13 Rescue Plan

In Exadata releases earlier than 12.2.1.1.0, after a storage server or database server rescue, you need to re-run multiple commands to configure items such as IORM plans, thresholds, and storage server and database server notification setting.

In Oracle Exadata release 12.2.1.1.0, there is a new attribute called rescuePlan for the cell and dbserver objects. When you are done configuring your database servers and storage servers, you should save the value of the rescuePlan attribute to a file. The file should be saved to a remote server because the data on the rescued server will be erased in the event of a rescue. After you rescue the server, you can retrieve the file from the remote server and run the file to restore the settings. See Example 3 below.

For security reasons, the rescue plan does not include configurations that require a password.

Example 1-8 Rescue Plan for a Storage Cell

The rescuePlan attribute for a storage server could look like this:

$ cellcli -e list cell attributes rescuePlan

CREATE ROLE "admin"

GRANT PRIVILEGE all actions ON diagpack all attributes WITH all options TO ROLE "admin"

CREATE ROLE "diagRole"

GRANT PRIVILEGE download ON diagpack all attributes WITH all options TO ROLE "diagRole"

GRANT PRIVILEGE create ON diagpack all attributes WITH all options TO ROLE "diagRole"

GRANT PRIVILEGE list ON diagpack all attributes WITH all options TO ROLE "diagRole"

ALTER CELL accessLevelPerm="remoteLoginEnabled", diagHistoryDays="7", metricHistoryDays="7", notificationMethod="mail,snmp",
 notificationPolicy="warning,critical,clear", snmpSubscriber=((host="localhost", port=162, community="public", type=asr)), 
 bbuLearnCycleTime="2016-10-17T02:00:00-07:00", bbuLearnSchedule="MONTH 1 DATE 17 HOUR 2 MINUTE 0", 
 alertSummaryStartTime="2016-09-21T17:00:00-07:00", alertSummaryInterval=weekly, 
 hardDiskScrubInterval=biweekly, hardDiskScrubFollowupIntervalInDays="14"

ALTER IORMPLAN objective=basic

Example 1-9 Rescue Plan for a Database Server

The rescuePlan attribute for a database server could look like this:

$ dbmcli -e list dbserver attributes rescuePlan

CREATE ROLE "listdbserverattrs"

GRANT PRIVILEGE list ON dbserver ATTRIBUTES bbuStatus, coreCount WITH all options TO ROLE "listdbserverattrs"

ALTER DBSERVER diagHistoryDays="7", metricHistoryDays="7", bbuLearnSchedule="MONTH 1 DATE 17 HOUR 2 MINUTE 0", 
 alertSummaryStartTime="2016-09-26T08:00:00-07:00", alertSummaryInterval=weekly, pendingCoreCount="128" force

Example 1-10 Creating a Rescue Plan script for a cell

The following command stores the commands in the rescuePlan attribute to a file called rescue.cli located on a remote server.

$ cellcli -e list cell attributes rescuePlan >& /location/on/remote/server/rescue_cell.cli

If you need to rescue the server, you can run the script after the server rescue to restore the settings. The following command runs the rescue_cell.cli file using the CellCLI start command:

$ cellcli -e start /location/on/remote/server/rescue_cell.cli

Example 1-11 Creating a Rescue Plan script for a database server

The following command stores the commands in the rescuePlan attribute to a file called rescue_db.cli located on a remote server.

$ dbmcli -e list dbserver attributes rescuePlan >& /location/on/remote/server/rescue_db.cli

If you need to rescue the server, you can run the script after the server rescue to restore the settings. The following command runs the rescue_cell.cli file using the CellCLI start command:

$ dbmcli -e start /location/on/remote/server/rescue_db.cli

1.14 Using ExaWatcher Charts to Monitor Oracle Exadata Performance

ExaWatcher is a utility that collects performance data on the storage servers and database servers of an Exadata system. The data collected includes operating system statistics, such as iostat, cell statistics (cellsrvstat), and network statistics.

1.14.1 About ExaWatcher Charts

ExaWatcher collects and presents performance data on the storage servers and database servers of Oracle Exadata for a specified period of time.

To extract the data collected by ExaWatcher, run GetExaWatcherResults.sh and specify the start and end time of the desired time range. The results are contained in a compressed archive file that may be written to your specified directory location.

For example:

$ GetExaWatcherResults.sh --from 08/24/2023_17:00:00 --to 08/25/2023_17:00:00 --resultdir /var/log/oracle/ExaWatcherResults

Note:

You can also use the -c or --scp options with GetExaWatcherResults.sh to copy the resulting archive file to a different location.

GetExaWatcherResults.sh also generates HTML pages that contain charts for IO, CPU utilization, cell server statistics, and alert history. The IO and CPU utilization charts use data from iostat, CPU detail uses data from mpstat, and cell server statistics use data from cellsrvstat. Alert history will be retrieved for the specified time frame.

You can find the new charts in the resulting archive file. In the archive file, there is a subdirectory named: Charts.ExaWatcher.<hostname>/<timestamp>_<duration>/, for example, Charts.ExaWatcher.xxxxceladm13.oracle.com/2016_08_24_17_00_00_01h00m00s_0.

To view the HTML pages, the archive file must be extracted on a machine with a local browser and Internet access. Then, open Charts.ExaWatcher.<hostname>/<timestamp>_<duration>/index.html in a browser. The left panel on that page shows the following menu:

Figure 1-1 ExaWatcher Menu in the Left Panel

Description of Figure 1-1 follows
Description of "Figure 1-1 ExaWatcher Menu in the Left Panel"

Note:

For screen reader users, the menu items are navigated using the UP/DOWN arrow keys and activated using the SPACE bar. The TAB key will move you to the frame on the right side.

The CellSrvStat menu item is available only when run against a storage server. The Alert History menu item is available only if there were alerts during the requested time frame.

1.14.2 Requirements for Using ExaWatcher Charts

To view the HTML pages, the generated archive file must be moved to a machine with a local browser that has access to the internet.

Due to the complexity of the ExaWatcher charts, if the Oracle Exadata Rack resides in a restricted environment, and the generated HTML files or archive file cannot be moved to an environment that has access to the internet, then you will not be able to view the ExaWatcher charts.

1.14.3 IO Charts

IO charts show IO performance for an entire server or for individual disks in the storage server.

The following pages are available for IO statistics:

1.14.3.1 IO Stat Summary

IOStat Summary shows a summary of IO performance for the entire server. The four charts shown in this page are:

Table 1-4 Statistics for IOStat Summary

Statistic Description

Flash IOPs

Hard Disk IOPs

Total reads per second, writes per second, and IO per second (reads per second + writes per second) for the server.

This uses r/s and w/s from iostat.

Flash MB/s

Hard Disk MB/s

Total read MB per second, write MB per second, and IO MB per second.

This uses rsec/s and wsec/s from iostat, converted into MB.

The statistics are shown for flash and hard disks, when applicable. On Exadata Extreme Flash, there are no hard disks. On database servers, there are no flash devices.

If there is a suspected I/O performance problem, the IOPs and the MB/s statistics for the storage servers can be compared to the data sheet to determine if the storage is at maximum capacity. High read times observed on the database can also be correlated to the service time and average wait time from iostat, to determine if the high times could potentially be due to the storage server. Note that the database times would typically include IOs that are satisfied from flash cache, as well as hard disk. In addition, these charts enable you to visualize any peaks during the time frame.

The partial screenshot below shows the IOPs and MB/s charts for flash and hard disk

Below each chart, there is a range selector that you can use to drill down to a specific time within the chart. Moving the range selector on any chart affects all charts on the page.

Note:

The range selector is not accessible to screen readers. Also, not all values presented in the chart are accessible to a screen reader. Only the first value of each chart data point is.

Figure 1-3 IO Summary Charts Showing Range Selector

Description of Figure 1-3 follows
Description of "Figure 1-3 IO Summary Charts Showing Range Selector"

When you use the range selector, the displayed chart changes to show only the data for the time range specified by the range selector.

1.14.3.2 I/O Stat Detail

IOStat Detail shows performance for each disk on the storage server. The following charts are shown in this page:

Table 1-5 Statistics for IO Stat Detail

Statistic Description

Flash Service Time

Hard Disk Service Time

Average service time per disk contrasted against the range of wait times.

Flash Wait Time

Hard Disk Wait Time

Average wait time per disk

By default, the charts include a line that depicts the average across all disks on the server. The shaded, background image indicates the minimum and maximum range for the statistic. You can choose to display individual disks by using the drop down selector.

If the background image has a wide range, then this can indicate possible differences in disk performance. You can use this metric to look more closely at each individual disk on the storage server to see if there is an imbalance. If the background image has a narrow range, then that indicates the disks are performing similarly.

The individual disk IOPs and MB/s for a storage server can also be compared to the data sheet numbers to see if the disks are potentially hitting their maximum capacity.

1.14.4 CPU Charts

The CPU charts show CPU utilization for the server. These statistics are from iostat (avg-cpu: %user, %system, %iowait).

1.14.5 CPU Detail

The CPU detail charts show detailed information for CPU usage, including the average CPU utilization per CPU ID. These statistics are from mpstat.

1.14.6 Cell Server Charts

Cell server statistics are useful for tracking features that are specific to Exadata storage servers. This page displays statistics related to Smart Flash Cache and Smart IOs.

1.14.7 Alert History

This page displays alerts that were present during the specified time frame. Alerts may be raised from errors or issues, which may result in IO performance issues on the servers.