Go to main content

Oracle® Server X5-8 Service Manual

Exit Print View

Updated: March 2018
 
 

Troubleshooting System Cooling Issues

Maintaining the proper internal operating temperature of the server is crucial to a the health of the server. To prevent server shutdown and damage to components, address over temperature and hardware related issues as soon as they occur. If your server has a temperature fault, the cause of the problem might be:

External Ambient Temperature Too High

If the ambient temperature in the server space is too high, the cool air that is pulled into the server cannot cool the server sufficiently to prevent the internal temperature from rising. This can cause poor performance or component failure.

Action: Check the ambient temperature of the server space against the environmental specifications for the server. If the temperature is not within the required operating range, remedy the situation immediately.

Prevention: Periodically check the ambient temperature of the server space to ensure that it is within the required range, especially if you have made any changes to the server space (for example, added additional servers). The temperature must be consistent and stable.

Airflow Blockage

The server cooling system uses fans to pull cool air in from the server front intake vents and exhaust warm air out the server back panel vents. If the front or back vents are blocked, the airflow through the server is disrupted and the cooling system fails to function properly causing the server internal temperature to rise.

Action: Inspect the server front and back panel vents for blockage from dust or debris. Additionally, inspect the server interior for improperly installed components or cables that can block the flow of air through the server.

Prevention: Periodically inspect and clean the server vents using a vacuum cleaner. Ensure that all components, such as cards, cables, fans, air baffles and dividers are properly installed.

Hardware Component Failure

Fan modules and power supply fans drive the server cooling system. When one of these components fails, the server internal temperature can rise. This rise in temperature can cause other components to enter into an over-temperature state. Additionally, some components, such as processors, might overheat when they are failing, which can also generate an over-temperature event.

To reduce the risk related to component failure, power supplies and fan modules are installed in pairs to provide redundancy. Redundancy ensures that if one component in the pair fails, the remaining component can continue to maintain the subsystem. For example, power supplies serve a dual function; they provide both power and airflow. If one power supply fails, the other functioning power supply is able to maintain both the power and the cooling subsystems.

Action: Investigate the cause of the over-temperature event, and replace failed components immediately. For hardware troubleshooting information, see Troubleshooting Server Hardware Faults.

Prevention: Maintain redundant systems and replace failed components immediately.