HP ML310 ISS Technology Update, Volume 7 Number 1 - Newsletter - Page 10
Fan failure recovery in ProLiant DL and ML servers
View all HP ML310 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 10 highlights
ISS Technology Update Volume 7, Number 1 Fan failure recovery in ProLiant DL and ML servers HP ProLiant DL and ML servers implement several technologies for handling high-temperature situations resulting from a fan failure. Depending upon the model of the server, the activity of the server when it failed, and the system configuration, these technologies exhibit a number of different behaviors as described in scenarios within this article. HP ProLiant features for self-managing the thermal environment This article focuses on the thermal management features of the ProLiant 300 and 500 series DL and ML servers that use the Integrated Lights-Out (iLO) management controller. The iLO controller resides on the system board of a host server. It contains its own management processor, memory, and network interface that allow it to operate independently from the host server. Among other features, the iLO controller monitors the actual temperatures within the system based on thermal sensors strategically located to protect essential components. Alternatively, the ProLiant 100 series ML and DL servers use the ServerEngines Pilot BMC. The HP ProLiant iLO2 Management Controller Driver, referred to as the health driver (Windows) or hpasm package (Linux), determines the presence and status of the fans in the system and reports in the OS logs on their redundancy. All temperature monitoring and fan control by the iLO2 controller takes place regardless of the state of the host operating system (OS). The main functions of the health driver are to make environmental information available to processes running on the host (for example, HP Systems Insight Manager and Insight Agents) and to complete tasks involving the host OS, such as a graceful shutdown. Advanced Configuration and Power Interface (ACPI) shutdown If a ProLiant DL or ML server experiences a component failure that requires a system shutdown, the health driver (if configured to do so) can initiate a shutdown. This shutdown will occur as if a system administrator had initiated it. If the health driver is not running, then the iLO firmware will simulate a power button press by using the Advanced Configuration and Power Interface (ACPI) mechanism. However, the shutdown is not triggered if the management console is locked. In this case, the system will continue to operate unless a critical temperature is reached, which will trigger a loss of power to the system. In general, ProLiant servers attempt to operate in degraded conditions as long as possible without risking data corruption. In the unlikely event that multiple fans fail, or in conditions where appropriate cooling cannot be maintained, the health driver and/or iLO controller will attempt to shut down the OS to help prevent data corruption, data loss, and system failure. When the health driver is operational, it tells the iLO controller to initiate a graceful shutdown. This shutdown takes effect 60 seconds after it has been determined that appropriate cooling conditions cannot be maintained. When the health driver is not operational, the iLO controller implements a graceful shutdown through the power button and ACPI mechanisms. ProLiant DL and ML servers with redundant fans respond to fan failures based upon the number of fans that fail, the state of the server when the failure occurs, and the presence and configuration of the health driver. 10