HP Integrity Superdome 2 8/16 HP Superdome 2 Health Management Stack Whitepape - Page 3

Analysis Engine Operation

Page 3 highlights

By embedding the Analysis Engine in the hardware and firmware, HP removed the need for external diagnostic software. That means:  Self-healing without operator assistance― remove failing parts from use  Built-in alerting ( WS-Man alerts to customers, SNMP traps, E-mail, Insight Remote Support with WEBES and HP -Systems Insight Manager (HP SIM) ) without OS assistance  Centralized reporting repository and single user interface to see what FRUs failed and the reason behind the failure  Agentless fault management―no need to have any working partition, or any installed OS, at all  Consistent reports and alerts for all environments―OS independent Analysis Engine Operation The HP Superdome 2 Analysis Engine, part of the SD2 firmware, does system analysis in a way never possible in previous HP Superdome systems. First, it collects the information from every sensor and components and stores it in a central place in the OA. With all the data in one place, the built-in SD2 Analysis Engine can automatically analyze the error situation, identify failed or "suspected" parts, initiate corrective actions, and notify administrators―even before a reboot has begun. Thanks to the extensive error-correcting systems built into HP Superdome 2, in many cases it will be possible to self-heal the system without any noticeable performance degradation. When repairs must be made, the SD2 Analysis Engine's analysis helps ensure that the right repair is made. Customers can review the output of any step in this process directly from the HP Superdome OA―everything from the raw data in the error log to the analysis results in the Health Repository and to the problem-cause-action in the WS-MAN alerts sent to Remote Support Tools and HP-SIM. Because this is all built into the sx3000 chipset and OA firmware, HP Superdome 2 does not need OSbased agents to monitor and report platform health and status. Note: Communication faults (PCIe errors) that occur within a running OS partition are recoverable using HPUX PCI error recovery feature are still reported by SysFaultMgmt (SFM) running on the OS but will be handled by the SD2 AE in the near future. IO cards and direct attached storage devices are monitored by a set of new WBEM standard based OS provider products. The Analysis Engine consists of three major components which interact to provide comprehensive built-in fault management: 1. Error Logging Service (ELS) 2. Core Analysis Engine (CAE) 3. Health Repository 3

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

3
By embedding the Analysis Engine in the hardware and firmware, HP removed the need for external
diagnostic software. That means:
Self-
healing without operator assistance― remove
failing parts from use
Built-in alerting ( WS-Man alerts to customers, SNMP traps, E-mail,
Insight Remote Support with
WEBES and HP -Systems Insight Manager (HP SIM) ) without OS assistance
Centralized reporting repository and single user interface to see what FRUs failed and the reason
behind the failure
Agentless fault management―no need to have any working partition, or any installed OS, at all
Consistent reports and alerts for all environments―OS independent
Analysis Engine Operation
The HP Superdome 2 Analysis Engine, part of the SD2 firmware, does system analysis in a way never
possible in previous HP Superdome systems. First, it collects the information from every sensor and
components and stores it in a central place in the OA. With all the data in one place, the built-in SD2
Analysis Engine can automatically analyze the error situation, identify failed or “suspected” parts, initiate
corrective actions, and notify administrators―even before a reboot has begun. Thanks to the extensiv
e
error-correcting systems built into HP Superdome 2, in many cases it will be possible to self-heal the system
without any noticeable performance degradation.
When repairs must be made, the SD2 Analysis Engine’s analysis helps ensure that the right repair
is made.
Customers can review the output of any step in this process directly from the HP Superdome
OA―everything from the raw data in the error log to the analysis results in the Health Repository and to the
problem-cause-action in the WS-MAN alerts sent to Remote Support Tools and HP-SIM.
Because this is all built into the sx3000 chipset and OA firmware, HP Superdome 2 does not need OS-
based agents to monitor and report platform health and status.
Note: Communication faults (PCIe errors) that occur within a running OS partition are recoverable using
HPUX PCI error recovery feature are still reported by SysFaultMgmt (SFM) running on the OS but will be
handled by the SD2 AE in the near future. IO cards and direct attached storage devices are monitored by a
set of new WBEM standard based OS provider products.
The Analysis Engine consists of three major components which interact to provide comprehensive built-in
fault management:
1.
Error Logging Service (ELS)
2.
Core Analysis Engine (CAE)
3.
Health Repository