Compaq DS20E Technical Guide - Page 9
Server Management, Reliability, Availability, and Maintainability - alpha
![]() |
View all Compaq DS20E manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 9 highlights
Server Management The AlphaServer products support important operational and platform management requirements. Operational Management Server/Network Management. Comaq Insight Manager is included with every system. This software tool allows you to monitor and control Alpha based servers. Insight Manager consists of two components: a Windows-based console application and server- or client-based management data collection agents. Management agents monitor over 1,000 management parameters. Key subsystems are instrumented to make health, configuration, and performance data available to the agent software. The agents act upon that data, by initiating alarms in the event of faults and by providing updated management information, such as network interface or storage subsystem performance statistics. Remote Server Management. An integrated remote management console (RMC) lets the operator perform several tasks from a serial console: monitor the system power, temperature, and fans, and reset, halt, and power the system on or off, regardless of the operating system or hardware state. The monitoring can be done locally or remotely through a modem. Platform Management The AlphaServer DS20E systems support platform management tasks such as manipulating and monitoring hardware performance, configuration, and errors. For example, the operating systems provide a number of tools to characterize system performance and display errors logged in the system error log file. In addition, system console firmware provides hardware configuration tools and diagnostics to facilitate quick hardware installation and troubleshooting. The system operator can use simple console commands to show the system configuration, devices, boot and operational flags, and recorded errors. Also, the console aids in inventory support by giving access to serial numbers and revisions of hardware and firmware. Error Reporting Compaq Analyze, a diagnostic tool used to determine the cause of hardware failures, is installed with the operating systems. It provides automatic background analysis, as it constantly views and reads the error log file. It analyzes both single error/fault events and multiple events. When an error condition is detected, it collects the error information and sends it and an analysis to the user. The tool requires a graphics monitor for its output display. Reliability, Availability, and Maintainability The AlphaServer DS20E system achieves an unparalleled level of reliability and availability through the careful application of technologies that balance redundancy, error correction, and fault management. Reliability and availability features are built into the CPU, memory, and I/O, and implemented at the system level. Processor Features • CPU data cache provides error correction code (ECC) protection. • Parity protection on CPU cache tag store. • Multi-tiered power-up diagnostics to verify the functionality of the hardware. With two processors, when you power up or reset the system, each CPU, in parallel, runs a set of diagnostic tests. If any tests fail, the failing CPU is configured out of the system. Responsibility for initializing memory and booting the console firmware is transferred to the other CPU, and the boot process continues. This feature ensures that a system can still power up and boot the operating system in case of a CPU failure. LEDs on the control panel indicate test status and component failure information. Memory Features • The memory ECC scheme is designed to provide maximum protection for user data. The memory scheme corrects single-bit errors and detects double-bit errors and total DRAM failure. It also detects RAM address errors. • Memory failover. The power-up diagnostics are designed to provide the largest amount of usable memory, configuring around errors. I/O Features • ECC protection on the switch interconnect and parity protection on the PCI and SCSI buses. • Extensive error correction built into disk drives. • Optional internal RAID improves reliability and data security. • Disk hot swap. 7
![](/manual_guide/products/compaq-ds20e-technical-guide-7cc612b/9.png)