Dell PowerVault MD3000i Hardware Owner's Manual - Page 78

Hard Controller Failures and Lockdown Conditions, Invalid Enclosure, ECC Errors

Page 78 highlights

Hard Controller Failures and Lockdown Conditions Certain events can cause a RAID controller module to fail and/or shut down. Unrecoverable ECC memory or PCI errors, or critical physical conditions can cause lockdown. If your RAID storage array is configured for redundant access and cache mirroring, the surviving controller can normally recover without data loss or shutdown. Typical hard controller failures are detailed in the following sections. Invalid Enclosure The RAID controller module is supported only in a Dell-supported enclosure. Upon installation in the enclosure, the controller performs a set of validation checks. The enclosure status LED is lit with a steady amber color while the RAID controller module completes these initial tests and the controllers are booted successfully. If the RAID controller module detects a non-Dell supported enclosure, the controller aborts startup. The RAID controller module will not generate any events to alert you in the event of an invalid enclosure, but the enclosure status LED is lit with a flashing amber color to indicate a fault state. For full details on the LEDs and their interpretation, see "Back-Panel Indicators and Features" on page 18. ECC Errors RAID controller firmware can detect ECC errors and can recover from a single-bit ECC error whether the RAID controller module is in a redundant or nonredundant configuration. A storage array with redundant controllers can recover from multi-bit ECC errors as well because the peer RAID controller module can take over, if necessary. The RAID controller module will failover if it experiences up to 10 single-bit errors, or up to three multi-bit errors. 78 Troubleshooting Your Enclosure

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106

78
Troubleshooting Your Enclosure
Hard Controller Failures and Lockdown
Conditions
Certain events can cause a RAID controller module to fail and/or shut down.
Unrecoverable ECC memory or PCI errors, or critical physical conditions can
cause lockdown. If your RAID storage array is configured for redundant access
and cache mirroring, the surviving controller can normally recover without
data loss or shutdown.
Typical hard controller failures are detailed in the following sections.
Invalid Enclosure
The RAID controller module is supported only in a Dell-supported enclosure.
Upon installation in the enclosure, the controller performs a set of validation
checks. The enclosure status LED is lit with a steady amber color while the
RAID controller module completes these initial tests and the controllers are
booted successfully. If the RAID controller module detects a non-Dell
supported enclosure, the controller aborts startup. The RAID controller
module will not generate any events to alert you in the event of an invalid
enclosure, but the enclosure status LED is lit with a flashing amber color to
indicate a fault state.
For full details on the LEDs and their interpretation, see
"Back-Panel
Indicators and Features" on page 18.
ECC Errors
RAID controller firmware can detect ECC errors and can recover from a
single-bit ECC error whether the RAID controller module is in a redundant
or nonredundant configuration. A storage array with redundant controllers
can recover from multi-bit ECC errors as well because the peer RAID
controller module can take over, if necessary.
The RAID controller module will failover if it experiences up to 10 single-bit
errors, or up to three multi-bit errors.