HP SureStore 7400 Linux Configuration HP VA 7100/7400 - Page 9

Device Not Ready - LUN Becoming Ready, Problem synopsis, Patch description, Caveats, Recommendation

Page 9 highlights

Linux Configuration HP VA 7100/7400 2.2 Device "Not Ready - LUN Becoming Ready" Problem synopsis: The Linux 2.2 kernel SCSI subsystem does not appear to distinguish the various kinds of "Not Ready" device states. Some "Not Ready" sense codes indicate that a device is not ready and will require manual intervention in order to become ready (such as the case when media has been removed from a tape device). However, other "Not Ready" codes indicate that the device is not yet ready, but is in the process of becoming ready (such as during device initialization or self-diagnostic). In the latter case, the LUN is said to be "becoming ready" and all that is required is a retry. The VA 7100/7400 can be in this "becoming ready" state for periods of up to 15 seconds, or more, during initialization, which typically causes the Linux subsystem to take the device offline, resulting in data loss and/or filesystem corruption. Patch description: When the device reports a state of "Not Ready - LUN becoming ready", this patch will re-queue the command at the high-level sd driver. In this case we know the device will soon be available and it is preferable to wait rather than to give up. Caveats: While the device is in this particular "becoming ready" state, there is currently no delay between command retry attempts. Recommendation: Use this patch at your discretion. If the array needs to be re-initialized and you notice lengthy "becoming ready" periods, this patch should be considered. Files Affected: ../linux/drivers/scsi/sd.c 2.3 Refined SCSI Error Recovery Problem synopsis: The Linux 2.2 kernel SCSI subsystem provides a reference (sample) error recovery function called scsi_unjam_host. This routine does not have ideal behavior for timeouts and it tends to mark fibre-channel devices offline with very little provocation. This action usually results in data loss and/or filesystem corruption, as well as irremovable locked system processes. In the event of a time out that exceeds the 30-second default value, the error recovery thread is awakened and, after some attempts to reset the missing device, it is marked offline resulting in filesystem corruption and hung processes. Patch description: Fortunately, the Linux 2.2.16 (and .19) SCSI subsystem provides a "new" error handling architecture whereby the default error recovery function can be overridden within each lowlevel driver. This patch to the qlogicfc driver implements the new SCSI error handling mechanism and supplies an alternate error recovery behavior which affects only devices attached to the Qlogic adapter. Other SCSI devices on the system continue to use the scsi_unjam_host function and retain their normal recovery behavior. The scsi_unjam_host routine was copied from scsi_error.c to qlogicfc.c and modified as an example of how to customize error recovery for a particular environment. On inserting the modified qlogicfc module into the kernel, the driver registers its error handling function, now called isp2x00_strategy_handler, with the SCSI host's eh_strategy_handler function pointer. The isp2x00_strategy_handler function is now called in place of scsi_unjam_host for Qlogic-attached devices. Rev 2002-01-23 Page 9

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

Linux Configuration
HP VA 7100/7400
Rev 2002-01-23
Page 9
2.2
Device “Not Ready – LUN Becoming Ready”
Problem synopsis:
The Linux 2.2 kernel SCSI subsystem does not appear to distinguish the various
kinds of “Not Ready” device states. Some “Not Ready” sense codes indicate that a device is not ready
and will require manual intervention in order to become ready (such as the case when media has been
removed from a tape device). However, other “Not Ready” codes indicate that the device is not yet
ready, but is in the process of becoming ready (such as during device initialization or self-diagnostic).
In the latter case, the LUN is said to be “becoming ready” and all that is required is a retry. The VA
7100/7400 can be in this “becoming ready” state for periods of up to 15 seconds, or more, during
initialization, which typically causes the Linux subsystem to take the device offline, resulting in data
loss and/or filesystem corruption.
Patch description:
When the device reports a state of “Not Ready - LUN becoming ready”, this
patch will re-queue the command at the high-level sd driver. In this case we know the device will soon
be available and it is preferable to wait rather than to give up.
Caveats:
While the device is in this particular “becoming ready” state, there is currently no delay
between command retry attempts.
Recommendation:
Use this patch at your discretion. If the array needs to be re-initialized and you
notice lengthy “becoming ready” periods, this patch should be considered.
Files Affected:
../linux/drivers/scsi/sd.c
2.3
Refined SCSI Error Recovery
Problem synopsis:
The Linux 2.2 kernel SCSI subsystem provides a reference (sample) error
recovery function called
scsi_unjam_host
. This routine does not have ideal behavior for time-
outs and it tends to mark fibre-channel devices offline with very little provocation. This action usually
results in data loss and/or filesystem corruption, as well as irremovable locked system processes. In
the event of a time out that exceeds the 30-second default value, the error recovery thread is awakened
and, after some attempts to reset the missing device, it is marked offline resulting in filesystem
corruption and hung processes.
Patch description:
Fortunately, the Linux 2.2.16 (and .19) SCSI subsystem provides a “new” error
handling architecture whereby the default error recovery function can be overridden within each low-
level driver. This patch to the qlogicfc driver implements the new SCSI error handling mechanism and
supplies an alternate error recovery behavior which affects only devices attached to the Qlogic adapter.
Other SCSI devices on the system continue to use the
scsi_unjam_host
function and retain their
normal recovery behavior. The
scsi_unjam_host
routine was copied from scsi_error.c to
qlogicfc.c and modified as an example of how to customize error recovery for a particular
environment. On inserting the modified qlogicfc module into the kernel, the driver registers its error
handling function, now called
isp2x00_strategy_handler
, with the SCSI host’s
eh_strategy_handler
function pointer. The
isp2x00_strategy_handler
function is
now called in place of
scsi_unjam_host
for Qlogic-attached devices.