HP SureStore 7400 Linux Configuration HP VA 7100/7400 - Page 9

Linux Configuration

HP VA 7100/7400

Rev 2002-01-23

Page 9

2.2

Device “Not Ready – LUN Becoming Ready”

Problem synopsis:

The Linux 2.2 kernel SCSI subsystem does not appear to distinguish the various

kinds of “Not Ready” device states. Some “Not Ready” sense codes indicate that a device is not ready

and will require manual intervention in order to become ready (such as the case when media has been

removed from a tape device). However, other “Not Ready” codes indicate that the device is not yet

ready, but is in the process of becoming ready (such as during device initialization or self-diagnostic).

In the latter case, the LUN is said to be “becoming ready” and all that is required is a retry. The VA

7100/7400 can be in this “becoming ready” state for periods of up to 15 seconds, or more, during

initialization, which typically causes the Linux subsystem to take the device offline, resulting in data

loss and/or filesystem corruption.

Patch description:

When the device reports a state of “Not Ready - LUN becoming ready”, this

patch will re-queue the command at the high-level sd driver. In this case we know the device will soon

be available and it is preferable to wait rather than to give up.

Caveats:

While the device is in this particular “becoming ready” state, there is currently no delay

between command retry attempts.

Recommendation:

Use this patch at your discretion. If the array needs to be re-initialized and you

notice lengthy “becoming ready” periods, this patch should be considered.

Files Affected:

../linux/drivers/scsi/sd.c

2.3

Refined SCSI Error Recovery

Problem synopsis:

The Linux 2.2 kernel SCSI subsystem provides a reference (sample) error

recovery function called

scsi_unjam_host

. This routine does not have ideal behavior for time-

outs and it tends to mark fibre-channel devices offline with very little provocation. This action usually

results in data loss and/or filesystem corruption, as well as irremovable locked system processes. In

the event of a time out that exceeds the 30-second default value, the error recovery thread is awakened

and, after some attempts to reset the missing device, it is marked offline resulting in filesystem

corruption and hung processes.

Patch description:

Fortunately, the Linux 2.2.16 (and .19) SCSI subsystem provides a “new” error

handling architecture whereby the default error recovery function can be overridden within each low-

level driver. This patch to the qlogicfc driver implements the new SCSI error handling mechanism and

supplies an alternate error recovery behavior which affects only devices attached to the Qlogic adapter.

Other SCSI devices on the system continue to use the

scsi_unjam_host

function and retain their

normal recovery behavior. The

scsi_unjam_host

routine was copied from scsi_error.c to

qlogicfc.c and modified as an example of how to customize error recovery for a particular

environment. On inserting the modified qlogicfc module into the kernel, the driver registers its error

handling function, now called

isp2x00_strategy_handler

, with the SCSI host’s

eh_strategy_handler

function pointer. The

isp2x00_strategy_handler

function is

now called in place of

scsi_unjam_host

for Qlogic-attached devices.

HP SureStore 7400 Linux Configuration HP VA 7100/7400 - Page 9

Device Not Ready - LUN Becoming Ready, Problem synopsis, Patch description, Caveats, Recommendation

Page 9 highlights