HP SureStore 7400 Linux Configuration HP VA 7100/7400 - Page 9
Device Not Ready - LUN Becoming Ready, Problem synopsis, Patch description, Caveats, Recommendation
View all HP SureStore 7400 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 9 highlights
Linux Configuration HP VA 7100/7400 2.2 Device "Not Ready - LUN Becoming Ready" Problem synopsis: The Linux 2.2 kernel SCSI subsystem does not appear to distinguish the various kinds of "Not Ready" device states. Some "Not Ready" sense codes indicate that a device is not ready and will require manual intervention in order to become ready (such as the case when media has been removed from a tape device). However, other "Not Ready" codes indicate that the device is not yet ready, but is in the process of becoming ready (such as during device initialization or self-diagnostic). In the latter case, the LUN is said to be "becoming ready" and all that is required is a retry. The VA 7100/7400 can be in this "becoming ready" state for periods of up to 15 seconds, or more, during initialization, which typically causes the Linux subsystem to take the device offline, resulting in data loss and/or filesystem corruption. Patch description: When the device reports a state of "Not Ready - LUN becoming ready", this patch will re-queue the command at the high-level sd driver. In this case we know the device will soon be available and it is preferable to wait rather than to give up. Caveats: While the device is in this particular "becoming ready" state, there is currently no delay between command retry attempts. Recommendation: Use this patch at your discretion. If the array needs to be re-initialized and you notice lengthy "becoming ready" periods, this patch should be considered. Files Affected: ../linux/drivers/scsi/sd.c 2.3 Refined SCSI Error Recovery Problem synopsis: The Linux 2.2 kernel SCSI subsystem provides a reference (sample) error recovery function called scsi_unjam_host. This routine does not have ideal behavior for timeouts and it tends to mark fibre-channel devices offline with very little provocation. This action usually results in data loss and/or filesystem corruption, as well as irremovable locked system processes. In the event of a time out that exceeds the 30-second default value, the error recovery thread is awakened and, after some attempts to reset the missing device, it is marked offline resulting in filesystem corruption and hung processes. Patch description: Fortunately, the Linux 2.2.16 (and .19) SCSI subsystem provides a "new" error handling architecture whereby the default error recovery function can be overridden within each lowlevel driver. This patch to the qlogicfc driver implements the new SCSI error handling mechanism and supplies an alternate error recovery behavior which affects only devices attached to the Qlogic adapter. Other SCSI devices on the system continue to use the scsi_unjam_host function and retain their normal recovery behavior. The scsi_unjam_host routine was copied from scsi_error.c to qlogicfc.c and modified as an example of how to customize error recovery for a particular environment. On inserting the modified qlogicfc module into the kernel, the driver registers its error handling function, now called isp2x00_strategy_handler, with the SCSI host's eh_strategy_handler function pointer. The isp2x00_strategy_handler function is now called in place of scsi_unjam_host for Qlogic-attached devices. Rev 2002-01-23 Page 9