Comment 24 for bug 286672

Revision history for this message
TJ (tj) wrote :

This looks to be related to a known issue with some combinations of controller and disk. The warning "link is slow to respond, please be patient" is issued in drivers/ata/libata-core.c::ata_wait_ready().

It waits for the link status to change from 0xff (no device) for up to ATA_TMOUT_FF_WAIT. Currently this is defined as 800 milliseconds in include.linux/libata.h:

 /* FIXME: GoVault needs 2s but we can't afford that without
  * parallel probing. 800ms is enough for iVDR disk
  * HHD424020F7SV00. Increase to 2secs when parallel probing
  * is in place.
  */
 ATA_TMOUT_FF_WAIT = 800,

The commit that introduced ata_wait_ready() was:

commit aa2731ad9ad80ac3fca48bd1c4cf0eceede4810e
Author: Tejun Heo <email address hidden>
Date: Mon Apr 7 22:47:19 2008 +0900

    libata: separate out ata_wait_ready() and implement ata_wait_after_reset()

    Factor out waiting logic (which is common to all ATA controllers) from
    ata_sff_wait_ready() into ata_wait_ready(). ata_wait_ready() takes
    @check_ready function pointer and uses it to poll for readiness. This
    allows non-SFF controllers to use ata_wait_ready() to wait for link
    readiness.

    This patch also implements ata_wait_after_reset() - generic version of
    ata_sff_wait_after_reset() - using ata_wait_ready().

    ata_sff_wait_ready() is reimplemented using ata_wait_ready() and
    ata_sff_check_ready(). Functionality remains the same.

The associated LKML comments give more detail:

http://lkml.org/lkml/2007/5/16/279

"On certain device/controller combination, 0xff status is asserted
after reset and doesn't get cleared during 150ms post-reset wait. As
0xff status is interpreted as no device (for good reasons), this can
lead to misdetection on such cases.

This patch implements ata_wait_after_reset() which replaces the 150ms
sleep and waits upto ATA_TMOUT_FF_WAIT if status is 0xff.
ATA_TMOUT_FF_WAIT is currently 800ms which is enough for
HHD424020F7SV00 to get detected but not enough for Quantum GoVault
drive which is known to take upto 2s.

Without parallel probing, spending 2s on 0xff port would incur too
much delay on ata_piix's which use 0xff to indicate empty port and
doesn't have SCR register, so GoVault needs to wait till parallel
probing."

In terms of it causing this kernel oops report it looks as if the problem is with the slow-path detection firing a false positive.

There was a related bug #318978 "Hard drive in Studio XPS 13 and 16 cause a 17-18s resume time" which applied a patch (commit b65db6fd5d) to increase ATA_TMOUT_PMP_SRST_WAIT to 5 seconds. In that case it was increasing the time allowed for the soft reset.

I suspect this current issue is related to that.

We need to decide if we can increase ATA_TMOUT_FF_WAIT to 5 seconds. Because it is a timeout and not a delay increasing the maxmium shouldn't affect systems that don't have a slow device since it is only used if the link indicates a device is present.

I've attached a patch that enables a 5 second maximum wait.

I'll ask other kernel developers to look at this.