Bug #286672 “WARNING: at /build/buildd/linux-2.6.27/kernel/power...” : Bugs : linux package : Ubuntu

Revision history for this message

Pascal d'Hermilly (pascal-tipisoft) wrote on 2008-10-20:

#1

BootDmesg.txt Edit (51.1 KiB, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (8.0 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (1.0 KiB, text/plain; charset="utf-8")
HalComputerInfo.txt Edit (2.7 KiB, text/plain; charset="utf-8")
LsUsb.txt Edit (727 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (32.0 KiB, text/plain; charset="utf-8")
OopsText.txt Edit (2.0 KiB, text/plain; charset="utf-8")
ProcCpuInfo.txt Edit (1.3 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.4 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (4.8 KiB, text/plain; charset="utf-8")

Revision history for this message

Dmytro Korzhevin (korg) wrote on 2008-10-25:

#2

I have same problem, after kerneloops install in Ubuntu 8.10

kernel: 2.6.27-7-generic

sudo dpkg -p kerneloops:

Package: kerneloops
Priority: optional
Section: utils
Installed-Size: 164
Maintainer: Ubuntu MOTU Developers <email address hidden>
Architecture: i386
Version: 0.10-2ubuntu1
Depends: libc6 (>= 2.4), libcurl3-gnutls (>= 7.16.2-1), libdbus-1-3 (>= 1.0.2), libdbus-glib-1-2 (>= 0.71), libglib2.0-0 (>= 2.14.0), libgtk2.0-0 (>= 2.14.1), libnotify1 (>= 0.4.4), libnotify1-gtk2.10
Size: 20128
Description: kernel oops tracker
kerneloops is a daemon that collects kernel crash information and then
submits the extracted signature to the kerneloops.org website for
statistical analysis and presentation to the Linux kernel developers.
Original-Maintainer: Matthew Wilcox <email address hidden>

Revision history for this message

Dmytro Korzhevin (korg) wrote on 2008-10-25:

#3

Maby MOTU can upload new version of kerneloops 0.12 package to repository of Ubuntu 8.10 to fix this issue?

Revision history for this message

Thomas Novin (thomasn80) wrote on 2008-11-08:

#4

There is also bug #287453 with the same error but not related to the package kerneloops.

Revision history for this message

sam tygier (samtygier) wrote on 2008-11-25:

#5

also see Bug #298541

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-12-03:

#6

Hi Pascal,

The message you are seeing is usually related to suspend/resume. Primarily that it took longer than expected to bring up your devices. I'll go ahead an reassign to the kernel team to look at.

Changed in linux:
assignee:	nobody → ubuntu-kernel-team
importance:	Undecided → Medium
status:	New → Triaged

Revision history for this message

Ilja Sekler (ilja-sekler-) wrote on 2008-12-08:

#7

I get this warning on Intrepid resuming from suspend-to-RAM reproducibly with the vanilla kernel 2.6.27.8 as well.

Revision history for this message

Jonas Petersson (catellie) wrote on 2008-12-08:

#8

I got it with 2.6.27-7-generic, but since I upgraded to 2.6.27-10-generic it seems to have disappeared.

Revision history for this message

Ilja Sekler (ilja-sekler-) wrote on 2008-12-21:

#9

The same warning with 2.6.27-11-generic from intrepid-proposed.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2008-12-23: Kernel team bugs

#10

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message

Dave Gilbert (ubuntu-treblig) wrote on 2009-02-21:

#11

I've merged a couple more dupes in with this; it was suggested (by someone else!) that perhaps #331415 is a dupe of this and many of these all relate to firmware loading - do all of those who have this problem have stuff that loads firmware?

Revision history for this message

TJ (tj) wrote on 2009-02-21:

#12

This appears to be caused by bug #331415 "request_firmware() fails on resume from suspend" but I need to know what device drivers might be requesting firmware.

Please attach /var/log/kern.log that contains the Oops message - it should also contain clues as to which driver was delayed.

Changed in linux:
assignee:	nobody → intuitivenipple
status:	Triaged → In Progress

Revision history for this message

zippidy_josh (redstone) wrote on 2009-02-21:

#13

kern.log Edit (62.4 KiB, text/plain)

Download full text (3.6 KiB)

Attached a kern.log. I'm running:
Linux redstone-desktop 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux
The log doesn't have a 'oops' message, but has:

Feb 20 08:55:53 redstone-desktop kernel: [388341.469605] ------------[ cut here ]------------
Feb 20 08:55:53 redstone-desktop kernel: [388341.469607] WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()
Feb 20 08:55:53 redstone-desktop kernel: [388341.469609] Modules linked in: rt73usb af_packet rfcomm bridge stp bnep sco l2cap bluetooth ppdev ipv6 acpi_cpufreq cpufreq_powersave cpufreq_stats cpufreq_conservative cpufreq_userspace cpufreq_ondemand freq_table container video output sbs sbshc pci_slot wmi battery iptable_filter ip_tables x_tables ac lp rt2500usb arc4 ecb crypto_blkcipher crc_itu_t rt2x00usb snd_hda_intel rt2x00lib snd_usb_audio snd_pcm_oss snd_mixer_oss rfkill snd_pcm led_class uvcvideo joydev mac80211 snd_seq_dummy snd_usb_lib evdev compat_ioctl32 videodev v4l1_compat snd_seq_oss nvidia(P) snd_hwdep cfg80211 snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq parport_pc i2c_core snd_timer snd_seq_device parport serio_raw intel_agp snd psmouse soundcore agpgart iTCO_wdt iTCO_vendor_support snd_page_alloc button shpchp pcspkr pci_hotplug ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic pata_acpi usbhid hid pata_jmicron ata_piix r8169 mii libata ehci_hcd scsi_mod dock uhci_hcd usbcore thermal
Feb 20 08:55:53 redstone-desktop kernel: rocessor fan fbcon tileblit font bitblit softcursor fuse [last unloaded: rt73usb]
Feb 20 08:55:53 redstone-desktop kernel: [388341.469660] Pid: 26102, comm: pm-suspend Tainted: P W 2.6.27-11-generic #1
Feb 20 08:55:53 redstone-desktop kernel: [388341.469663] [<c037d236>] ? printk+0x1d/0x1f
Feb 20 08:55:53 redstone-desktop kernel: [388341.469666] [<c0131e99>] warn_on_slowpath+0x59/0x90
Feb 20 08:55:53 redstone-desktop kernel: [388341.469670] [<c014d305>] ? sched_clock_cpu+0xd5/0x170
Feb 20 08:55:53 redstone-desktop kernel: [388341.469673] [<c014c15f>] ? down_trylock+0x2f/0x40
Feb 20 08:55:53 redstone-desktop kernel: [388341.469675] [<c0132572>] ? try_acquire_console_sem+0x12/0x40
Feb 20 08:55:53 redstone-desktop kernel: [388341.469679] [<c024ed00>] ? kobject_put+0x20/0x50
Feb 20 08:55:53 redstone-desktop kernel: [388341.469682] [<c015d2b4>] suspend_test_finish+0x74/0x80
Feb 20 08:55:53 redstone-desktop kernel: [388341.469684] [<c015d3a6>] suspend_devices_and_enter+0xe6/0x190
Feb 20 08:55:53 redstone-desktop kernel: [388341.469687] [<c015d621>] enter_state+0xd1/0x100
Feb 20 08:55:53 redstone-desktop kernel: [388341.469689] [<c015d6d5>] state_store+0x85/0xd0
Feb 20 08:55:53 redstone-desktop kernel: [388341.469691] [<c015d650>] ? state_store+0x0/0xd0
Feb 20 08:55:53 redstone-desktop kernel: [388341.469693] [<c024ebc4>] kobj_attr_store+0x24/0x30
Feb 20 08:55:53 redstone-desktop kernel: [388341.469695] [<c01ffac7>] sysfs_write_file+0x97/0x100
Feb 20 08:55:53 redstone-desktop kernel: [388341.469698] [<c01b2760>] vfs_write+0xa0/0x110
Feb 20 08:55:53 redstone-desktop kernel: [388341.469700] [<c01ffa30>] ? sysfs_write_file+0x0/0x100
Feb 20 08:55:53 redstone-...

Attached a kern.log.  I'm running:
  Linux redstone-desktop 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux
The log doesn't have a 'oops' message, but has:

Feb 20 08:55:53 redstone-desktop kernel: [388341.469605] ------------[ cut here ]------------
Feb 20 08:55:53 redstone-desktop kernel: [388341.469607] WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()
Feb 20 08:55:53 redstone-desktop kernel: [388341.469609] Modules linked in: rt73usb af_packet rfcomm bridge stp bnep sco l2cap bluetooth ppdev ipv6 acpi_cpufreq cpufreq_powersave cpufreq_stats cpufreq_conservative cpufreq_userspace cpufreq_ondemand freq_table container video output sbs sbshc pci_slot wmi battery iptable_filter ip_tables x_tables ac lp rt2500usb arc4 ecb crypto_blkcipher crc_itu_t rt2x00usb snd_hda_intel rt2x00lib snd_usb_audio snd_pcm_oss snd_mixer_oss rfkill snd_pcm led_class uvcvideo joydev mac80211 snd_seq_dummy snd_usb_lib evdev compat_ioctl32 videodev v4l1_compat snd_seq_oss nvidia(P) snd_hwdep cfg80211 snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq parport_pc i2c_core snd_timer snd_seq_device parport serio_raw intel_agp snd psmouse soundcore agpgart iTCO_wdt iTCO_vendor_support snd_page_alloc button shpchp pcspkr pci_hotplug ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic pata_acpi usbhid hid pata_jmicron ata_piix r8169 mii libata ehci_hcd scsi_mod dock uhci_hcd usbcore thermal 
Feb 20 08:55:53 redstone-desktop kernel: rocessor fan fbcon tileblit font bitblit softcursor fuse [last unloaded: rt73usb]
Feb 20 08:55:53 redstone-desktop kernel: [388341.469660] Pid: 26102, comm: pm-suspend Tainted: P        W 2.6.27-11-generic #1
Feb 20 08:55:53 redstone-desktop kernel: [388341.469663]  [<c037d236>] ? printk+0x1d/0x1f
Feb 20 08:55:53 redstone-desktop kernel: [388341.469666]  [<c0131e99>] warn_on_slowpath+0x59/0x90
Feb 20 08:55:53 redstone-desktop kernel: [388341.469670]  [<c014d305>] ? sched_clock_cpu+0xd5/0x170
Feb 20 08:55:53 redstone-desktop kernel: [388341.469673]  [<c014c15f>] ? down_trylock+0x2f/0x40
Feb 20 08:55:53 redstone-desktop kernel: [388341.469675]  [<c0132572>] ? try_acquire_console_sem+0x12/0x40
Feb 20 08:55:53 redstone-desktop kernel: [388341.469679]  [<c024ed00>] ? kobject_put+0x20/0x50
Feb 20 08:55:53 redstone-desktop kernel: [388341.469682]  [<c015d2b4>] suspend_test_finish+0x74/0x80
Feb 20 08:55:53 redstone-desktop kernel: [388341.469684]  [<c015d3a6>] suspend_devices_and_enter+0xe6/0x190
Feb 20 08:55:53 redstone-desktop kernel: [388341.469687]  [<c015d621>] enter_state+0xd1/0x100
Feb 20 08:55:53 redstone-desktop kernel: [388341.469689]  [<c015d6d5>] state_store+0x85/0xd0
Feb 20 08:55:53 redstone-desktop kernel: [388341.469691]  [<c015d650>] ? state_store+0x0/0xd0
Feb 20 08:55:53 redstone-desktop kernel: [388341.469693]  [<c024ebc4>] kobj_attr_store+0x24/0x30
Feb 20 08:55:53 redstone-desktop kernel: [388341.469695]  [<c01ffac7>] sysfs_write_file+0x97/0x100
Feb 20 08:55:53 redstone-desktop kernel: [388341.469698]  [<c01b2760>] vfs_write+0xa0/0x110
Feb 20 08:55:53 redstone-desktop kernel: [388341.469700]  [<c01ffa30>] ? sysfs_write_file+0x0/0x100
Feb 20 08:55:53 redstone-desktop kernel: [388341.469703]  [<c01b28a2>] sys_write+0x42/0x70
Feb 20 08:55:53 redstone-desktop kernel: [388341.469705]  [<c0103f7b>] sysenter_do_call+0x12/0x2f
Feb 20 08:55:53 redstone-desktop kernel: [388341.469707]  [<c0370000>] ? enable_nonboot_cpus+0x60/0xc0
Feb 20 08:55:53 redstone-desktop kernel: [388341.469710]  =======================
Feb 20 08:55:53 redstone-desktop kernel: [388341.469712] ---[ end trace d3ce43fe78c881db ]---

Revision history for this message

TJ (tj) wrote on 2009-02-21: Re: [Bug 286672] Re: WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()

#14

On Sat, 2009-02-21 at 17:44 +0000, zippidy_josh wrote:
> Attached a kern.log. I'm running:
> Linux redstone-desktop 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux

I don't *think* your issue is caused by firmware. The tell-tail is
kernel time-stamps that delay for 60 seconds.

I do see a 7 second delay whilst the SATA drive is being probed:

[388333.633707] sd 0:0:0:0: [sda] Starting disk
[388338.256032] ata1: link is slow to respond, please be patient (ready=0)
[388340.272066] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

But the only firmware reports come after user-space has been unfrozen:

[388341.469762] Restarting tasks ... done.
[388342.292283] r8169: eth0: link down
[388342.292524] ADDRCONF(NETDEV_UP): eth0: link is not ready
[388548.630686] usbcore: deregistering interface driver rt73usb
[388549.005780] phy16: Selected rate control algorithm 'pid'
[388549.007027] Registered led device: rt73usb-phy16:radio
[388549.007051] Registered led device: rt73usb-phy16:assoc
[388549.007073] Registered led device: rt73usb-phy16:quality
[388549.007628] usbcore: registered new interface driver rt73usb
[388553.029264] firmware: requesting rt73.bin

Revision history for this message

Jonas Petersson (catellie) wrote on 2009-02-21:

#15

As mentioned above, I no longer experince this issue, but I can say that I never loaded any firmware that I'm aware of - unless you count ndiswrapper. Previous models of MacBookPro required firmware for the webcam, but that has never been needed on the 4,1 I have. So, from my point of view, firmware is not the real/only cause of this.
[BTW: I'm currently on 2.6.27-12-generic and it seems less stable than -11- just after resuming (2 hangs in 3 days), but that is not related this bug as far as I can tell.]

Revision history for this message

Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote on 2009-02-22:

#16

Same thing happens with the current jaunty kernel on my system after suspend:

[ 412.802413] WARNING: at /build/buildd/linux-2.6.28/kernel/power/main.c:177 suspend_test_finish+0x80/0x90()
[ 412.802415] Component: resume devices
[ 412.802416] Modules linked in: radeon drm binfmt_misc bridge stp bnep ppdev video output input_polldev iptable_filter ip_tables x_tables dm_crypt lp joydev hid_microsoft dcdbas serio_raw psmouse pcspkr iTCO_wdt iTCO_vendor_support wacom snd_hda_intel usbhid snd_pcm_oss snd_mixer_oss snd_pcm parport_pc parport snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd intel_agp agpgart shpchp soundcore snd_page_alloc usb_storage tg3 ehci_hcd uhci_hcd fbcon tileblit font bitblit softcursor fuse
[ 412.802445] Pid: 4676, comm: pm-suspend Not tainted 2.6.28-8-generic #24-Ubuntu
[ 412.802446] Call Trace:
[ 412.802450] [<c013aa10>] warn_slowpath+0x60/0x80
[ 412.802453] [<c015473a>] ? down_trylock+0x2a/0x40
[ 412.802456] [<c013b06d>] ? try_acquire_console_sem+0xd/0x30
[ 412.802459] [<c02bec80>] ? kobject_put+0x20/0x50
[ 412.802462] [<c04ea0c1>] ? printk+0x18/0x1f
[ 412.802465] [<c0165eb0>] suspend_test_finish+0x80/0x90
[ 412.802468] [<c0165f86>] suspend_devices_and_enter+0xc6/0x160
[ 412.802470] [<c04ea0c1>] ? printk+0x18/0x1f
[ 412.802472] [<c0166209>] enter_state+0xc9/0x100
[ 412.802474] [<c01662bd>] state_store+0x7d/0xc0
[ 412.802477] [<c0166240>] ? state_store+0x0/0xc0
[ 412.802480] [<c02beb44>] kobj_attr_store+0x24/0x30
[ 412.802483] [<c020b662>] sysfs_write_file+0x92/0xf0
[ 412.802486] [<c01be8c8>] vfs_write+0x98/0x110
[ 412.802488] [<c020b5d0>] ? sysfs_write_file+0x0/0xf0
[ 412.802490] [<c01be9fd>] sys_write+0x3d/0x70
[ 412.802493] [<c0103f6b>] sysenter_do_call+0x12/0x2f
[ 412.802495] [<c04e0000>] ? da903x_probe+0xc7/0x1e7
[ 412.802497] ---[ end trace 87ad1ccb4fe25fbf ]---

Revision history for this message

TJ (tj) wrote on 2009-02-24:

#17

Fridtjof, could you attach /var/log/kern.log that includes the problem. I want to see what was going on before and after the warning message.

Revision history for this message

Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote on 2009-02-24:

#18

kern.log after waking up from S3 Edit (16.1 KiB, text/plain)

Sure, attached.

Revision history for this message

TJ (tj) wrote on 2009-02-24:

#19

Fridtjof, thank-you. It looks like the same cause as zippidy_josh reported.

[10692.148085] sd 0:0:0:0: [sda] Starting disk
[10697.032030] ata1: link is slow to respond, please be patient (ready=0)
[10698.992081] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Fridtjof, zippyidy_josh, could both attach the system PCI configuration from:

sudo lspci -vvnn

I'm wondering if there's a common hardware device here, or whether it is an issue with the ata_piix / libata drivers.

Revision history for this message

Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote on 2009-02-24:

#20

lspci on a Dell OptiPlex 745 Edit (21.2 KiB, text/plain)

Revision history for this message

zippidy_josh (redstone) wrote on 2009-02-25:

#21

lspci.txt Edit (25.7 KiB, text/plain)

Output of
sudo lspci -vvnn
attached.
Cheers.

Revision history for this message

TJ (tj) wrote on 2009-02-25:

#22

Could both of you attach the details of the disk drive that is slow to respond? There are several potential scenarios I'm exploring:

1. Specific issues with some disk drives. Some drives take a *long* time to clear the status register after a reset.
2. Issues with the particular chipset/controllers (I notice both systems have two 82801* (ICH8/9 Family) IDE controllers
3. Issues with ata_piix driver

The reports:

ls -l /sys/block/sda > /tmp/report.txt
sudo hdparm -I /dev/sda >> /tmp/report.txt

Attach /tmp/report.txt to your reply.

Revision history for this message

Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote on 2009-02-25:

#23

Report from OptiPlex 745 Edit (2.5 KiB, text/plain)

Revision history for this message

TJ (tj) wrote on 2009-02-25:

#24

Increase ATA_TMOUT_FF_WAIT to 5 seconds Edit (446 bytes, text/plain)

This looks to be related to a known issue with some combinations of controller and disk. The warning "link is slow to respond, please be patient" is issued in drivers/ata/libata-core.c::ata_wait_ready().

It waits for the link status to change from 0xff (no device) for up to ATA_TMOUT_FF_WAIT. Currently this is defined as 800 milliseconds in include.linux/libata.h:

/* FIXME: GoVault needs 2s but we can't afford that without
  * parallel probing. 800ms is enough for iVDR disk
  * HHD424020F7SV00. Increase to 2secs when parallel probing
  * is in place.
  */
ATA_TMOUT_FF_WAIT = 800,

The commit that introduced ata_wait_ready() was:

commit aa2731ad9ad80ac3fca48bd1c4cf0eceede4810e
Author: Tejun Heo <email address hidden>
Date: Mon Apr 7 22:47:19 2008 +0900

libata: separate out ata_wait_ready() and implement ata_wait_after_reset()

    Factor out waiting logic (which is common to all ATA controllers) from
    ata_sff_wait_ready() into ata_wait_ready(). ata_wait_ready() takes
    @check_ready function pointer and uses it to poll for readiness. This
    allows non-SFF controllers to use ata_wait_ready() to wait for link
    readiness.

This patch also implements ata_wait_after_reset() - generic version of
ata_sff_wait_after_reset() - using ata_wait_ready().

ata_sff_wait_ready() is reimplemented using ata_wait_ready() and
ata_sff_check_ready(). Functionality remains the same.

The associated LKML comments give more detail:

http://lkml.org/lkml/2007/5/16/279

"On certain device/controller combination, 0xff status is asserted
after reset and doesn't get cleared during 150ms post-reset wait. As
0xff status is interpreted as no device (for good reasons), this can
lead to misdetection on such cases.

This patch implements ata_wait_after_reset() which replaces the 150ms
sleep and waits upto ATA_TMOUT_FF_WAIT if status is 0xff.
ATA_TMOUT_FF_WAIT is currently 800ms which is enough for
HHD424020F7SV00 to get detected but not enough for Quantum GoVault
drive which is known to take upto 2s.

Without parallel probing, spending 2s on 0xff port would incur too
much delay on ata_piix's which use 0xff to indicate empty port and
doesn't have SCR register, so GoVault needs to wait till parallel
probing."

In terms of it causing this kernel oops report it looks as if the problem is with the slow-path detection firing a false positive.

There was a related bug #318978 "Hard drive in Studio XPS 13 and 16 cause a 17-18s resume time" which applied a patch (commit b65db6fd5d) to increase ATA_TMOUT_PMP_SRST_WAIT to 5 seconds. In that case it was increasing the time allowed for the soft reset.

I suspect this current issue is related to that.

We need to decide if we can increase ATA_TMOUT_FF_WAIT to 5 seconds. Because it is a timeout and not a delay increasing the maxmium shouldn't affect systems that don't have a slow device since it is only used if the link indicates a device is present.

I've attached a patch that enables a 5 second maximum wait.

I'll ask other kernel developers to look at this.

This looks to be related to a known issue with some combinations of controller and disk. The warning "link is slow to respond, please be patient" is issued in drivers/ata/libata-core.c::ata_wait_ready().

It waits for the link status to change from 0xff (no device) for up to ATA_TMOUT_FF_WAIT. Currently this is defined as 800 milliseconds in include.linux/libata.h:

/* FIXME: GoVault needs 2s but we can't afford that without
	 * parallel probing.  800ms is enough for iVDR disk
	 * HHD424020F7SV00.  Increase to 2secs when parallel probing
	 * is in place.
	 */
	ATA_TMOUT_FF_WAIT	=  800,

The commit that introduced ata_wait_ready() was:

commit aa2731ad9ad80ac3fca48bd1c4cf0eceede4810e
Author: Tejun Heo <htejun@gmail.com>
Date:   Mon Apr 7 22:47:19 2008 +0900

libata: separate out ata_wait_ready() and implement ata_wait_after_reset()
    
    Factor out waiting logic (which is common to all ATA controllers) from
    ata_sff_wait_ready() into ata_wait_ready().  ata_wait_ready() takes
    @check_ready function pointer and uses it to poll for readiness.  This
    allows non-SFF controllers to use ata_wait_ready() to wait for link
    readiness.
    
    This patch also implements ata_wait_after_reset() - generic version of
    ata_sff_wait_after_reset() - using ata_wait_ready().
    
    ata_sff_wait_ready() is reimplemented using ata_wait_ready() and
    ata_sff_check_ready().  Functionality remains the same.

The associated LKML comments give more detail:

http://lkml.org/lkml/2007/5/16/279

"On certain device/controller combination, 0xff status is asserted
after reset and doesn't get cleared during 150ms post-reset wait.  As
0xff status is interpreted as no device (for good reasons), this can
lead to misdetection on such cases.

This patch implements ata_wait_after_reset() which replaces the 150ms
sleep and waits upto ATA_TMOUT_FF_WAIT if status is 0xff.
ATA_TMOUT_FF_WAIT is currently 800ms which is enough for
HHD424020F7SV00 to get detected but not enough for Quantum GoVault
drive which is known to take upto 2s.

Without parallel probing, spending 2s on 0xff port would incur too
much delay on ata_piix's which use 0xff to indicate empty port and
doesn't have SCR register, so GoVault needs to wait till parallel
probing."

In terms of it causing this kernel oops report it looks as if the problem is with the slow-path detection firing a false positive.

There was a related bug #318978 "Hard drive in Studio XPS 13 and 16 cause a 17-18s resume time" which applied a patch (commit b65db6fd5d) to increase ATA_TMOUT_PMP_SRST_WAIT to 5 seconds. In that case it was increasing the time allowed for the soft reset.

I suspect this current issue is related to that.

We need to decide if we can increase ATA_TMOUT_FF_WAIT to 5 seconds. Because it is a timeout and not a delay increasing the maxmium shouldn't affect systems that don't have a slow device since it is only used if the link indicates a device is present.

I've attached a patch that enables a 5 second maximum wait.

I'll ask other kernel developers to look at this.

Revision history for this message

TJ (tj) wrote on 2009-02-25:

#25

Forgot to mention my primary concern here!

Although the change in the arbitrary time-out will remove the 'link is slow to respond' message it won't of itself remove the kernel oops.

The ideal solution would be parallel processing for multiple devices maybe combined with some intelligence in the kernel's slow-path detection to account for situations that are mandated in specifications (in this case ATA-6 allows such delays) or imposed by hardware constraints.

Revision history for this message

TJ (tj) wrote on 2009-02-26:

#26

Fridtjof, we're looking at this issue in more detail and thinking about back-porting the Jaunty patch from bug #318978 to test kernels for Intrepid. However we noticed that in one of your dmesg reports in comment #16

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/286672/comments/16

It shows the in-use kernel was "2.6.28-8-generic #24-Ubuntu" which should include the patch to deal with #318978.

Could you clarify which kernel versions you are experiencing this with to help us?

Stefan is going to build an Intrepid test kernel with the patch incorporated and will make it available shortly.

Revision history for this message

Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote on 2009-02-26:

#27

Both kernels (intrepid and jaunty) show the same behavior, I tested to 2.6.28 on my intrepid box to check if this problem still existed in the upcoming version.

Revision history for this message

Stefan Bader (smb) wrote on 2009-02-26:

#28

I have uploaded the kernels to http://people.ubuntu.com/~smb/bug286672/ If the problem was with 2.6.28 as well this likely does not change much at this point, but it would show whether increasing the other timeout would help. If you try this kernel, could you please post the resulting dmesg again? Thanks.

Revision history for this message

zippidy_josh (redstone) wrote on 2009-02-27: Re: [Bug 286672] Re: WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()

#29

report.txt Edit (2.8 KiB, text/plain; charset=US-ASCII; name="report.txt")

Attached is report.txt.
Josh

On Wed, Feb 25, 2009 at 4:27 AM, TJ <email address hidden> wrote:

> Could both of you attach the details of the disk drive that is slow to
> respond? There are several potential scenarios I'm exploring:
>
> 1. Specific issues with some disk drives. Some drives take a *long* time to
> clear the status register after a reset.
> 2. Issues with the particular chipset/controllers (I notice both systems
> have two 82801* (ICH8/9 Family) IDE controllers
> 3. Issues with ata_piix driver
>
> The reports:
>
> ls -l /sys/block/sda > /tmp/report.txt
> sudo hdparm -I /dev/sda >> /tmp/report.txt
>
> Attach /tmp/report.txt to your reply.
>
> --
> WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176
> suspend_test_finish+0x74/0x80()
> https://bugs.launchpad.net/bugs/286672
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” source package in Ubuntu: In Progress
>
> Bug description:
> I installed kerneloops. Seconds later apport says the kernel chrashed... It
> doesn't seems likely since I'm still here.. but something did probably go
> bad.
>
> ProblemType: KernelOops
> Architecture: i386
> DistroRelease: Ubuntu 8.10
> Package: linux-image-2.6.27-7-generic 2.6.27-7.12
> ProcCmdLine: User Name=UUID=a177eaa0-fd11-4509-94e9-d25f0ae59299 ro quiet
> splash
> ProcVersionSignature: Ubuntu 2.6.27-7.12-generic
> SourcePackage: linux
> Title: WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176
> suspend_test_finish+0x74/0x80()
>

Revision history for this message

TJ (tj) wrote on 2009-02-27:

#30

The only thing of note in the disk reports is that both disks are on the first PCI SATA controller and, probably more significant, both devices are Western Digital. Anecdotal reports suggest some WD drives in particular have this characteristic slow reset.

Please try the Intrepid kernel Stefan has prepared and report back with the dmesg after a suspend/resume cycle.

Revision history for this message

Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote on 2009-02-27:

#31

dmesg on OptiPlex 745 Edit (3.6 KiB, text/plain)

2.6.27-11-generic #1 SMP Thu Feb 26 18:51:37 UTC 2009 i686 GNU/Linux

Revision history for this message

TJ (tj) wrote on 2009-02-27:

#32

Fridtjof, thank-you.

As already mentioned there is no real problem here.

[ 110.100086] sd 0:0:0:0: [sda] Starting disk
[ 114.984030] ata1: link is slow to respond, please be patient (ready=0)
[ 116.216086] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 116.232336] ata1.00: configured for UDMA/133

The reason for the Warning:

[ 116.506548] PM: resume devices took 7.100 seconds

is that TEST_SUSPEND_SECONDS is arbitrarily set to 5 seconds. It looks as if we could increase this value slightly in order to avoid these unnecessary Warnings.

Revision history for this message

Stefan Bader (smb) wrote on 2009-02-27:

#33

Sorry, my fault. I was so concentrated on the error path that I forgot the success path. While being at that I added additional WARN statements to cause stack traces when entering the function to wait for the link ready. Hopefully this gives a better understanding from where this is called. The new kernels are at the same place (v2) http://people.ubuntu.com/~smb/bug286672/.

Revision history for this message

zippidy_josh (redstone) wrote on 2009-02-27:

#34

I'm won't have access to that machine again for at least a month, so it'll
be a bit...

Josh

On Feb 27, 2009 9:30 AM, "TJ" <email address hidden> wrote:

The only thing of note in the disk reports is that both disks are on the
first PCI SATA controller and, probably more significant, both devices
are Western Digital. Anecdotal reports suggest some WD drives in
particular have this characteristic slow reset.

Please try the Intrepid kernel Stefan has prepared and report back with
the dmesg after a suspend/resume cycle.

--

WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176
suspend_test_finish+0x74/0x80() https...

Revision history for this message

Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote on 2009-02-28:

#35

dmesg on OptiPlex 745 Edit (3.3 KiB, text/plain)

2.6.27-11-generic #1 SMP Fri Feb 27 19:03:38 UTC 2009
I agree that this is not a critical error, but nonethelesse a kernel oops is not exactly a "nice to have".

Revision history for this message

TJ (tj) wrote on 2009-02-28: Re: [Bug 286672] Re: WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()

#36

On Sat, 2009-02-28 at 08:45 +0000, Fridtjof Busse wrote:

> I agree that this is not a critical error, but nonethelesse a kernel
> oops is not exactly a "nice to have".

That was my point in mentioning TEST_SUSPEND_SECONDS. We are discussing
increasing the current 5-second limit to avoid these false alarms.
Before we can do that however, we have to look at a wide range of
systems and scenarios so as to choose a value that will avoid false
alarms but not miss real problems.

Revision history for this message

TJ (tj) wrote on 2009-03-09:

#37

Progress Update:

Currently analysing log-files collected on Launchpad to determine a suitable value for TEST_SUSPEND_SECONDS.

TJ (tj) on 2009-03-20

tags:

added: test-suspend-seconds

Revision history for this message

TJ (tj) wrote on 2009-03-23:

#38

Proposing to increase TEST_SUSPEND_SECONDS to 12 to avoid false positives based on analysis of all bug reports that have log attachments with non-failure resume times over 5 seconds. Bugs with such logs are tagged "test-suspend-seconds":

https://bugs.launchpad.net/ubuntu/+bugs?field.tag=test-suspend-seconds&search=Search

Revision history for this message

Tormod Volden (tormodvolden) wrote on 2009-04-21:

#39

TJ, I added the test-suspend-seconds tag to a bunch of reports which I think are duplicates. Should I dup them, or would you like run your launchpad mining on them first?

Revision history for this message

TJ (tj) wrote on 2009-04-21:

#40

On Tue, 2009-04-21 at 18:16 +0000, Tormod Volden wrote:
> TJ, I added the test-suspend-seconds tag to a bunch of reports which I
> think are duplicates. Should I dup them, or would you like run your
> launchpad mining on them first?

Thanks Tormod that's really helpful. I'll run a modified script against
them first and inspect them too - sometimes the logs show obvious device
problems.

Jeremy Foshee (jeremyfoshee) on 2011-01-12

Changed in linux (Ubuntu):
assignee:	TJ (intuitivenipple) → nobody
status:	In Progress → Triaged

Revision history for this message

Andy Whitcroft (apw) wrote on 2011-01-25:

#41

The timeout for this warning was increased for Lucid and later. Closing off the development task.

Changed in linux (Ubuntu):
status:	Triaged → Fix Released
assignee:	nobody → Andy Whitcroft (apw)

Ubuntu
linux package

WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()

Bug Description

Duplicates of this bug

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntulinux package

WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()

Bug Description

Duplicates of this bug

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
linux package