WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Andy Whitcroft |
Bug Description
I installed kerneloops. Seconds later apport says the kernel chrashed... It doesn't seems likely since I'm still here.. but something did probably go bad.
ProblemType: KernelOops
Architecture: i386
DistroRelease: Ubuntu 8.10
Package: linux-image-
ProcCmdLine: User Name=UUID=
ProcVersionSign
SourcePackage: linux
Title: WARNING: at /build/
Pascal d'Hermilly (pascal-tipisoft) wrote : | #1 |
- BootDmesg.txt Edit (51.1 KiB, text/plain; charset="utf-8")
- CurrentDmesg.txt Edit (8.0 KiB, text/plain; charset="utf-8")
- Dependencies.txt Edit (1.0 KiB, text/plain; charset="utf-8")
- HalComputerInfo.txt Edit (2.7 KiB, text/plain; charset="utf-8")
- LsUsb.txt Edit (727 bytes, text/plain; charset="utf-8")
- Lspci.txt Edit (32.0 KiB, text/plain; charset="utf-8")
- OopsText.txt Edit (2.0 KiB, text/plain; charset="utf-8")
- ProcCpuInfo.txt Edit (1.3 KiB, text/plain; charset="utf-8")
- ProcInterrupts.txt Edit (1.4 KiB, text/plain; charset="utf-8")
- ProcModules.txt Edit (4.8 KiB, text/plain; charset="utf-8")
Dmytro Korzhevin (korg) wrote : | #2 |
Dmytro Korzhevin (korg) wrote : | #3 |
Maby MOTU can upload new version of kerneloops 0.12 package to repository of Ubuntu 8.10 to fix this issue?
Thomas Novin (thomasn80) wrote : | #4 |
There is also bug #287453 with the same error but not related to the package kerneloops.
sam tygier (samtygier) wrote : | #5 |
also see Bug #298541
Leann Ogasawara (leannogasawara) wrote : | #6 |
Hi Pascal,
The message you are seeing is usually related to suspend/resume. Primarily that it took longer than expected to bring up your devices. I'll go ahead an reassign to the kernel team to look at.
Changed in linux: | |
assignee: | nobody → ubuntu-kernel-team |
importance: | Undecided → Medium |
status: | New → Triaged |
Ilja Sekler (ilja-sekler-) wrote : | #7 |
I get this warning on Intrepid resuming from suspend-to-RAM reproducibly with the vanilla kernel 2.6.27.8 as well.
Jonas Petersson (catellie) wrote : | #8 |
I got it with 2.6.27-7-generic, but since I upgraded to 2.6.27-10-generic it seems to have disappeared.
Ilja Sekler (ilja-sekler-) wrote : | #9 |
The same warning with 2.6.27-11-generic from intrepid-proposed.
Launchpad Janitor (janitor) wrote : Kernel team bugs | #10 |
Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https:/
Dave Gilbert (ubuntu-treblig) wrote : | #11 |
I've merged a couple more dupes in with this; it was suggested (by someone else!) that perhaps #331415 is a dupe of this and many of these all relate to firmware loading - do all of those who have this problem have stuff that loads firmware?
This appears to be caused by bug #331415 "request_firmware() fails on resume from suspend" but I need to know what device drivers might be requesting firmware.
Please attach /var/log/kern.log that contains the Oops message - it should also contain clues as to which driver was delayed.
Changed in linux: | |
assignee: | nobody → intuitivenipple |
status: | Triaged → In Progress |
zippidy_josh (redstone) wrote : | #13 |
Attached a kern.log. I'm running:
Linux redstone-desktop 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux
The log doesn't have a 'oops' message, but has:
Feb 20 08:55:53 redstone-desktop kernel: [388341.469605] ------------[ cut here ]------------
Feb 20 08:55:53 redstone-desktop kernel: [388341.469607] WARNING: at /build/
Feb 20 08:55:53 redstone-desktop kernel: [388341.469609] Modules linked in: rt73usb af_packet rfcomm bridge stp bnep sco l2cap bluetooth ppdev ipv6 acpi_cpufreq cpufreq_powersave cpufreq_stats cpufreq_
Feb 20 08:55:53 redstone-desktop kernel: rocessor fan fbcon tileblit font bitblit softcursor fuse [last unloaded: rt73usb]
Feb 20 08:55:53 redstone-desktop kernel: [388341.469660] Pid: 26102, comm: pm-suspend Tainted: P W 2.6.27-11-generic #1
Feb 20 08:55:53 redstone-desktop kernel: [388341.469663] [<c037d236>] ? printk+0x1d/0x1f
Feb 20 08:55:53 redstone-desktop kernel: [388341.469666] [<c0131e99>] warn_on_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469670] [<c014d305>] ? sched_clock_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469673] [<c014c15f>] ? down_trylock+
Feb 20 08:55:53 redstone-desktop kernel: [388341.469675] [<c0132572>] ? try_acquire_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469679] [<c024ed00>] ? kobject_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469682] [<c015d2b4>] suspend_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469684] [<c015d3a6>] suspend_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469687] [<c015d621>] enter_state+
Feb 20 08:55:53 redstone-desktop kernel: [388341.469689] [<c015d6d5>] state_store+
Feb 20 08:55:53 redstone-desktop kernel: [388341.469691] [<c015d650>] ? state_store+
Feb 20 08:55:53 redstone-desktop kernel: [388341.469693] [<c024ebc4>] kobj_attr_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469695] [<c01ffac7>] sysfs_write_
Feb 20 08:55:53 redstone-desktop kernel: [388341.469698] [<c01b2760>] vfs_write+
Feb 20 08:55:53 redstone-desktop kernel: [388341.469700] [<c01ffa30>] ? sysfs_write_
Feb 20 08:55:53 redstone-...
TJ (tj) wrote : Re: [Bug 286672] Re: WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80() | #14 |
On Sat, 2009-02-21 at 17:44 +0000, zippidy_josh wrote:
> Attached a kern.log. I'm running:
> Linux redstone-desktop 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux
I don't *think* your issue is caused by firmware. The tell-tail is
kernel time-stamps that delay for 60 seconds.
I do see a 7 second delay whilst the SATA drive is being probed:
[388333.633707] sd 0:0:0:0: [sda] Starting disk
[388338.256032] ata1: link is slow to respond, please be patient (ready=0)
[388340.272066] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
But the only firmware reports come after user-space has been unfrozen:
[388341.469762] Restarting tasks ... done.
[388342.292283] r8169: eth0: link down
[388342.292524] ADDRCONF(
[388548.630686] usbcore: deregistering interface driver rt73usb
[388549.005780] phy16: Selected rate control algorithm 'pid'
[388549.007027] Registered led device: rt73usb-phy16:radio
[388549.007051] Registered led device: rt73usb-phy16:assoc
[388549.007073] Registered led device: rt73usb-
[388549.007628] usbcore: registered new interface driver rt73usb
[388553.029264] firmware: requesting rt73.bin
Jonas Petersson (catellie) wrote : | #15 |
As mentioned above, I no longer experince this issue, but I can say that I never loaded any firmware that I'm aware of - unless you count ndiswrapper. Previous models of MacBookPro required firmware for the webcam, but that has never been needed on the 4,1 I have. So, from my point of view, firmware is not the real/only cause of this.
[BTW: I'm currently on 2.6.27-12-generic and it seems less stable than -11- just after resuming (2 hangs in 3 days), but that is not related this bug as far as I can tell.]
Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote : | #16 |
Same thing happens with the current jaunty kernel on my system after suspend:
[ 412.802413] WARNING: at /build/
[ 412.802415] Component: resume devices
[ 412.802416] Modules linked in: radeon drm binfmt_misc bridge stp bnep ppdev video output input_polldev iptable_filter ip_tables x_tables dm_crypt lp joydev hid_microsoft dcdbas serio_raw psmouse pcspkr iTCO_wdt iTCO_vendor_support wacom snd_hda_intel usbhid snd_pcm_oss snd_mixer_oss snd_pcm parport_pc parport snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd intel_agp agpgart shpchp soundcore snd_page_alloc usb_storage tg3 ehci_hcd uhci_hcd fbcon tileblit font bitblit softcursor fuse
[ 412.802445] Pid: 4676, comm: pm-suspend Not tainted 2.6.28-8-generic #24-Ubuntu
[ 412.802446] Call Trace:
[ 412.802450] [<c013aa10>] warn_slowpath+
[ 412.802453] [<c015473a>] ? down_trylock+
[ 412.802456] [<c013b06d>] ? try_acquire_
[ 412.802459] [<c02bec80>] ? kobject_
[ 412.802462] [<c04ea0c1>] ? printk+0x18/0x1f
[ 412.802465] [<c0165eb0>] suspend_
[ 412.802468] [<c0165f86>] suspend_
[ 412.802470] [<c04ea0c1>] ? printk+0x18/0x1f
[ 412.802472] [<c0166209>] enter_state+
[ 412.802474] [<c01662bd>] state_store+
[ 412.802477] [<c0166240>] ? state_store+
[ 412.802480] [<c02beb44>] kobj_attr_
[ 412.802483] [<c020b662>] sysfs_write_
[ 412.802486] [<c01be8c8>] vfs_write+
[ 412.802488] [<c020b5d0>] ? sysfs_write_
[ 412.802490] [<c01be9fd>] sys_write+0x3d/0x70
[ 412.802493] [<c0103f6b>] sysenter_
[ 412.802495] [<c04e0000>] ? da903x_
[ 412.802497] ---[ end trace 87ad1ccb4fe25fbf ]---
Fridtjof, could you attach /var/log/kern.log that includes the problem. I want to see what was going on before and after the warning message.
Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote : | #18 |
Fridtjof, thank-you. It looks like the same cause as zippidy_josh reported.
[10692.148085] sd 0:0:0:0: [sda] Starting disk
[10697.032030] ata1: link is slow to respond, please be patient (ready=0)
[10698.992081] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Fridtjof, zippyidy_josh, could both attach the system PCI configuration from:
sudo lspci -vvnn
I'm wondering if there's a common hardware device here, or whether it is an issue with the ata_piix / libata drivers.
Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote : | #20 |
zippidy_josh (redstone) wrote : | #21 |
Could both of you attach the details of the disk drive that is slow to respond? There are several potential scenarios I'm exploring:
1. Specific issues with some disk drives. Some drives take a *long* time to clear the status register after a reset.
2. Issues with the particular chipset/controllers (I notice both systems have two 82801* (ICH8/9 Family) IDE controllers
3. Issues with ata_piix driver
The reports:
ls -l /sys/block/sda > /tmp/report.txt
sudo hdparm -I /dev/sda >> /tmp/report.txt
Attach /tmp/report.txt to your reply.
Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote : | #23 |
- Increase ATA_TMOUT_FF_WAIT to 5 seconds Edit (446 bytes, text/plain)
This looks to be related to a known issue with some combinations of controller and disk. The warning "link is slow to respond, please be patient" is issued in drivers/
It waits for the link status to change from 0xff (no device) for up to ATA_TMOUT_FF_WAIT. Currently this is defined as 800 milliseconds in include.
/* FIXME: GoVault needs 2s but we can't afford that without
* parallel probing. 800ms is enough for iVDR disk
* HHD424020F7SV00. Increase to 2secs when parallel probing
* is in place.
*/
ATA_TMOUT_FF_WAIT = 800,
The commit that introduced ata_wait_ready() was:
commit aa2731ad9ad80ac
Author: Tejun Heo <email address hidden>
Date: Mon Apr 7 22:47:19 2008 +0900
libata: separate out ata_wait_ready() and implement ata_wait_
Factor out waiting logic (which is common to all ATA controllers) from
ata_
@check_ready function pointer and uses it to poll for readiness. This
allows non-SFF controllers to use ata_wait_ready() to wait for link
readiness.
This patch also implements ata_wait_
ata_
ata_
ata_
The associated LKML comments give more detail:
http://
"On certain device/controller combination, 0xff status is asserted
after reset and doesn't get cleared during 150ms post-reset wait. As
0xff status is interpreted as no device (for good reasons), this can
lead to misdetection on such cases.
This patch implements ata_wait_
sleep and waits upto ATA_TMOUT_FF_WAIT if status is 0xff.
ATA_TMOUT_FF_WAIT is currently 800ms which is enough for
HHD424020F7SV00 to get detected but not enough for Quantum GoVault
drive which is known to take upto 2s.
Without parallel probing, spending 2s on 0xff port would incur too
much delay on ata_piix's which use 0xff to indicate empty port and
doesn't have SCR register, so GoVault needs to wait till parallel
probing."
In terms of it causing this kernel oops report it looks as if the problem is with the slow-path detection firing a false positive.
There was a related bug #318978 "Hard drive in Studio XPS 13 and 16 cause a 17-18s resume time" which applied a patch (commit b65db6fd5d) to increase ATA_TMOUT_
I suspect this current issue is related to that.
We need to decide if we can increase ATA_TMOUT_FF_WAIT to 5 seconds. Because it is a timeout and not a delay increasing the maxmium shouldn't affect systems that don't have a slow device since it is only used if the link indicates a device is present.
I've attached a patch that enables a 5 second maximum wait.
I'll ask other kernel developers to look at this.
Forgot to mention my primary concern here!
Although the change in the arbitrary time-out will remove the 'link is slow to respond' message it won't of itself remove the kernel oops.
The ideal solution would be parallel processing for multiple devices maybe combined with some intelligence in the kernel's slow-path detection to account for situations that are mandated in specifications (in this case ATA-6 allows such delays) or imposed by hardware constraints.
Fridtjof, we're looking at this issue in more detail and thinking about back-porting the Jaunty patch from bug #318978 to test kernels for Intrepid. However we noticed that in one of your dmesg reports in comment #16
https:/
It shows the in-use kernel was "2.6.28-8-generic #24-Ubuntu" which should include the patch to deal with #318978.
Could you clarify which kernel versions you are experiencing this with to help us?
Stefan is going to build an Intrepid test kernel with the patch incorporated and will make it available shortly.
Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote : | #27 |
Both kernels (intrepid and jaunty) show the same behavior, I tested to 2.6.28 on my intrepid box to check if this problem still existed in the upcoming version.
Stefan Bader (smb) wrote : | #28 |
I have uploaded the kernels to http://
zippidy_josh (redstone) wrote : Re: [Bug 286672] Re: WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80() | #29 |
- report.txt Edit (2.8 KiB, text/plain; charset=US-ASCII; name="report.txt")
Attached is report.txt.
Josh
On Wed, Feb 25, 2009 at 4:27 AM, TJ <email address hidden> wrote:
> Could both of you attach the details of the disk drive that is slow to
> respond? There are several potential scenarios I'm exploring:
>
> 1. Specific issues with some disk drives. Some drives take a *long* time to
> clear the status register after a reset.
> 2. Issues with the particular chipset/controllers (I notice both systems
> have two 82801* (ICH8/9 Family) IDE controllers
> 3. Issues with ata_piix driver
>
> The reports:
>
> ls -l /sys/block/sda > /tmp/report.txt
> sudo hdparm -I /dev/sda >> /tmp/report.txt
>
> Attach /tmp/report.txt to your reply.
>
> --
> WARNING: at /build/
> suspend_
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” source package in Ubuntu: In Progress
>
> Bug description:
> I installed kerneloops. Seconds later apport says the kernel chrashed... It
> doesn't seems likely since I'm still here.. but something did probably go
> bad.
>
> ProblemType: KernelOops
> Architecture: i386
> DistroRelease: Ubuntu 8.10
> Package: linux-image-
> ProcCmdLine: User Name=UUID=
> splash
> ProcVersionSign
> SourcePackage: linux
> Title: WARNING: at /build/
> suspend_
>
The only thing of note in the disk reports is that both disks are on the first PCI SATA controller and, probably more significant, both devices are Western Digital. Anecdotal reports suggest some WD drives in particular have this characteristic slow reset.
Please try the Intrepid kernel Stefan has prepared and report back with the dmesg after a suspend/resume cycle.
Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote : | #31 |
- dmesg on OptiPlex 745 Edit (3.6 KiB, text/plain)
2.6.27-11-generic #1 SMP Thu Feb 26 18:51:37 UTC 2009 i686 GNU/Linux
Fridtjof, thank-you.
As already mentioned there is no real problem here.
[ 110.100086] sd 0:0:0:0: [sda] Starting disk
[ 114.984030] ata1: link is slow to respond, please be patient (ready=0)
[ 116.216086] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 116.232336] ata1.00: configured for UDMA/133
The reason for the Warning:
[ 116.506548] PM: resume devices took 7.100 seconds
is that TEST_SUSPEND_
Stefan Bader (smb) wrote : | #33 |
Sorry, my fault. I was so concentrated on the error path that I forgot the success path. While being at that I added additional WARN statements to cause stack traces when entering the function to wait for the link ready. Hopefully this gives a better understanding from where this is called. The new kernels are at the same place (v2) http://
zippidy_josh (redstone) wrote : | #34 |
I'm won't have access to that machine again for at least a month, so it'll
be a bit...
Josh
On Feb 27, 2009 9:30 AM, "TJ" <email address hidden> wrote:
The only thing of note in the disk reports is that both disks are on the
first PCI SATA controller and, probably more significant, both devices
are Western Digital. Anecdotal reports suggest some WD drives in
particular have this characteristic slow reset.
Please try the Intrepid kernel Stefan has prepared and report back with
the dmesg after a suspend/resume cycle.
--
WARNING: at /build/
suspend_
Fridtjof Busse (fbusse-deactivatedaccount-deactivatedaccount) wrote : | #35 |
- dmesg on OptiPlex 745 Edit (3.3 KiB, text/plain)
2.6.27-11-generic #1 SMP Fri Feb 27 19:03:38 UTC 2009
I agree that this is not a critical error, but nonethelesse a kernel oops is not exactly a "nice to have".
TJ (tj) wrote : Re: [Bug 286672] Re: WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80() | #36 |
On Sat, 2009-02-28 at 08:45 +0000, Fridtjof Busse wrote:
> I agree that this is not a critical error, but nonethelesse a kernel
> oops is not exactly a "nice to have".
That was my point in mentioning TEST_SUSPEND_
increasing the current 5-second limit to avoid these false alarms.
Before we can do that however, we have to look at a wide range of
systems and scenarios so as to choose a value that will avoid false
alarms but not miss real problems.
Progress Update:
Currently analysing log-files collected on Launchpad to determine a suitable value for TEST_SUSPEND_
tags: | added: test-suspend-seconds |
Proposing to increase TEST_SUSPEND_
https:/
Tormod Volden (tormodvolden) wrote : | #39 |
TJ, I added the test-suspend-
On Tue, 2009-04-21 at 18:16 +0000, Tormod Volden wrote:
> TJ, I added the test-suspend-
> think are duplicates. Should I dup them, or would you like run your
> launchpad mining on them first?
Thanks Tormod that's really helpful. I'll run a modified script against
them first and inspect them too - sometimes the logs show obvious device
problems.
Changed in linux (Ubuntu): | |
assignee: | TJ (intuitivenipple) → nobody |
status: | In Progress → Triaged |
Andy Whitcroft (apw) wrote : | #41 |
The timeout for this warning was increased for Lucid and later. Closing off the development task.
Changed in linux (Ubuntu): | |
status: | Triaged → Fix Released |
assignee: | nobody → Andy Whitcroft (apw) |
I have same problem, after kerneloops install in Ubuntu 8.10
kernel: 2.6.27-7-generic
sudo dpkg -p kerneloops:
Package: kerneloops Maintainer: Matthew Wilcox <email address hidden>
Priority: optional
Section: utils
Installed-Size: 164
Maintainer: Ubuntu MOTU Developers <email address hidden>
Architecture: i386
Version: 0.10-2ubuntu1
Depends: libc6 (>= 2.4), libcurl3-gnutls (>= 7.16.2-1), libdbus-1-3 (>= 1.0.2), libdbus-glib-1-2 (>= 0.71), libglib2.0-0 (>= 2.14.0), libgtk2.0-0 (>= 2.14.1), libnotify1 (>= 0.4.4), libnotify1-gtk2.10
Size: 20128
Description: kernel oops tracker
kerneloops is a daemon that collects kernel crash information and then
submits the extracted signature to the kerneloops.org website for
statistical analysis and presentation to the Linux kernel developers.
Original-