"task blocked for more than 120 seconds" freezes system in Lucid: xfs

Bug #588046 reported by Thomas
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I get "INFO: task xfsdatad/0:3344 blocked for more than 120 seconds." with subsequent freeze when doing disk I/O on a single or two xfs partitions.
This is yet another variant of bug #494476, where emphasis is on ext4 and possible other processes causing the bug to appear. Reading this and bug #276476, my impression is that it is triggered by disk I/O, no matter the filesystem.

In on instance, I tried to unrar a large file. After processing 80% of the archive, the process got stuck, the filesystem became unusable ("ls" on other console freezing as well), only reboot helped. Of course, the simple 'shutdown' commend doesn't finish, so that's actually a hard reset.
The second instance was during a copy of several Gig from one xfs partition to another. Same symptoms.

Given that only a reset resolves the situation, this might eventually lead to damaged file systems and data loss.

This system has been upgraded to 10.04 via 9.10 from 9.04. The problem _did not_ occur under 9.04.

-----------------------------------------------------------
ay 31 09:55:22 tirion kernel: [ 2160.252098] INFO: task xfsdatad/0:3344 blocked for more than 120 seconds.
May 31 09:55:22 tirion kernel: [ 2160.252108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 09:55:22 tirion kernel: [ 2160.252117] xfsdatad/0 D 0001674e 0 3344 2 0x00000000
May 31 09:55:22 tirion kernel: [ 2160.252132] f5205ec4 00000046 f5d72000 0001674e 00000000 c0846740 f605b5ec c0846740
May 31 09:55:22 tirion kernel: [ 2160.252153] 6b8e4857 000001d5 c0846740 c0846740 f605b5ec c0846740 c0846740 f6668a80
May 31 09:55:22 tirion kernel: [ 2160.252173] 00000000 000001d5 f605b340 f5205ef8 f605b340 f5b890e4 f5205ef0 c058b315
May 31 09:55:22 tirion kernel: [ 2160.252192] Call Trace:
May 31 09:55:22 tirion kernel: [ 2160.252215] [<c058b315>] rwsem_down_failed_common+0x75/0x1a0
May 31 09:55:22 tirion kernel: [ 2160.252228] [<c058b45d>] rwsem_down_write_failed+0x1d/0x30
May 31 09:55:22 tirion kernel: [ 2160.252241] [<c058b4f6>] call_rwsem_down_write_failed+0x6/0x10
May 31 09:55:22 tirion kernel: [ 2160.252252] [<c058aa14>] ? down_write+0x24/0x30
May 31 09:55:22 tirion kernel: [ 2160.252315] [<f8dfa2c0>] xfs_ilock+0x70/0x90 [xfs]
May 31 09:55:22 tirion kernel: [ 2160.252366] [<f8e1b180>] xfs_setfilesize+0x30/0x70 [xfs]
May 31 09:55:22 tirion kernel: [ 2160.252415] [<f8e1b413>] xfs_end_bio_delalloc+0x13/0x20 [xfs]
May 31 09:55:22 tirion kernel: [ 2160.252466] [<c016369e>] run_workqueue+0x8e/0x150
May 31 09:55:22 tirion kernel: [ 2160.252516] [<f8e1b400>] ? xfs_end_bio_delalloc+0x0/0x20 [xfs]
May 31 09:55:22 tirion kernel: [ 2160.252529] [<c01637e4>] worker_thread+0x84/0xe0
May 31 09:55:22 tirion kernel: [ 2160.252542] [<c0167740>] ? autoremove_wake_function+0x0/0x50
May 31 09:55:22 tirion kernel: [ 2160.252554] [<c0163760>] ? worker_thread+0x0/0xe0
May 31 09:55:22 tirion kernel: [ 2160.252565] [<c01674b4>] kthread+0x74/0x80
May 31 09:55:22 tirion kernel: [ 2160.252575] [<c0167440>] ? kthread+0x0/0x80
May 31 09:55:22 tirion kernel: [ 2160.252587] [<c0104087>] kernel_thread_helper+0x7/0x10
May 31 09:55:22 tirion kernel: [ 2160.252602] INFO: task loop1:4127 blocked for more than 120 seconds.
May 31 09:55:22 tirion kernel: [ 2160.252609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 09:55:22 tirion kernel: [ 2160.252617] loop1 D 004206d7 0 4127 2 0x00000000
May 31 09:55:22 tirion kernel: [ 2160.252630] d296bde8 00000046 f5fa0000 004206d7 00000000 c0846740 f605cf8c c0846740
May 31 09:55:22 tirion kernel: [ 2160.252649] 94b6508d 000001d5 c0846740 c0846740 f605cf8c c0846740 c0846740 f6668a80
May 31 09:55:22 tirion kernel: [ 2160.252669] 00000000 000001d5 f605cce0 c1c08740 f605cce0 d296be30 d296bdf8 c058998a
May 31 09:55:22 tirion kernel: [ 2160.252688] Call Trace:
May 31 09:55:22 tirion kernel: [ 2160.252699] [<c058998a>] io_schedule+0x3a/0x60
May 31 09:55:22 tirion kernel: [ 2160.252711] [<c01c9f2d>] sync_page+0x3d/0x50
May 31 09:55:22 tirion kernel: [ 2160.252721] [<c058a12d>] __wait_on_bit+0x4d/0x70
May 31 09:55:22 tirion kernel: [ 2160.252731] [<c01c9ef0>] ? sync_page+0x0/0x50
May 31 09:55:22 tirion kernel: [ 2160.252742] [<c01ca151>] wait_on_page_bit+0x91/0xa0
May 31 09:55:22 tirion kernel: [ 2160.252754] [<c0167790>] ? wake_bit_function+0x0/0x50
May 31 09:55:22 tirion kernel: [ 2160.252766] [<c01ca3f1>] wait_on_page_writeback_range+0xa1/0x110
May 31 09:55:22 tirion kernel: [ 2160.252780] [<c01ca5f8>] filemap_write_and_wait_range+0x78/0x80
May 31 09:55:22 tirion kernel: [ 2160.252795] [<c022a0a2>] vfs_fsync_range+0x62/0xc0
May 31 09:55:22 tirion kernel: [ 2160.252807] [<c022a1b3>] vfs_fsync+0x33/0x40
May 31 09:55:22 tirion kernel: [ 2160.252820] [<c03f0fa1>] do_bio_filebacked+0xf1/0x160
May 31 09:55:22 tirion kernel: [ 2160.252830] [<c058955c>] ? schedule+0x44c/0x840
May 31 09:55:22 tirion kernel: [ 2160.252845] [<c012a438>] ? default_spin_lock_flags+0x8/0x10
May 31 09:55:22 tirion kernel: [ 2160.252857] [<c058b5ef>] ? _spin_lock_irqsave+0x2f/0x50
May 31 09:55:22 tirion kernel: [ 2160.252868] [<c01678af>] ? finish_wait+0x4f/0x70
May 31 09:55:22 tirion kernel: [ 2160.252879] [<c03f10b6>] loop_thread+0xa6/0x210
May 31 09:55:22 tirion kernel: [ 2160.252891] [<c0167740>] ? autoremove_wake_function+0x0/0x50
May 31 09:55:22 tirion kernel: [ 2160.252901] [<c03f1010>] ? loop_thread+0x0/0x210
May 31 09:55:22 tirion kernel: [ 2160.252912] [<c01674b4>] kthread+0x74/0x80
May 31 09:55:22 tirion kernel: [ 2160.252922] [<c0167440>] ? kthread+0x0/0x80
May 31 09:55:22 tirion kernel: [ 2160.252934] [<c0104087>] kernel_thread_helper+0x7/0x10
May 31 09:55:22 tirion kernel: [ 2160.252944] INFO: task kdmflush:4141 blocked for more than 120 seconds.
May 31 09:55:22 tirion kernel: [ 2160.252950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 09:55:22 tirion kernel: [ 2160.252958] kdmflush D 000337e4 0 4141 2 0x00000000
May 31 09:55:22 tirion kernel: [ 2160.252971] d29a9edc 00000046 f5f18000 000337e4 00000000 c0846740 f692e92c c0846740
May 31 09:55:22 tirion kernel: [ 2160.252991] 9094f856 000001d5 c0846740 c0846740 f692e92c c0846740 c0846740 f63ebc40
May 31 09:55:22 tirion kernel: [ 2160.253010] 00000000 000001d5 f692e680 c1c08740 f692e680 00000002 d29a9eec c058998a
May 31 09:55:22 tirion kernel: [ 2160.253029] Call Trace:
...

several more traces

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-22-generic 2.6.32-22.33
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-22.33-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: roth 2992 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'I82801DBICH4'/'Intel 82801DB-ICH4 with STAC9750,51 at irq 5'
   Mixer name : 'SigmaTel STAC9750,51'
   Components : 'AC97a:83847650'
   Controls : 36
   Simple ctrls : 23
Date: Mon May 31 22:38:06 2010
HibernationDevice: RESUME=/dev/sda5
MachineType: Dell Computer Corporation Inspiron 8600
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: root=UUID=ec7ab08b-f34d-4d61-8a03-4f9035425956 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34
SourcePackage: linux
WpaSupplicantLog:

dmi.bios.date: 01/12/2004
dmi.bios.vendor: Dell Computer Corporation
dmi.bios.version: A04
dmi.board.name: 0P3490
dmi.board.vendor: Dell Computer Corporation
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Computer Corporation
dmi.modalias: dmi:bvnDellComputerCorporation:bvrA04:bd01/12/2004:svnDellComputerCorporation:pnInspiron8600:pvr:rvnDellComputerCorporation:rn0P3490:rvr:cvnDellComputerCorporation:ct8:cvr:
dmi.product.name: Inspiron 8600
dmi.sys.vendor: Dell Computer Corporation

Revision history for this message
Thomas (t-roth) wrote :
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Thomas,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Thomas (t-roth) wrote :

I've installed upstream kernel 2.6.34-99-generic and filled those two partitions cpoying around 100GB without a problem, the freeze does not occur.

tags: removed: needs-upstream-testing
Revision history for this message
Thomas (t-roth) wrote :

Tested 2 more upstream kernels from kernel.ubuntu.com/~kernel-ppa/mainline/:

linux-image-2.6.32-02063209-generic_2.6.32-02063209 reproduces the error: just by downloading (1 Mb/s) to the disk. No error messages in the log because of:
kernel: Kernel logging (proc) stopped.
kernel: imklog: Cannot read proc file system, 1

linux-image-2.6.32-0206321505-generic_2.6.32-0206321505 does _not_ reproduce the error: Downloaded several hundreds MB, unrar, move files around ...

However, the same problem with the proc fs occurs - is that a bad thing? Everything seems to work fine, and I would stick to that kernel for now - the 2.6.34 I tested yesterday doesn't recongnize my LAN (though it sees my WLAN ;-)), nor does it know the sound chip.

tags: added: kernel-needs-review kernel-uncat
Changed in linux (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → Medium
Andy Whitcroft (apw)
tags: added: kernel-fs
removed: kernel-uncat
Andy Whitcroft (apw)
tags: added: kernel-reviewed
removed: kernel-needs-review
Revision history for this message
Brad Figg (brad-figg) wrote :

There are a couple other bugs that have similar symptoms. Specifically, https://bugs.launchpad.net/bugs/585092 and https://bugs.launchpad.net/bugs/543617 . Users experiencing these bugs have reported positive results from testing the 2.6.32-25 -proposed kernel. Please give that a test and see if it resolves this bug as well.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Leon (leonbo) wrote :

I've tried the new kernel (2.6.32-25) but it made no difference for me

Revision history for this message
Leon (leonbo) wrote :

My problem was a broken second disk in a raid 1 array. No problems anymore!

Revision history for this message
Nathan Grennan (ngrennan) wrote :

Linux version 2.6.32-25-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #44-Ubuntu SMP Fri Sep 17 20:05:27 UTC 2010

This is on ext4.

[193727.468533] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[193727.468859] updatedb.mloc D 0000000000000002 0 3898 3892 0x00000000
[193727.468864] ffff8800972379d8 0000000000000082 0000000000015bc0 0000000000015bc0
[193727.468869] ffff88042d6b5f80 ffff880097237fd8 0000000000015bc0 ffff88042d6b5bc0
[193727.468873] 0000000000015bc0 ffff880097237fd8 0000000000015bc0 ffff88042d6b5f80
[193727.468878] Call Trace:
[193727.468887] [<ffffffff8116d010>] ? sync_buffer+0x0/0x50
[193727.468893] [<ffffffff81541557>] io_schedule+0x47/0x70
[193727.468897] [<ffffffff8116d055>] sync_buffer+0x45/0x50
[193727.468901] [<ffffffff81541dbf>] __wait_on_bit+0x5f/0x90
[193727.468905] [<ffffffff8116d010>] ? sync_buffer+0x0/0x50
[193727.468909] [<ffffffff81541e68>] out_of_line_wait_on_bit+0x78/0x90
[193727.468914] [<ffffffff810845c0>] ? wake_bit_function+0x0/0x40
[193727.468918] [<ffffffff8116d006>] __wait_on_buffer+0x26/0x30
[193727.468923] [<ffffffff811e4fea>] ext4_find_entry+0x1ba/0x4c0
[193727.468928] [<ffffffff8154379e>] ? _spin_lock+0xe/0x20
[193727.468931] [<ffffffff8154379e>] ? _spin_lock+0xe/0x20
[193727.468937] [<ffffffff811560a0>] ? d_rehash+0x50/0x60
[193727.468940] [<ffffffff811e533d>] ext4_lookup+0x4d/0x130
[193727.468944] [<ffffffff8114cd32>] real_lookup+0xe2/0x160
[193727.468947] [<ffffffff8114ecd8>] do_lookup+0xb8/0xf0
[193727.468952] [<ffffffff8108be61>] ? in_group_p+0x31/0x40
[193727.468956] [<ffffffff8114f805>] __link_path_walk+0x765/0xf80
[193727.468960] [<ffffffff8115029a>] path_walk+0x6a/0xe0
[193727.468964] [<ffffffff8115046b>] do_path_lookup+0x5b/0xa0
[193727.468968] [<ffffffff81151137>] user_path_at+0x57/0xa0
[193727.468972] [<ffffffff8154379e>] ? _spin_lock+0xe/0x20
[193727.468978] [<ffffffff812b2fe5>] ? _atomic_dec_and_lock+0x55/0x80
[193727.468984] [<ffffffff81147664>] ? cp_new_stat+0xe4/0x100
[193727.468988] [<ffffffff8114788c>] vfs_fstatat+0x3c/0x80
[193727.468992] [<ffffffff8114793e>] vfs_lstat+0x1e/0x20
[193727.468995] [<ffffffff81147964>] sys_newlstat+0x24/0x50

Revision history for this message
penalvch (penalvch) wrote :

Thomas, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc1

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: bios-outdated-a14
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.