Idle-priority scheduling bug blocks tasks

Bug #276476 reported by unikob
210
This bug affects 36 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Andy Whitcroft
Nominated for Hardy by D
Nominated for Jaunty by Brian Rogers
Nominated for Karmic by Lucian Adrian Grijincu

Bug Description

"INFO: task blocked for more than 120 seconds" is a diagnostic message for when a process that wants to run is blocked from running for a long time.

This bug report refers to a specific problem with idle-priority scheduling that has now been fixed. If you are up to date and are still getting the "task blocked" message, it is a different bug, so please file a new bug report.

Tags: cherry-pick
Revision history for this message
unikob (unikob) wrote :
Revision history for this message
unikob (unikob) wrote : Re: INFO: task blocked for more than 120 seconds causes system freeze

Changed to linux (kernel) as it appears it is not a nspluginwrapper bug afterall.

description: updated
Revision history for this message
unikob (unikob) wrote :
Download full text (3.9 KiB)

As this happens frequently I've found out that the system recovers from the freeze and resumes normal operation two to ten minutes after. However the system continues to freeze periodically after that.

Steps to reproduce:

1. Do something that causes a lot of IO operations ie. copy images, mp3's around, download Ibex cd-images from the net etc.

2. Wait until system freezes and check out dmesg from terminal (if you it opens) and you'll see something like this:

[ 9309.344503] INFO: task pdflush:17487 blocked for more than 120 seconds.
[ 9309.344505] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9309.344508] pdflush D 0000000000000001 0 17487 2
[ 9309.344513] ffff8800b2ce14e0 0000000000000046 00000000000256b3 ffffffff803a7def
[ 9309.344518] ffffffff807a5400 ffffffff807a5400 ffffffff807a5400 ffffffff807a5400
[ 9309.344524] ffffffff807a5400 ffffffff807a5400 ffffffff807a5400 ffffffff807a5400
[ 9309.344528] Call Trace:
[ 9309.344534] [<ffffffff803a7def>] ? __up_read+0x8f/0xb0
[ 9309.344538] [<ffffffff80500d65>] schedule_timeout+0x95/0xd0
[ 9309.344560] [<ffffffffa0af9037>] ? xfs_bmap_add_extent+0x237/0x420 [xfs]
[ 9309.344565] [<ffffffff805017e6>] __down+0x76/0xc0
[ 9309.344569] [<ffffffff8026bddc>] down+0x4c/0x50
[ 9309.344590] [<ffffffffa0b3ee23>] xfs_buf_lock+0x23/0x60 [xfs]
[ 9309.344612] [<ffffffffa0b2b9e2>] xfs_getsb+0x22/0x50 [xfs]
[ 9309.344634] [<ffffffffa0b3269a>] xfs_trans_getsb+0x5a/0xa0 [xfs]
[ 9309.344656] [<ffffffffa0b30d90>] xfs_trans_apply_sb_deltas+0x20/0x320 [xfs]
[ 9309.344678] [<ffffffffa0b311ab>] _xfs_trans_commit+0x11b/0x3f0 [xfs]
[ 9309.344700] [<ffffffffa0b227de>] ? xlog_space_left+0xe/0x90 [xfs]
[ 9309.344721] [<ffffffffa0af65d1>] ? xfs_bmap_finish+0x21/0x1a0 [xfs]
[ 9309.344744] [<ffffffffa0b1f518>] xfs_iomap_write_allocate+0x288/0x3e0 [xfs]
[ 9309.344767] [<ffffffffa0b20356>] xfs_iomap+0x2d6/0x310 [xfs]
[ 9309.344773] [<ffffffff80395f66>] ? blk_rq_map_sg+0x16/0x2d0
[ 9309.344793] [<ffffffffa0b3b15b>] xfs_map_blocks+0x3b/0x90 [xfs]
[ 9309.344799] [<ffffffff803b36c6>] ? swiotlb_map_sg_attrs+0x56/0x160
[ 9309.344819] [<ffffffffa0b3c3eb>] xfs_page_state_convert+0x2fb/0x760 [xfs]
[ 9309.344838] [<ffffffffa010338c>] ? ahci_qc_issue+0xc/0xe0 [ahci]
[ 9309.344844] [<ffffffff803a6d30>] ? radix_tree_gang_lookup_tag_slot+0xc0/0xe0
[ 9309.344849] [<ffffffff8038fa24>] ? elv_next_request+0x164/0x290
[ 9309.344870] [<ffffffffa0b3c9b1>] xfs_vm_writepage+0x71/0x120 [xfs]
[ 9309.344875] [<ffffffff802bbeaa>] ? __dec_zone_page_state+0x2a/0x30
[ 9309.344879] [<ffffffff802b4407>] __writepage+0x17/0x50
[ 9309.344883] [<ffffffff802b568d>] write_cache_pages+0x2bd/0x420
[ 9309.344887] [<ffffffff802b43f0>] ? __writepage+0x0/0x50
[ 9309.344890] [<ffffffff802aefab>] ? mempool_alloc+0x5b/0x140
[ 9309.344896] [<ffffffff802b5814>] generic_writepages+0x24/0x30
[ 9309.344916] [<ffffffffa0b3b50c>] xfs_vm_writepages+0x5c/0x80 [xfs]
[ 9309.344921] [<ffffffff802b584d>] do_writepages+0x2d/0x50
[ 9309.344925] [<ffffffff8030ea3b>] __sync_single_inode+0x6b/0x2f0
[ 9309.344930] [<ffffffff8030ed18>] __writeback_single_inode+0x58/0x1b0
[ 9309.344933] [<ffffffff8023bae2>] ? update_cu...

Read more...

Revision history for this message
unikob (unikob) wrote :

Seems to be resemble bug #257331:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/257331

Searches around the interwebs seems to indicate dm-crypt as a potential candidate, although I'm certain I'm not running my disks encrypted.. seems that there are other appearances of this floating around:

http://www.google.fi/search?hl=fi&q=info:+task+blocked+for+120+seconds&start=10&sa=N

Revision history for this message
unikob (unikob) wrote :

The problem went away when I switched to server-kernel. I don't know what the differences are between generic and server kernel, but maybe someone with more knowledge can find this info useful.

Revision history for this message
Ioan (ioan123-gmail) wrote :

server-kernel did not fix my issue - it still freezes.

Interesting fact - if it can help - I am running BOINC, and when the system gets frozen, if I kill BOINC task - everything gets back to normal. BTW, my config is amd64 (on Core2Duo), patched with latest kernel 2.6.27-8

Any help will be highly appreciated, this is so annoying ...

Revision history for this message
unikob (unikob) wrote :

Hi Ioan, could you attach your kern.log also?

I'm running boinc too, but in my case killing boinc did not fix the issue. Anyway I'm changing status to confirmed.

Changed in linux:
status: New → Confirmed
Revision history for this message
unikob (unikob) wrote :

Apparently the bug is present even on the server kernel, as I just got hit with this again.

Uname -a:

Linux ubuntu 2.6.27-8-server #1 SMP Thu Nov 6 18:18:16 UTC 2008 x86_64 GNU/Linux

And killing Boinc rectifies the system usability.

Previous server-kernels worked fine on my machine, what have changed between 2.6.27-7 and 2.6.27-8?

Revision history for this message
Ioan (ioan123-gmail) wrote :

I cannot enclose the system log right now (not at home) - I can do it tonight, but you won't see anything more than you have already posted above (no mention of boincmgr in my syslog). Have you also noticed that pdflush is *always* the first program frozen - could be a coincidence, maybe not

I also tried the kernel options "nohz=off highres=off" - did not fix the issue either

Talking about Boinc - I found on the Internet a forum telling that the current Boinc Linux client can freeze - the beta is supposed to fix the issue - I tried the beta, it doesn't - same issue (in addition to other problems ...)

I will try tonight the server-kernel 2.6.27-7 see how it works for me

Revision history for this message
Ioan (ioan123-gmail) wrote :

server-kernel 2.6.27-7 woks only slightly better: still have the issue, but instead of freezing the entire system, it freezes only 1 application (firefox or mplayer or any other active app you are using).

Also tried "Folding at Home" instead of Boinc (just to see if Boinc is the real cause) - same issue - definitely looks more a kernel issue?

Ioan (ioan123-gmail)
Changed in linux:
assignee: nobody → ubuntu-kernel-team
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Martin Lindhe (martinlindhe) wrote : Re: INFO: task blocked for more than 120 seconds causes system freeze

I am also hit by this bug on my work PC (amd64 and fully updated Ubuntu 8.10).

The issue usually shows up every work day. Gnome fully freeze up, even mouse cursor cant move.
If you leave the computer alone for a couple of minutes it resumes to working state.
Often the freeze happens in combination with other network or disk-heavy tasks being run, but I have not figured any way to clearly reproduce it.

I have 2 laptops and 1 home pc that also runs 64-bit ubuntu 8.10 and none of those computers show this issue. I believe it is some hardware combination & kernel issue.

Revision history for this message
kenjo (ken-kenjo) wrote : Re: [Bug 276476] Re: INFO: task blocked for more than 120 seconds causes system freeze

On Thu, 2009-01-08 at 10:38 +0000, Martin Lindhe wrote:
> I am also hit by this bug on my work PC (amd64 and fully updated Ubuntu
> 8.10).
>
> The issue usually shows up every work day. Gnome fully freeze up, even mouse cursor cant move.
> If you leave the computer alone for a couple of minutes it resumes to working state.
> Often the freeze happens in combination with other network or disk-heavy tasks being run, but I have not figured any way to clearly reproduce it.
>
> I have 2 laptops and 1 home pc that also runs 64-bit ubuntu 8.10 and
> none of those computers show this issue. I believe it is some hardware
> combination & kernel issue.
>

Are you also running a background process??

I have changed to 2.6.28 and its much much better but still have had a
few short freezes

Revision history for this message
Martin Lindhe (martinlindhe) wrote :

kenjo wrote:
> Are you also running a background process??
>
> I have changed to 2.6.28 and its much much better but still have had a
> few short freezes
>

What would qualify as a background process? I run several
such as sshd, cupsd, bluetoothd ;)

However i also am running distributed.net client 24/7 on the machine.

Revision history for this message
kenjo (ken-kenjo) wrote :

On Tue, 2009-01-13 at 10:35 +0000, Martin Lindhe wrote:
> kenjo wrote:
> > Are you also running a background process??
> >
> > I have changed to 2.6.28 and its much much better but still have had a
> > few short freezes
> >
>
> What would qualify as a background process? I run several
> such as sshd, cupsd, bluetoothd ;)
>
> However i also am running distributed.net client 24/7 on the machine.
>

I meant any low priority process so that the machine always uses 100%
cpu. This looks like the common thing so far.

Revision history for this message
Martin Lindhe (martinlindhe) wrote :

kenjo wrote:
> On Tue, 2009-01-13 at 10:35 +0000, Martin Lindhe wrote:
>
>> kenjo wrote:
>>
>>> Are you also running a background process??
>>>
>>> I have changed to 2.6.28 and its much much better but still have had a
>>> few short freezes
>>>
>>>
>> What would qualify as a background process? I run several
>> such as sshd, cupsd, bluetoothd ;)
>>
>> However i also am running distributed.net client 24/7 on the machine.
>>
>>
>
> I meant any low priority process so that the machine always uses 100%
> cpu. This looks like the common thing so far.
>
Actually it appears to me that the bug is triggered for me
sometimes when I am running VirtualBox in the background
for several hours. But as mentioned before it can happen
in various situations.

Revision history for this message
Brian Rogers (brian-rogers) wrote : Re: INFO: task blocked for more than 120 seconds causes system freeze

This bug has been fixed upstream, by the following commits:
cce7ade803699463ecc62a065ca522004f7ccb3d
6bc912b71b6f33b041cfde93ca3f019cbaa852bc
e17036dac189dd034c092a91df56aa740db7146d

See this thread:
http://lkml.org/lkml/2009/1/11/70

It also looks like these fixes will be backported to the stable kernel.

Revision history for this message
unikob (unikob) wrote :

Great!

I'll close this bug when I see the fix backported to current kernel in 8.10

Revision history for this message
unikob (unikob) wrote :

Fix(es) Committed on latest stable kernel 2.6.28.2

Changed in linux:
status: Confirmed → Fix Committed
Revision history for this message
Brian Rogers (brian-rogers) wrote :

Only one of the three commits for scheduler issues has been backported in 2.6.28.2, and I can still produce this problem with 2.6.28-6-generic. I believe the critical commit for this issue is 6bc912b71b6f33b041cfde93ca3f019cbaa852bc.

Changed in linux:
status: Fix Committed → Confirmed
description: updated
Andy Whitcroft (apw)
Changed in linux:
assignee: nobody → apw
status: Confirmed → In Progress
Revision history for this message
Andy Whitcroft (apw) wrote :

We are indeed only carrying one of the three upstream commits (for completeness):

    cce7ade803699463ecc62a065ca522004f7ccb3d
    6bc912b71b6f33b041cfde93ca3f019cbaa852bc
    e17036dac189dd034c092a91df56aa740db7146d

We picked up the third of these via the v2.6.28.2 stable updates. I had a chat with the patch originator and they are really recommended as a trio, so I have pulled the other two commits into our tree and built some test kernels based on the latest Jaunty kernel. If those who are affected by this issue could test these kernels and report back here that would be helpful. Kernels can be found at the URL below:

    http://people.ubuntu.com/~apw/lp276476-jaunty/

Changed in linux:
importance: Undecided → Medium
status: In Progress → Incomplete
Revision history for this message
unikob (unikob) wrote :

Tested using Brian Rogers program from LKML:

Current 8.10 chokes albeit doesn't hang, system becomes unusable.

Current Jaunty (of 4.4.2009) gets a little choppy when four instances of the program runs at the same time,.

Jaunty with patched kernel fares best, no choppines when four instances simultaneously running.

Used test case:

Boinc running on four cores worldcommunitygrid, Firefox with Flash x64 playing a video from Myspace, launching the test program from console.

Revision history for this message
unikob (unikob) wrote :

Confirmed on Jaunty Alpha 6.

Changed in linux:
status: Incomplete → Confirmed
Andy Whitcroft (apw)
Changed in linux:
status: Confirmed → In Progress
Revision history for this message
kenjo (ken-kenjo) wrote : Re: [Bug 276476] Re: INFO: task blocked for more than 120 seconds causes system freeze

On Wed, 2009-03-04 at 17:59 +0000, unikob wrote:
> Tested using Brian Rogers program from LKML:
>
> Current 8.10 chokes albeit doesn't hang, system becomes unusable.
>
> Current Jaunty (of 4.4.2009) gets a little choppy when four instances of
> the program runs at the same time,.
>
> Jaunty with patched kernel fares best, no choppines when four instances
> simultaneously running.
>
> Used test case:
>
> Boinc running on four cores worldcommunitygrid, Firefox with Flash x64
> playing a video from Myspace, launching the test program from console.
>

I have upgraded to a 2.6.29 kernel and see no problem any longer.

Revision history for this message
Lars (lars-taeuber) wrote : Re: INFO: task blocked for more than 120 seconds causes system freeze

Hi there!

I still have these problems (sometimes very often) with ubuntu 9.04 jaunty and 2.6.28-11-generic kernel.

Is a fixed kernel in sight?

Thanks
Lars

Revision history for this message
Alexandre Kandalintsev (spam-messir) wrote :

It seems karmic affected too (2.6.30-9-generic)

Revision history for this message
Lars (lars-taeuber) wrote :

Hi!

I vote this bug to be changed to a higher importance.
It is very annoying because it happens quite often a day and more often when the computer is running on high load.
(niced processes => 100% CPU load, e.g. boinc)

When playing mplayer video files (-vo xv) while the computer being on high load you have a very high chance to lock the whole X session for quite some time.
I'll test this with only audio files next time.

Regards
Lars

Revision history for this message
Andy Whitcroft (apw) wrote :

To fix this I would need someone to test the kernels I posted. As the primary fixes to the scheduler we pulled in via a second stable update I will regenerate these test kernels. Ok updated kernels are available at the URL below, if you could test those and report back here:

    http://people.canonical.com/~apw//lp276476-jaunty/

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Lars (lars-taeuber) wrote :

Hi Andy,

I can confirm that the amd64 version seems to fix the problem, at least with my setup.
Thank you _very_much_!

Lars

Revision history for this message
Lucian Adrian Grijincu (lucian.grijincu) wrote :

@Andy, Can you please build kernels for karmic?

It happens for me on 2.6.31-3-generic #19-Ubuntu SMP, running on a Intel(R) Pentium(R) 4 CPU 3.20GHz with two hyperthreads.

It does not involve any background processes (as far as I can tell).
It seems to happen when RAM gets tight (I run one or two gitk's over the linux kernel and `free -m` reports little improvement).

As a side note on 2.6.31-3-generic it happens quite frequently, but I don' recall seeing it on 2.6.31-2-generic.
Should I file a separate bug?

Revision history for this message
Lars (lars-taeuber) wrote :

Hi again.

Sorry I have to recall my confirmation.
Here are not the »120 seconds freeze« error messages any more but I had a frozen system. I'm not sure it is caused by the same bug.

But it is definitely harder to get a frozen system.
Lars

Revision history for this message
Sesshomurai (darren-ontrenet) wrote :

Hi,
 I see what seems to be this same problem on my 8.10 server AMI's on EC2. They freeze predictably after a certain period of activity. I found that memory got below 35MB free when it froze and my 'top' listing showed kswapd with nice of -5 appearing 2nd on the process list. My server image has no swap defined. Hope this info helps.

Revision history for this message
Christoph Korn (c-korn) wrote :

This bug is still present in the final karmic.

What is the status with these commits being backported ?

Revision history for this message
Brian Rogers (brian-rogers) wrote :

A new bug should be filed for any 'task blocked' issues remaining in Karmic, since the commits to fix the original issue are already in kernel 2.6.31. This bug should actually be in the 'Fix Released' state, so I'll fix that.

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Allen.McIntosh (mcintosh) wrote :

I have been bitten by this several times in Karmic.
For what it's worth I run FC12 on the same machine, and so far I haven't been bitten by it there. (I believe FC12 also uses kernel 2.6.31).

I'll be getting more experience with FC12 now, since crashing every two hours isn't acceptable. I'll post a followup if I see crashes there (or not).

Revision history for this message
BenR (benregier) wrote :

This bug (or a similar one) is still affecting me, and I'm using 2.6.31-16 generic on Karmic.

It was frozen today for about 20 minutes before I gave up and restarted it using SysRq+rseiub.

Does this mean we need to open a new bug report?

[23400.464046] INFO: task i915/0:318 blocked for more than 120 seconds.
[23400.464052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23400.464057] i915/0 D c08145c0 0 318 2 0x00000000
[23400.464065] f6a05f04 00000046 f64f713c c08145c0 f64f73a8 c08145c0 08c8cc8d 0000143a
[23400.464077] c08145c0 c08145c0 f64f73a8 c08145c0 08c8c149 0000143a c08145c0 ef0cc8c0
[23400.464088] f64f7110 f69fec14 f69fec18 ffffffff f6a05f30 c056fdd6 f6901920 f69fec1c
[23400.464100] Call Trace:
[23400.464111] [<c056fdd6>] __mutex_lock_slowpath+0xc6/0x130
[23400.464117] [<c056fcf0>] mutex_lock+0x20/0x40
[23400.464146] [<f83aec0a>] i915_gem_retire_work_handler+0x2a/0x70 [i915]
[23400.464154] [<c0157a7e>] run_workqueue+0x6e/0x140
[23400.464176] [<f83aebe0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[23400.464182] [<c0157bd8>] worker_thread+0x88/0xe0
[23400.464188] [<c015c280>] ? autoremove_wake_function+0x0/0x40
[23400.464195] [<c0157b50>] ? worker_thread+0x0/0xe0
[23400.464200] [<c015bf8c>] kthread+0x7c/0x90
[23400.464205] [<c015bf10>] ? kthread+0x0/0x90
[23400.464211] [<c0104007>] kernel_thread_helper+0x7/0x10

Revision history for this message
Ian! D. Allen (idallen) wrote :

This started happening to me on 8.10 (2.6.27-16-generic) when I started a
real-disk-to-real-disk "cp -a" inside a guest 9.10 running inside VMware.
The host 8.10 is having a very hard time with this.

Yesterday, the time-outs were all firefox:

# grep 'blocked for more than 120 seconds' syslog.0 | tr : ' ' | acol 11 | sort | uniq -c
     48 firefox

Today, it's a mix:

      5 boinc
      3 fetchmail
      1 hal-acl-tool
      6 kjournald
      5 local
      1 mutt_dotlock
      1 pam-foreground-
      1 polkit-ian
      5 procmail
      6 rsync
      1 sadc
      5 setiathome_6.03
      1 sleep
      1 vmnet-dhcpd

Sample:

[68185.828301] INFO: task rsync:21998 blocked for more than 120 seconds.
[68185.828303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[68185.828306] rsync D 0000ffff 0 21998 21980
[68185.828309] c0cd1cc8 00000086 ffffffff 0000ffff c9642840 c0cd1c90 00000008 c0cd1c80
[68185.828315] 00000246 c01d8285 c2c2cb60 00071b6d 00000000 f6e319cc 00000000 00000000
[68185.828321] c0514080 f70de000 c0cd0000 c2c2cdd8 c2b17d00 00000000 c02405b9 c2b17d00
[68185.828327] Call Trace:
[68185.828328] [<c01d8285>] ? alloc_buffer_head+0x15/0x40
[68185.828333] [<c02405b9>] ? generic_unplug_device+0x9/0x40
[68185.828338] [<c037febc>] io_schedule+0x2c/0x40
[68185.828341] [<c01d8a7d>] sync_buffer+0x3d/0x50
[68185.828344] [<c038049f>] __wait_on_bit+0x4f/0x70
[68185.828348] [<c01d8a40>] ? sync_buffer+0x0/0x50
[68185.828351] [<c038056d>] out_of_line_wait_on_bit+0xad/0xc0
[68185.828355] [<c01d8a40>] ? sync_buffer+0x0/0x50
[68185.828358] [<c0148700>] ? wake_bit_function+0x0/0x60
[68185.828362] [<c01d8bae>] __wait_on_buffer+0x2e/0x30
[68185.828366] [<f8a75f33>] wait_on_buffer+0x23/0x30 [ext3]
[68185.828383] [<f8a767c6>] __ext3_get_inode_loc+0x256/0x300 [ext3]
[68185.828392] [<c01c8a5b>] ? find_inode_fast+0xb/0x70
[68185.828396] [<c01c9197>] ? get_new_inode_fast+0xe7/0x120
[68185.828399] [<c01c9d1b>] ? iget_locked+0x5b/0x60
[68185.828404] [<f8a768cd>] ext3_iget+0x5d/0x3e0 [ext3]
[68185.828412] [<c01c5c7a>] ? d_rehash+0x4a/0x60
[68185.828417] [<c01c718e>] ? __d_lookup+0xe/0x120
[68185.828422] [<f8a7c169>] ext3_lookup+0x89/0xd0 [ext3]
[68185.828431] [<c03817ed>] ? _spin_lock+0xd/0x10
[68185.828435] [<c01c79bf>] ? d_alloc+0x13f/0x1a0
[68185.828444] [<c01bd1df>] real_lookup+0xaf/0x110
[68185.828447] [<c01bd2cd>] do_lookup+0x8d/0xd0
[68185.828451] [<c01bd84d>] __link_path_walk+0x53d/0xb40
[68185.828454] [<c01be2b4>] path_walk+0x54/0xb0
[68185.828457] [<c01be466>] do_path_lookup+0xb6/0x1a0
[68185.828461] [<c01bf07a>] user_path_at+0x4a/0x80
[68185.828464] [<c01b6f89>] ? cp_new_stat64+0xe9/0x100
[68185.828470] [<c01b74d3>] vfs_lstat_fd+0x23/0x50
[68185.828474] [<c01b7596>] vfs_lstat+0x16/0x20
[68185.828477] [<c01b75b9>] sys_lstat64+0x19/0x30
[68185.828481] [<c0104072>] syscall_call+0x7/0xb
[68185.828484] =======================

Revision history for this message
Ian! D. Allen (idallen) wrote :

> This started happening to me on 8.10 (2.6.27-16-generic) when I started a
> real-disk-to-real-disk "cp -a" inside a guest 9.10 running inside VMware.

In addition to the "blocked for more than 120 seconds" errors, I'm also
seeing characters that I'm typing repeat themselves. Reading LKML,
I see concurrent reports of keyboard character repeating, so it's related.

I'm going to upgrade to 9.10 - this 8.10 has too many problems. (The USB
subsystem is also broken for large file transfers to external disks.)

Revision history for this message
Sesshomurai (darren-ontrenet) wrote :

I believe this - very very bad problem - also happens in karmic according to a poster above.
I'm beginning to think that the blocked task is not the "cause" but the "effect" of this problem.
In other words, perhaps the scheduler or something in the kernel is foobar and it thus results
in a task or more becoming blocked indefinitely.

Revision history for this message
humbhenri (humbhenri) wrote :

I can confirm this bug in karmic too, kernel 2.6.31-16-generic2.6.31-16-generic. System freezes often using chromium, flashplugin eats 100% cpu and then computer hangs for a few minutes; in terminal task block messages popup too.

Revision history for this message
Rodrigo Azevedo (rodrigoams) wrote :

I've the same problem with karmic server (AMD 64), Linux 2.6.31-17-server #54-Ubuntu SMP. System freezes when I'm using a cpu intensive scientific program. With Ubuntu 6.04 I not have this problem. This is very annoying because I installed karmic in 20 computers last week and they started to freeze.

Revision history for this message
mkalbere (info-programmers) wrote :

I had this problem on my eeepc 1201 ION on karmic ubuntu (2.6.31-17-generic).
I upgrade to 2.6.33-020633rc4-generic with the NVIDIA-Linux-x86_64-195.30-pkg2.run nividia drivers.

I submitted my netbook to highload (I found out that the freeze occured mostly on heavy hdd access )

Several hours after .. still no freeze ... seems i's fixed .. I'll repost a feed-back in 3-4 days .

Revision history for this message
mkalbere (info-programmers) wrote :

Sorry bad pronostic, forget my last comment ;-(

Revision history for this message
Richard Huddleston (rhuddusa) wrote :
Download full text (6.6 KiB)

I'm seeing this on
Linux box05 2.6.32-02063209-generic #02063209 SMP Wed Feb 24 10:09:53 UTC 2010 x86_64 GNU/Linux

occurs while coping a disk to nfs share on another box

Feb 24 21:11:29 box05 kernel: [ 600.530615] INFO: task flush-0:25:2623 blocked for more than 120 seconds.
Feb 24 21:11:29 box05 kernel: [ 600.530618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 24 21:11:29 box05 kernel: [ 600.530622] flush-0:25 D 0000000000000001 0 2623 2 0x00000000
Feb 24 21:11:29 box05 kernel: [ 600.530628] ffff8800c7bdd110 0000000000000046 0000000000000000 ffff8800c7bddfd8
Feb 24 21:11:29 box05 kernel: [ 600.530634] ffff880129f45b40 0000000000015a80 0000000000015a80 ffff8800c7bddfd8
Feb 24 21:11:29 box05 kernel: [ 600.530640] 0000000000015a80 ffff8800c7bddfd8 0000000000015a80 ffff880129f45f00
Feb 24 21:11:29 box05 kernel: [ 600.530645] Call Trace:
Feb 24 21:11:29 box05 kernel: [ 600.530651] [<ffffffff81533c52>] io_schedule+0x42/0x60
Feb 24 21:11:29 box05 kernel: [ 600.530668] [<ffffffffa0c31f39>] nfs_wait_bit_uninterruptible+0x9/0x10 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530673] [<ffffffff815343d5>] __wait_on_bit+0x55/0x80
Feb 24 21:11:29 box05 kernel: [ 600.530689] [<ffffffffa0c31f30>] ? nfs_wait_bit_uninterruptible+0x0/0x10 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530706] [<ffffffffa0c31f30>] ? nfs_wait_bit_uninterruptible+0x0/0x10 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530711] [<ffffffff81534478>] out_of_line_wait_on_bit+0x78/0x90
Feb 24 21:11:29 box05 kernel: [ 600.530716] [<ffffffff8107f760>] ? wake_bit_function+0x0/0x30
Feb 24 21:11:29 box05 kernel: [ 600.530732] [<ffffffffa0c31f28>] nfs_wait_on_request+0x28/0x30 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530750] [<ffffffffa0c36199>] nfs_wait_on_requests_locked+0x79/0xd0 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530757] [<ffffffff8111e01b>] ? get_partial_node+0x9b/0xc0
Feb 24 21:11:29 box05 kernel: [ 600.530775] [<ffffffffa0c37253>] nfs_sync_mapping_wait+0xb3/0x140 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530782] [<ffffffff81437a58>] ? pskb_expand_head+0x58/0x1a0
Feb 24 21:11:29 box05 kernel: [ 600.530800] [<ffffffffa0c37366>] nfs_wb_page_priority+0x86/0xe0 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530806] [<ffffffff8112744e>] ? page_cgroup_zoneinfo+0x2e/0x40
Feb 24 21:11:29 box05 kernel: [ 600.530823] [<ffffffffa0c373ce>] nfs_wb_page+0xe/0x10 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530836] [<ffffffffa0c26975>] nfs_release_page+0x35/0x60 [nfs]
Feb 24 21:11:29 box05 kernel: [ 600.530842] [<ffffffff810e58eb>] try_to_release_page+0x2b/0x40
Feb 24 21:11:29 box05 kernel: [ 600.530847] [<ffffffff810f27e7>] shrink_page_list+0x4b7/0x550
Feb 24 21:11:29 box05 kernel: [ 600.530852] [<ffffffff810f2abd>] ? isolate_lru_pages+0x23d/0x280
Feb 24 21:11:29 box05 kernel: [ 600.530858] [<ffffffff810f2e43>] shrink_inactive_list+0x2f3/0x700
Feb 24 21:11:29 box05 kernel: [ 600.530864] [<ffffffff81436ba9>] ? __alloc_skb+0x49/0x180
Feb 24 21:11:29 box05 kernel: [ 600.530870] [<ffffffff810e8f23>] ? move_freepages_block+0x73/0x80
Feb 24 21:11:29 box05 kernel: [ 600.530875] [<ffffffff810f32a1>] shrink_list+0x51/0x...

Read more...

Revision history for this message
Hurukan (pelat-laurent) wrote :

I have the same problem. The freeze happens in disk-heavy tasks being run.. (cp, dd, ...)

I'm on Linux welcome 2.6.31-19-generic #56-Ubuntu SMP Thu Jan 28 02:39:34 UTC 2010 x86_64 GNU/Linux

Here a trace from Virtualbox (when I was creating a disk of 40GB)

[ 600.427705] INFO: task VirtualBox:2495 blocked for more than 120 seconds.
[ 600.427708] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 600.427711] VirtualBox D 00000000ffffffff 0 2495 2428 0x00000000
[ 600.427716] ffff8801c89c1a08 0000000000000082 ffff8800280441c8 0000000000015880
[ 600.427722] ffff880214573110 0000000000015880 0000000000015880 0000000000015880
[ 600.427727] 0000000000015880 ffff880214573110 0000000000015880 0000000000015880
[ 600.427732] Call Trace:
[ 600.427737] [<ffffffff811ebf95>] do_get_write_access+0x315/0x5e0
[ 600.427742] [<ffffffff81078a00>] ? wake_bit_function+0x0/0x40
[ 600.427747] [<ffffffff811ec3ec>] jbd2_journal_get_write_access+0x2c/0x50
[ 600.427753] [<ffffffff811d4680>] __ext4_journal_get_write_access+0x30/0x70
[ 600.427758] [<ffffffff811b394b>] ext4_reserve_inode_write+0x6b/0x90
[ 600.427763] [<ffffffff811b39b7>] ext4_mark_inode_dirty+0x47/0x1d0
[ 600.427768] [<ffffffff811b3cb2>] ext4_dirty_inode+0x62/0xb0
[ 600.427773] [<ffffffff81140056>] __mark_inode_dirty+0x36/0x130
[ 600.427778] [<ffffffff81136544>] inode_setattr+0x74/0x170
[ 600.427782] [<ffffffff811b3e7e>] ext4_setattr+0x17e/0x360
[ 600.427787] [<ffffffff811367a3>] notify_change+0x163/0x340
[ 600.427792] [<ffffffff8111dfb9>] do_truncate+0x69/0x90
[ 600.427797] [<ffffffff81420b58>] ? common_perm+0x48/0x90
[ 600.427802] [<ffffffff81128720>] ? get_write_access+0x40/0x70
[ 600.427807] [<ffffffff81129226>] may_open+0x1e6/0x220
[ 600.427812] [<ffffffff8112cb58>] do_filp_open+0x208/0xac0
[ 600.427817] [<ffffffff8127cf42>] ? __strncpy_from_user+0x22/0x60
[ 600.427822] [<ffffffff811376b2>] ? alloc_fd+0x102/0x150
[ 600.427827] [<ffffffff8111d094>] do_sys_open+0x64/0x160
[ 600.427832] [<ffffffff8111d1bb>] sys_open+0x1b/0x20
[ 600.427837] [<ffffffff81012002>] system_call_fastpath+0x16/0x1b

Revision history for this message
Hurukan (pelat-laurent) wrote :
Download full text (3.9 KiB)

When I add kernel.hung_task_timeout_secs = 0 in my sysctl.conf, Most of the time, I have small freeze from 1/2 seconds max...
But sometimes, I have a big freeze (need a reboot). The disk is new (and there is no problem on it - checked before...)

Here a trace:
Feb 28 21:05:10 redshield kernel: [ 1.760466] Hardware name: P55A-UD5
Feb 28 21:05:10 redshield kernel: [ 1.760467] Modules linked in:
Feb 28 21:05:10 redshield kernel: [ 1.760469] Pid: 0, comm: swapper Not tainted 2.6.31-19-generic #56-Ubuntu
Feb 28 21:05:10 redshield kernel: [ 1.760471] Call Trace:
Feb 28 21:05:10 redshield kernel: [ 1.760475] [<ffffffff8105e678>] warn_slowpath_common+0x78/0xb0
Feb 28 21:05:10 redshield kernel: [ 1.760477] [<ffffffff8105e6bf>] warn_slowpath_null+0xf/0x20
Feb 28 21:05:10 redshield kernel: [ 1.760479] [<ffffffff81034335>] hpet_next_event+0x75/0x90
Feb 28 21:05:10 redshield kernel: [ 1.760481] [<ffffffff8103437b>] hpet_legacy_next_event+0xb/0x10
Feb 28 21:05:10 redshield kernel: [ 1.760484] [<ffffffff8108556f>] clockevents_program_event+0x4f/0x90
Feb 28 21:05:10 redshield kernel: [ 1.760486] [<ffffffff81086a60>] tick_dev_program_event+0x40/0xd0
Feb 28 21:05:10 redshield kernel: [ 1.760489] [<ffffffff810863f6>] tick_broadcast_oneshot_control+0x116/0x120
Feb 28 21:05:10 redshield kernel: [ 1.760491] [<ffffffff81085ce0>] tick_notify+0x130/0x1a0
Feb 28 21:05:10 redshield kernel: [ 1.760494] [<ffffffff8152ca37>] notifier_call_chain+0x47/0x90
Feb 28 21:05:10 redshield kernel: [ 1.760496] [<ffffffff8107d6d1>] raw_notifier_call_chain+0x11/0x20
Feb 28 21:05:10 redshield kernel: [ 1.760498] [<ffffffff81085374>] clockevents_notify+0x34/0xa0
Feb 28 21:05:10 redshield kernel: [ 1.760500] [<ffffffff812d7eea>] lapic_timer_state_broadcast+0x41/0x43
Feb 28 21:05:10 redshield kernel: [ 1.760502] [<ffffffff812d8447>] acpi_idle_enter_bm+0x185/0x2bf
Feb 28 21:05:10 redshield kernel: [ 1.760504] [<ffffffff812d82b4>] ?...

Read more...

Revision history for this message
Paul (paulwillems) wrote :
Download full text (3.2 KiB)

Hi,

A friend of my has the same problem. He is using:
Linux smi01 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux (Release 9.10)

Unfortunately the system freezes frequently. Looking into his dmesg learns me that we are facing the same issue as described above. A small tail from dmesg:
[13920.408039] Call Trace:
[13920.408044] [<c0572b76>] __mutex_lock_slowpath+0xc6/0x130
[13920.408047] [<c0572a90>] mutex_lock+0x20/0x40
[13920.408068] [<f84af8fa>] i915_gem_retire_work_handler+0x2a/0x70 [i915]
[13920.408071] [<c01579de>] run_workqueue+0x6e/0x140
[13920.408083] [<f84af8d0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[13920.408086] [<c0157b38>] worker_thread+0x88/0xe0
[13920.408089] [<c015c180>] ? autoremove_wake_function+0x0/0x40
[13920.408091] [<c0157ab0>] ? worker_thread+0x0/0xe0
[13920.408093] [<c015be8c>] kthread+0x7c/0x90
[13920.408095] [<c015be10>] ? kthread+0x0/0x90
[13920.408097] [<c0104047>] kernel_thread_helper+0x7/0x10
[14040.408013] INFO: task i915/0:329 blocked for more than 120 seconds.
[14040.408016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[14040.408019] i915/0 D c08185c0 0 329 2 0x00000000
[14040.408024] f646df04 00000046 00000001 c08185c0 f6450f28 c08185c0 e7a7e7b7 00000c02
[14040.408030] c08185c0 c08185c0 f6450f28 c08185c0 e7a7e34d 00000c02 c08185c0 f6445340
[14040.408037] f6450c90 f6931814 f6931818 ffffffff f646df30 c0572b76 c074ce80 f693181c
[14040.408043] Call Trace:
[14040.408050] [<c0572b76>] __mutex_lock_slowpath+0xc6/0x130
[14040.408053] [<c0572a90>] mutex_lock+0x20/0x40
[14040.408075] [<f84af8fa>] i915_gem_retire_work_handler+0x2a/0x70 [i915]
[14040.408079] [<c01579de>] run_workqueue+0x6e/0x140
[14040.408091] [<f84af8d0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[14040.408093] [<c0157b38>] worker_thread+0x88/0xe0
[14040.408096] [<c015c180>] ? autoremove_wake_function+0x0/0x40
[14040.408099] [<c0157ab0>] ? worker_thread+0x0/0xe0
[14040.408101] [<c015be8c>] kthread+0x7c/0x90
[14040.408103] [<c015be10>] ? kthread+0x0/0x90
[14040.408105] [<c0104047>] kernel_thread_helper+0x7/0x10

There is one thing I've observed. With the top command I saw that just one process was continuously running (Normally you will see a few processes). The process was "ica" (iTALC master application)

Tail from kern.log:
Mar 26 20:57:30 smi01 kernel: [14040.408053] [<c0572a90>] mutex_lock+0x20/0x40
Mar 26 20:57:30 smi01 kernel: [14040.408075] [<f84af8fa>] i915_gem_retire_work_handler+0x2a/0x70 [i915]
Mar 26 20:57:30 smi01 kernel: [14040.408079] [<c01579de>] run_workqueue+0x6e/0x140
Mar 26 20:57:30 smi01 kernel: [14040.408091] [<f84af8d0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
Mar 26 20:57:30 smi01 kernel: [14040.408093] [<c0157b38>] worker_thread+0x88/0xe0
Mar 26 20:57:30 smi01 kernel: [14040.408096] [<c015c180>] ? autoremove_wake_function+0x0/0x40
Mar 26 20:57:30 smi01 kernel: [14040.408099] [<c0157ab0>] ? worker_thread+0x0/0xe0
Mar 26 20:57:30 smi01 kernel: [14040.408101] [<c015be8c>] kthread+0x7c/0x90
Mar 26 20:57:30 smi01 kernel: [14040.408103] [<c015be10>] ? kthread+0x0/0x90
Mar 26 20:57:30 smi01 kernel: [14040...

Read more...

Revision history for this message
kenjo (ken-kenjo) wrote :

This particular bug has been fixed it was a scheduler problem.

The printout is a sanity check and there is many many different reasons
for it to be printed. Your problem looks to be related to the X display
driver. You should open a new bug for your problem.

Revision history for this message
Dimitris Kounalakis (dcoun) wrote :

bug exists. During a mv command through nfs with big video files in the client machine got the following output
Client nfs machine:
Linux htpc 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 04:38:19 UTC 2010 x86_64 GNU/Linux
Server nfs machine:
Linux asterias 2.6.31-20-server #58-Ubuntu SMP Fri Mar 12 05:40:05 UTC 2010 x86_64 GNU/Linux

Apr 5 22:15:09 htpc kernel: [44760.940080] INFO: task mv:5523 blocked for more than 120 seconds.
Apr 5 22:15:09 htpc kernel: [44760.940087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 5 22:15:09 htpc kernel: [44760.940093] mv D 0000000000000000 0 5523 5506 0x00000000
Apr 5 22:15:09 htpc kernel: [44760.940104] ffff880069105ae8 0000000000000086 ffff880069105a78 0000000000015880
Apr 5 22:15:09 htpc kernel: [44760.940113] ffff88007b131a60 0000000000015880 0000000000015880 0000000000015880
Apr 5 22:15:09 htpc kernel: [44760.940121] 0000000000015880 ffff88007b131a60 0000000000015880 0000000000015880
Apr 5 22:15:09 htpc kernel: [44760.940128] Call Trace:
Apr 5 22:15:09 htpc kernel: [44760.940143] [<ffffffff810da740>] ? sync_page+0x0/0x50
Apr 5 22:15:09 htpc kernel: [44760.940153] [<ffffffff8152ab18>] io_schedule+0x28/0x40
Apr 5 22:15:09 htpc kernel: [44760.940159] [<ffffffff810da77d>] sync_page+0x3d/0x50
Apr 5 22:15:09 htpc kernel: [44760.940166] [<ffffffff8152b247>] __wait_on_bit+0x57/0x80
Apr 5 22:15:09 htpc kernel: [44760.940173] [<ffffffff810da8ee>] wait_on_page_bit+0x6e/0x80
Apr 5 22:15:09 htpc kernel: [44760.940182] [<ffffffff81078a70>] ? wake_bit_function+0x0/0x40
Apr 5 22:15:09 htpc kernel: [44760.940191] [<ffffffff810e4cf0>] ? pagevec_lookup_tag+0x20/0x30
Apr 5 22:15:09 htpc kernel: [44760.940198] [<ffffffff810dad95>] wait_on_page_writeback_range+0xf5/0x190
Apr 5 22:15:09 htpc kernel: [44760.940206] [<ffffffff810dae57>] filemap_fdatawait+0x27/0x30
Apr 5 22:15:09 htpc kernel: [44760.940213] [<ffffffff810db304>] filemap_write_and_wait+0x44/0x50
Apr 5 22:15:09 htpc kernel: [44760.940249] [<ffffffffa02b813b>] nfs_setattr+0x14b/0x170 [nfs]
Apr 5 22:15:09 htpc kernel: [44760.940256] [<ffffffff810da722>] ? __lock_page+0x62/0x70
Apr 5 22:15:09 htpc kernel: [44760.940264] [<ffffffff81078a70>] ? wake_bit_function+0x0/0x40
Apr 5 22:15:09 htpc kernel: [44760.940270] [<ffffffff810da559>] ? find_get_page+0x19/0xa0

Revision history for this message
Alvin (alvind) wrote :

Per comment #34
Should we move on to separate bugs about the blocked tasks? (like bug 494476)
There are quite a lot of those (rsync, kvm, pdflush, kjournald, and more,...)

Revision history for this message
Sesshomurai (darren-ontrenet) wrote :

Hi,
  I don't think this bug is defined correctly. I believe its not the hung process that CAUSES the system freeze. Its something in the kernel/scheduler that CAUSES a process to HANG, which then precipitates to the entire system.

So I don't recommend looking at individual programs or services to fix this, rather severe problem. Nor do I think its something in a process that triggers this. Something else is going on system/kernel wide.

This is a showstopper for production service providers using Ubuntu server.

Revision history for this message
tom (thomas-gutzler) wrote :

Hi,
sorry for double post (bug 494476) but this one seems to get more attention.

I'm getting the same errors (INFO: task xyz blocked for more than 120 seconds) since I upgraded to ext4 on my 4TB /home partition. It gets a lot of I/O including rsync backups, smbd and nfs fileserver for ~20 people. I have attached a dmesg.

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 9.10
Release: 9.10
Codename: karmic
# uname -a
Linux io 2.6.31-20-server #58-Ubuntu SMP Fri Mar 12 05:40:05 UTC 2010 x86_64 GNU/Linux

Revision history for this message
Neil Broomfield (neil-broomfield) wrote :

I think It could be related to this ext4 bug on kernel.org...hopefully a kernel dev can shed more light on the issue?

https://bugzilla.kernel.org/show_bug.cgi?id=14830

I'm guessing this issue is also present on XFS too given the debug in comment #3

Revision history for this message
BenR (benregier) wrote :

I'm on ext3 and this is affecting me as well, so it may not be limited to ext4 and xfs

Revision history for this message
Marc Nieper-Wißkirchen (marc-nieper-wisskirchen) wrote :

Since upgrading to Ubuntu 10.04, my LTSP clients are showing this problem. The first client starts without any issues. All the other ones display messages like: "[...] INFO: task sshd: 493 blocked for more than 120 seconds..." and similarly for other services.

Revision history for this message
Dane Powell (danepowell) wrote :

I saw this in 9.10, and was very hopeful that upgrading to 10.04 would fix it, but no cigar :( This morning I started seeing these errors again. For me, they only occur on cron tasks. Symptoms include not being able to log in via SSH or IMAP for the duration of the errors.

Revision history for this message
humbhenri (humbhenri) wrote : Re: [Bug 276476] Re: INFO: task blocked for more than 120 seconds causes system freeze

Even in Lucid this error is present. They should worry more about serious
things like this than stupid colors and button placements.
---------- Forwarded message ----------
From: danep <email address hidden>
Date: 2010/5/4
Subject: [Bug 276476] Re: INFO: task blocked for more than 120 seconds
causes system freeze
To: <email address hidden>

I saw this in 9.10, and was very hopeful that upgrading to 10.04 would
fix it, but no cigar :( This morning I started seeing these errors
again. For me, they only occur on cron tasks. Symptoms include not being
able to log in via SSH or IMAP for the duration of the errors.

--
INFO: task blocked for more than 120 seconds causes system freeze
https://bugs.launchpad.net/bugs/276476
You received this bug notification because you are a direct subscriber
of the bug.

--

:.Humberto Pinheiro.:

Revision history for this message
ivan (ivan.tg) wrote : Re: INFO: task blocked for more than 120 seconds causes system freeze

I have this problem too, it appears when i attach my camera to the laptop. Attaching any of the other USB mass storage devices i have does not cause problem. However once triggered the bug prevents any other USB storage device from functioning.

And why is the status 'Fix Released' as even with the latest kernel update (2.6.32-22-generic) the problem exists?
I can confirm the problem does not exist on Hardy with latest 2.6.24 kernel.

OFF TOPIC, ironically i stumbled on this bug while trying to download video from my camera showing reproduction of X server crash caused by typing into the search field of Firefox (3.6.3). Alas, yet another bad surprise with 10.04

Revision history for this message
Derek (bugs-m8y) wrote :

Encountering same issue w/ Ubuntu as a vmware guest. Very unpleasant since I'd only recently made the move to the new virtual server layout.

Has made a previously reliable server almost unusable.

Revision history for this message
Derek (bugs-m8y) wrote :

Rebooting the machine does seem to help.
The "basically unusable" level was after 26 days of uptime.
Karmic, running on ESXi 4

Revision history for this message
Jean-Peer Lorenz (peer.loz) wrote :

Is definitely not fixed in Lucid 10.04. I'm facing this since upgrading to Lucid.

Excerpt from the syslog:
--------------------------
May 18 11:43:05 damballah kernel: [ 2245.176082] usb 4-1: reset full speed USB device using ohci_hcd and address 2
May 18 11:43:40 damballah kernel: [ 2280.216126] INFO: task nautilus:2076 blocked for more than 120 seconds.
May 18 11:43:40 damballah kernel: [ 2280.216136] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 18 11:43:40 damballah kernel: [ 2280.216141] nautilus D 000076ec 0 2076 1143 0x00000000
May 18 11:43:40 damballah kernel: [ 2280.216149] d9f15e08 00200086 f1da0000 000076ec 00000000 c0846740 f3e5291c c0846740
May 18 11:43:40 damballah kernel: [ 2280.216159] 5bdf771b 000001f1 c0846740 c0846740 f3e5291c c0846740 c0846740 f0452000
May 18 11:43:40 damballah kernel: [ 2280.216167] 00000000 000001f1 f3e52670 c7a9a028 c7a9a02c ffffffff d9f15e34 c058a556
May 18 11:43:40 damballah kernel: [ 2280.216176] Call Trace:
May 18 11:43:40 damballah kernel: [ 2280.216194] [<c058a556>] __mutex_lock_slowpath+0xc6/0x130
May 18 11:43:40 damballah kernel: [ 2280.216200] [<c058a475>] mutex_lock+0x25/0x40
May 18 11:43:40 damballah kernel: [ 2280.216214] [<f83ca3e9>] fat_count_free_clusters+0x29/0x260 [fat]
May 18 11:43:40 damballah kernel: [ 2280.216225] [<c016ae38>] ? lock_hrtimer_base+0x28/0x50
May 18 11:43:40 damballah kernel: [ 2280.216232] [<c022002f>] ? mntput_no_expire+0x1f/0xe0
May 18 11:43:40 damballah kernel: [ 2280.216237] [<c022002f>] ? mntput_no_expire+0x1f/0xe0
May 18 11:43:40 damballah kernel: [ 2280.216245] [<f83cc368>] fat_statfs+0xc8/0xe0 [fat]
May 18 11:43:40 damballah kernel: [ 2280.216256] [<c02070bd>] vfs_statfs+0x6d/0x80
May 18 11:43:40 damballah kernel: [ 2280.216261] [<c02070ed>] vfs_statfs64+0x1d/0x40
May 18 11:43:40 damballah kernel: [ 2280.216267] [<c02071eb>] sys_statfs64+0x5b/0x90
May 18 11:43:40 damballah kernel: [ 2280.216273] [<c017ab53>] ? do_futex+0xc3/0x1f0
May 18 11:43:40 damballah kernel: [ 2280.216278] [<c017acf2>] ? sys_futex+0x72/0x120
May 18 11:43:40 damballah kernel: [ 2280.216285] [<c01033ec>] syscall_call+0x7/0xb
May 18 11:43:42 damballah kernel: [ 2282.176066] usb 4-1: reset full speed USB device using ohci_hcd and address 2
May 18 11:44:10 damballah kernel: Kernel logging (proc) stopped.
-------------------

After login a backup cronjob was started (nice = 19!) and the system got nearly unusable.

Should anyone modifiy the bug status to pull some attraction on this issue again?

Revision history for this message
Jean-Peer Lorenz (peer.loz) wrote :

P.S.: Even killing the cronjob did not helped to re-create a useable system. I had to perfom a reboot :(

Revision history for this message
Gregory Moyer (syphr42) wrote :

I have a machine running two guests under kvm, the first guest was upgraded to lucid just after release and has never shown an issue. However, just recently I upgraded the other guest and the host to lucid and now the recently upgraded guest locks up completely with a similar log to the ones above on a daily basis. When it hits this problem, network connections are lost and I am unable to shutdown normally. Sometimes I can issue a virsh shutdown and the guest will start to shut down, but then freeze. I have never been able to recover from this or escape from it ultimately any other way than issuing a virsh destroy and then bringing the guest back up.

Revision history for this message
Jean-Peer Lorenz (peer.loz) wrote :

Should someone change the status of this bug in order to pull some attraction on it again?

Revision history for this message
Brian Rogers (brian-rogers) wrote :

I'm going to update the bug title to help the appropriate action occur.

"INFO: task blocked for more than 120 seconds" is a diagnostic message for when a process that wants to run in blocked from running for a long time. It's a symptom, not a disease (read: bug). Just like you'd file a new bug if you found a new way to crash a program, a new bug should be filed for each unique cause of blocked tasks.

Therefore I'll update the bug title to refer to the cause, not the symptom, so people don't find this bug and think it should be reopened.

summary: - INFO: task blocked for more than 120 seconds causes system freeze
+ Idle-priority scheduling bug blocks tasks
description: updated
Revision history for this message
tom (thomas-gutzler) wrote :

So you're saying people who are still affected by the "task blocked for more than 120 seconds" bug should all file their own report (like Alvin suggested in #50)? Or can we just all add to bug 494476?

Revision history for this message
Brian Rogers (brian-rogers) wrote :

If it appears to have the same cause (in this case, ext4 access or perhaps disk access in general), then follow that bug. If it appears to be triggered by something else, file a new bug.

If in doubt, file a new bug. It can always be marked as a duplicate later if after investigation it's revealed to be the same issue.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

tom,
    Please file a new bug regardless. The Ubuntu Kernel Team is moving to a model whereby all affected reporters open new bug so that specific data can be logged for each of them.

I want all affected reporters to open new bug reports for this issue.

Thanks!

~JFo

Revision history for this message
Sesshomurai (darren-ontrenet) wrote : Re: [Bug 276476] Re: Idle-priority scheduling bug blocks tasks

Hi,
 I think this makes sense. Even though it may result in a common fix for all.

But one thing that might help, is maybe provide a set of steps for how to
file information about this sort of bug. Some of us aren't kernel experts,
but if we can gather the information in a common way when we open a new
bug, it could help you.

Darren

> tom,
> Please file a new bug regardless. The Ubuntu Kernel Team is moving to
> a model whereby all affected reporters open new bug so that specific
> data can be logged for each of them.
>
> I want all affected reporters to open new bug reports for this issue.
>
> Thanks!
>
> ~JFo
>
> --
> Idle-priority scheduling bug blocks tasks
> https://bugs.launchpad.net/bugs/276476
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” package in Ubuntu: Fix Released
>
> Bug description:
> "INFO: task blocked for more than 120 seconds" is a diagnostic message for
> when a process that wants to run is blocked from running for a long time.
>
> This bug report refers to a specific problem with idle-priority scheduling
> that has now been fixed. If you are up to date and are still getting the
> "task blocked" message, it is a different bug, so please file a new bug
> report.
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276476/+subscribe
>

Revision history for this message
tom (thomas-gutzler) wrote :

Since there are no additional comments in bug #494476, has anyone launched a new report?
As Darren pointed out above, it might be useful to have a template to look at when submitting related bugs. Searching for bugs where tasks are blocked for more than 120 seconds gives heaps of results but it's difficult to tell if they're related or not - some are even tagged as fixed.

Revision history for this message
Manuel Iglesias Alonso (glesialo) wrote :

Please check this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=16219

I have same problems described here and I have found that in my case it is a kernel bug related to GUID partition tables

I am still writing the bug report.

Revision history for this message
Douglas Bagnall (douglasbagnall) wrote :

For the information of this bug's followers, installing cgroup-bin caused me lock-ups and
"INFO: task * blocked for more than 120 seconds" messages.

https://bugs.launchpad.net/ubuntu/+source/libcgroup/+bug/598335

Revision history for this message
Bruce Edge (bruce-edge) wrote :
Download full text (5.4 KiB)

This is still happening in 10.04.1, both with the default Ubuntu server kernel and with the ppa kernel:

Linux topaz 2.6.35-22-server #33~lucid1-Ubuntu SMP Sat Sep 18 13:29:53 UTC 2010 x86_64 GNU/Linux

Oct 4 09:32:44 topaz kernel: [405000.890225] INFO: task scp:27169 blocked for more than 120 seconds.
Oct 4 09:32:44 topaz kernel: [405000.890231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 4 09:32:44 topaz kernel: [405000.890234] scp D 000000010267dc02 0 27169 29503 0x00000004
Oct 4 09:32:44 topaz kernel: [405000.890241] ffff88012f8b9c08 0000000000000086 ffff880200000000 0000000000015980
Oct 4 09:32:44 topaz kernel: [405000.890248] ffff88012f8b9fd8 0000000000015980 ffff88012f8b9fd8 ffff88011cdb5b80
Oct 4 09:32:44 topaz kernel: [405000.890254] 0000000000015980 0000000000015980 ffff88012f8b9fd8 0000000000015980
Oct 4 09:32:44 topaz kernel: [405000.890259] Call Trace:
Oct 4 09:32:44 topaz kernel: [405000.890272] [<ffffffff81100b60>] ? sync_page+0x0/0x50
Oct 4 09:32:44 topaz kernel: [405000.890279] [<ffffffff8159ca93>] io_schedule+0x73/0xc0
Oct 4 09:32:44 topaz kernel: [405000.890283] [<ffffffff81100b9d>] sync_page+0x3d/0x50
Oct 4 09:32:44 topaz kernel: [405000.890287] [<ffffffff8159d10f>] __wait_on_bit+0x5f/0x90
Oct 4 09:32:44 topaz kernel: [405000.890292] [<ffffffff81100d53>] wait_on_page_bit+0x73/0x80
Oct 4 09:32:44 topaz kernel: [405000.890297] [<ffffffff8107f120>] ? wake_bit_function+0x0/0x40
Oct 4 09:32:44 topaz kernel: [405000.890303] [<ffffffff8110b675>] ? pagevec_lookup_tag+0x25/0x40
Oct 4 09:32:44 topaz kernel: [405000.890307] [<ffffffff8110121d>] filemap_fdatawait_range+0x10d/0x1a0
Oct 4 09:32:44 topaz kernel: [405000.890312] [<ffffffff811012db>] filemap_fdatawait+0x2b/0x30
Oct 4 09:32:44 topaz kernel: [405000.890316] [<ffffffff811015e4>] filemap_write_and_wait+0x44/0x50
Oct 4 09:32:44 topaz kernel: [405000.890334] [<ffffffffa01abf3c>] nfs_setattr+0x14c/0x160 [nfs]
Oct 4 09:32:44 topaz kernel: [405000.890341] [<ffffffff8116bebb>] notify_change+0x16b/0x310
Oct 4 09:32:44 topaz kernel: [405000.890347] [<ffffffff8135cb9e>] ? tty_ldisc_deref+0xe/0x10
Oct 4 09:32:44 topaz kernel: [405000.890352] [<ffffffff81152364>] do_truncate+0x64/0xa0
Oct 4 09:32:44 topaz kernel: [405000.890356] [<ffffffff811525fb>] T.782+0xeb/0x110
Oct 4 09:32:44 topaz kernel: [405000.890360] [<ffffffff8115262e>] sys_ftruncate+0xe/0x10
Oct 4 09:32:44 topaz kernel: [405000.890366] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
Oct 4 09:32:44 topaz kernel: [405000.890371] INFO: task flush-0:26:27661 blocked for more than 120 seconds.
Oct 4 09:32:44 topaz kernel: [405000.890373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 4 09:32:44 topaz kernel: [405000.890376] flush-0:26 D 0000000102694b5b 0 27661 2 0x00000000
Oct 4 09:32:44 topaz kernel: [405000.890383] ffff8801324fda40 0000000000000046 0000000000000000 0000000000015980
Oct 4 09:32:44 topaz kernel: [405000.890388] ffff8801324fdfd8 0000000000015980 ffff8801324fdfd8 ffff88015ab28000
Oct 4 09:32:44 topaz kernel: [405000.890394] 0000000000015980 0000000000015980 ffff880132...

Read more...

Revision history for this message
spazy (marekr) wrote :
Download full text (5.9 KiB)

i have to confirm this problem

Kernel & distribution version
Nov 27 08:34:44 core kernel: [ 0.000000] Linux version 2.6.31-22-server (buildd@allspice) (gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9) ) #68-Ubuntu SMP Tue Oct 26 16:50:02 UTC 2010 (Ubuntu 2.6.31-22.68-serv
er)

kern.log

Nov 27 04:41:54 core kernel: [1242602.033782] INFO: task kjournald2:407 blocked for more than 120 seconds.
Nov 27 04:41:54 core kernel: [1242602.033799] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 27 04:41:54 core kernel: [1242602.033813] kjournald2 D 0000000000000000 0 407 2 0x00000000
Nov 27 04:41:54 core kernel: [1242602.033820] ffff8801205fbad0 0000000000000046 0000000000000002 0000000000015880
Nov 27 04:41:54 core kernel: [1242602.033827] ffff8801210b47c0 0000000000015880 0000000000015880 0000000000015880
Nov 27 04:41:54 core kernel: [1242602.033832] 0000000000015880 ffff8801210b47c0 0000000000015880 0000000000015880
Nov 27 04:41:54 core kernel: [1242602.033839] Call Trace:
Nov 27 04:41:54 core kernel: [1242602.033857] [<ffffffff810da1e0>] ? sync_page+0x0/0x50
Nov 27 04:41:54 core kernel: [1242602.033864] [<ffffffff81529cd8>] io_schedule+0x28/0x40
Nov 27 04:41:54 core kernel: [1242602.033868] [<ffffffff810da21d>] sync_page+0x3d/0x50
Nov 27 04:41:54 core kernel: [1242602.033872] [<ffffffff8152a1f7>] __wait_on_bit+0x57/0x80
Nov 27 04:41:54 core kernel: [1242602.033876] [<ffffffff810da38e>] wait_on_page_bit+0x6e/0x80
Nov 27 04:41:54 core kernel: [1242602.033882] [<ffffffff810785b0>] ? wake_bit_function+0x0/0x40
Nov 27 04:41:54 core kernel: [1242602.033888] [<ffffffff810e4640>] ? pagevec_lookup_tag+0x20/0x30
Nov 27 04:41:54 core kernel: [1242602.033893] [<ffffffff810da835>] wait_on_page_writeback_range+0xf5/0x190
Nov 27 04:41:54 core kernel: [1242602.033899] [<ffffffff811ee9a9>] ? jbd2_journal_file_buffer+0x59/0x80
Nov 27 04:41:54 core kernel: [1242602.033903] [<ffffffff810da8f7>] filemap_fdatawait+0x27/0x30
Nov 27 04:41:54 core kernel: [1242602.033907] [<ffffffff811ef896>] journal_finish_inode_data_buffers+0x56/0x150
Nov 27 04:41:54 core kernel: [1242602.033911] [<ffffffff811f03b6>] jbd2_journal_commit_transaction+0x6d6/0x1100
Nov 27 04:41:54 core kernel: [1242602.033917] [<ffffffff811f5e33>] kjournald2+0x103/0x270
Nov 27 04:41:54 core kernel: [1242602.033921] [<ffffffff81078570>] ? autoremove_wake_function+0x0/0x40
Nov 27 04:41:54 core kernel: [1242602.033926] [<ffffffff811f5d30>] ? kjournald2+0x0/0x270
Nov 27 04:41:54 core kernel: [1242602.033929] [<ffffffff81078186>] kthread+0xa6/0xb0
Nov 27 04:41:54 core kernel: [1242602.033935] [<ffffffff8101312a>] child_rip+0xa/0x20
Nov 27 04:41:54 core kernel: [1242602.033939] [<ffffffff810780e0>] ? kthread+0x0/0xb0
Nov 27 04:41:54 core kernel: [1242602.033942] [<ffffffff81013120>] ? child_rip+0x0/0x20
Nov 27 04:41:54 core kernel: [1242602.033960] INFO: task bash:13216 blocked for more than 120 seconds.
Nov 27 04:41:54 core kernel: [1242602.033970] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 27 04:41:54 core kernel: [1242602.034014] bash D 0000000000000000 0 13216 13215 0x00000000
Nov 27 04:41:54 core...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.