Comment 60 for bug 34902

Revision history for this message
ed_p (edpizzi) wrote : Re: Ralink Wireless USB/PCMCIA/PCI hangs PC

Here is some additional information:

Playing around in a text console, it appears that the "freeze" is caused by a kernel panic that occurs when we hit a BUG in a bh context at kernel/timer.c, line 411 (in cascade, an inline function that appears in __run_timers, which itself is an inline function that appears in run_timer_softirq, which is run in a bh context).

Here's the trace I was working from. The code indicates that it's line 411 of some file:

Code: ... <0f> 0b 9b 01 ee 03 30 c0 ...

It panics, so I couldn't conclusively trace which file from the panic, but I think it's clearly kernel/timer.c, given the stack trace.

 [<c01259a2>] run_timer_softirq+0x132/0x1d0
 [<c01214ff>] __do_softirq+0x4f/0xb0
 [<c0121595>] do_softirq+0x35/0x40
 [<c0121645>] irq_exit+0x35/0x40
 [<c01059cf>] do_IRQ+0x1f/0x30
 [<c010410a>] common_interrupt+0x1a/0x20

I'm doing some testing now, as to how to fix this. Since this is a double-fault, I'm curious if we can't just disable BUG()'s in the kernel (by recompiling). I don't know if the system will recover or not - it realizes something's wrong, but we panic since it reports the BUG in BH context, not necessarily because the problem is un-recoverable.

I just did a build with big-kernel-lock pre-emption (CONFIG_PREEMPT_BKL) turned off, and I didn't see any touble. So this may be a pre-emption issue. The kernel that had the problem and the one that didn't is a pretty big configuration delta, so I'm still trying to figure out what fixed it.

Still looking...