Comment 30 for bug 245779

Revision history for this message
Dave (ubuntu-comm) wrote :

I've been fighting with this bug for 2 months now. Sometimes I get uptime of a couple of weeks. Sometimes only a couple of days. Very weird and seems resilient to kernel changes.

My setup is a Dual Quad Core Xeon 5405 (6Mbyte cache) mounted on a supermicro X7DCL-i motherboard with 8gigs of DDR2 ECC ram memory. The hardware did complete a 48 hours memtest successfully so I'm quite confident it's not MB/RAM/Hardware issue. The BIOS is the latest available (8/18/2008 from Supermicro).

I've seen the bug randomly with both 2.6.24-18-xen and 2.6.24-19-xen versions of the kernel. The process that dies can be anything in Dom0, DomU and seems unrelated to the actual process/module that is executed. It seems somewhat related to the order of processes loaded (in my case the order of domU startup). The 2.6.24-18 kernel seems a little more stable, but this could be a coincidence.

So far I've seen crashing a simple ext2 formatting, various processes in different domU, various processes in dom0. The offending process is somewhat sticky (leading me to believe a memory/hardware issue) but I ruled that out above.

The latest incarnation is a clamd process that won't live longher than few hours without crashing:

19240.984220] BUG: soft lockup - CPU#0 stuck for 11s! [clamd:2976]
[19240.984230]
[19240.984233] Pid: 2976, comm: clamd Not tainted (2.6.24-18-xen #1)
[19240.984237] EIP: 0061:[<c0327677>] EFLAGS: 00000286 CPU: 0
[19240.984245] EIP is at _spin_lock+0x7/0x10
[19240.984248] EAX: c1c2898c EBX: 00000000 ECX: c1c28980 EDX: 00000d88
[19240.984251] ESI: 50425067 EDI: 00000001 EBP: c0477158 ESP: e7f49ef4
[19240.984254] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[19240.984260] CR0: 8005003b CR2: b70b1000 CR3: 28400000 CR4: 00002660
[19240.984264] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[19240.984267] DR6: ffff0ff0 DR7: 00000400
[19240.984271] [<c01759d5>] mprotect_fixup+0x395/0x800
[19240.984284] [<c013bb90>] autoremove_wake_function+0x0/0x40
[19240.984293] [<c0175fcb>] sys_mprotect+0x18b/0x230
[19240.984299] [<c0105832>] syscall_call+0x7/0xb
[19240.984305] [<c0320000>] vcc_def_wakeup+0x30/0x60
[19240.984310] =======================

I'm currently running this particular domU with 2.4.26-21-xen kernel, for testing. Will report if it crashes.

Here's CPUinfo. Might be usefull:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 6
cpu MHz : 1999.999
cache size : 6144 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu de tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up arch_perfmon pebs bts pni monitor ds_cpl vmx tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4004.96
clflush size : 64