Comment 21 for bug 297058

Revision history for this message
Wolm (torben-wolm) wrote :

So. It just happened again. My server crashed. This time I am sure it
has nothing to do with the USB drive I had since it is no longer attached.

It seems to be some unfortunate timing of a kernel(?) problem and
heavy disk use.

I just suddenly get these messages in the log:

Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready (errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting link
Oct 23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH completeOct 23 00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write Protect is off
Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Which just repeat themselves until about 01:19 and then it goes quiet until a final logging at
7:54 where the server finally crashes (just stops to respond to network requests, keyboard a.s.o.)

I just checked the kern.log, which has a lot of entries of:

Oct 23 00:54:12 matrix kernel: [14573754.220270] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 23 00:56:13 matrix kernel: [14573754.220348] ata1.00: cmd ca/00:50:14:9f:8d/00:00:00:00:00/e1 tag 0 dma 40960 out
Oct 23 00:56:13 matrix kernel: [14573754.220352] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 23 00:56:13 matrix kernel: [14573754.220465] ata1.00: status: { DRDY }
Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready (errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting linkOct 23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH complete
Oct 23 00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write Protect is off
Oct 23 00:56:13 matrix kernel: [14573765.126598] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

This adds some more info about an exception?

Searching for theses entries, gives a lot of people reporting the same problem:

And probably a solution: http://ubuntuforums.org/showthread.php?t=1145513
(The guy on that post wonders why there hasn't been many reports on this issue...)

Also:
https://bugzilla.redhat.com/show_bug.cgi?id=462425
https://bugzilla.redhat.com/show_bug.cgi?id=404851
http://lkml.org/lkml/2008/11/9/22
http://forums.fedoraforum.org/showthread.php?t=219746

I'm running kernel 2.6.27-11-server. Someone suggest to run kernel-rt instead:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/279693 (comment #23)

I haven't tried that. I will try to see if a kernel 2.6.27-14 is available or eventually try the -rt
suggestion.

It seems it is possible to crash the system by doing a "ls -lR /". Not what I expect from a Linux system...

Kind regards
Torben