Comment 34 for bug 297058

Revision history for this message
chuck-dtol (chuck-colford) wrote :

My apologies in advance if I'm violating any protocols - I'm a bit of a newbie on Linux, but a long-time geek. I may have some info that is useful. If not - you may disregard.

I've been running Ubuntu on my server since version 6. I switched to Ubuntu on all my 3 desktop systems last fall and upgraded to 9.10. Key for me was success with VirtualBox so I could migrate over with a few of my old windows Apps intact. All worked well. About a week before Christmas (2009), I had a hard disk corruption on my root ext3 partition. It was a real pain, but I had full backups and did a restore. I became paranoid and learned how to read and watch the logs. My faith in Linux was somewhat shaken. It didn't seem to be a hard drive failure.

Alas - I had another corruption a week ago. In my logs - I had the dreaded pattern discussed above. My sample (edited for brevity):

kernel: warning: `VirtualBox' uses 32-bit capabilities (legacy support in use)
kernel: device eth0 entered promiscuous mode
kernel: ata1: EH in SWNCQ mode,QC:qc_active 0xFFFC0 sactive 0xFFFC0
kernel: ata1: SWNCQ:qc_active 0x40 defer_bits 0xFFF80 last_issue_tag 0x6
kernel: dhfis 0x40 dmafis 0x0 sdbfis 0x0
kernel: ata1: ATA_REG 0x51 ERR_REG 0x4
kernel: ata1: tag : dhfis dmafis sdbfis sacitve
kernel: ata1: tag 0x6: 1 0 0 1
kernel: ata1.00: exception Emask 0x1 SAct 0xfffc0 SErr 0x0 action 0x6 frozen
kernel: ata1.00: Ata error. fis:0x41
kernel: ata1.00: cmd 61/08:30:5f:4e:77/00:00:1a:00:00/40 tag 6 ncq 4096 out
kernel: res 51/04:08:5f:4e:77/04:00:1a:00:00/40 Emask 0x1 (device error)
kernel: ata1.00: status: { DRDY ERR }
kernel: ata1.00: error: { ABRT }
...
kernel: ata1.00: status: { DRDY }
kernel: ata1: hard resetting link
kernel: ata1: nv: skipping hardreset on occupied port
kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
kernel: ata1.00: configured for UDMA/133
kernel: ata1: EH complete

I have been experimenting to get to the bottom of this so I can trust my filesystem. I have tried 3 different SATA drives, different SATA cables, a different PSU and even a different plugin SATA controller. I still see these SATA link errors on my dual Core AMD 64 bit system.

What I have found (on my system) is a strong correlation between these SATA errors and the use of VirtualBox (I'm using VirtualBox version PUEL v3.1.2). My logs are quiet for many days - until I start doing moderate IO on my Win32 XP Guest OS. Then I see SATA errors on my Linux (Dual Core AMD 64 bit) host. I've seen this on Kernels 2.6.31-17-generic x86_64 and back to 2.6.31-14.

By any chance - are those of you with this problem running VirtualBox and noticing this host SATA problem occurs or gets aggravated when you are doing file IO in the guest? If so - you are not alone. Again - sorry if this info is unhelpful.