[LUCID-NATTY] ata errors { DRDY ERR } { ABRT }

Bug #591532 reported by Montblanc
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Unassigned
Nominated for Lucid by Montblanc

Bug Description

I finally decided to do a Kubuntu 10.04 LTS fresh install, so I started it from LiveCD (no errors found) with normal kernel parameters. Everything is fine, KDE starts up and I can mount my disks. But before installing, I wanted to check dmesg just to be sure and there I found some errors like these:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x08000000
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:01:66:f4/00:00:1b:00:00/e0 tag 0 dma 4096 in
ata1.00: status: { DRDY ERR }
ata1.00: error: { ABRT }
ata1.00: configured for UDMA/100
ata1: EH complete

which did NOT show up in my Karmic system (using kernel 2.6.31-22-generic-pae as I'm typing). I'm also affected by bug #228302 from Feisty (which, since Intrepid, I'm used to work around with the "pata_ali.atapi_dma=1" solution), so I thought this was some kind of libata regression. I googled around a bit and tried starting up my LiveCD with many parameters such as "noapic acpi=off pata_ali.atapi_dma=1" but I still get the errors, no matter what. As far as I used the LiveCD session, I noticed nothing unusual, but I'm too scared that after the installation my system could freeze due to interrupt losses, leading to data loss, so I'm still on Karmic right now and sincerely wish to know what's behind these errors, before knowing if there's any fix.

I'm attaching my dmesg output from my Karmic installation, as well as the concerning Lucid ones.

Revision history for this message
Montblanc (montblanc) wrote :
Revision history for this message
Montblanc (montblanc) wrote :
Revision history for this message
Montblanc (montblanc) wrote :
Revision history for this message
Montblanc (montblanc) wrote :
Revision history for this message
Montblanc (montblanc) wrote :
Revision history for this message
Montblanc (montblanc) wrote :
Revision history for this message
Montblanc (montblanc) wrote :

It's easily reproducible on my hardware, just insert a Lucid LiveCD and look at dmesg output, thus I put this bug status to Confirmed. And one more thing: S.M.A.R.T. status is ok, I get this errors just on Lucid!

Changed in linux (Ubuntu):
status: New → Confirmed
assignee: nobody → Ubuntu Kernel Team (ubuntu-kernel-team)
Changed in linux (Ubuntu):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → nobody
Montblanc (montblanc)
Changed in linux (Ubuntu):
status: Confirmed → New
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

removed deprecated team assignment.

~JFo

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Montblanc,
    When you say SMART status is ok, you mean that you have it disabled, yes? We have known issues with SMART being enabled.

Thanks!

~JFo

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Montblanc (montblanc) wrote :

Jeremy,
SMART is enabled in the bios. I ran a `smartctl -H /dev/sdX` and it PASSED. Also, when I turn back to karmic with 2.6.31 branch, I don't get that messages! Should I try disabling SMART in the BIOS?

Revision history for this message
Montblanc (montblanc) wrote :

I disabled SMART monitoring for both my drives from the bios, but I get the same error messages. I also tried switching from emulated PATA mode to AHCI mode, but nothing changed. What's puzzling me is that
failed command: READ DMA EXT
but some users on IRC say they've got similar errors and they notice nothing unusual.
Could it be that Lucid kernel is just showing to me what previous kernel haven't been doing?
I would just like to know if these error messages are harmless, so I can proceed installing Lucid.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → Medium
tags: added: kernel-needs-review kernel-uncat
tags: added: kernel-reviewed
removed: kernel-needs-review
tags: added: kernel-core
removed: kernel-uncat
Revision history for this message
Jan Eringa (j-eringa) wrote :

I'm seeing this as well.
Initially I thought it was a faulty SATA card & so replaced it.
But the new card is doing exactly the same thing.

Revision history for this message
zac.hanson.thurn (zac-jessandzac) wrote :

After upgrading my Acer Aspire One netbook to 10.04 I started seeing this issue also. I also started having disk corruption issues (I am unsure whether these things may be related).

Revision history for this message
Richard Dawe (richdawe) wrote :

I installed a backport 2.6.35 from Maverick to Lucid, and that seems to have resolved this errors for me:

 rich@theroux:~$ uname -a
 Linux theroux 2.6.35-19-generic #25~lucid1-Ubuntu SMP Wed Aug 25 04:24:28 UTC 2010 i686 GNU/Linux

I found the backport details at <http://www.ubuntuupdates.org/ppa/kernel-ppa?dist=lucid>

I ran these commands, to install the backport:

 sudo add-apt-repository ppa:kernel-ppa/ppa
 sudo aptitude update
 sudo aptitude install linux-image-generic-lts-backport-maverick linux-headers-generic-lts-backport-maverick

The graphical splash screen doesn't seem to work too well on my netbook with 2.6.35, so I disabled it in the GRUB2 config -- see <http://richdawe.livejournal.com/7368.html>.

Revision history for this message
Jeff (jesterr) wrote :

Confirmed, issue was resolved for me too by installing a later kernel:

$ uname -a
Linux serv 2.6.35-20-generic #29~lucid1-Ubuntu SMP Tue Sep 7 13:28:30 UTC 2010 i686 GNU/Linux

Revision history for this message
Montblanc (montblanc) wrote :

I'm running 2.6.35-20-generic-pae on lucid too, but I keep getting these errors. Though, I have to say that I've never noticed anything different, I guess the kernel is just showing me something I always had: issues with my Uli chipset about DMA. I can't blame Ubuntu for this, since my environment is up and running as normal for months since upgrading to linux 2.6.32.
Since it was solved for everyone but me, I would mark this bug as Fix Released, but I'd better wait for a developer to take care of it.

Revision history for this message
DJ_DEF (dj-def) wrote :

I too have that error; it's not the first time, I noticed a similar one even some months ago, but that was due to my sata cable that was bad connected. Now with ubuntu 10.10 it's well connected and I often see this error.

Revision history for this message
Gerry Reno (greno-verizon) wrote :

I've got this same DRDY ERR occurring on one of my machines, and once it started it continues to show this error.

I did a drive test on the drives and no errors were found. So this is definitely something with the kernel.

Revision history for this message
jonie (jonie) wrote :

Even though READ_DMA_EXT error would suggest unreadable sector I observed this on a drive that's got a bunch of reallocated sectors (just slow to read), no pending or uncorrectable. This portion of the disk passes the most powerful ReadWriteReadCompare test in HDAT2 but if a read operation to such sector occurs under one of the newer kernels the drive immediately spins down, READ_DMA_EXT error is written to the SMART log and the drive is not avaliable in BIOS until next cold start.

Revision history for this message
Montblanc (montblanc) wrote :

Can someone kindly nominate this bug for Maverick and Natty?

summary: - ata errors { DRDY ERR } { ABRT } in Lucid
+ [LUCID-NATTY] ata errors { DRDY ERR } { ABRT }
Revision history for this message
Montblanc (montblanc) wrote :

I'm sorry, I've just read jonie's reply.

@jonie So, according to you it's just a matter of reading time in reallocated sectors? I don't know how to look for them, can you point it out to me, please? It makes sense, though, since the system is definitely usable.

One more thing I noticed is that after buying another SATA drive I'm now having either `failed command READ DMA` or `failed command READ DMA EXT` (might be unrelated), but just on ata4.00 (one of the old drives), the new one and the other one show no errors, so I'm thinking about a power supply issue (too many drives attached to one cable?)

Revision history for this message
Mike Homer (homerhomer) wrote :

I had a similar issue and this seems to have fixed it ( fingers crossed )

https://ata.wiki.kernel.org/index.php/Pata_sch

Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.