Sata disk not identified during install (Ati sb600)

Bug #75055 reported by Bordiga Giacomo
6
Affects Status Importance Assigned to Milestone
linux-source-2.6.17 (Ubuntu)
Won't Fix
Undecided
Unassigned
linux-source-2.6.20 (Ubuntu)
Fix Released
Undecided
Ben Collins

Bug Description

Binary package hint: linux-image-2.6.17-10-generic

Sata disk not identified during install boot.

Ubuntu Edgy x86-64
Motherboard: Asus M2R32-MVP (Latest bios)
  Sata controller: Ati sb600

The kernel sees that the disk is plugged but fails to identify it.
I tried every bios sata setup: Native IDE, Legacy IDE, AHCI.
I tried also the feisty beta, but i get kernel panic (MCFG area is not e820-reserved). I overcame the panic with acpi=off. The disk however is not detected.

I tried also OpenSuse 10.2 (kernel 2.6.18) and the disk is not found.

Archlinux (0.7.2) instead is working (kernel 2.6.17 i think) and finds the disk.

Revision history for this message
Bordiga Giacomo (gbordiga) wrote :

The kernel output is the the same as this:

[17179583.244000] ata1: SATA max UDMA/133 cmd 0xF883E100 ctl 0x0 bmdma 0x0 irq 209
[17179583.244000] ata2: SATA max UDMA/133 cmd 0xF883E180 ctl 0x0 bmdma 0x0 irq 209
[17179583.244000] ata3: SATA max UDMA/133 cmd 0xF883E200 ctl 0x0 bmdma 0x0 irq 209
[17179583.244000] ata4: SATA max UDMA/133 cmd 0xF883E280 ctl 0x0 bmdma 0x0 irq 209
[17179583.628000] ata1: SATA link up 3.0 Gbps (SStatus 113)
[17179583.628000] unexpected IRQ trap at vector a8
[17179613.628000] ata1: qc timeout (cmd 0xec)
[17179613.628000] ata1: dev 0 failed to IDENTIFY (I/O error)

There is a thread on the Ubuntu forum on the same problem.
Someone suggested the kernel parameter acpi=force irqpoll.

However the solution is not good. I get a lot of:
...
[17179583.628000] unexpected IRQ trap at vector a8
[17179583.628000] unexpected IRQ trap at vector a8
[17179583.628000] unexpected IRQ trap at vector a8
...
and something like: the drive seems confused

I just looked a the archlinux kernel config file and i found that they have msi disabled. I started ubuntu with pci=nomsi.
And the problem disappears.

Revision history for this message
Bordiga Giacomo (gbordiga) wrote :
Revision history for this message
Bordiga Giacomo (gbordiga) wrote :

I just upgraded to feisty (2.6.20). I'm getting the same problem. Added 2.6.20 as affected.

Revision history for this message
Ben Collins (ben-collins) wrote :

This seriously sounds like a BIOS bug. Try pci=noacpi or acpi=noirq.

Changed in linux-source-2.6.20:
assignee: nobody → ben-collins
status: Unconfirmed → Needs Info
Revision history for this message
Bordiga Giacomo (gbordiga) wrote :

With 2.6.20 kernel I'm suffering of another bug (probably unrelated) https://launchpad.net/bugs/86169 that causes a kernel panic. Currently I'm booting with the option acpi=off (that resolves the panic) and pci=nomsi (to get the disk recognized). I tried both pci=noacpi and acpi=noirq with and without acpi=off. If i omit acpi=off i get a kernel panic with both options. If i use acpi=off with both options the disk is not recognized as described above.

Linux studio-ubuntu 2.6.20-8-generic #2 SMP Tue Feb 13 01:14:41 UTC 2007 x86_64 GNU/Linux

Attached probably useful files

Revision history for this message
Bordiga Giacomo (gbordiga) wrote :
Revision history for this message
Bordiga Giacomo (gbordiga) wrote :

I already contacted Asus for support on this problem and i got in response that my motherboard is not linux certified and so they won't assure that a BIOS update will resolve my problem.

My current bios version is v0804 (2007/01/26).

Revision history for this message
Kyle McMartin (kyle) wrote :

Try booting with pci=nommconf to deal with the MMCONFIG problem. Between that and pci=nomsi, you should be set for booting with acpi on.

Cheers,
 Kyle

Revision history for this message
Bryce Harrington (bryce) wrote :

I've run into a somewhat similar bug with herd-5. I was able to boot the livecd and install with only an IDE attached, but when I hooked up a secondary SATA drive, the Live CD would panic during boot and the already installed system would have various IRQ-related problems, boot up very slowly, and fail to recognize both the SATA drive and the network card.

Using the irqpoll option had no effect.

The fix for me was to turn *on* RAID in the BIOS. No idea why that fixed it, but all the problems disappeared.

    product: Dimension XPS Gen 2
    vendor: Dell Computer Corporation
    BIOS:
         version: A02 (10/20/2003)
          capabilities: pci pnp apm upgrade shadowing escd cdboot bootselect edd int13floppytoshiba int5printscreen int9keyboard int14serial int17printer acpi usb agp ls120boot biosbootspecification netboot

         *-storage
             description: RAID bus controller
             product: 82801ER (ICH5R) SATA Controller
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@00:1f.2
             logical name: scsi0
             logical name: scsi1
             version: 02
             width: 32 bits
             clock: 66MHz
             capabilities: storage bus_master emulated scsi-host
             configuration: driver=ata_piix latency=0
             resources: ioport:fe00-fe07 ioport:fe10-fe13 ioport:fe20-fe27 iopor
t:fe30-fe33 ioport:fea0-feaf irq:18
           *-disk
                description: SCSI Disk
                product: Maxtor 6Y160M0
                vendor: ATA
                physical id: 0.0.0
                bus info: scsi@0:0.0.0
                logical name: /dev/sda
                version: YAR5
                serial: Y48ZFXGE
                size: 152GB
                capabilities: partitioned partitioned:dos
                configuration: ansiversion=5

Revision history for this message
Bordiga Giacomo (gbordiga) wrote :

This is my situation before 2.6.20-11.

With pci=nomsi and acpi=off no problems

Without pci=nomsi the disk is not recognised

Without acpi=off kernel panic.

pci=nommconf has no benefit

After the upgrade to 2.6.20-11 i can boot, with no problems, without the pci=nomsi option (still remains the kernel panic with no acpi=off).

It looks like the bug got fixed.

Changed in linux-source-2.6.20:
status: Needs Info → Fix Released
Revision history for this message
Gord (alyceh) wrote :
Download full text (3.7 KiB)

I am not an Ubuntu user, rather a Debian (unstable) user. However, since Debian has been of absolutely no use in fixing this problem and Ubuntu has, I might as well comment here.

I too have a M2R32-MVP motherboard. Installing Linux was largely a combination of the latest Debian netinst CD and Kanotix-2006-preview. However, any attempt to actually boot off the new installation was frustrated. Most combinations of boot options resulted in the boot process hanging just after all the SCSI disk partitions were listed. A couple of combinations of boot options seem to result in a successful boot, but because of a plethora of warnings about some unbound interrupt, any text console was not useful.

As I understand the hardware, we should be running with the BIOS set to AHCI (not legacy, native or anything else). Some have mentioned a BIOS setting about capturing Int 19, I don't believe that this setting will noticably effect things (but I could be wrong). It is probably that should be enabled, so that the various BIOSes in a computer can boot properly. ACPI should probably be set to 2.0: 1.0 doesn't appear to be enough support (I have a flaky 21 inch CAD monitor sensitive to this). I haven't actually run across any documentation as to what the different versions of ACPI are supposed to do, and actually provide.

In the patch notes for 2.6.17 (kernel.org?), there is a note about PCI to the effect: fix issues with extended conf space when MMCONFIG disabled because of e820. It is my opinion that either this patch is buggy, or it doesn't go far enough. This MMCONFIG/e820 business was the first thing I noticed in trying to boot either 2.6.18 or 2.6.20 kernels.

This particular motherboard has 2 SATA controllers, the listed ATI SB600 attached to this bug report, and a JMicron 360 for the external SATA connector.

In terms of this unbound interrupt making the text consoles unusable, what will stop this from happening is to set loglevel=4. (loglevel=5 isn't sufficient.) This doesn't fix the problem, it just makes the console usable. As long as this problem persists, any dmesg output is likely unusable as it is just full of this unbound interrupt.

I gather that if you see a message about MMCONFIG not being e820 reserved in booting, you need to set pci=nommconf as a boot option. What seems to be required in order to actually get the boot to work in a crippled manner, is to also set nomsi (so pci=nommconf,nomsi). This isn't a good solution, as MSI seems to be needed for PCI Express cards (or at least some of them) to work properly. The near term, better solution is to track down what MSI (and MSI-X) devices actually work properly on this motherboard (which I am starting to do), and only enable MSI on those devices with another boot option (device_msi=a,b,c where a, b, c are 16 digit hex numbers describing the deviceID and vendorID).

A common boot option in discussions on this problem is irqpoll. I gather irqpoll is the bigger hammer with respect to broken hardware compared to irqfixup. I haven't seen any evidence that it actually helps in this problem, nor that it hurts if set. I am not sure of any performance implications of either of the...

Read more...

Revision history for this message
Gord (alyceh) wrote :

Back for more commentary ....

I can't imagine any ordinary Linux user actually reading that MSI-HOWTO, it has little relevant to fixing boot/installation problems, and it doesn't appear current. Pitty.

If a person runs 'lspci -vvx | less' (pipe output of lspci -vvx into the less program), and then search for "Message Signalled Interrupts", you will find all the PCI devices on your system which should support MSI/MSI-X. Unfortunately, the stanzas of output are sufficiently long that you need to page up (hence you can't use more or pg for paging) in order to find the PCI device (which is of the form NN:NN.N).

On the Asus M2R32-MPV, the devices are: 00:02.0, 00:12.0, 00:13.[012345], 00:14.1, 00:14.2. On my machine, I have an additional device (video of 01:00.0).

In order to get the device and vendor codes for use in a boot parameter of device_msi=a,b,c, I found that the hwinfo program is "useful". There is far more output in hwinfo that just this, so you have to search for it. Piping into less, and having it look for the PCI device numbers is probably the easiest thing to do.

00:02.0 is the RS 480 PCI-X device/function with 0x5a341002

00:12.0 is the SATA part of the ATI SB-600 with 0x43801002

00:13.[012345] is the bank of 6 USB device/functions with:
0 -> 0x43871002
1 -> 0x43881002
2 -> 0x43891002
3 -> 0x438a1002
4 -> 0x438b1002
5 -> 0x43861002

00:14.1 is the IDE part of the ATI SB-600 with 0x438c1002

00:14.2 is the Azalia (sound) part of the ATI SB-600 with 0x43831002

The numbering of devices seems a bit odd to me. 0x4384 and 0x4385 seem to be missing, and 0x4386 is placed oddly.

In quite a few discussions of problems in booting/installing, the Local APIC gets mentioned occasionally. I gather this is LOCAL to a chip or function? Anyway, MSI seems to require the presence of a LOCAL APIC. Hence, you can't mix the nolapic parameter with other options in trying to get a system working.

Revision history for this message
Gord (alyceh) wrote :

Either those device_msi boot parameters were removed, or they never made it in. Hopefully the continuing changes to newer kernels is going to result in some nice set of boot parameters working.

Revision history for this message
Bhuvan (bpasham) wrote :

There is a problem with ATI SB600 SATA drivers on AMD64 (Atleast I had) if you are using 4GB+ RAM. I had the same problem of drives not being recognized and I found a detailed explanation of the the issue @ http://<email address hidden>/msg06310.html (follow the whole thread)
This problem is still there in 7.04 Fiesty AMD64. Reduce the RAM to less than 4GB (Remove 2GB) and install. Once the installation is complete add the kernel parameter 'mem = 4095MB' in the GRUB configuration. This worked for me.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Everyone,

The 18 month support period for Edgy Eft 6.10 has reached it's end of life. As a result, we are closing the linux-source-2.6.17 Edgy Eft kernel task. However, Bordiga (the original bug reporter) commented that this issue was resolved with the 2.6.20 kernel. For those still experiencing issues, Hardy Heron 8.04 was recently released. It would be helpful if you could test the new release and verify if this is still an issue - http://www.ubuntu.com/getubuntu/download . You should be able to test your bug using the Hardy LiveCD. If the issue still exists, please open a new bug report against the Hardy kernel (ie "linux" source package). Thanks.

Changed in linux-source-2.6.17:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.