system hangs when reading from multiple VT6421 PCI sata cards

Bug #398392 reported by Marcello Romani
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: sata-modules-2.6.28-13-generic-di

System is a celeron 2.4GHz on asus motherboard (lspci.txt attached).

There are three Hamlet PCI sata-1 cards, with chipset VIA VT6421.
Each card has a sata II hard disk connected to the internal port.
Each disk has the "limit to 1.5Gbps" jumper applied.
On the three disks is a software raid created with mdadm, raid level 5.
The raid array starts just fine.
There is an ext3 filesystem on the array, from which I need to recover some deleted files using ext3grep.
After starting the array with mdadm, I launch ext3grep with the --dump-names option.
The procedure reaches the phase where ext3grep prints out several "Searching block xxx" lines.
Suddenly, the computer freezes and nothing but hitting the hw reset button can restore it.

This happens with the cards inserted into PCI slots 2, 3, 4 (couting the one near the AGP slot as the first).
(/proc/interrupts attached)

I tried inserting the three cards in different PCI slots. Interrupt assignments change, and the problem described is even worse. For example, in some configurations just activating the md array makes the system hang. In others the disk detection made during the boot makes the system hang.

If I connect two of the disks to the same controller, reading from both disks at the same time (for example with dd if=/dev/sda of=/dev/zero in one terminal and dd if=/dev/sdb of=/dev/zero in another one) hangs the system immetiately.

I tried to boot with the "nosmp" option. The problem was still there, but the system lasted a bit longer before hanging up.

I tried to stress test the system by launching dd if=/dev/sdx of=/dev/zero, where x=a,b,c, in three different terminals.
The green lights on the three cards where flashing very fast, indicating disk activity.
The system was perfectly usable (i.e. no freeze).
When I activated the eth0 cart with ifconfig eth0 192.168.1.1 up, the system froze.
Note that eth0 shares an interrupt with one sata_via module. (see attached /proc/interrupts file).

I suspect there is a problem with interrupt sharing.

Some informations about my system:

root@marcello-desktop:/home/marcello# uname -a
Linux marcello-desktop 2.6.28-13-generic #45-Ubuntu SMP Tue Jun 30 19:49:51 UTC 2009 i686 GNU/Linux

root@marcello-desktop:/home/marcello# cat /proc/interrupts
           CPU0
  0: 178 XT-PIC-XT timer
  1: 48 XT-PIC-XT i8042
  2: 0 XT-PIC-XT cascade
  3: 2367 XT-PIC-XT sata_via, ehci_hcd:usb1, VIA8237
  4: 1 XT-PIC-XT
  5: 75 XT-PIC-XT sata_via, sata_via, uhci_hcd:usb4, uhci_hcd:usb5
  6: 5 XT-PIC-XT floppy
  7: 1 XT-PIC-XT parport0
  8: 0 XT-PIC-XT rtc0
  9: 0 XT-PIC-XT acpi
 11: 1380 XT-PIC-XT sata_via, uhci_hcd:usb2, uhci_hcd:usb3, eth0, mga@pci:0000:01:00.0
 12: 142 XT-PIC-XT i8042
 14: 7416 XT-PIC-XT pata_via
 15: 6992 XT-PIC-XT pata_via
NMI: 0 Non-maskable interrupts
LOC: 21503 Local timer interrupts
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
SPU: 0 Spurious interrupts
ERR: 0
MIS: 0

Revision history for this message
Marcello Romani (marcello-romani) wrote :
Revision history for this message
Marcello Romani (marcello-romani) wrote :
tags: added: sata-via vt6421
Revision history for this message
Marcello Romani (marcello-romani) wrote :

Interface eth0 was active, I added the sources mail repos to package manager, and I hitted "reload" button in synaptic package manager.
Therefore the system was busy downloading data from ubuntu servers.
The download speed was about 70KB/s.

While this was in progress, I activated the md array involving the three disks connected to the three sata cards.
As soon as the array was assembled and started, the system freezed.

Note that eth0 is sharing an interrupt with a sata_via module.

Revision history for this message
Marcello Romani (marcello-romani) wrote :

I was able to avoid the system freeze by entering the BIOS setup utility and disabling almost all integrated peripherals: usb, eth0, serial and parallel port, game/midi port, audio subsystem.

The PATA subsystem was not disabled, and a 80GB disk (the system disk) was attached to it.

I have since loaded the system with heavy I/O on the three sata disks, and everything is running fine.

As soon as I have the time, I want to repeat the tests by enabling one peripheral at a time to hopefully find the culprit...

Revision history for this message
Marcello Romani (marcello-romani) wrote :

Ok, I think I found the problem.

In the BIOS settings, neither ACPI nor ACPI APIC were enabled.
Enabling ACPI caused the system not to boot, due to bug 11941 I think, so I let it disabled.
I enabled the flag ACPI APIC and did some stress tests... now the system doesn't hang anymore.

The file /proc/interrupts now reads as follows:

           CPU0
  0: 185 IO-APIC-edge timer
  1: 1334 IO-APIC-edge i8042
  8: 0 IO-APIC-edge rtc0
  9: 0 IO-APIC-fasteoi acpi
 12: 36697 IO-APIC-edge i8042
 14: 50498 IO-APIC-edge pata_via
 15: 20757 IO-APIC-edge pata_via
 16: 451867 IO-APIC-fasteoi sata_via, mga@pci:0000:01:00.0
 17: 0 IO-APIC-fasteoi uhci_hcd:usb3
 18: 457336 IO-APIC-fasteoi sata_via, uhci_hcd:usb4
 19: 587316 IO-APIC-fasteoi sata_via, ehci_hcd:usb1
 20: 0 IO-APIC-fasteoi sata_via
 21: 5658 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb5, uhci_hcd:usb6, uhci_hcd:usb7, uhci_hcd:usb8
 22: 2240 IO-APIC-fasteoi VIA8237
 23: 2688052 IO-APIC-fasteoi eth0
NMI: 0 Non-maskable interrupts
LOC: 288082 Local timer interrupts
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
SPU: 0 Spurious interrupts
ERR: 0
MIS: 0

Quite different from the previous one :-)

So in the end the problem seems to be caused by not having enabled the IO-APIC in the BIOS settings.

Revision history for this message
Marcello Romani (marcello-romani) wrote :

Changed status to INVALID because the problem was solved by enabling io-apic.

Changed in linux (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.