system hangs when reading from multiple VT6421 PCI sata cards
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: sata-modules-
System is a celeron 2.4GHz on asus motherboard (lspci.txt attached).
There are three Hamlet PCI sata-1 cards, with chipset VIA VT6421.
Each card has a sata II hard disk connected to the internal port.
Each disk has the "limit to 1.5Gbps" jumper applied.
On the three disks is a software raid created with mdadm, raid level 5.
The raid array starts just fine.
There is an ext3 filesystem on the array, from which I need to recover some deleted files using ext3grep.
After starting the array with mdadm, I launch ext3grep with the --dump-names option.
The procedure reaches the phase where ext3grep prints out several "Searching block xxx" lines.
Suddenly, the computer freezes and nothing but hitting the hw reset button can restore it.
This happens with the cards inserted into PCI slots 2, 3, 4 (couting the one near the AGP slot as the first).
(/proc/interrupts attached)
I tried inserting the three cards in different PCI slots. Interrupt assignments change, and the problem described is even worse. For example, in some configurations just activating the md array makes the system hang. In others the disk detection made during the boot makes the system hang.
If I connect two of the disks to the same controller, reading from both disks at the same time (for example with dd if=/dev/sda of=/dev/zero in one terminal and dd if=/dev/sdb of=/dev/zero in another one) hangs the system immetiately.
I tried to boot with the "nosmp" option. The problem was still there, but the system lasted a bit longer before hanging up.
I tried to stress test the system by launching dd if=/dev/sdx of=/dev/zero, where x=a,b,c, in three different terminals.
The green lights on the three cards where flashing very fast, indicating disk activity.
The system was perfectly usable (i.e. no freeze).
When I activated the eth0 cart with ifconfig eth0 192.168.1.1 up, the system froze.
Note that eth0 shares an interrupt with one sata_via module. (see attached /proc/interrupts file).
I suspect there is a problem with interrupt sharing.
Some informations about my system:
root@marcello-
Linux marcello-desktop 2.6.28-13-generic #45-Ubuntu SMP Tue Jun 30 19:49:51 UTC 2009 i686 GNU/Linux
root@marcello-
CPU0
0: 178 XT-PIC-XT timer
1: 48 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
3: 2367 XT-PIC-XT sata_via, ehci_hcd:usb1, VIA8237
4: 1 XT-PIC-XT
5: 75 XT-PIC-XT sata_via, sata_via, uhci_hcd:usb4, uhci_hcd:usb5
6: 5 XT-PIC-XT floppy
7: 1 XT-PIC-XT parport0
8: 0 XT-PIC-XT rtc0
9: 0 XT-PIC-XT acpi
11: 1380 XT-PIC-XT sata_via, uhci_hcd:usb2, uhci_hcd:usb3, eth0, mga@pci:
12: 142 XT-PIC-XT i8042
14: 7416 XT-PIC-XT pata_via
15: 6992 XT-PIC-XT pata_via
NMI: 0 Non-maskable interrupts
LOC: 21503 Local timer interrupts
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
SPU: 0 Spurious interrupts
ERR: 0
MIS: 0
Interface eth0 was active, I added the sources mail repos to package manager, and I hitted "reload" button in synaptic package manager.
Therefore the system was busy downloading data from ubuntu servers.
The download speed was about 70KB/s.
While this was in progress, I activated the md array involving the three disks connected to the three sata cards.
As soon as the array was assembled and started, the system freezed.
Note that eth0 is sharing an interrupt with a sata_via module.