Comment 5 for bug 579572

Revision history for this message
Paul Parisi (pparisi) wrote :

Same problem here on two Dell R300 units (configured with three disks in a RAID 5). We had three machines (one R610 and two R300's) with virtually identical ubuntu 64bit software configurations and when they rebooted one day, due to a power outage the R300's never came back up but the R610 (running a newer, better SAS controller) recovered fine. Our Dell R300's have subsequently been removed from production due to this issue and are being replaced with HP units.

Discussed with dell and they says its the poor quality SAS controller in the lower cost hardware, like the R410 and R300.
They do something dodgy with the controllers to get the cost down.

Basically my understanding of the problem is this:
1. System starts boot process
2. Grub is loaded via BIOS routines (no problems)
3. Ubuntu kernal is bootscrapped into memory via BIOS routines and then run (no problems)
4. Ubuntu kernal loads and uses its the proper RAID driver to continue accessing the drives, however as it now has switched from BIOS routines to actual driver the RIAD is not ready for access, and hence everything dies.

According to dell CentOS probably caters for this issue in their driver. Ubuntu will need to do the same to cater for this fault in the Dell hardware.

The root delay trick works for us too, however we don't know what the root delay value should be to be confident with it for a production environment. We note the RAID slowed down over a period of a couple months, so a root delay of 120 might not be enough in the future... Just don't know enough about the poor quality dell gear to be sure. Dell had nothing further to offer on this, again usual comments about unsupported OS.

Anyway, the fix, it one can be found, needs to be tested on the dodgy SAS controllers installed in the R300's and other low cost series. Let me know if you need further technical details from our machines and I will post them up.