Comment 64 for bug 550559

Revision history for this message
Lars (lars-taeuber) wrote :

Hi,

here is the report about my test with the SAS-HBA.
The problem seems the same the symptoms change.
The result is as bad as before.

For me it is very important to get this tracked down.

SAS-HBA: areca ARC-1300ix-16
module: mvsas

test:
mdadm -C /dev/md3 -l6 -n8 /dev/sdc-h] missing missing
(the two missing hdds prevents this raid from initial sync)

[ 112.537311] md: bind<sdc>
[ 112.565624] md: bind<sdd>
[ 112.592516] md: bind<sde>
[ 112.623219] md: bind<sdf>
[ 112.649941] md: bind<sdg>
[ 112.678637] md: bind<sdh>
[ 112.743525] raid5: device sdh operational as raid disk 5
[ 112.743532] raid5: device sdg operational as raid disk 4
[ 112.743538] raid5: device sdf operational as raid disk 3
[ 112.743543] raid5: device sde operational as raid disk 2
[ 112.743547] raid5: device sdd operational as raid disk 1
[ 112.743552] raid5: device sdc operational as raid disk 0
[ 112.744821] raid5: allocated 8490kB for md3
[ 112.744939] 5: w=1 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744946] 4: w=2 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744953] 3: w=3 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744958] 2: w=4 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744964] 1: w=5 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744969] 0: w=6 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744975] raid5: raid level 6 set md3 active with 6 out of 8 devices, algorithm 2
[ 112.766420] RAID5 conf printout:
[ 112.766427] --- rd:8 wd:6
[ 112.766434] disk 0, o:1, dev:sdc
[ 112.766441] disk 1, o:1, dev:sdd
[ 112.766448] disk 2, o:1, dev:sde
[ 112.766454] disk 3, o:1, dev:sdf
[ 112.766461] disk 4, o:1, dev:sdg
[ 112.766468] disk 5, o:1, dev:sdh
[ 112.766554] md3: detected capacity change from 0 to 12002393063424
[ 112.766852] md3: unknown partition table

everything is just fine till yet.

Now produce high io-load:
mke2fs -j /dev/md3

[ 190.981812] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981821] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981848] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981855] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981885] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981890] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981916] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981922] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981946] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981952] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981974] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981980] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980143] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 221.980152] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980176] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 221.980182] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980209] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 221.980215] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980241] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 221.980247] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980269] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 221.980275] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980296] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 221.980302] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 252.980157] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 252.980166] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[...]
[ 345.980304] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 345.980310] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 345.980343] sd 6:0:1:0: [sdd] Unhandled error code
[ 345.980347] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 345.980355] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5c 10 00 00 68 00
[ 345.980373] end_request: I/O error, dev sdd, sector 220176
[ 345.998441] raid5:md3: read error not correctable (sector 220176 on sdd).
[ 345.998453] raid5: Disk failure on sdd, disabling device.
[ 345.998456] raid5: Operation continuing on 5 devices.
[ 346.018076] raid5:md3: read error not correctable (sector 220184 on sdd).
[ 346.018081] raid5:md3: read error not correctable (sector 220192 on sdd).
[ 346.018087] raid5:md3: read error not correctable (sector 220200 on sdd).
[ 346.018092] raid5:md3: read error not correctable (sector 220208 on sdd).
[ 346.018097] raid5:md3: read error not correctable (sector 220216 on sdd).
[ 346.018102] raid5:md3: read error not correctable (sector 220224 on sdd).
[ 346.018107] raid5:md3: read error not correctable (sector 220232 on sdd).
[ 346.018112] raid5:md3: read error not correctable (sector 220240 on sdd).
[ 346.018117] raid5:md3: read error not correctable (sector 220248 on sdd).
[ 346.018137] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.018141] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.018148] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5c 00 00 00 10 00
[ 346.018190] end_request: I/O error, dev sdd, sector 220160[ 346.027529] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.027535] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.027547] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5b e8 00 00 18 00
[ 346.027592] end_request: I/O error, dev sdd, sector 220136
[ 346.036922] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.036926] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.036933] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5b 80 00 00 68 00
[ 346.036949] end_request: I/O error, dev sdd, sector 220032
[ 346.046250] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.046257] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.046269] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5b 00 00 00 80 00
[ 346.046314] end_request: I/O error, dev sdd, sector 219904
[ 346.055640] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.055644] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.055651] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5a 80 00 00 08 00
[ 346.055666] end_request: I/O error, dev sdd, sector 219776
[ 346.064960] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.064965] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.064977] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5a b0 00 00 50 00
[ 346.065021] end_request: I/O error, dev sdd, sector 219824
[ 346.074336] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.074340] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.074347] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 03 5a 88 00 00 28 00
[ 346.074362] end_request: I/O error, dev sdd, sector 219784
[ 346.083634] sd 6:0:1:0: [sdd] Unhandled error code
[ 346.083641] sd 6:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.083654] sd 6:0:1:0: [sdd] CDB: Read(10): 28 00 00 02 ad 38 00 00 48 00
[ 346.083698] end_request: I/O error, dev sdd, sector 175416
[ 346.092958] sd 6:0:5:0: [sdh] Unhandled error code
[ 346.092962] sd 6:0:5:0: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.092968] sd 6:0:5:0: [sdh] CDB: Read(10): 28 00 00 03 5c 10 00 00 68 00
[ 346.092984] end_request: I/O error, dev sdh, sector 220176
[ 346.101765] raid5: Disk failure on sdh, disabling device.
[ 346.101767] raid5: Operation continuing on 4 devices.
[...]
[ 346.207013] sd 6:0:0:0: [sdc] Unhandled error code
[ 346.207017] sd 6:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 346.207022] sd 6:0:0:0: [sdc] CDB: Read(10): 28 00 00 02 aa 80 00 00 08 00
[ 346.207032] end_request: I/O error, dev sdc, sector 174720
[ 361.830087] INFO: task flush-9:3:1557 blocked for more than 120 seconds.
[ 361.844397] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 361.873254] flush-9:3 D 0000000000000000 0 1557 2 0x00000000
[ 361.873267] ffff8800c056d620 0000000000000046 0000000000015dc0 0000000000015dc0
[ 361.873289] ffff88011b06b1d0 ffff8800c056dfd8 0000000000015dc0 ffff88011b06ae00
[ 361.873309] 0000000000015dc0 ffff8800c056dfd8 0000000000015dc0 ffff88011b06b1d0
[ 361.873329] Call Trace:
[ 361.873358] [<ffffffffa00da882>] get_active_stripe+0x312/0x3f0 [raid456]
[ 361.873377] [<ffffffff8105ded0>] ? default_wake_function+0x0/0x20
[ 361.873393] [<ffffffffa00de403>] make_request+0x243/0x4b0 [raid456]
[ 361.873409] [<ffffffff81085ff0>] ? autoremove_wake_function+0x0/0x40
[ 361.873425] [<ffffffff8142b6b0>] md_make_request+0xc0/0x130
[...]

So it doesn't look like a libata problem.
The system seems to fast for the HDDs.
I'd like to track this down.

Does someone has an idea what I should test next?
This problem ist independent from (exchanged everything one after another):

- mainboard (different generations of opterons)
- controller (SAS and SATA)
- PSU (2way or 3 way reduntant and a normal singel one)

Thanks
Lars