Impact: mdadm, Raid5 get stuck in uninterruptable sleep under heavy I/O
load. Copying data to a Raid 5 XFS partition results in a permanent lock
on several processes related to it, getting stuck in the D(+) state.
Occurs when large quantities of data (10-40 GB) is copied, resulting in
processes being unkillable, and the system cannot reboot and requires
power cycling the server.
Fix: The patch from commit 6ed3003c19a96fe18edf8179c4be6fe14abbebbc. The
fix is to not make any generic_make_request() calls in raid5
make_request until all waiting has been done. We do this by simply
setting STRIPE_HANDLE instead of calling handle_stripe(). This causes a
performance hit, so this patch also only calls raid5_activate_delayed()
at unplug time, never in raid5. This seems to bring back the
performance numbers. [quoting the commit message]
Testing: Without the patch, Raid 5 using md on an XFS filesystem locks
up under heavy data copying - this is repeatable. With the patch, the
lock up does not occur.
Patch tested from my PPA build by Andrew Cholakian (see previous message)
SRU justification:
Impact: mdadm, Raid5 get stuck in uninterruptable sleep under heavy I/O
load. Copying data to a Raid 5 XFS partition results in a permanent lock
on several processes related to it, getting stuck in the D(+) state.
Occurs when large quantities of data (10-40 GB) is copied, resulting in
processes being unkillable, and the system cannot reboot and requires
power cycling the server.
Fix: The patch from commit 6ed3003c19a96fe 18edf8179c4be6f e14abbebbc. The make_request( ) calls in raid5 delayed( )
fix is to not make any generic_
make_request until all waiting has been done. We do this by simply
setting STRIPE_HANDLE instead of calling handle_stripe(). This causes a
performance hit, so this patch also only calls raid5_activate_
at unplug time, never in raid5. This seems to bring back the
performance numbers. [quoting the commit message]
Testing: Without the patch, Raid 5 using md on an XFS filesystem locks
up under heavy data copying - this is repeatable. With the patch, the
lock up does not occur.
Patch tested from my PPA build by Andrew Cholakian (see previous message)