Comment 144 for bug 438136

Revision history for this message
In , Martin Pitt (pitti) wrote :

The bigger problem of this is (as you already mentioned) that the raw value is misparsed way too often. Random examples from bug reports:

  http://launchpadlibrarian.net/34574037/smartctl.txt
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 327697

  http://launchpadlibrarian.net/35971054/smartctl_tests.log
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65542

  http://launchpadlibrarian.net/36599746/smartctl_tests-deer.log
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65552

  https://bugzilla.redhat.com/attachment.cgi?id=382378
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 655424

  https://bugzilla.redhat.com/show_bug.cgi?id=506254
reallocated-sector-count 100/100/ 5 FAIL 1900724 sectors Prefail
Online

It seems that "no officially accepted spec about SMART attribute decoding" also hits here in the sense of that way too many drives get the raw counts wrong. In all the 30 or so logs that I looked at in the various Launchpad/RedHat/fd.o bug reports related to this I didn't see an implausible value of the normalized values, though.

I appreciate the effort of doing vendor independent bad blocks checking, but a lot of people get tons of false alarms due to that, and thus won't believe it any more if there is really a disk failing some day.

My feeling is that a more cautious approach would be to use the normalized value vs. treshold for the time being, and use the raw values if/when that can be made more reliable (then we should use something in between logarithmic and linear, though, since due to sheer probabilities, large disks will have more bad sectors and also more reserve sectors than small ones).