Comment 144 for bug 441941

Revision history for this message
Colin Watson (cjwatson) wrote :

It would be very helpful to me if people affected by this bug could follow the directions from Felix Zielcke in comment 4, and attach the results here. This is much more useful than describing your hardware or explaining again how serious the bug is! :-)

The one person to provide this information did so in comment 6. Thank you. His information indicates that offsets 0x4A00 to 0x4BFF are overwritten. If other people's results are consistent with that (and that's a big if, which is why we NEED the data requested in comment 4), then the problem is NOT that GRUB 2 is doing something dramatically different from how GRUB Legacy did it, or putting its information in a different place. The problem is:

  GRUB 2's core image is more useful than GRUB Legacy's Stage 1.5, and is thus bigger.

This puts us in a very difficult position. The enhancements are real, and we're using them in Ubuntu; they allow us to do such things as making the boot loader reliable on multiple-disk systems. If you were affected by that unreliability then this is a pretty big deal! Thus, simply going back to GRUB 2 is not an option for us. The relevant change in BURG (which, by the way, is not a total rewrite of GRUB 2, but a single-developer fork which sees relatively little development effort compared to GRUB 2) is that it provides a --alt option to grub-install and grub-setup which forces the use of blocklists: in plain English, that means that rather than putting the core image in the gap between the master boot record and the first partition, it instead remembers the list of blocks where /boot/grub/core.img currently lives on disk. That will become unbootable in a different situation, namely when your filesystem decides to move /boot/grub/core.img around for some reason (filesystem recovery, defragmentation, performance optimisations, etc.). Now, presumably those people advocating BURG aren't as badly affected by that, which is fair enough, but I hope people see why I hold the position that it's not obvious that it's worth exchanging one problem for the other.

At this point, I don't want to make any change without more hard data on exactly what these Windows tools are doing. There seem to be several tools involved. Perhaps they're all doing different things, or perhaps they have something in common. Maybe we can spot particular signatures and work around them somehow, e.g. by skipping the relevant sectors, for instance. But I can't do anything until I know exactly what's going on. So, at the risk of repeating myself again - if you're affected by this bug, please provide the information requested by Felix Zielcke in comment 4, following the example set in comment 6. Thank you.