Comment 83 for bug 569900

Revision history for this message
Colin Watson (cjwatson) wrote : Re: mount: mounting /dev/md0 on /root/ failed: Invalid argument

I finally got back to this bug and figured out what was going on. It took a while ...

A few people suggested that what was happening was that the partitioner was creating partitions that extended beyond the end of the disk. That wasn't actually quite right if you looked at the logs in detail and did the arithmetic; they were entirely within the disk, just extending onto the last (incomplete) cylinder, and there's nothing wrong with that in itself. However, there were log messages indicating that the md layer in the kernel thought that an md device was overflowing the disk, and this pointed me in the right direction.

When I tried to fix this bug before, I observed that what was happening was that mdadm was getting confused between /dev/sda and /dev/sda1 (or whatever the last partition happened to be). Since the 0.90 metadata format stores the superblock at the end of the device, there's obvious potential for confusion between a partition extending all the way to the end of the disk and the disk device itself. I fixed this, or so I thought, by constraining the installer's partitioner to never use the last sector of the disk. This fixed the problem in my tests.

Unfortunately, I apparently didn't quite do enough research on exactly what was happening. When I came back to this bug, I read the md(4) manual page, and found this:

  The common format - known as version 0.90 - has a superblock that is 4K long
  and is written into a 64K aligned block that starts at least 64K and less
  than 128K from the end of the device (i.e. to get the address of the superblock round the size of the device down to a multiple of 64K and then subtract 64K).

(The 1.0 superblock format is similar, but is never more than 12K from the end of the device, so a fix for 0.90 will fix 1.0 too. 1.1 and 1.2 store their superblocks at or near the start of the device, and do not suffer from this problem.)

So, if you do the mathematics based on partman's current constraints, the result is that Ubuntu will currently get this wrong for any disk whose size is an exact multiple of 1048576 bytes plus any number between 512 and 65536. The 500GB disks common among commenters on this bug report are, according to the logs, 500107862016 bytes long, which is 476940 * 1048576 + 24576. I could never reproduce this in KVM before because my habit is to create disk images which are an exact number of megabytes (I usually just say '10G' or thereabouts), and such an image would never encounter this bug thanks to my previous attempted fix of avoiding the last sector.

The proper fix, then, is for partman to round the disk size down to 64K, subtract one further sector, and avoid any sectors after that.