Comment 10 for bug 334994

Revision history for this message
Andy Whitcroft (apw) wrote :

I should also note that the kernel is not lying, these file are visibly present in sysfs:

    (initramfs) ls /sys/devices/virtual/block/md0/md
    dev-sda1 safe_mode_delay resync_start raid_disks
    reshape_position new_dev component_size layout
    array_state metadata_version chunk_size level
    (initramfs) ls /sys/devices/virtual/block/md1/md
    dev-sda5 safe_mode_delay resync_start raid_disks
    reshape_position new_dev component_size layout
    array_state metadata_version chunk_size level
    (initramfs)

Note the dev-sda1 in the md0/md directory in sysfs, and the dev-sda5 in the md1/md directory. These are the ones it complains about on insertion:

    [ 35.023792] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [ 35.023794] sysfs: duplicate filename 'dev-sda1' can not be created
    [...]
    [ 35.074528] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [ 35.074529] sysfs: duplicate filename 'dev-sda5' can not be created

Whatever registered this directory seems to have done it properly, it has appropriate links etc internally:
    (initramfs) ls -l /sys/devices/virtual/block/md0/md/dev-sda1
    lrwxrwxrwx 1 0 0 0 block -> ../../../../../pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/sda1
    -rw-r--r-- 1 0 0 4096 size
    -rw-r--r-- 1 0 0 4096 offset
    -rw-r--r-- 1 0 0 4096 slot
    -rw-r--r-- 1 0 0 4096 errors
    -rw-r--r-- 1 0 0 4096 state

Ok so where do these come from. They are made by bind_rdev_to_array() and undone by unbind_rdev_from_array(). From the logs we can see that that basically the kernel is making, unmaking, and remaking the array to degrade it:

    [ 3.371474] md: bind<sda1>
    [ 3.381990] md: bind<sda5>
    [...]
    [ 35.003029] md: md0 stopped.
    [ 35.003043] md: unbind<sda1>
    [ 35.020198] md: export_rdev(sda1)
    [ 35.023745] md: bind<sda1>
    [ 35.023787] ------------[ cut here ]------------
    [ 35.023792] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [ 35.023794] sysfs: duplicate filename 'dev-sda1' can not be created

If we look at the unbind_rdev_from_array() call it uses delayed work to remove the actual entries:

    static void unbind_rdev_from_array(mdk_rdev_t * rdev)
    {
 [...]
        synchronize_rcu();
        INIT_WORK(&rdev->del_work, md_delayed_delete);
        kobject_get(&rdev->kobj);
        schedule_work(&rdev->del_work);
    }

And it appears to be this this is removing the objects finally:

    static void md_delayed_delete(struct work_struct *ws)
    {
        mdk_rdev_t *rdev = container_of(ws, mdk_rdev_t, del_work);
        kobject_del(&rdev->kobj);
        kobject_put(&rdev->kobj);
    }

So if this was not waited for appropriatly we might well then sometimes manage to get back to binding the new one before this has been done. This being a race would also fit with the transient nature of the issue.

Will patch this to wait for the pending work and see if that resolves the issue or not.