Comment 12 for bug 334994

Revision history for this message
Andy Whitcroft (apw) wrote :

Added some debugging to the teardown code and managed to reproduce this. What we find is that we unbind and then attempt and fail a bind on the array, then we see the deletes for the unbind complete. This leads to the bind failure:

    [ 3.476504] md: bind<sda1>
    [...]
    [ 35.097882] md: md0 stopped.
    [ 35.097897] md: unbind<sda1>
    [ 35.097907] APW: sysfs_remove_link ret<0>
    [ 35.110198] md: export_rdev(sda1)
    [ 35.113254] md: bind<sda1>
    [ 35.113297] ------------[ cut here ]------------
    [ 35.113300] WARNING: at /home/apw/build/jaunty/ubuntu-jaunty/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [...]
    [ 35.115126] APW: deleted something

Here where we happened to mount successfully, note the delete falls in
the expected place:

    [ 3.479917] md: bind<sda5>
    [...]
    [ 35.118235] md: md1 stopped.
    [ 35.118240] md: unbind<sda5>
    [ 35.118244] APW: sysfs_remove_link ret<0>
    [ 35.140164] md: export_rdev(sda5)
    [ 35.142276] APW: deleted something
    [ 35.143848] md: bind<sda1>
    [ 35.152288] md: bind<sda5>
    [ 35.158571] raid1: raid set md1 active with 1 out of 2 mirrors

If we look at the code for stopping the array we see the following:

    static int do_md_stop(mddev_t * mddev, int mode, int is_open)
    {
    [...]
      rdev_for_each(rdev, tmp, mddev)
       if (rdev->raid_disk >= 0) {
        char nm[20];
        sprintf(nm, "rd%d", rdev->raid_disk);
        sysfs_remove_link(&mddev->kobj, nm);
       }

      /* make sure all md_delayed_delete calls have finished */
      flush_scheduled_work();

      export_array(mddev);
    [...]

Note that we flush_scheduled_work() to wait for md_delayed_deletes and then
export the array. However it is export_array() which triggers these
deletes:

    static void export_array(mddev_t *mddev)
    {
    [...]
     rdev_for_each(rdev, tmp, mddev) {
      if (!rdev->mddev) {
       MD_BUG();
       continue;
      }
      kick_rdev_from_array(rdev);
     }
    [...]
    }

It does this via unbind_rdev_from_array():

    static void kick_rdev_from_array(mdk_rdev_t * rdev)
    {
     unbind_rdev_from_array(rdev);
     export_rdev(rdev);
    }

Which triggers the delated delete:

    static void unbind_rdev_from_array(mdk_rdev_t * rdev)
    {
    [...]
     rdev->sysfs_state = NULL;
     /* We need to delay this, otherwise we can deadlock when
      * writing to 'remove' to "dev/state". We also need
      * to delay it due to rcu usage.
      */
     synchronize_rcu();
     INIT_WORK(&rdev->del_work, md_delayed_delete);
     kobject_get(&rdev->kobj);
     schedule_work(&rdev->del_work);
    }

So in reality we do not want to wait for this before the export_array()
but after. Testing with a patch to do this seems to resolve the issue.