Ubuntu
linux package

Bug #334994
Comment #12

Comment 12 for bug 334994

Revision history for this message

Andy Whitcroft (apw) wrote on 2009-03-27:

#12

Added some debugging to the teardown code and managed to reproduce this. What we find is that we unbind and then attempt and fail a bind on the array, then we see the deletes for the unbind complete. This leads to the bind failure:

    [ 3.476504] md: bind<sda1>
    [...]
    [ 35.097882] md: md0 stopped.
    [ 35.097897] md: unbind<sda1>
    [ 35.097907] APW: sysfs_remove_link ret<0>
    [ 35.110198] md: export_rdev(sda1)
    [ 35.113254] md: bind<sda1>
    [ 35.113297] ------------[ cut here ]------------
    [ 35.113300] WARNING: at /home/apw/build/jaunty/ubuntu-jaunty/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [...]
    [ 35.115126] APW: deleted something

Here where we happened to mount successfully, note the delete falls in
the expected place:

    [ 3.479917] md: bind<sda5>
    [...]
    [ 35.118235] md: md1 stopped.
    [ 35.118240] md: unbind<sda5>
    [ 35.118244] APW: sysfs_remove_link ret<0>
    [ 35.140164] md: export_rdev(sda5)
    [ 35.142276] APW: deleted something
    [ 35.143848] md: bind<sda1>
    [ 35.152288] md: bind<sda5>
    [ 35.158571] raid1: raid set md1 active with 1 out of 2 mirrors

If we look at the code for stopping the array we see the following:

    static int do_md_stop(mddev_t * mddev, int mode, int is_open)
    {
    [...]
      rdev_for_each(rdev, tmp, mddev)
       if (rdev->raid_disk >= 0) {
        char nm[20];
        sprintf(nm, "rd%d", rdev->raid_disk);
        sysfs_remove_link(&mddev->kobj, nm);
       }

/* make sure all md_delayed_delete calls have finished */
flush_scheduled_work();

export_array(mddev);
[...]

Note that we flush_scheduled_work() to wait for md_delayed_deletes and then
export the array. However it is export_array() which triggers these
deletes:

    static void export_array(mddev_t *mddev)
    {
    [...]
     rdev_for_each(rdev, tmp, mddev) {
      if (!rdev->mddev) {
       MD_BUG();
       continue;
      }
      kick_rdev_from_array(rdev);
     }
    [...]
    }

It does this via unbind_rdev_from_array():

    static void kick_rdev_from_array(mdk_rdev_t * rdev)
    {
     unbind_rdev_from_array(rdev);
     export_rdev(rdev);
    }

Which triggers the delated delete:

    static void unbind_rdev_from_array(mdk_rdev_t * rdev)
    {
    [...]
     rdev->sysfs_state = NULL;
     /* We need to delay this, otherwise we can deadlock when
      * writing to 'remove' to "dev/state". We also need
      * to delay it due to rcu usage.
      */
     synchronize_rcu();
     INIT_WORK(&rdev->del_work, md_delayed_delete);
     kobject_get(&rdev->kobj);
     schedule_work(&rdev->del_work);
    }

So in reality we do not want to wait for this before the export_array()
but after. Testing with a patch to do this seems to resolve the issue.

Added some debugging to the teardown code and managed to reproduce this.  What we find is that we unbind and then attempt and fail a bind on the array, then we see the deletes for the unbind complete.  This leads to the bind failure:

[    3.476504] md: bind<sda1>
    [...]
    [   35.097882] md: md0 stopped.
    [   35.097897] md: unbind<sda1>
    [   35.097907] APW: sysfs_remove_link ret<0>
    [   35.110198] md: export_rdev(sda1)
    [   35.113254] md: bind<sda1>
    [   35.113297] ------------[ cut here ]------------
    [   35.113300] WARNING: at /home/apw/build/jaunty/ubuntu-jaunty/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [...]
    [   35.115126] APW: deleted something

Here where we happened to mount successfully, note the delete falls in
the expected place:

[    3.479917] md: bind<sda5>
    [...]
    [   35.118235] md: md1 stopped.
    [   35.118240] md: unbind<sda5>
    [   35.118244] APW: sysfs_remove_link ret<0>
    [   35.140164] md: export_rdev(sda5)
    [   35.142276] APW: deleted something
    [   35.143848] md: bind<sda1>
    [   35.152288] md: bind<sda5>
    [   35.158571] raid1: raid set md1 active with 1 out of 2 mirrors

If we look at the code for stopping the array we see the following:

static int do_md_stop(mddev_t * mddev, int mode, int is_open)
    {
    [...]
		    rdev_for_each(rdev, tmp, mddev)
			    if (rdev->raid_disk >= 0) {
				    char nm[20];
				    sprintf(nm, "rd%d", rdev->raid_disk);
				    sysfs_remove_link(&mddev->kobj, nm);
			    }

/* make sure all md_delayed_delete calls have finished */
		    flush_scheduled_work();

export_array(mddev);
    [...]

Note that we flush_scheduled_work() to wait for md_delayed_deletes and then
export the array.  However it is export_array() which triggers these
deletes:

static void export_array(mddev_t *mddev)
    {
    [...]
	    rdev_for_each(rdev, tmp, mddev) {
		    if (!rdev->mddev) {
			    MD_BUG();
			    continue;
		    }
		    kick_rdev_from_array(rdev);
	    }
    [...]
    }

It does this via unbind_rdev_from_array():

static void kick_rdev_from_array(mdk_rdev_t * rdev)
    {
	    unbind_rdev_from_array(rdev);
	    export_rdev(rdev);
    }

Which triggers the delated delete:

static void unbind_rdev_from_array(mdk_rdev_t * rdev)
    {
    [...]
	    rdev->sysfs_state = NULL;
	    /* We need to delay this, otherwise we can deadlock when
	     * writing to 'remove' to "dev/state".  We also need
	     * to delay it due to rcu usage.
	     */
	    synchronize_rcu();
	    INIT_WORK(&rdev->del_work, md_delayed_delete);
	    kobject_get(&rdev->kobj);
	    schedule_work(&rdev->del_work);
    }

So in reality we do not want to wait for this before the export_array()
but after.  Testing with a patch to do this seems to resolve the issue.

Ubuntulinux package

Comment 12 for bug 334994

Ubuntu
linux package