Comment 11 for bug 555661

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 555661] Re: sudo service statd start does not return

On Tue, Apr 20, 2010 at 02:10:56PM -0000, Brian J. Murrell wrote:
> > Yes, because upstart has been confused into thinking it's running when
> > it isn't.

> But why should upstart be able to determine that it's not actually
> running and either walk through the stop anyway, or noop it, rather than
> hanging indefinitely in the stop operation? That's the real nature of
> this particular bug.

That's the nature of an existing bug, already filed against the upstart
package. If you want, I can mark this bug as a duplicate of that one; but
that doesn't help with the very real bug in nfs-utils regarding the /var
race condition.

> > There are two conflicting use cases here - one where /var is a separate
> > local filesystem, and one where root is on NFS. When using nfsroot, we
> > *can't* wait for the 'local-filesystems' event, as doing so blocks the
> > root filesystem from ever being set up correctly by mountall.

> Right. I did not advocate waiting for local-filesystem but rather
> waiting for "mounted=/var/lib/nfs" (something that does not currently
> exist afaik, hence my explanation as to how I thought such a thing would
> work).

You can, as a local admin, modify your /etc/init/statd.conf to set 'start on
mounted MOUNTPOINT=/var'. There indeed is not a way to specify this that
will work for arbitrary paths that may or may not actually be mountpoints
(including /var), because we only get 'mounted' events for actual
filesystems. It *would* be nice to be able to be able to specify in the
default jobs that the job waits for a particular path, and I've asked Scott
James Remnant for this in the past - but only in passing and at a much lower
priority than a number of other critical bugs related to mountall this
cycle, because it would require a two-way negotiation for upstart to let
mountall know which particular points in the path it needs to send
notifications for.

So in short, we shouldn't let resolution of this bug block on the
availability of such a feature.

> The nfsroot is a somewhat special case though in that you don't really
> mount the nfsroot as / but typically you mount it somewhere else and
> then pivot to it. I'm not really sure where the mountall runs in all of
> that but I would think after the pivot is done during normal processing
> of a post-/ mount. If /var is expected to be nfs mounted at that point
> also, then I think you are stuck as you say in
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/525154/comments/3. So maybe this use case is simply invalid. I'm really not sure TBH. Although it would be a shame as I can see reasons for such a use case.

mountall runs after init starts. nfsroot may or may not be done using an
initramfs - there's in-kernel support for nfsroot using static IP
configuration. But we can reasonably assume that *if* someone is using
nfsroot, then /var/lib is on the root filesystem, because that's the only
way to make this work even pre-upstart. The problem lies entirely in trying
to express a single job that works both for nfsroot and for non-nfsroot with
/var as a separate partition.

In the case of portmap, there's no trade-off; "start on virtual" is always
correct. In the case of statd, there's a trade-off, and I think breaking
nfsroot is the lesser evil when weighed against breaking /var partitions -
especially since this was already the status quo for nfsroot systems prior
to Apr 17.

So for lucid, I'm still inclined to update the statd job to 'start on
local-filesystems'. Possibly 'start on (local-filesystems and mounting
TYPE=nfs)' - if that doesn't cause NFS mount attempts after the first one to
deadlock in mountall/upstart. I'll have to test this and propose it as an
SRU if it checks out.

> > I think the case where the system unrecoverably hangs on boot (the
> > nfsroot case) has to take precedence here. For your case, you should be
> > able to edit /etc/init/statd.conf as you describe to be 'start on
> > portmap and mounted MOUNTPOINT=/var"; except that this won't stop
> > mountall from trying to mount NFS mounts in parallel at boot time, so if
> > you have such mounts that will fail if statd isn't running, that still
> > doesn't solve your problem.

> Yeah. And what happens in the case where /var is not separate but on /?
> Will that mounted MOUNTPOINT=/var cause the statd init job to not run
> because there is no /var to be mounted? If so, we have not really
> solved the problem in a universal and generic method.

Correct.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>