Comment 58 for bug 525154

Revision history for this message
Dave Martin (dave-martin-arm) wrote : Re: mountall for /var races with rpc.statd

particularly @Scott, @Steve

Since I've now hit this bug again, I had another read over the bug thread...

Here are some thoughts... which may be substantially wrong, but hey.

There feels like a disconnect here between the true preconditions for some jobs, and the kind of preconditions specifyable for upstart jobs in general.

A fundamental problem, if I understand the situation correctly, is that we have cases where the events (things happening dynamically during boot) are not adequate to determine whether/when upstart should consider a job startable, at least at the level of the simple boolean combinations etc. that upstart currently understands.

The startability of some jobs depends on other factors (in this case, static system configuration which the administrator expects to customise --- the fstab). If upstart is conservative and waits until _everything_ is mounted, we will fail in some cases, for example when there are NFS mounts in fstab. Alternatively, if upstart is aggressive and tries to start the statd job as soon as it is _probably_ startable, then it might fail to start, and there's not much we can do about it -- that seems to be the current behaviour.

This is a problem because upstart doesn't currently have any sensible metholodogy for retrying failed jobs. So, we either need a way to retry jobs at sensible times, or we would need a more expressive way to determine when jobs should be started.

Conservative approach
==================
The "conservative" approach would be this approximation (which seems work for me):

    start on (filesystem and (started portmap or mounting TYPE=nfs))

...because "filesystem" really does mean the whole FHS tree has been mounted, and that the contents of /var are real (not just a stub mountpoint). This won't work for anyone who uses NFS for a mountpoint within the FHS (even if it's not /var and not otherwise needed for launching statd) and probably won't work if an NFS filesystem is listed in /etc/fstab (?) - but shouldn't cause extra problems when using nfsroot since the kernel's internal statd is used in that case (I think?)

Better approach?
==============
Ideally, we could write something like:

    start on (mounted-final MOUNTPOINT=/var) and (started portmap or mounting TYPE=nfs)

Where "mounted-final MOUNTPOINT=<path>" means that all necessary mounts have been done to populate <path> with its "real" FHS contents, and the boot process won't mount anything else on top.

This could be implemented in a practical way in mountall if we don't attempt to make it universal--- i.e., we don't ensure that it works for every possible <path>, but we do make it work for top-level directories defined by the FHS. To emit these events, mountall's must parse the whole fstab and then act appropriately on each mount:

  * When <path> is mounted:
      * emit mounted MOUNTPOINT=<path>
      * for d in {each FHS top-level dir}:
            if no explicit mount for d or a parent of d in fstab:
                emit mounted MOUNTPOINT=<d>

General approach
==============

The above feels a bit messy and fragile, and doesn't solve the general problem of configuration-dependent job start preconditions. So, it might be better to implement outside mountall, by extending upstart with some extra flexibility for job start conditions. For example:

    start on $eval(mounted-final /var) and (started portmap or mounting TYPE=nfs)

...where $eval(<command> <arguments>) is some magic new upstart event expression syntax which runs an arbitrary command or script and uses its output as part of the event expression. [I'm not suggesting exactly that syntax of course -- I admit it's pretty hideous ;P]

In our case, "mounted-final <path>" is some widget which returns the event expression "mounted MOUNTPOINT=<x>", where <x> is the deepest path listed in fstab that is a parent of, or is equal to, <path>. This is pretty easy to script up. So it really depends on whether upstart can/should be extended to support this kind of thing.

"Retry" approach
==============
Finally, it might be interesting to consider whether it makes sense to define specific retry conditions for jobs. This makes allows us to do better at retries than dumb polling. I don't remember exactly the pattern-matching capabilities for event key values, but I can imagine something like:

retry on (mounted MOUNTPOINT=/var) or (mounted MOUNTPOINT=/var/*)
retry on (mounted MOUNTPOINT=/var(/.*)?) # if regex is supported?

It still feels a bit wrong though... statd may spuriously succeed to start if the rootfs contains /var/lib/nfs, but the real /var is subsquently mounted on top of it (maybe after statd was started).

If a retry feature is added, it would be wise to limit the maximum number of retries (as for respawn) or the maximum time period over which retries will be attempted.

Thoughts?