Comment 1 for bug 643289

Revision history for this message
Steve Langasek (vorlon) wrote :

Building on Clint Byrum's work on bug #525154, I'm much closer now to understanding a possible solution for this issue, but it's going to require some coordination. Details:

- the current idmapd job starts on 'local-filesystems or mounting TYPE=nfs4' because it needs to start whenever an nfs4 filesystem is mounted and it also needs to wait until /usr and /var/lib are available before starting up (/usr because idmapd is located in /usr/sbin; /var/lib because it uses /var/lib/nfs/rpc_pipefs). The only way to wait for /usr and /var/lib is by waiting for 'local-filesystems'; it's *possible* that one or both of these filesystems is not local, but that's a local configuration error anyway.
- the start condition used here is buggy. If local-filesystems is emitted first, idmapd will proceed to start up without blocking any further 'mounting' hooks. If 'mounting TYPE=nfs4' is emitted first, there is no way to make the job wait for the local-filesystems signal to be received, which can cause the job to try to start before the filesystem is usable and wind up in an inconsistent state when idmapd aborts.
- using jobs in the style of portmap-wait and statd-mounting, it is possible to construct a set of jobs that will only start idmapd on local-filesystems, and *also* block any nfs4 mounts until idmapd is started.
- unfortunately, it appears that mountall itself blocks on the result of the 'mounting' hook before doing any further processing of *any* mount points, with the result that, if 'local-filesystems' has not already been emitted at the time it tries to mount the first nfs4 filesystem, we end up in a deadlock: the 'mounting' hook is waiting for idmapd to start; idmapd is waiting for local-filesystems to be emitted; and mountall is waiting for the 'mounting' hook to return before going on to do any other mounts.

I see three possible solutions here.

1. Change mountall to be able to do other work while waiting for the 'mounting' hook to return. Conceptually I don't see any reason this isn't possible, so it should just be a matter of code reordering.
2. Change mountall to special case nfs4 mounts so that they are never handled until after local-filesystems is emitted. Yuck for the special-casing, though conceptually not actually different from what we're trying to achieve through the nfs-common upstart jobs.
3. Move idmapd and its dependencies (libevent; libnfsidmap, /usr/lib/libnfsidmap/) to the root filesystem (/sbin, /lib) and move /var/lib/nfs/rpc_pipefs to /var/run/nfs/rpc_pipefs. The latter may be correct in its own right (I'm pretty sure there's nothing on this in-kernel mount point that would count as 'persistent state'); the former doesn't even cover all cases unless we also move the kerberos+ldap stack to /lib, due to /usr/lib/libnfsidmap/umich_ldap.so.

I believe option 1 is the most straightforward to SRU and is correct per se, although parts of 3 are probably worth pursuing in their own right as part of an overall effort to improve FHS compliance.