Comment 8 for bug 555661

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote : Re: [Bug 555661] Re: sudo service statd start does not return

On Mon, 2010-04-19 at 06:24 +0000, Steve Langasek wrote:
> Does statd log anything to syslog in this case?

No. Per my last message, there is no running statd to log anything.
But "service statd stop" still hangs.

> "start/running" indicates that upstart believes it got as far as running
> the 'script' process.

Right.

> So we'll need to debug why this process isn't
> running after boot.

OK. But I think there is a lower layer issue at play here. Why should
"service statd stop" hang, especially if the statd process it not even
running. Hrm. But now (after a few more stop/start attempts) it does
seem to acknowledge that it's stopped:

# status statd
statd stop/killed, process 1380
# service statd stop
stop: Job has already been stopped: statd

So let's try to start it:

# service statd start
[ hangs, until I give up and ^C ]
^C
# status statd
statd start/killed, process 1380

My /etc/init/statd.conf:

# statd - NSM status monitor

description "NSM status monitor"
author "Steve Langasek <email address hidden>"

start on (started portmap or mount TYPE=nfs)
stop on stopping portmap

expect fork
respawn

env DEFAULTFILE=/etc/default/nfs-common

pre-start script
 exec 2>/tmp/statd-pre.conf.debug.$$
 set -x
 if [ -f "$DEFAULTFILE" ]; then
     . "$DEFAULTFILE"
 fi

 [ "x$NEED_STATD" != xno ] || { stop; exit 0; }

 start portmap || true
 status portmap | logger -p local0.debug -t NFS
 status portmap | grep -q start/running # || start portmap
 exec sm-notify
end script

script
 exec 2>/tmp/statd.conf.debug.$$
 set -x

 if [ -f "$DEFAULTFILE" ]; then
     . "$DEFAULTFILE"
 fi

 if [ "x$NEED_STATD" != xno ]; then
  ls -l /var/lib/nfs >&2 || true
  exec rpc.statd -L $STATDOPTS
 fi
end script

Notice the exec/set -x at the start of the pre-start and script
sections. That's an effort to debug this. However, even after many
attempted start/stop operations since this machine has been booted, not
a single /tmp/statd{.-pre}.debug.$$ file has appeared, beyond the two
that were written at boot time:

-rw-r--r-- 1 root root 233 2010-04-17 19:09 /tmp/statd-pre.conf.debug.1252
-rw-r--r-- 1 root root 304 2010-04-17 19:09 /tmp/statd-pre.conf.debug.1345
-rw-r--r-- 1 root root 291 2010-04-17 19:09 /tmp/statd.conf.debug.1379

So it seems clear to me that the start/stop operations are not even
getting to executing the pre-start and scripts.

The content of those files (i.e. what init tried to do at boot time):

# cat /tmp/statd-pre.conf.debug.1252
+ [ -f /etc/default/nfs-common ]
+ . /etc/default/nfs-common
+ NEED_STATD=
+ STATDOPTS=
+ NEED_IDMAPD=no
+ NEED_GSSD=no
+ [ x != xno ]
+ start portmap
+ status portmap
+ logger -p local0.debug -t NFS
/dev/fd/10: 1: logger: not found

# cat /tmp/statd-pre.conf.debug.1345
+ [ -f /etc/default/nfs-common ]
+ . /etc/default/nfs-common
+ NEED_STATD=
+ STATDOPTS=
+ NEED_IDMAPD=no
+ NEED_GSSD=no
+ [ x != xno ]
+ start portmap
start: Job is already running: portmap
+ true
+ status portmap
+ logger -p local0.debug -t NFS
+ status portmap
+ grep -q start/running
+ exec sm-notify

# cat /tmp/statd.conf.debug.1379
+ [ -f /etc/default/nfs-common ]
+ . /etc/default/nfs-common
+ NEED_STATD=
+ STATDOPTS=
+ NEED_IDMAPD=no
+ NEED_GSSD=no
+ [ x != xno ]
+ ls -l /var/lib/nfs
ls: cannot access /var/lib/nfs: No such file or directory
+ true
+ exec rpc.statd -L
statd: Could not chdir: No such file or directory

What's in that last script is important and probably the root cause as
far as the boot time failure and that's that /var is on it's own
filesystem. I have another bug open about this issue.

(But this other bug only addresses the boot time failure and not the
issue at the root of this bug which is the inability to stop/start statd
even after the system is booted and fully running).

Iirc, the bug about statd needing /var/lib/nfs to be mounted was
answered pretty with a "well, there is no real solution to this problem"
type answer. But I have been giving that some thought.

Using this case as an example for a more general solution...

Having statd start on (started portmap or mount TYPE=nfs) is not good
enough. It has to wait for /var to be mounted. But does that mean it
should just add a dependency on (local?) filesystems being mounted?
That sounds too course to me.

As I understand it, currently there are several very course signals
emitted for "filesystems mounted". Probably one for "local" filesystems
as well as "remote" filesystems.

But why distinguish those and why have signals that are so course? Why
not emit signals for the availability (i.e is mounted) of various parts
of the overall filesystem tree.

mountall could analyze /etc/fstab and determine the various mount points
in the overall namespace and emit signals when desired portions of the
filesystem namespace are available.

So back to our current example, for this particular problem, we can tell
init (in /etc/init/statd.conf with an additional requirement in the
start on of "mounted=/var/lib/nfs") that stat needs /var/lib/nfs to be
available. Now, mountall, after having looked at the fstab would know
that that will happen when /var is mounted and emit that signal
"mounted: /var/lib/nfs" when /var is mounted.

If mountall had seen in /etc/fstab that /var/lib was it's own
filesystem, of course it would mount that before emitting
"mounted: /var/lib", etc.

This does mean that mountall/upstart/init/etc. needs to look at all of
its consumers and find out what portions of the filesystem they are
waiting for, but given the fine granularity that upstart is working at
here, I think that is necessary.