Comment 50 for bug 525154

Revision history for this message
David Mathog (mathog) wrote : Re: mountall for /var races with rpc.statd

I think it is finally working, but what a mess.

Observations;

1. An explicit nfs mount like:

  mount /mnt/server/directory

will not work from within rc.local while mountall is still running as a daemon.

2. An explicit

  killall -SIGUSR1 mountall

from within rc.local will be ignored by mountall.

3. An explict

  killall -9 mountall

will kill mountall, but then init goes nuts and throws up one of these

 General error mounting filesystems
 A maintenance shell will now be started
 etc.

It does this even after one has already logged in on the console!

4. An explicit

  killall -SIGUSR1 mountall

from a console session will allow the NFS mount to proceed, but of course
that can only be done by root, and so it isn't a general solution for the end users

5. An explicit

  killall -SIGUSR1 mountall

from an at job will also allow NFS mounts to complete.

6. I have no idea why mountall is hanging around and not starting the NFS connection.
This is not a race condition with statd, as it can be tested and shown to be running while mountall
is still twiddling its metaphorical fingers.

So my "final" solution, until somebody fixes mountall, is
to add to /etc/rc.local

set +e # else any failed grep causes an exit
marunning=`ps -ef | grep mountall | grep -v grep`
logger "MATHOG marunning [ $marunning ]"
set -e
if [ -n "$marunning" ]
then
  logger "MATHOG use at to kill mountall"
  at -f /etc/saf2.sh now
else
  logger "MATHOG ma not running, no kill needed"
fi

Create the at file "saf2.sh" like:

cat >/etc/saf2.sh <<EOD # (name isn't important)
#!/bin/bash
set +e
#allow time for rc.local to exit, this should probably be shorter
sleep 4
marunning=`ps -ef | grep mountall | grep -v grep`
logger "MATHOG2 marunning? [ $marunning ]"
logger "MATHOG2 trying sigusr1 on mountall from at job"
killall -SIGUSR1 mountall
kstat=$?
marunning=`ps -ef | grep mountall | grep -v grep`
logger "MATHOG2 kstat $kstat marunning STILL? [ $marunning ]"
set -e
EOD

When this boots the nfs/statd messages still appear on the console, but it does mount
the NFS volumes within a few seconds, and these messages show up in /var/log/messages:
Aug 25 14:48:07 saf04 logger: MATHOG marunning [ root 400 1 0 14:48 ? 00:00:00 mountall --daemon ]
Aug 25 14:48:07 saf04 logger: MATHOG use at to kill mountall
Aug 25 14:48:10 saf04 logger: MATHOG2 marunning? [ root 400 1 0 14:48 ? 00:00:00 mountall --daemon ]
Aug 25 14:48:10 saf04 logger: MATHOG2 trying sigusr1 on mountall from at job
Aug 25 14:48:10 saf04 logger: MATHOG2 kstat 0 marunning STILL? [ root 400 1 0 14:48 ? 00:00:00 mountall --daemon ]
Aug 25 14:48:10 saf04 kernel: [ 20.345629] RPC: Registered udp transport module.
Aug 25 14:48:10 saf04 kernel: [ 20.345632] RPC: Registered tcp transport module.
Aug 25 14:48:10 saf04 kernel: [ 20.345633] RPC: Registered tcp NFSv4.1 backchannel transport module.

That is, when it glitches mountall is still running in rc.local, and when that is detected the at job is sent
off and that finally triggers the NFS mount. Obviously you can dispense with the MATHOG tags, that is
just in there for my debugging.

Oh yes, there was also a bug in "at", so that it would not accept a job. The solution for that was to uninstall and reinstall
"at" with:

apt-get remove --purge at
apt-get install ubuntu-standard

I wish I could rest easy with this work around, but the man page for mountall says:

       This is a temporary tool until init(8) itself gains the necessary flexibility to perform this processing;
       you should not rely on its behaviour.

Consequently I'm looking forward to all of this falling apart again when they finally get around to that.