statd incompatible with /var as a separate filesystem

Bug #1371564 reported by Marius Gedminas
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

I've a server that stubbornly doesn't mount NFS filesystems on boot. I upgraded it to 14.04 and the problem persists. This time I dug deeper and discovered that the mount upstart jobs are blocked waiting for statd-mounting, which are waiting for statd to come up, but statd is in "stop/waiting" state and doesn't want to come up.

/var/log/syslog shows

    Sep 19 12:21:58 muskatas kernel: [ 9.356501] init: statd main process (1268) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.356511] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.363809] init: statd main process (1272) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.363819] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.370785] init: statd main process (1276) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.370795] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.382239] init: statd main process (1281) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.382250] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.394097] init: statd main process (1285) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.394107] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.400026] init: statd main process (1289) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.400037] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.411247] init: statd main process (1293) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.411258] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.421803] init: statd main process (1297) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.421813] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.429929] init: statd main process (1302) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.429939] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.442795] init: statd main process (1306) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.442805] init: statd main process ended, respawning
    Sep 19 12:21:58 muskatas kernel: [ 9.457698] init: statd main process (1310) terminated with status 1
    Sep 19 12:21:58 muskatas kernel: [ 9.457708] init: statd respawning too fast, stopped

If I 'sudo start statd' after logging in, the statd-mounting jobs are terminated and mountall proceeds to mount the NFS filesystems.

I'm not sure how to debug this further.

Revision history for this message
Steve Langasek (vorlon) wrote :

What are the contents of /var/log/upstart/statd.log?

What are the contents of /etc/default/nfs-common?

Changed in nfs-utils (Ubuntu):
status: New → Incomplete
Revision history for this message
Marius Gedminas (mgedmin) wrote :

Aargh, I keep forgettnig about /var/log/upstart/!

It's not very informative:

    $ sudo cat /var/log/upstart/statd.log
    UPSTART_EVENTS =

Perhaps this is because statd came up fine when I started it manually, some time after boot? I tried rebooting again, but the contents of that file did not change.

    $ cat /etc/default/nfs-common
    # If you do not set values for the NEED_ options, they will be attempted
    # autodetected; this should be sufficient for most people. Valid alternatives
    # for the NEED_ options are "yes" and "no".

    # Do you want to start the statd daemon? It is not needed for NFSv4.
    NEED_STATD=

    # Options for rpc.statd.
    # Should rpc.statd listen on a specific port? This is especially useful
    # when you have a port-based firewall. To use a fixed port, set this
    # this variable to a statd argument like: "--port 4000 --outgoing-port 4001".
    # For more information, see rpc.statd(8) or http://wiki.debian.org/SecuringNFS
    STATDOPTS=

    # Do you want to start the gssd daemon? It is required for Kerberos mounts.
    NEED_GSSD=

By the way it seems that am using NFSv4, which shouldn't need statd at all according to that comment. My /etc/fstab contains

    fridge:/home /home nfs auto,rw,hard,intr,_netdev 0 0
    fridge:/stuff /fridge/stuff nfs auto,rw,hard,intr,_netdev 0 0

but when I 'sudo mount -a' to force NFS mounts, /proc/mounts lists these as 'nfs4'.

Revision history for this message
Marius Gedminas (mgedmin) wrote :

I've just noticed

    init: portmap main process (1198) terminated with status 127
    init: portmap main process ended, respawning
    init: portmap post-start process (1202) terminated with status 1

in /var/log/syslog, about half a second before statd's startup failures. This might explain them.

Again, no indication why portmap would fail. There's no /var/log/upstart/portmap.log. When I ssh in after boot, 'status portmap' shows 'start/running'. /var/log/boot.log says

     * Starting Upstart job to start rpcbind on boot only[ OK ]
     * Starting Upstart job to start portmap on boot only[ OK ]
     * Stopping Upstart job to start rpcbind on boot only[ OK ]
     * Stopping Upstart job to start portmap on boot only[ OK ]
     * Stopping Mount network filesystems[ OK ]
     * Starting Bridge socket events into upstart[ OK ]
     * Starting RPC port mapper[ OK ]
     * Starting RPC portmapper replacement[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[ OK ]
     * Starting NSM status monitor[fail]
     * Stopping NSM status monitor[ OK ]

which indicates no portmap failures, although it's a bit strange to see both 'RPC port mapper' and 'RPC portmapper replacement' both.

(This server is running Ubuntu since 7.10, who knows what obsolete packages/config files might have accumulated during all those upgrades.)

Revision history for this message
Marius Gedminas (mgedmin) wrote :

The portmap failure might be a red herring: the 'portmap' package is not installed. /etc/init/portmap.conf exists because the package was removed (automatically during one of the upgrades) but not purged. I've no idea why Upstart says

    # status portmap
    portmap start/running

since there's no portmap process running. (rpcbind is installed and running.)

Revision history for this message
Marius Gedminas (mgedmin) wrote :

I've purged the remains of the 'portmap' package and rebooted. There are no more mentions of 'portmap' in /var/log/syslog. statd continues to fail in the same fashion.

I'm attaching the entire syslog since last boot, just in case. There's something else I noticed that might be relevant: local filesystems are mounted *after* statd fails to come up.

    Sep 20 12:03:52 muskatas kernel: [ 9.614549] init: statd respawning too fast, stopped
    Sep 20 12:03:52 muskatas kernel: [ 10.204317] EXT4-fs (sdb1): re-mounted. Opts: errors=remount-ro
    Sep 20 12:03:52 muskatas kernel: [ 10.767575] EXT4-fs (sdb6): mounted filesystem with ordered data mode. Opts: (null)
    Sep 20 12:03:52 muskatas kernel: [ 10.781818] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
    Sep 20 12:03:52 muskatas kernel: [ 10.809712] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null)
    Sep 20 12:03:52 muskatas kernel: [ 10.874940] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: (null)
    Sep 20 12:03:52 muskatas kernel: [ 11.036904] systemd-udevd[1441]: failed to execute '/lib/udev/socket:/org/freedesktop/hal/udev_event' 'socket:/org/freedesktop/hal/udev_event': No such file or directory
    Sep 20 12:03:52 muskatas kernel: [ 11.126807] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)
    Sep 20 12:03:52 muskatas kernel: [ 11.552037] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)

sdb1 is root, sdb2 is /usr, sdb3 is /var, dm-1 is /tmp.

Revision history for this message
Marius Gedminas (mgedmin) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

statd should not depend on any filesystems other than the root filesystem and the virtual filesystems. If it does, that's a regression in statd.

I do notice that '$ strings /sbin/rpc.statd |grep /' includes a reference to '/var/lib/nfs', so this could be the problem.

Two different things you could try for debugging:

 - create a /var/lib/nfs directory on your root filesystem (sudo mount -obind / /mnt; sudo mkdir -p /mnt/var/lib/nfs; sudo umount /mnt)
- set STATD_OPTS="-F -d" in /etc/default/nfs-common

The first of these will let you debug whether the mount ordering is the cause of the failure. The second should get you more useful output in /var/log/upstart/statd.log. (But you will need to undo it once you're done debugging, since this will interfere with upstart's service readiness detection for statd.)

Revision history for this message
Steve Langasek (vorlon) wrote :

FWIW I've confirmed locally with strace that statd is looking in /var/lib/nfs (/var/lib/nfs/sm; /var/lib/nfs/state) at start-up. I'm surprised that this is the case; I didn't explicitly test with /var on a separate partition when preparing these upstart jobs, but I think I would've checked for references to /var/lib/nfs in the code. At any rate, this behavior seems to have been in place since at least 12.04.

Given the requirement that this data persist across reboots (in fact, that's more or less the entire purpose of rpc.statd), I don't have a good solution for making this work with a separate /var partition at the moment.

Changed in nfs-utils (Ubuntu):
status: Incomplete → Triaged
summary: - statd fails to come up on boot
+ statd incompatible with /var as a separate filesystem
Revision history for this message
Marius Gedminas (mgedmin) wrote :

For the record, a sufficient workaround for me is to set NEED_STATD=no in /etc/default/nfs-common, since NFSv4 doesn't need statd.

Revision history for this message
Maciej Puzio (maciej-puzio) wrote :

For a system with a separate /var partition, the workaround from year 2010 appears to work:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/525154/comments/8

For the Ubuntu 14.04 this workaround translates to:

--- /etc/init/statd.conf.ORIG 2013-09-11 16:46:50.000000000 -0500
+++ etc/init/statd.conf 2015-06-02 16:24:42.029659358 -0500
@@ -12,7 +12,7 @@
 # TYPE=nfs is handled in the "statd-mounting" job.
 #
 start on (started portmap ON_BOOT=
- or (virtual-filesystems and started portmap ON_BOOT=y))
+ or (virtual-filesystems and mounted MOUNTPOINT=/var and started portmap ON_BOOT=y))
 stop on stopping portmap

 expect fork

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.