sudo service statd start does not return

Bug #555661 reported by Brian J. Murrell
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

When I try to start rpc.statd on my Karmic machine, it simply hangs, indefinitely (i.e. until I issue a ^C (sigint)):

$ sudo status statd
statd stop/killed, process 1427
$ sudo service statd stop
stop: Job has already been stopped: statd
$ sudo service statd start
[ hangs forever ]

ProblemType: Bug
Architecture: i386
Date: Mon Apr 5 07:25:41 2010
DistroRelease: Ubuntu 9.10
NonfreeKernelModules: nvidia
Package: nfs-common 1:1.2.0-2ubuntu8
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-19.56-generic
SourcePackage: nfs-utils
Uname: Linux 2.6.31-19-generic i686

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :
Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :

I should add, that starting it manually works just fine:

# rpc.statd -L
# mount /home/brian
# df -h /home/brian
Filesystem Size Used Avail Use% Mounted on
pc:/home/brian 31G 26G 4.0G 87% /autohome/brian

Revision history for this message
Steve Langasek (vorlon) wrote :

> $ sudo status statd
> statd stop/killed, process 1427

What is process 1427?

Changed in nfs-utils (Ubuntu):
status: New → Incomplete
Revision history for this message
Brian J. Murrell (brian-interlinx) wrote : Re: [Bug 555661] Re: sudo service statd start does not return

On Mon, 2010-04-05 at 23:19 +0000, Steve Langasek wrote:
> > $ sudo status statd
> > statd stop/killed, process 1427
>
> What is process 1427?

Nothing:

# ps -p1427
  PID TTY TIME CMD

Revision history for this message
Steve Langasek (vorlon) wrote :

Well, upstart is wedged then. You'll need to reboot to reset it. I can't say if this is an nfs-common or upstart bug without a reproducible test case.

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :

OK. This Karmic machine rebooted (due to power outage, unfortunately) and has done it again. No rpc.statd was running after it was booted and when I check it:

# status statd
statd start/running, process 1380
# ps -p1380 fw
  PID TTY STAT TIME COMMAND

So, upstart... you are a lying bastard. Well, let's see if we can fix you...

# service statd stop
[ command hangs forever... or at least the few tens of minutes I waited for it before i ^C'd it ]

Revision history for this message
Steve Langasek (vorlon) wrote :

Does statd log anything to syslog in this case?

"start/running" indicates that upstart believes it got as far as running the 'script' process. So we'll need to debug why this process isn't running after boot.

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :
Download full text (5.5 KiB)

On Mon, 2010-04-19 at 06:24 +0000, Steve Langasek wrote:
> Does statd log anything to syslog in this case?

No. Per my last message, there is no running statd to log anything.
But "service statd stop" still hangs.

> "start/running" indicates that upstart believes it got as far as running
> the 'script' process.

Right.

> So we'll need to debug why this process isn't
> running after boot.

OK. But I think there is a lower layer issue at play here. Why should
"service statd stop" hang, especially if the statd process it not even
running. Hrm. But now (after a few more stop/start attempts) it does
seem to acknowledge that it's stopped:

# status statd
statd stop/killed, process 1380
# service statd stop
stop: Job has already been stopped: statd

So let's try to start it:

# service statd start
[ hangs, until I give up and ^C ]
^C
# status statd
statd start/killed, process 1380

My /etc/init/statd.conf:

# statd - NSM status monitor

description "NSM status monitor"
author "Steve Langasek <email address hidden>"

start on (started portmap or mount TYPE=nfs)
stop on stopping portmap

expect fork
respawn

env DEFAULTFILE=/etc/default/nfs-common

pre-start script
 exec 2>/tmp/statd-pre.conf.debug.$$
 set -x
 if [ -f "$DEFAULTFILE" ]; then
     . "$DEFAULTFILE"
 fi

 [ "x$NEED_STATD" != xno ] || { stop; exit 0; }

 start portmap || true
 status portmap | logger -p local0.debug -t NFS
 status portmap | grep -q start/running # || start portmap
 exec sm-notify
end script

script
 exec 2>/tmp/statd.conf.debug.$$
 set -x

 if [ -f "$DEFAULTFILE" ]; then
     . "$DEFAULTFILE"
 fi

 if [ "x$NEED_STATD" != xno ]; then
  ls -l /var/lib/nfs >&2 || true
  exec rpc.statd -L $STATDOPTS
 fi
end script

Notice the exec/set -x at the start of the pre-start and script
sections. That's an effort to debug this. However, even after many
attempted start/stop operations since this machine has been booted, not
a single /tmp/statd{.-pre}.debug.$$ file has appeared, beyond the two
that were written at boot time:

-rw-r--r-- 1 root root 233 2010-04-17 19:09 /tmp/statd-pre.conf.debug.1252
-rw-r--r-- 1 root root 304 2010-04-17 19:09 /tmp/statd-pre.conf.debug.1345
-rw-r--r-- 1 root root 291 2010-04-17 19:09 /tmp/statd.conf.debug.1379

So it seems clear to me that the start/stop operations are not even
getting to executing the pre-start and scripts.

The content of those files (i.e. what init tried to do at boot time):

# cat /tmp/statd-pre.conf.debug.1252
+ [ -f /etc/default/nfs-common ]
+ . /etc/default/nfs-common
+ NEED_STATD=
+ STATDOPTS=
+ NEED_IDMAPD=no
+ NEED_GSSD=no
+ [ x != xno ]
+ start portmap
+ status portmap
+ logger -p local0.debug -t NFS
/dev/fd/10: 1: logger: not found

# cat /tmp/statd-pre.conf.debug.1345
+ [ -f /etc/default/nfs-common ]
+ . /etc/default/nfs-common
+ NEED_STATD=
+ STATDOPTS=
+ NEED_IDMAPD=no
+ NEED_GSSD=no
+ [ x != xno ]
+ start portmap
start: Job is already running: portmap
+ true
+ status portmap
+ logger -p local0.debug -t NFS
+ status portmap
+ grep -q start/running
+ exec sm-notify

# cat /tmp/statd.conf.debug.1379
+ [ -f /etc/default/nfs-common ]
+ . /etc/default/nfs-common
+ NEED_STATD=
+ STATDOPTS=
+ NEED_IDMAPD=no
+...

Read more...

Revision history for this message
Steve Langasek (vorlon) wrote :

> No. Per my last message, there is no running statd to log anything.

Yes, I was asking if it logged anything before it died.

> But "service statd stop" still hangs.

Yes, because upstart has been confused into thinking it's running when it isn't.

> if [ "x$NEED_STATD" != xno ]; then
> ls -l /var/lib/nfs >&2 || true
> exec rpc.statd -L $STATDOPTS
> fi

Er, this will *definitely* fail. Because this script is marked 'expect fork', it will track the first child process that's forked off by the script - in this case, ls - and consider *that* to be the service. Why is this 'ls' command here?

(This is bug #406397)

> statd: Could not chdir: No such file or directory

> What's in that last script is important and probably the root cause as
> far as the boot time failure and that's that /var is on it's own
> filesystem. I have another bug open about this issue.

Ah; that'll be the root cause of your boot-time failure, then. What's the bug number for this issue? It should be a bug on nfs-utils.

> Having statd start on (started portmap or mount TYPE=nfs) is not good
> enough. It has to wait for /var to be mounted. But does that mean it
> should just add a dependency on (local?) filesystems being mounted?
> That sounds too course to me.

There are two conflicting use cases here - one where /var is a separate local filesystem, and one where root is on NFS. When using nfsroot, we *can't* wait for the 'local-filesystems' event, as doing so blocks the root filesystem from ever being set up correctly by mountall. (We just fixed a bug in portmap wrt this - bug #537133).

I think the case where the system unrecoverably hangs on boot (the nfsroot case) has to take precedence here. For your case, you should be able to edit /etc/init/statd.conf as you describe to be 'start on portmap and mounted MOUNTPOINT=/var"; except that this won't stop mountall from trying to mount NFS mounts in parallel at boot time, so if you have such mounts that will fail if statd isn't running, that still doesn't solve your problem.

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :

On Tue, 2010-04-20 at 13:37 +0000, Steve Langasek wrote:
>
> Yes, I was asking if it logged anything before it died.

Doesn't appear to have.

> Yes, because upstart has been confused into thinking it's running when
> it isn't.

But why should upstart be able to determine that it's not actually
running and either walk through the stop anyway, or noop it, rather than
hanging indefinitely in the stop operation? That's the real nature of
this particular bug.

> > if [ "x$NEED_STATD" != xno ]; then
> > ls -l /var/lib/nfs >&2 || true
> > exec rpc.statd -L $STATDOPTS
> > fi
>
> Er, this will *definitely* fail. Because this script is marked 'expect
> fork', it will track the first child process that's forked off by the
> script - in this case, ls - and consider *that* to be the service.

~bleah~

> Why
> is this 'ls' command here?

To try to get more clarification on this /var and statd racing issue.

> Ah; that'll be the root cause of your boot-time failure, then. What's
> the bug number for this issue?

#525154.

> It should be a bug on nfs-utils.

It is. I posted some requested info back in Feb. but nothing was done
with it.

> There are two conflicting use cases here - one where /var is a separate
> local filesystem, and one where root is on NFS. When using nfsroot, we
> *can't* wait for the 'local-filesystems' event, as doing so blocks the
> root filesystem from ever being set up correctly by mountall.

Right. I did not advocate waiting for local-filesystem but rather
waiting for "mounted=/var/lib/nfs" (something that does not currently
exist afaik, hence my explanation as to how I thought such a thing would
work).

The nfsroot is a somewhat special case though in that you don't really
mount the nfsroot as / but typically you mount it somewhere else and
then pivot to it. I'm not really sure where the mountall runs in all of
that but I would think after the pivot is done during normal processing
of a post-/ mount. If /var is expected to be nfs mounted at that point
also, then I think you are stuck as you say in
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/525154/comments/3. So maybe this use case is simply invalid. I'm really not sure TBH. Although it would be a shame as I can see reasons for such a use case.

> I think the case where the system unrecoverably hangs on boot (the
> nfsroot case) has to take precedence here. For your case, you should be
> able to edit /etc/init/statd.conf as you describe to be 'start on
> portmap and mounted MOUNTPOINT=/var"; except that this won't stop
> mountall from trying to mount NFS mounts in parallel at boot time, so if
> you have such mounts that will fail if statd isn't running, that still
> doesn't solve your problem.

Yeah. And what happens in the case where /var is not separate but on /?
Will that mounted MOUNTPOINT=/var cause the statd init job to not run
because there is no /var to be mounted? If so, we have not really
solved the problem in a universal and generic method.

Revision history for this message
Steve Langasek (vorlon) wrote :
Download full text (4.7 KiB)

On Tue, Apr 20, 2010 at 02:10:56PM -0000, Brian J. Murrell wrote:
> > Yes, because upstart has been confused into thinking it's running when
> > it isn't.

> But why should upstart be able to determine that it's not actually
> running and either walk through the stop anyway, or noop it, rather than
> hanging indefinitely in the stop operation? That's the real nature of
> this particular bug.

That's the nature of an existing bug, already filed against the upstart
package. If you want, I can mark this bug as a duplicate of that one; but
that doesn't help with the very real bug in nfs-utils regarding the /var
race condition.

> > There are two conflicting use cases here - one where /var is a separate
> > local filesystem, and one where root is on NFS. When using nfsroot, we
> > *can't* wait for the 'local-filesystems' event, as doing so blocks the
> > root filesystem from ever being set up correctly by mountall.

> Right. I did not advocate waiting for local-filesystem but rather
> waiting for "mounted=/var/lib/nfs" (something that does not currently
> exist afaik, hence my explanation as to how I thought such a thing would
> work).

You can, as a local admin, modify your /etc/init/statd.conf to set 'start on
mounted MOUNTPOINT=/var'. There indeed is not a way to specify this that
will work for arbitrary paths that may or may not actually be mountpoints
(including /var), because we only get 'mounted' events for actual
filesystems. It *would* be nice to be able to be able to specify in the
default jobs that the job waits for a particular path, and I've asked Scott
James Remnant for this in the past - but only in passing and at a much lower
priority than a number of other critical bugs related to mountall this
cycle, because it would require a two-way negotiation for upstart to let
mountall know which particular points in the path it needs to send
notifications for.

So in short, we shouldn't let resolution of this bug block on the
availability of such a feature.

> The nfsroot is a somewhat special case though in that you don't really
> mount the nfsroot as / but typically you mount it somewhere else and
> then pivot to it. I'm not really sure where the mountall runs in all of
> that but I would think after the pivot is done during normal processing
> of a post-/ mount. If /var is expected to be nfs mounted at that point
> also, then I think you are stuck as you say in
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/525154/comments/3. So maybe this use case is simply invalid. I'm really not sure TBH. Although it would be a shame as I can see reasons for such a use case.

mountall runs after init starts. nfsroot may or may not be done using an
initramfs - there's in-kernel support for nfsroot using static IP
configuration. But we can reasonably assume that *if* someone is using
nfsroot, then /var/lib is on the root filesystem, because that's the only
way to make this work even pre-upstart. The problem lies entirely in trying
to express a single job that works both for nfsroot and for non-nfsroot with
/var as a separate partition.

In the case of portmap, there's no trade-off; "start on virtual" is always
correct. In the ...

Read more...

Revision history for this message
Steve Langasek (vorlon) wrote :

> So for lucid, I'm still inclined to update the statd job to 'start on
> local-filesystems'. Possibly 'start on (local-filesystems and mounting
> TYPE=nfs)' - if that doesn't cause NFS mount attempts after the first one to
> deadlock in mountall/upstart. I'll have to test this and propose it as an
> SRU if it checks out.

Ah, in fact that causes a deadlock in mountall/upstart even before NFS
mounts are attempted. So 'start on local-filesystems' is as close as we can
probably get for lucid.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :
Download full text (3.4 KiB)

On Sat, 2010-04-24 at 22:49 +0000, Steve Langasek wrote:
> On Tue, Apr 20, 2010 at 02:10:56PM -0000, Brian J. Murrell wrote:
> > But why should upstart be able to determine that it's not actually
> > running and either walk through the stop anyway, or noop it, rather than
> > hanging indefinitely in the stop operation? That's the real nature of
> > this particular bug.
>
> That's the nature of an existing bug, already filed against the upstart
> package. If you want, I can mark this bug as a duplicate of that one;

If that same bug is what is stopping a "sudo service statd" from not
returning, even after the machine is fully booted and all local
filesytems, including /var are mounted, then I would think marking as a
duplicate is fine.

> but
> that doesn't help with the very real bug in nfs-utils regarding the /var
> race condition.

Indeed, but that is bug #525154, so we can carry on that discussion
there, yes?

> You can, as a local admin, modify your /etc/init/statd.conf to set 'start on
> mounted MOUNTPOINT=/var'.

Yeah, but as I'm sure you understand, that doesn't scale very well. :-(

> There indeed is not a way to specify this that
> will work for arbitrary paths that may or may not actually be mountpoints
> (including /var), because we only get 'mounted' events for actual
> filesystems.

Indeed, which I fully understand. Of course, my proposal was a "future"
for upstart/mountall.

> It *would* be nice to be able to be able to specify in the
> default jobs that the job waits for a particular path, and I've asked Scott
> James Remnant for this in the past

Ahhh. So we are of like minds on this, which gives me a bit of
reassurance that I am not on crack thinking about that. :-)

> because it would require a two-way negotiation for upstart to let
> mountall know which particular points in the path it needs to send
> notifications for.

Indeed. I was not sure who did what in terms of reading the conf files
and whatnot, but I had fully realized that prior to mounting, mountall
(or somebody) would have to use the fstab and the /etc/init/*.conf files
to figure out all of the "path mounted" signals that are being waited
for and should be sent.

> So in short, we shouldn't let resolution of this bug block on the
> availability of such a feature.

Indeed, and I'd even be happy to just assume that /var/lib/nfs is
on /var (as that would be the majority of the cases -- where /var is
separate), and work with mounted MOUNTPOINT=/var, but that's
incompatible with those that just put everything in /.

Those seem like two incompatible cases and the admin that runs the
(IMHO) "properly" partitioned system is going to have to touch every one
of his systems to add the "mounted MOUNTPOINT=/var" dependency to his
statd.conf files.

I really know quite little about the workings of upstart at this point,
but I wonder if some sort of temporary, outboard-of-mountall job can be
written to emulate the functionality of scanning the .conf files for
wanted paths, as well as the fstab and emitting signals when a mounted
filesystem makes wanted paths available.

So, for the case of /var being a seperate filesystem, given that my
statd.conf already has:

      ...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for nfs-utils (Ubuntu) because there has been no activity for 60 days.]

Changed in nfs-utils (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.