Comment 6 for bug 264711

Revision history for this message
Jeff Oliver (jeffrey-oliver) wrote : Re: expect fork/daemon do not work as expected

Ok, i think I have something here. I feel like its a major hack, but as far as I can tell it works. Here's what I did:

I used xinetd as my sample job with the following job file:

# xinetd -
#

start on started network
stop on stopping network

respawn
session leader
expect fork

exec /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid

***********

I made some changes to the code to handle what I was seeing when attempting to run this job:

1) Changed the tracing state machine.
Instead of using distinct states, i changes the trace_state variable to be a set of bits that indicate the progress through the state machine. The field is zero if the state machine is inactive, bit 0 on if active, bit 1 on if we've hit the first exec, bit 2 on if we've hit the fork in the parent process, and bit 3 on if we've hit the fork in the child process. When bit2 and bit 3 are both on, i figure the process has forked enough and we continue on setting the job state to the next state. That seems to work fine, with only 1 really visible problem....

2) Changes how to search for jobs in the pid list. The job hash table has a list of pids that upstart is currently keeping track of. When a signal is caught from a child, if the pid is not in the list, it ignores it. Unfortunately, in my situation, i see a signal for a pid we're not keeping track of yet, since the signal to transition to the new pid has not been processed yet. As a result, I added a function that searches for a pid's parent pid in the job list. If that is found, then we'll handle that signal as well, since it's related to the job we're working on. There is another catch to that...

3) A place to keep the pid of the newly forked process...for that i added another job->pid[] entry to keep track of the newly forked pid. This seems awefully kludgey, but at the moment, I can't think of a better way to deal with the problem. Somehow, the tracing state machine needs to anticipate the pid of the new process, and it obviously cannot do that...it can only anticipate that there will be a pid at some point.

In any case, it seems to work the way it ought to. I can start a forking process, let it fork, then kill it and see it get restarted automatically:

2008-10-15T21:23:44.061227+00:00 fs03 xinetd[19501]: Exiting...
2008-10-15T21:23:44.061624+00:00 fs03 init: waitid - pid=19501, signo=17, code=1, status=0
2008-10-15T21:23:44.061658+00:00 fs03 init: xinetd main process (19501) exited normally
2008-10-15T21:23:44.061691+00:00 fs03 init: xinetd main process ended, respawning
2008-10-15T21:23:44.061724+00:00 fs03 init: xinetd state changed from running to stopping
2008-10-15T21:23:44.061756+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-15T21:23:44.061787+00:00 fs03 init: Handling stopping event
2008-10-15T21:23:44.061820+00:00 fs03 init: xinetd state changed from stopping to killed
2008-10-15T21:23:44.061853+00:00 fs03 init: xinetd state changed from killed to post-stop
2008-10-15T21:23:44.061885+00:00 fs03 init: xinetd state changed from post-stop to starting
2008-10-15T21:23:44.061916+00:00 fs03 init: Handling starting event
2008-10-15T21:23:44.061948+00:00 fs03 init: xinetd state changed from starting to pre-start
2008-10-15T21:23:44.062434+00:00 fs03 init: xinetd state changed from pre-start to spawned
2008-10-15T21:23:44.062646+00:00 fs03 init: xinetd main process (20291)
2008-10-15T21:23:44.062783+00:00 fs03 init: waitid - pid=20291, signo=17, code=4, status=5
2008-10-15T21:23:44.062880+00:00 fs03 init: trace_new pid=20291
2008-10-15T21:23:44.063783+00:00 fs03 init: waitid - pid=20292, signo=17, code=4, status=19
2008-10-15T21:23:44.063942+00:00 fs03 init: trace_new_child pid=20292
2008-10-15T21:23:44.065006+00:00 fs03 xinetd[20292]: xinetd Version 2.3.12 started with libwrap loadavg options compiled in.
2008-10-15T21:23:44.065038+00:00 fs03 xinetd[20292]: Started working: 1 available service
2008-10-15T21:23:44.065038+00:00 fs03 init: trace_state=b
2008-10-15T21:23:44.065286+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-15T21:23:44.065391+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-15T21:23:44.065501+00:00 fs03 init: waitid - pid=20291, signo=17, code=4, status=261
2008-10-15T21:23:44.065604+00:00 fs03 init: xinetd main process (20291) became new process (20292)
2008-10-15T21:23:44.065699+00:00 fs03 init: trace_state=f
2008-10-15T21:23:44.065795+00:00 fs03 init: xinetd state changed from spawned to post-start
2008-10-15T21:23:44.065891+00:00 fs03 init: xinetd state changed from post-start to running
2008-10-15T21:23:44.065995+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-15T21:23:44.066042+00:00 fs03 init: Handling started event
2008-10-15T21:23:44.066233+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-15T21:23:44.066412+00:00 fs03 init: waitid - pid=20291, signo=17, code=1, status=0
2008-10-15T21:23:44.066521+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0