upstart (/sbin/init) fails to start many /etc/init and /etc/rc2.d files

Bug #581291 reported by Mike Bianchi
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
NULL Project
Invalid
Undecided
Unassigned
mountall (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

A virgin install of 10.4 was failing to start many of the items in /etc/init and /etc/rc2.d, or least that was the symptom.
Example: at the end of boot the getty processes on /dev/tty[1-6] where _often_ not present.
Example: my home-made entries in /etc/rc2.d apparently did not run (and they have worked for years).

What's different? This is may first AMD 64-bit quad-processor and it runs very much faster than anything I've ever had before.

(Many days of debugging.)

I think the problem is that Upstart's tendency to run many things in parallel means that sometimes subtle dependencies can get caught by race conditions. E.g if event A completes before event B starts, all is well, but the other order causes problems.

My get-around has been to add the init='/sbin/int --verbose' to the linux /boot/vmlinuz/... line in /boot/grub/grub.cfg by
making my own menuentry item in /etc/grub.d/08_custom .
See man 8 init .
This seems to have the effect of slowing down Upstart, and possibly single-threading some of the startups (because they are held back by the queue of messages to the console?).
((Once again, adding the debugging printout _prevents_ the problem you are trying to debug.))

At this point I'm happy to just leave the --verbose option in, since now my machine seems to boot reliably.

Suggestion: Upstart needs a formal --single-thread option to get around just this sort of problem. Give me a slow boot that always works any day!

I've marked the security vulnerability box because any security boot script that does not run properly could be a problem.

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :
visibility: private → public
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Please provide more evidence of these "failures", for example error logs, /var/log/boot.log, etc. Read http://www.chiark.greenend.org.uk/~sgtatham/bugs.html

Upstart is designed to *prevent* race conditions, if you are experiencing some, we'd like to fix them. Upstart does not need a "single thread" option

Changed in upstart:
status: New → Incomplete
affects: upstart → null
Changed in null:
status: Incomplete → Invalid
Changed in upstart (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

I am unable to provide the failure evidence, beyond the observations of things like "ps axf does not show any getty processes", because upstart was not putting any messages to the console nor into log files. I don't see any logging mechanism in /etc/init, probably because it expects things that fail to go to dmesg or syslog. Before I turned --verbose on, there are only init: message in syslog for apport and plymouth . Afterwards there a lots of messages, but no problems.

See attachments syslog.old syslog.new .

Now that I think about it, the problem may have to do with /dev/console and syslog not having been properly started when
other things were happening on other cores(?).

Suggestion: How about a tool to create the tree of dependencies of /etc/init/*.conf ?

  "Upstart does not need a "single thread" option."

At this point, I disagree. The evidence is that my boots are reliable now and before --verbose they were not.
I strongly suspect that the reason is the --verbose is enforcing an order that is not enforced when it is not present.
I _suspect_ that the problem is not in /sbin/init but in some subtle dependency in the ./etc/init/*.conf files that is
expressed in my circumstance that is not expressed in others.

I suspect the circumstance has to do with very fast CPUs or multiple cores.

I don't think race conditions are preventable if they are not obvious to those writing the /etc/inti/*.conf files.
Get a start on wrong and they will happen.

If --single-thread is the wrong answer, then I am asking for something that will be obvious to the next person
who has this problem.

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :
Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

I spoke too soon.

The init='/sbin/init --verbose' trick worked, until I accepted an update that was a new version of the mountall package.
Now I have the old behavior even with --verbose . Sigh.
I don't _know_ that mountall caused the problem. I was working on other stuff and only discovered the problem later.

Remember the old, old advice. If it is working, don't touch _anything_ !

Not that I tried both 2.6.32-22-generic and 2.6.32-21-generic, but that did not fix the problem.

Attached is a syslog that includes the working runs and then the broken ones.
I tried to comment it.

I think there is now evidence of the problem.

Look for when init: tty4 goal changed from stop to start ....

May 16 14:36:04 autoaud-broad1 kernel: [ 0.030000] CPU 3/0x3 -> Node 0
May 16 14:36:04 autoaud-broad1 kernel: [ 0.030000] CPU: Physical Processor ID: 0
May 16 14:36:04 autoaud-broad1 kernel: [ 0.030000] CPU: Processor Core ID: 3

>>>> tty4 is starting when the CPUs are not yet fully initialize?????

May 16 14:36:04 autoaud-broad1 init: tty4 goal changed from stop to start

May 16 14:36:04 autoaud-broad1 kernel: [ 0.650088] CPU3: AMD Phenom(tm) II X4 955 Processor stepping 02

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

Having now learned that /var/log/apt/history noted the move from mountall 2.14 to 2.15 today and that
   apt-get install mountall=2.14 sets the package back to what it was, I seem to have returned to the more reliable booting behavior.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Please attach the file I asked for.

security vulnerability: yes → no
affects: upstart (Ubuntu) → mountall (Ubuntu)
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Also after a failed boot, could you run "runlevel" - what is the output?

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

After a successful boot, with init='/sbin/init --verbose' : N 2
After a failed boot, without init= : unknown

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

Also, with init='/sbin/init --verbose' boots are not 100% reliable. Some times they still fail.
I have set the number of active CPUs to 3 (was 4) in the BIOS, in desperate hope that this will make things more reliable.

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

Scott -- sorry -- I missed the request for /var/log/boot .log

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 581291] Re: upstart (/sbin/init) fails to start many /etc/init and /etc/rc2.d files

On Mon, 2010-05-17 at 12:47 +0000, Mike Bianchi wrote:

> After a successful boot, with init='/sbin/init --verbose' : N 2
> After a failed boot, without init= : unknown
>
Thanks, this matches the symptoms of bug #543506 - investigation is
still ongoing; in most cases it actually looks like a failure of the
non-Upstart services not the Upstart ones.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

I'm moving my participation in this discussion to bug #543506.

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

> Thanks, this matches the symptoms of bug #543506 - investigation is
> still ongoing; in most cases it actually looks like a failure of the
> non-Upstart services not the Upstart ones.

I don't see how you can say it's a problem with non-Upstart services if the tty[1-6].conf services are not starting.
I would hope that Upstart would be immune to problems in things outside /etc/init such as /etc/rc?.d/ scripts.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

On Mon, 2010-05-17 at 15:09 +0000, Mike Bianchi wrote:

> I don't see how you can say it's a problem with non-Upstart services if the tty[1-6].conf services are not starting.
>
If you read those conf files, they depend on the non-Upstart services
being started (runlevel/rc etc.)

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

> If you read those conf files, they depend on the non-Upstart services being started (runlevel/rc etc.)

Does that make sense?
It would seem to me that the only requirement for gettys on /dev/tty[1-6] is the ability to login and see the file systems.

Suggestion: use start on runlevel [2345] in /etc/init/tty[1-6].conf

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

> Suggestion: use start on runlevel [2345] in /etc/init/tty[1-6].conf

Am I expected to make that sort of change to /etc/init/*.conf files?
Maybe there could be two categories:
  /etc/init
  /etc/init_local

where the non-official boot sequence commands would live.

Again, official boot sequence commands should be robust against failures in /etc/init_local .

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.