[Maverick] Reboot of linux-virtual hangs on EC2

Bug #727814 reported by Stefan Bader
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned
Maverick
Fix Released
Medium
Stefan Bader

Bug Description

SRU Justification:

Impact: On reboot or shutdown the current Xen code does try to stop other CPUs. However the IPI communication is disabled already at that point. So issuing a reboot or shutdown from within an instance with multiple vcpus hangs.

Fix: Cherry pick of an upstream patch (added around 2.6.37) removes the attempt to stop other CPUs.

Testcase: From within an EC2 instance, call "sudo reboot". Instance never comes up again. With the patch applied it does.

Revision history for this message
Stefan Bader (smb) wrote :

Does not affect Natty as fix was added with 2.6.37-rc1.

description: updated
Changed in linux (Ubuntu Maverick):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
status: Triaged → Invalid
Revision history for this message
Stefan Bader (smb) wrote :
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Brandon Black (blblack) wrote :

Anyone have a userland workaround (other than ec2-reboot-instances of own instance-id, which requires auth keys on the node...) for getting an ec2 node on these kernels to reboot itself successfully?

Also: note that once we have a release kernel w/ the fix, new Maverick AMIs will have to go out before the problem is really solved (or else you can't really reboot to the fixed kernel from a fresh image after update).

Revision history for this message
Stefan Bader (smb) wrote :

There is kind of a workaround (though slow). The reboot must be initiated from outside (either the web interface or the ami tools ec2-reboot-instances) and one has to wait 5 minutes in that case (this is a timeout of the xen commands). After that the instance comes up again.

Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-maverick
Revision history for this message
Stefan Bader (smb) wrote :

Reboot works timely again and halt moves the session into terminated as expected.

tags: added: verification-done-maverick
removed: verification-needed-maverick
Revision history for this message
Charles Peters II (cp) wrote :

I am using another VPS provider using Xen and after I did a 10.04 to 10.10 upgrade last week I ran into this issue of reboot's hanging. Today I enabled proposed and pulled in a few updates and reboot's are working with linux-image-2.6.35-28-virtual (2.6.35-28.50 Ubuntu:10.10/maverick-proposed [amd64]).

Revision history for this message
Brandon Black (blblack) wrote :

Stefan, your workaround would almost be acceptable if this were the only bug in play. However, for those of us booting Maverick AMIs for PV-grub, and then using cloud-init to auto-downgrade the kernel to Karmic's or auto-upgrade to Natty's (because let's face it, so far Lucid and Maverick have yet to have a production-capable kernel for EC2 use), lack of a reboot method that works from a shellscript inside the instance itself without sensitive keys is a no-go for even initially booting from Maverick AMIs with this bug in them. Luckily the older Maverick AMIs from before this bug was introduced are still available...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.35-28.50

---------------
linux (2.6.35-28.50) maverick-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #734399

  [ Corentin Chary ]

  * SAUCE: (drop after 2.6.38) eeepc-wmi: reorder keymap
    - LP: #689393
  * SAUCE: (drop after 2.6.38) eeepc-wmi: add wlan key found on 1015P
    - LP: #689393

  [ Keng-Yu Lin ]

  * SAUCE: eeepc-wmi: set the touchpad toggle key code to F22
    - LP: #689393

  [ Tim Gardner ]

  * [Config] CONFIG_BOOT_PRINTK_DELAY=y
    - LP: #733191

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon/bo: add some fallback placements for VRAM only
    objects."
    - LP: #652934
  * eeepc-wmi: add additional hotkeys
    - LP: #689393
  * xen: don't bother to stop other cpus on shutdown/reboot
    - LP: #727814
  * Yama: use thread group leader when creating match
    - LP: #729839
  * mmc: sdhci-pci: add ricoh e822 pci id with device specific quirks
    - LP: #730820
 -- Brad Figg <email address hidden> Sun, 13 Mar 2011 07:01:39 -0700

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.