Boot failure after upgrading kernel to 2.6.32-25-generic

Bug #659422 reported by Dmitry Torokhov
106
This bug affects 15 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

After installing 10.04.1 in a 32-bit VM and updating only the kernel package to 2.6.32-25-generic the system fails to boot due to udevd process segfaulting (somewhere in libc.so) and thus neither of required kernel modules being loaded. Booting into previous kernel (2.6.32-24-generic) brings the system back to the working state. Initramsfs image for 2.6.32-25-generic appears to be in a good shape.

The issue affects 32 bit installations only; 64-bit guests boot fine with both 2.6.32-24 and 2.6.32-25 kernels.

Please note that apport information has been collected on 2.6.32-24 kernel and not the troublesome 2.6.32-25.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-24-generic 2.6.32-24.39
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-24.39-generic 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: AudioPCI [Ensoniq AudioPCI], device 0: ES1371/1 [ES1371 DAC2/ADC]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: dtor 1308 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'AudioPCI'/'Ensoniq AudioPCI ENS1371 at 0x2080, irq 16'
   Mixer name : 'Cirrus Logic CS4297A rev 3'
   Components : 'AC97a:43525913'
   Controls : 24
   Simple ctrls : 13
CurrentDmesg:

Date: Tue Oct 12 02:52:51 2010
HibernationDevice: RESUME=UUID=6023ab94-c479-408a-8427-16fc521479c6
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release i386 (20100816.1)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb:
 Bus 002 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
 Bus 002 Device 002: ID 0e0f:0003 VMware, Inc. Virtual Mouse
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: VMware, Inc. VMware Virtual Platform
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-24-generic root=UUID=bd0ddac6-2301-414b-9428-f682f86f550f ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34.1
RfKill:

SourcePackage: linux
dmi.bios.date: 09/02/2010
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/02/2010:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Related branches

Revision history for this message
Dmitry Torokhov (dtor) wrote :
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Dmitry Torokhov (dtor) wrote :

Why exactly was it marked 'incomplete' without any additional comments? What information is missing?

Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Austin Godber (godber-uberhip) wrote :

I can confirm this happens on VMware ESX 4.1. I know for certain it does not happen on older vmware-server 1.0.X series. And when it dumps me into the initramfs console, I can run udevadm and about 50% of the time it will segfault.

Revision history for this message
Austin Godber (godber-uberhip) wrote :

I have gone back through the process and can confirm that on ESX 4.1 the 2.6.32-25-generic kernel segfaults during boot, the -pae kernel does not segfault. The udevadm command segfaults:

udevadm[94]: segfault at 1 ip 00000001 sp bfb55514 error 4 in libdl.so.2

After the segfault, the user is placed in the initramfs console. Running udevadm a number of times on this console it segfaults about 50% of the time.

Revision history for this message
Austin Godber (godber-uberhip) wrote :

This does not happen on ESX 3.5i or VMware Workstation 7.

Revision history for this message
SteveM (stevem) wrote :

Confirming that upgrading from 2.6.32-24 to 2.6.32-25 breaks the system. Segmentation fault occurs while booting.

Revision history for this message
Austin Godber (godber-uberhip) wrote :

This problem persists in the recent 2.6.32-26-generic kernel.

Revision history for this message
Sébastien CANCHON (scanchon) wrote :

Same problem here (after upgrade from ESX 3.5i -> ESXi 4.1i) and 32 bits Virtual Machine.
Ubuntu 10.04.1 LTS & Kernel 2.6.32-25-generic-pae won't boot (udev segault): same errors as below.
Upgrade to 2.6.32-26-generic: same problem.
Older installed kernel (2.6.24-23-server) won't boot too, udev got an error ("libudev: udev_monitor_new_from_netlink: error getting socket: Invalid argument) and segault (wait-for-root[879]: segfault at 00000030 eip b7fc0f2b esp bfee2a70 error 4).
Other 64bits VM's works fine (Ubuntu10.04.1).

Revision history for this message
Sébastien CANCHON (scanchon) wrote :

Tried with 2.6.32-24-generic: boot withou problems !

Revision history for this message
angel4you (5schuster) wrote :

Hello,

we had the same problem in our company and I found a solution for this.
In ESXi 4.1 vSphere Client choose the VM Ware-Machine which starts with the "segmentation fault"
and go to the machine settings.
There you have to change the "host operating system" from "Ubuntu 32 bit" to "Another"
"Version 32 Bit".

After that run your Machine and start with the dist-upgrade.
Following the finished update reboot your system and it should work now.

Revision history for this message
angel4you (5schuster) wrote :

Hello,

we had the same problem in our company and I found a workaround for this.
In ESXi 4.1 vSphere Client choose the VM Ware-Machine which starts with the "segmentation fault"
and go to the machine settings.
There you have to change the "host operating system" from "Ubuntu 32 bit" to "Another"
"Version 32 Bit".
VMI must also be deactivated.

After that run your Machine and start with the dist-upgrade.
Following the finished update reboot your system and it should work now.

Revision history for this message
Dmitry Torokhov (dtor) wrote :

I bisected the problem down to this change:

commit 4f797709f4c0f61368c2f0f989cefc805fed0c9e
Author: Brad Figg <email address hidden>
Date: Fri Aug 27 09:10:55 2010 -0700

    mm: make the vma list be doubly linked

    BugLink: http://bugs.launchpad.net/bugs/625392

    commit 297c5eee372478fc32fec5fe8eed711eedb13f3d upstream.

    It's a really simple list, and several of the users want to go backwards
    in it to find the previous vma. So rather than have to look up the
    previous entry with 'find_vma_prev()' or something similar, just make it
    doubly linked instead.

    Tested-by: Ian Campbell <email address hidden>
    Signed-off-by: Linus Torvalds <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>
    Acked-by: Steve Conklin <email address hidden>
    Acked-by: Stefan Bader <email address hidden>
    Acked-by: Brad Figg <email address hidden>
    Signed-off-by: Brad Figg <email address hidden>

 include/linux/mm_types.h | 2 +-
 kernel/fork.c | 7 +++++--
 mm/mmap.c | 39 +++++++++++++++++++++++++--------------
 mm/nommu.c | 7 +++++--
 4 files changed, 36 insertions(+), 19 deletions(-)

Reverting allows me to boot the VM. However if I compile mainline or stable 2.6.32.21 that has this commit as well then I am able to boot just fine.

I expect we are witnessing incompatibility of the patch with Ubuntu "sauce". I suspect it might be this one:

commit 59b35f3ba2cba40bd260ad19dbf4cf445714044f
Author: Kees Cook <email address hidden>
Date: Thu Apr 30 12:47:47 2009 -0700

    UBUNTU: SAUCE: [x86] implement cs-limit nx-emulation for ia32

but it is too tangled up with other changes to be reverted cleanly to test this theory.

Revision history for this message
Sam Stenvall (negge) wrote :

I can confirm that this happens with kernel 2.6.32-27-generic-pae on a VMware ESX 4.1 (build 320092). udevadm segfaults right after the kernel has finished loading, and it inevitable leads to failure to mount the root filesystem.

Eagerly waiting for a fix!

Revision history for this message
Sam Stenvall (negge) wrote :

Forgot to say that both kernels -27 and -26 work just fine on ESX 4.0.

Revision history for this message
Slackjaw77 (alienresidents) wrote :
Revision history for this message
Chris Jakeway (chrisjakeway) wrote :

It looks like the 2.6.35-* versions of the kernel do not have the same problem.

This worked for me:
- If you don't have an ealier kernel version to boot into; in VMware, disable acceleration for the VM. (Right click: Edit Settings... -> Options -> General -> Disable acceleration)
- Start the VM. It'll run very slowly with acceleration disabled.
- Install one of the 2.6.35 kernel versions. Example: linux-image-2.6.35-25-generic
- Shut down the VM.
- Re-enable Acceleration if you disabled it above.
- Start the VM. Right at the start, as grub boots, hold down a shift key. Select your newly installe 2.6.35-* kernel.

Revision history for this message
Jon Buckley (itsafork) wrote :

I am using a few of different JumpBoxes, and after migrating them from ESXi4.0 to ESXi4.1 I went to boot them back up, & in 2 of them have received the error message saying:

Segmentation fault
Gave up waiting for root device

I have already tried switching the OS Type in vSphere device manager from "Ubuntu Linux 32-bit" to "Other Linux 32-bit"

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
RP (darpified) wrote :

Still segfaults all the way up to kernel .32 (from .25), on fully updated virtual machines.

Revision history for this message
Oscar Urra Cuairán (oscar-urra) wrote :

I have the same problem after upgrading Vmware ESXi 4.0 to ESXi4.1.
I've found another work around that works for me:

In the vSphere manager, shutdown the virtual machine and edit it settings: go to the Options tab, and select "CPU/MMU Virtualization". There, change the selected option from "Automatic" (the first) to "Use Intel VT-x/AMD-V for instruction set virtualization and software for MMU virtualization" (the third).

After that, I can boot with any 2.6.32-3x kernel.

Revision history for this message
Stephan Dühr (stephan-duehr) wrote :

Changing any of the VM settings did not work for me (VMware 4.1), 2.6.32-38-generic failed with udevd segfault.
Finally I installed the virtual kernel (apt-get install linux-image-virtual), which is suitable for VMware according to the Ubuntu Server FAQ:
https://help.ubuntu.com/community/ServerFaq#What_are_the_differences_between_the_server_and_virtual_kernels.3F

This installs a 2.6.32-38-generic-pae kernel and solves the problem.

Revision history for this message
Brian C. Shensky (bshensky) wrote :

Failed to start up immediately after a virgin install of 32-bit Server (Minimal Virtual Install option) 10.04 LTS on VMWare ESXi 4.1. Get the segfault after the udevd failure line, then a revert to busybox.

Changing VMWare's identified OS from Ubuntu 32-bit to any other value does not resolve issue.
Changing VMWare's CPU/MMU Virtualization value from Automatic to "Use Intel VT-x/AMD-V" does not resolve issue.
The only way to boot the VM is to turn Acceleration off.

Once booted, I reverted back to the earlier known good kernel, 2.6.34.24:
$ apt-get install linux-image-2.6.32-24-virtual
$ echo "linux-image-2.6.32-33-virtual hold" | sudo dpkg --set-selections # stock kernel on distro
$ echo "linux-image-2.6.32-41-virtual hold" | sudo dpkg --set-selections # current dist-upgrade target for this distro

Then I turned Acceleration back on and rebooted. Works. W00t!

$ uname
Linux myboxname 2.6.32-24-generic-pae #43-Ubuntu SMP Thu Sep 16 15:30:27 UTC 2010 i686 GNU/Linux

penalvch (penalvch)
tags: added: bisect-done
Alexej (nebu0email.tg)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.