[regression] 2.6.27-7 sometimes fails to boot (iwl3945 issue?)

Bug #263059 reported by Mikael Nilsson
286
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Intel Linux Wireless
Invalid
Critical
linux (Ubuntu)
Fix Released
High
Tim Gardner
Intrepid
Fix Released
High
Tim Gardner

Bug Description

Problem: Intrepid fails to boot on a variety of laptops using the iwl3945 driver.

Affected versions: all 2.6.27 Intrepid kernels. 2.6.26 is reported not to be affected.

Symptoms: A hard freeze partway through the boot process, near the time when the iwl3945 module is loaded. No kernel panic is printed, and the system is unresponsive even to Sysrq.

Frequency: This happens frequently, but not every time (20%-80% failure rate).

Workarounds: Blacklisting iwl3945 reliably avoids the problem. Delaying the loading of iwl3945 for 5 seconds also reliably avoids the problem.

Revision history for this message
Francisco T. (leviatan1) wrote :

The boot freezes when my wifi is ON, when the kernel is loading iwl3945 driver. If wifi is OFF, it doesn't freeze.

Do you have the same problem?

Revision history for this message
Mikael Nilsson (mini) wrote : Re: [regression] 2.6.27-2 fails to boot on Dell XPS M1710 when wireless enabled

Indeed, that was it.

description: updated
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Guys,

Would either of you be able to boot with the 'quiet' and 'splash' options removed and your wireless enabled to trigger the hang. Would you then be able to take a digital photo of any errors that appear on your screen prior to the hang and attach it to this bug report? Thanks.

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: New → Triaged
Revision history for this message
Mikael Nilsson (mini) wrote :

Unfortunately, the 2.6.27-2 kernel now fails to boot either way.

The symptoms seem similar to bug #102982, which affected me before (i.e. no particular error messages, and the last I see is about intel_rng).

Revision history for this message
Mikael Nilsson (mini) wrote :

Or possibly bug #106256. Will check if the boot process resumes after waiting.

Revision history for this message
Mikael Nilsson (mini) wrote :

Even with wireless OFF, the computer does not continue the boot process even after waiting for a long time (15 mins). It also does not respond to Ctrl-Alt-Delete, which to me suggests a complete hang.

removing "quiet splash" from the kernel command line, I don't see any error messages. The boot hangs after displaying "setting system clock..."

Revision history for this message
Francisco T. (leviatan1) wrote :

Me too, I can confirm . Now It doesn't matter if the wireless is on or off. Also before, if the ethernet cable was connected, it froze .

Now, It freezes when it is loading hardware drivers. The freeze is random, maybe 1 each 10 boots.

I can't find the exact reason.

Revision history for this message
Juan Pablo Salazar Bertín (snifer) wrote :

When you "switch wireless off", do you see the iwl3945 driver still being loaded? (with "quiet splash" removed)
Have you tried disabling wireless in your BIOS config?

I've reported a possible duplicate (bug #267002), please let me know about your results, thanks.

Revision history for this message
Francisco T. (leviatan1) wrote :

I tried 5 boots with the wireless on (really it's radio ON/OFF):
Once it stopped in the line: [ 13.078867] iwl3945: Detected Intel Wireless WiFi Link 3945ABG
Another time it stopped in the line: [ 13.831927] Synaptics Touchpad, model: 1, fw: 6.1, id: 0xa3a0b3, caps: 0xa04713/0x10008
Other times boot was normal.

No ethernet cable was connected and my BIOS hasn't any disable wireless option.

In attachment you have a complete normal boot.

Revision history for this message
Mikael Nilsson (mini) wrote :

2.6.27-3 boots normally, even with wireless ON.

Will report if this is a stable situation.

Revision history for this message
Juan Pablo Salazar Bertín (snifer) wrote :

2.6.27-3 still fails to boot sometimes for me.

Revision history for this message
Francisco T. (leviatan1) wrote :

Update 2.6.27-3.
After some normal boots (always with wireless ON), the last time again it failed in the line:
iwl3945: Detected Intel Wireless WiFi Link 3945ABG

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi everyone,

Francisco, it seems you may be experiencing what Juan has reported at bug 267002. If you and Juan can continue to track the issue at that bug report it would be great. It seems Mikael, the original bug reporter, no longer is experiencing issues with the newer 2.6.27-3 kernel so it would seem he may have had a slightly different bug than what you both have. For now I am makring Mikael's bug (ie this bug) as "Fix Released". Thanks.

Changed in linux:
status: Triaged → Fix Released
Revision history for this message
Mikael Nilsson (mini) wrote :

Actually, Francisco describes my experience as well - even the -3 kernel fails to boot sometimes. It is NOT connected to wireless ON or OFF - it sometimes fails anyway.

A reboot usually works.

Please reopen.

description: updated
description: updated
Mikael Nilsson (mini)
Changed in linux:
status: Fix Released → Confirmed
Revision history for this message
Marcin Feder (marfed) wrote : Re: [regression] 2.6.27-3 fails to boot on Dell XPS M1710

I can confirm the same problem on Asus V6J (nvidia + iwl3945). System always hanging on "Starting Network Interfaces" when booted from 2.6.27-3. When started with 2.6.24-19-generic kernel it works properly. When wireless card is turned off (using BIOS settings) system boot process goes further and stops on CUPS.

Revision history for this message
Marcin Feder (marfed) wrote :

2.6.27-4 fails to boot too. In my case it _always_ hangs on iwl3945 driver activation. Maybe this bug should have more general title i.e: "2.6.27-3 - boot freezes when the iwl3945 is being loaded"

Revision history for this message
Matthew Wardrop (mister.wardrop) wrote :

I can confirm this too... Because of this, and the better suspend behaviour of 2.6.25, I usually end up using the older kernel.

Revision history for this message
Matthew Wardrop (mister.wardrop) wrote :

Ah... but I note it is already confirmed... sorry for the spam.

Kind Regards,
Matthew

Revision history for this message
Yotam Benshalom (benshalom) wrote :

I have the same problem. Curiously, it happens in about 66% of the boots but not in all of them.

Here is what I get when quiet and splash are turned off (copied by hand...)

iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for linux, 1.2.26ks
iwl3945: Copyright (C) 2003-2008 Intel Corporation
iwl3945 0000 :05 :00 .0 PCI INT -> GSI 19 (level, low) -> IRQ 19
iwl3945: Detected Intel Wireless Wifi Link 3945ABG
cs: IO port probe 0x100-0x3af: clean
cs: IO port probe 0x3e0-0x4ff: excluding 0x4d0-0x4d7
cs: IO port probe 0x820-0x8ff: clean
cs: IO port probe 0xc00-0xcf7: clean
cs: IO port probe 0xa00-0xaff: clean
iwl3945: Tunable Channels: 13 802.11bg, 0 802.11a channels
iwl3945 0000 :05 :00 .0 PCI INT A disabled
iwl3945 0000 :05 :00 .0 PCI INT -> GSI 22 (level, low) -> IRQ 22
Setting the system clock ... OK

<<<ETERNAL HANG>>>

(sometimes it happens before the system clock line)

This looks like a general iwl3495 driver problem with intrepid. Is there more data I can send in order to help solving it?

Revision history for this message
Yotam Benshalom (benshalom) wrote :

I forgot to mention - this happens in 32-bit system on lg-s1 laptop.

Revision history for this message
DSHR (s-heuer) wrote :

Still occurs with 2.6.27-4 - currently it takes 2 or 3 tries to boot succesfully.

Revision history for this message
EdwardO (edwardooo) wrote :

Same for me on Dell XPS 1710 too after fresh install of alpha6 and update... Can confirm it happens loading the iwl3945 driver...

Revision history for this message
Mikael Nilsson (mini) wrote : Re: [regression] 2.6.27-4 fails to boot on Dell XPS M1710

Still happens on 2.6.26-4 (I'm the original reporter).

description: updated
Revision history for this message
Matthew Wardrop (mister.wardrop) wrote :

For me, it only sometimes shows the "setting system clock" item... And sometimes halts immediately after the ipw3945 output. Probably a race condition of sorts....

Kind Regards,
Matthew

Revision history for this message
Mikael Nilsson (mini) wrote : Re: [Bug 263059] Re: [regression] 2.6.27-4 fails to boot on Dell XPS M1710

On tor, 2008-10-02 at 09:57 +0000, Matthew Wardrop wrote:
> For me, it only sometimes shows the "setting system clock" item... And
> sometimes halts immediately after the ipw3945 output. Probably a race
> condition of sorts....

This is exactly my experience.

/Mikael

Revision history for this message
Yotam Benshalom (benshalom) wrote : Re: [regression] 2.6.27-4 fails to boot on Dell XPS M1710

I get this error too on 2.6.27-4, and it gets worse. Today I had to make 6 hard boot attempts before I could log in. Are there any news about a solution? Is there perhaps an alternative driver for Intel 3945?

Mikael Nilsson (mini)
description: updated
Revision history for this message
Yotam Benshalom (benshalom) wrote : Re: [regression] 2.6.27-4 fails to boot (iwl3945 issue?)

This issue remains with 2.6.27-5 kernel installed from the repository. I get anything between 1 to 6 hangs before a successful boot.

Revision history for this message
Jakob Petsovits (jpetso) wrote :

I get this error on an HP n6320 (iwl3945, too) and booting works if the firmware (iwlwifi-3945-1.ucode) is not present. Once I put it into /lib/firmware/2.6.27-4-generic/, I get lockups similar to those described above.

Revision history for this message
DSHR (s-heuer) wrote :

Problem is still there on Lenovo X60S with kernel 2.6.27-5.

4 good boots - 2 hangs.

I am going to check the iwlwifi-3945-ucode-15.28.1.6 firmware ...

Revision history for this message
Frederic PO (fredericp) wrote :

>> For me, it only sometimes shows the "setting system clock" item... And
>> sometimes halts immediately after the ipw3945 output. Probably a race
>> condition of sorts....
>
> This is exactly my experience.

Same here with Asus A8JS 2.6.27-4-generic.
I'm attaching dmesg output after a successful boot.
Hope it helps.

Revision history for this message
bimmerd00d (brandon-holloway) wrote :

Holding a key while booting seems to bypass this issue on my dell latitude d820 with the intel 3945abg card. It fails on setting the system clock every time if i dont hold a key.

Revision history for this message
Oliver (lobohacks) wrote :

Hi,
same here on the laptop of my father, samsung r55.
boot hangs at various points.
my first thought was, that it is related to the nvidia module.
Removing it reduced the number of fail boots, but did not fix it.

Can anyone confirm that removing the nvidia-module reduces the number of fail boots?

please fix this one.

regards oliver

Revision history for this message
Mikael Nilsson (mini) wrote : Re: [regression] 2.6.27-5 sometimes fails to boot (iwl3945 issue?)

As noted, I (original reporter) still experience this on 2.6.27-5.

description: updated
Revision history for this message
Lex Berger (lexberger) wrote :

Confirming for linux 2.6.27-5 on a Samsung R65

I'm getting the same output as Yotam reporting at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263059/comments/19

Revision history for this message
Henry Gomersall (hgomersall) wrote :

I can confirm the same bug on a Dell Inspiron 9400 with Intel Corporation PRO/Wireless 3945ABG Network Connection [8086:4222] (rev 02) (from lspci).

Systems hangs at
iwl3945: Detected Wireless WiFi Link 3945ABG
iwl3945: Tunable channels: 13 802.11bg, 23 80.11a channels
<hang>

Loading using recovery mode is a little more reliable.

Keypress workaround may work - indicative of a race hazard?

Revision history for this message
Francisco T. (leviatan1) wrote :

>Linux portatil 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686 GNU/Linux
It still fails.

Revision history for this message
Yotam Benshalom (benshalom) wrote :

Still fails for me too with 2.6.27-6.

Revision history for this message
johnn1949 (johnn1949) wrote :

I haven't figured out how to do this correctly but my Bug #277901 seems to be a duplicate of this one.

Revision history for this message
Ryan Davies (iownsu) wrote :

I'm getting the same issue, However this computer doesn't have wireless, So i cant disable that to try and boot.

All previous kernel's boot except the "Last known configuration"

This is a Compaq M2000
cpu model name : Mobile AMD Sempron(tm) Processor 2800+
cpu MHz : 1591.816

Revision history for this message
Mikael Nilsson (mini) wrote : Re: [regression] 2.6.27-6 sometimes fails to boot (iwl3945 issue?)

Confirm failure on 2.6.27-6 as well.

description: updated
Mikael Nilsson (mini)
description: updated
Changed in intellinuxwireless:
status: Unknown → Confirmed
Mikael Nilsson (mini)
description: updated
Steve Langasek (vorlon)
Changed in linux:
milestone: none → ubuntu-8.10
Matt Zimmerman (mdz)
Changed in linux:
status: Confirmed → In Progress
Tim Gardner (timg-tpi)
Changed in linux:
assignee: ubuntu-kernel-team → timg-tpi
Matt Zimmerman (mdz)
description: updated
Matt Zimmerman (mdz)
description: updated
111 comments hidden view all 191 comments
Revision history for this message
Tim Gardner (timg-tpi) wrote :

I tried some kernel debug features (without any change in behavior):

CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_LOCKDEP=y
CONFIG_PROVE_LOCKING=y
CONFIG_TRACE_IRQFLAGS=y

Revision history for this message
Hernando Torque (htorque) wrote :

The only thing we know: a five (probably less) seconds delay stops the hangs.

So I removed everything [up to "Loading hardware drivers..."] before the module gets loaded during a successful delayed boot by either blacklisting it, removing the driver, disabling it in the kernel, and adding "acpi=off" to the kernel line.

Result: I still get hangs, now looking like this: http://img.xrmb2.net/images/874633.png

I'm outta ideas.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

The i3945 part of this bug is a red herring. It also happens with an ipw2200 on a Thinkpad T42 (bug #284406). I'm wondering if the problem is hardware related. Can anyone confirm this hang using a 64 bit kernel?

Revision history for this message
Luke12 (luca-venturini) wrote : Re: [Bug 263059] Re: [regression] 2.6.27-7 sometimes fails to boot (iwl3945 issue?)

Again sorry for a "me not" post; using a 64 bit kernel here I have no
problems. Cannot say for a 32 bit kernel though. You can find my lspci
output in earlier comments.

Il giorno ven, 17/10/2008 alle 13.23 +0000, Tim Gardner ha scritto:
> The i3945 part of this bug is a red herring. It also happens with an
> ipw2200 on a Thinkpad T42 (bug #284406). I'm wondering if the problem is
> hardware related. Can anyone confirm this hang using a 64 bit kernel?
>

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I've created an extremely minimal test case that replicates this bug.

First I compiled a custom kernel, this had almost all core parts compiled in and only true "drivers" as modules. I attach the config here.

Notably this only leaves iwl3945 and tg3 as modules for me.

The hang still occurred at udevadm trigger time - proving that it wasn't anything core being raced, just ordinary PCI drivers

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Next I eliminated userspace from the problem. I commented out the "start on startup" line from /etc/event.d/rcS and instead added the attached "sysinit" job.

This performs the absolute minimum necessary to get udev running, and sets off the trigger.

I still had the hang, so it's not a race with anything like dbus, HAL, NM, X, etc.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

To make sure it wasn't any udev background processing, I cut out all the non-essential udev rules.

I was left with just this:

# ls /etc/udev/rules.d
20-names.rules
40-basic-permissions.rules
40-permissions.rules
80-programs.rules
85-ifupdown.rules
90-modprobe.rules

NOTE that I explicitly removed the network device renaming rules -- the hang still occurred, so this is not a problem with device renaming.

It must simply be a module loading issue.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I blacklisted iwl3945, and the boot goes normally with no hang (replicated about 50 times)

So I tried the following at the shell:

# while true; do modprobe iwl3945; sleep 0.1; modprobe -r iwl3945; sleep 0.1; done

Unfortunately this did not cause a hang, even after hundreds of iterations.

The only difference between that and what udev is doing is that udev may load modules in parallel.

So I blacklisted tg3 as well, then tried the attached shell script.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Not only does this hang, frequently; I also saw the following *wonderful* error message a few times:

Uhhuh. NMI received for unknown reason b1 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue.

Substituting tg3 for another driver (ThinkPad users have e1000 anyway) seems to still produce the hang - I had the pcmcia socket as a module and used that instead, that caused the hang.

Repeatedly loading tg3 and the pcmcia socket together does _not_ hang.

My hypothesis is that the iwl family of drivers may leave the PCI bus in an invalid state, so when combined with another driver load, can cause a hang or at least leaving the kernel severely unhappy.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Note that I am able to reproduce the hang with the kill switch both on and off, but it is far more common with the kill switch on (device disabled)

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 263059] Re: [regression] 2.6.27-7 sometimes fails to boot (iwl3945 issue?)

On Fri, Oct 17, 2008 at 04:00:42PM -0000, Scott James Remnant wrote:
> Note that I am able to reproduce the hang with the kill switch both on
> and off, but it is far more common with the kill switch on (device
> disabled)

That's consistent with my testing on David's machine: the frequency went up
when we turned the kill switch on.

--
 - mdz

Revision history for this message
taiebot65 (taiebot65) wrote :

Hello i don't know if it is related to this bug but now i hang on boot and shutdown for more than 15 second at each boot.
My wifi load or not load and when it load my connection is so weak i can not do anything. I ve got when i m connecting to the wire more than 700kb/s and with my wifi connected less than 50kb/s.

taiebot@home:~$ lsusb
Bus 005 Device 002: ID 0bda:8187 Realtek Semiconductor Corp. RTL8187 Wireless Adapter
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Revision history for this message
Matt Zimmerman (mdz) wrote :

On Sat, Oct 18, 2008 at 04:59:01PM -0000, taiebot65 wrote:
> Hello i don't know if it is related to this bug but now i hang on boot and shutdown for more than 15 second at each boot.
> My wifi load or not load and when it load my connection is so weak i can not do anything. I ve got when i m connecting to the wire more than 700kb/s and with my wifi connected less than 50kb/s.

Your problem is not related to this bug report.

--
 - mdz

Revision history for this message
xinit (ubuntu-evenflow) wrote :

I'm experiencing the same thing on an HP Compac nc6320. Haven't tried out any workarounds yet, but at boot, I have about 20% success and booting nicely.

lscpi:

00:00.0 Host bridge [0600]: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub [8086:27a0] (rev 03)
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)
00:02.1 Display controller [0380]: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller [8086:27a6] (rev 03)
00:1b.0 Audio device [0403]: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller [8086:27d8] (rev 01)
00:1c.0 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 [8086:27d0] (rev 01)
00:1c.2 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 [8086:27d4] (rev 01)
00:1c.3 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 [8086:27d6] (rev 01)
00:1d.0 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 [8086:27c8] (rev 01)
00:1d.1 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 [8086:27c9] (rev 01)
00:1d.2 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 [8086:27ca] (rev 01)
00:1d.3 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 [8086:27cb] (rev 01)
00:1d.7 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller [8086:27cc] (rev 01)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev e1)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge [8086:27b9] (rev 01)
00:1f.1 IDE interface [0101]: Intel Corporation 82801G (ICH7 Family) IDE Controller [8086:27df] (rev 01)
00:1f.2 SATA controller [0106]: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller [8086:27c5] (rev 01)
02:06.0 CardBus bridge [0607]: Texas Instruments PCIxx12 Cardbus Controller [104c:8039]
02:06.1 FireWire (IEEE 1394) [0c00]: Texas Instruments PCIxx12 OHCI Compliant IEEE 1394 Host Controller [104c:803a]
02:06.2 Mass storage controller [0180]: Texas Instruments 5-in-1 Multimedia Card Reader (SD/MMC/MS/MS PRO/xD) [104c:803b]
02:06.3 SD Host controller [0805]: Texas Instruments PCIxx12 SDA Standard Compliant SD Host Controller [104c:803c]
02:06.4 Communication controller [0780]: Texas Instruments PCIxx12 GemCore based SmartCard controller [104c:803d]
02:0e.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM5788 Gigabit Ethernet [14e4:169c] (rev 03)
08:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection [8086:4222] (rev 02)

Revision history for this message
travtek (bddunham) wrote :

I was having the same problem on my lenovo Z61m laptop which also uses the iwl3945 driver. It would consistently fail on every other boot attempt. Since I ran the Update Manager last night, it hasn't failed to boot once. Maybe it is fixed now.

Revision history for this message
Andres Järv (andresjarv) wrote :

The 2.6.27 kernel has survived 2 cold boots here too. Previously that did never happen. I'll test some more.

Revision history for this message
xinit (ubuntu-evenflow) wrote :

Same here. 2 boots and no problems. Don't see any related fixes in the changelog though.

Revision history for this message
Hernando Torque (htorque) wrote :

15 boots without a hang. Fixed with 2.6.27-7.12?

Revision history for this message
Sander Jonkers (jonkers) wrote : Re: [Bug 263059] Re: [regression] 2.6.27-7 sometimes fails to boot (iwl3945 issue?)

What's "2.6.27-7.12"? Is that an Ubuntu kernel, or a Linux kernel?

I'm now running "2.6.27-7-generic #1 SMP Fri Oct 17 22:24:21 UTC 2008 i686
GNU/Linux" (without the .12), and there are no further updates available. Is
that the version that has fixed the bug for you? I can't tell right away
because I'm using the blacklict&modprobe workaround. Is it safe / worthwile
to remove that workaround?

Sander

On Mon, Oct 20, 2008 at 1:12 AM, Hernando Torque <email address hidden>wrote:

> 15 boots without a hang. Fixed with 2.6.27-7.12?
>
> --
> [regression] 2.6.27-7 sometimes fails to boot (iwl3945 issue?)
> https://bugs.launchpad.net/bugs/263059
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>

Revision history for this message
Alexey Balmashnov (a.balmashnov) wrote :

Sander, 2.6.27-7.12 is an actual version of the package see http://packages.ubuntu.com/intrepid/linux-image-2.6.27-7-generic or package description in your favorite package manager.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 263059] Re: [regression] 2.6.27-7 sometimes fails to boot (iwl3945 issue?)

On Mon, Oct 20, 2008 at 08:15:32AM -0000, Sander Jonkers wrote:
> What's "2.6.27-7.12"? Is that an Ubuntu kernel, or a Linux kernel?

cat /proc/version_signature

--
 - mdz

Revision history for this message
Matt Zimmerman (mdz) wrote :

On Sun, Oct 19, 2008 at 09:18:37PM -0000, xinit wrote:
> Same here. 2 boots and no problems. Don't see any related fixes in the
> changelog though.

Notably, this upload disabled the ftrace feature in the kernel. This is a
new feature in 2.6.27 which is suspected to have bugs related to loadable
modules. It may have been the culprit.

  * disable CONFIG_DYNAMIC_FTRACE due to possible memory corruption on
    module unload

--
 - mdz

Revision history for this message
Hernando Torque (htorque) wrote :

Another 15 boots without a hang. Tried to find a pattern in the syslogs but there is none. iwl3945 seems to usually get loaded earlier but not always.

I've suspected the ftrace change too and am currently building a kernel with this option enabled.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Could someone with the relevant hardware try Scott's minimal test case under 2.6.27-7.12 and see if it still breaks?

Revision history for this message
Hernando Torque (htorque) wrote :

Will try it later, for now I can just confirm that enabling CONFIG_DYNAMIC_FTRACE caused hangs again (didn't touch other config parts).

Revision history for this message
Loïc Minier (lool) wrote :

I booted the old and new kernels today. The old kernel would hang 3 times out of 3 when my wired cable wasn't plugged, and didn't hang 3 times out of 3 when it was plugged (probably a timing issue with e1000e when the cable is plugged).

The new kernel booted successfully 6 times out of 6 (half of these tries with network cable plugged).

I only tried the testcase with the new kernel, and it passes just fine at least 150 times (however I tried triggerring the bug with parallel loading in the past myself, with snd-hda-intel, and didn't succeed in getting the hang).

Revision history for this message
Martin Pitt (pitti) wrote :

Scott's test script runs successfully over 100 iterations with -7.12.

I successfully booted -7.12 with "quiet splash" two times and with "quiet" two times. With either of those, previous kernels almost always hung for me. Booting without "quiet" still works fine (just as in previous kernels).

So this is fixed for me, too, thanks to everyone!

Revision history for this message
jekle (jekle) wrote :

I had the same problem with >=2.6.27-3

Since some days (2.6.27.7) the problem seems to be fixed.

I have a Dell Precision M90 Notebook.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Marking this fixed by:

linux (2.6.27-7.12) intrepid; urgency=low
[...]
  * disable CONFIG_DYNAMIC_FTRACE due to possible memory corruption on
    module unload

Changed in linux:
status: In Progress → Fix Released
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I'm up over 1,000 iterations with the new kernel. I am content that disabling ftrace has fixed the problem.

From the available information, and note that I'm *way* out of my comfort zone and area of expertise here, I would hypothesise that having the __init section of another module discarded while iwl3945 is initialising causes memory corruption due to the ftrace bug - and due to the complex initialisation of the PCI device thanks to the kill switch behaviour, this can result in a hang.

Revision history for this message
Ara Pulido (ara) wrote :

Tested in Sony VGN-SZ140P
This machine wasn't booting with the kernel with the bug. With the 2.6.27-7.12 kernel it boots correctly every time I tried (>20 times).

Revision history for this message
DSHR (s-heuer) wrote :

I removed all modprobe workarounds and system boots reliably now (lenovo thinkpad X60s).

Revision history for this message
Mikael Nilsson (mini) wrote :

-7.12 seems to work reliably for me as well. I'm the original reporter.

Changed in intellinuxwireless:
status: Confirmed → In Progress
Revision history for this message
Andrew Lentvorski (bsder) wrote :

Just wanted to confirm that this works on a Dell Inspiron Mini 9 with a Dell pulled Intel 3945ABG card. Before, my system would quite reliably hang. Now, it works fine.

Thanks for all the hard work fixing this.

Revision history for this message
Andrew Lentvorski (bsder) wrote :

Spoke too soo, my Mini 9 is hanging again.

Even worse, when it hangs the system gets *HOT*. Too hot to actually hold. And it only took about 60 seconds.

Not good.

Revision history for this message
Andrew Lentvorski (bsder) wrote :

Spoke too soon, my Mini 9 is hanging again.

Even worse, when it hangs the system gets *HOT*. Too hot to actually hold. And it only took about 60 seconds.

Not good.

Revision history for this message
Andrew Lentvorski (bsder) wrote :

Spoke too soon, my Mini 9 with a Dell pulled Intel 3945ABG is hanging again.

Even worse, when it hangs the system gets *HOT*. Too hot to actually hold. And it only took about 60 seconds.

Not good.

Revision history for this message
mockdeep (rtfletch81) wrote :

My 8.10 installation will only boot up at home. Anywhere else it hangs at "Configuring network interfaces". It seems it will only boot up if it is in the presence of my default network connection.

Changed in intellinuxwireless:
status: In Progress → Confirmed
Changed in intellinuxwireless:
status: Confirmed → Invalid
Revision history for this message
Jason Tackaberry (tack) wrote :

Just upgraded to Intrepid and am hit with this extraordinarily irritating problem. 2.6.27-9.19 has CONFIG_HAVE_DYNAMIC_FTRACE=y. I assume at some point between 7.12 and 9.19 it was reenabled?

Revision history for this message
Hernando Torque (htorque) wrote :

It was CONFIG_DYNAMIC_FTRACE that got (and still is) disabled. My system's working fine - are you sure the bug described here is what happens to you (look at the load of linked pictures)?

tags: added: iso-testing
Changed in intellinuxwireless:
importance: Unknown → Critical
Displaying first 40 and last 40 comments. View all 191 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.