kernel panic - kernel stack corrupted

Bug #592745 reported by Matt Price
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

intermittent kernel panic that can be reliably triggered by running powertop on my lenovo thinkpad t410. syslog gives this error before crash:

 kernel panic - not syncing: stack protector: kernel stack is corrupted in ffffffff814760bd

this is my third time trying to report this bug in launchpad so i'll stop here! would love some help, please let me know what i can do to help with the diagnosis.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-22-generic 2.6.32-22.36
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: HDA Generic [HDA Generic]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: matt 1677 F.... pulseaudio
                      matt 1760 F.... amarok
 /dev/snd/pcmC0D0p: matt 1677 F...m pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf2620000 irq 17'
   Mixer name : 'Intel G45 DEVIBX'
   Components : 'HDA:14f15069,17aa214c,00100302 HDA:80862804,17aa21b5,00100000'
   Controls : 10
   Simple ctrls : 5
Card29.Amixer.info:
 Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 6IHT29WW-1.03'
   Mixer name : 'ThinkPad EC 6IHT29WW-1.03'
   Components : ''
   Controls : 1
   Simple ctrls : 1
Card29.Amixer.values:
 Simple mixer control 'Console',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Fri Jun 11 12:37:39 2010
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=8bc30640-587d-41e8-a9bd-8833b3836ece
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Beta amd64 (20100318)
Lsusb:
 Bus 002 Device 002: ID 8087:0020
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 17ef:480f Lenovo
 Bus 001 Device 002: ID 8087:0020
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 2516CTO
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-22-generic root=UUID=08517b2d-d32a-45d4-b574-a6d38f0dde9c ro crashkernel=384M-2G:64M,2G-:128M quiet splash
ProcEnviron:
 LANG=en_CA.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34.1
RfKill:

SourcePackage: linux
dmi.bios.date: 01/07/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6IET49WW (1.09 )
dmi.board.name: 2516CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6IET49WW(1.09):bd01/07/2010:svnLENOVO:pn2516CTO:pvrThinkPadT410:rvnLENOVO:rn2516CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 2516CTO
dmi.product.version: ThinkPad T410
dmi.sys.vendor: LENOVO

Revision history for this message
Matt Price (matt-price) wrote :
Revision history for this message
Matt Price (matt-price) wrote : Re: [Bug 592745] [NEW] kernel panic - kernel stack corrupted

should probably have said in my original report:

powertop touches USB, acpi, and a bunch of other systems; maybe it
would be good to involve some of the packagers to see whether they can
predict what's likely to cause this problem; but I don't see how to
add an "also affects powertop" line in the bug. --matt

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Matt,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Matt Price (matt-price)
tags: removed: needs-upstream-testing
Revision history for this message
Matt Price (matt-price) wrote :

Thanks Jeremy. I tested against both linux-image-2.6.35-999-generic_2.6.35-999.201006111153_amd64 and linux-image-2.6.32-02063209-generic_2.6.32-02063209_amd64. The bug is gone on both!!

Now I have a couple of questions:
- will the 2.6.32 mainline kernel eventually make it into lucid-proposed?
- if not, how can I help to locate the moment when the fix was introduced so you can port that fix into lucid?

Attached is also a config diff between the crashy current kernel and the mainline kernel. As far as I can tell the only module present in the current lucid kernel that is likely to have an impact is:
> CONFIG_THINKPAD_ACPI_ALSA_SUPPORT=y

would it be helpful for me to build a kernel without thinkpad alsa support (a pointer to the recommended build method would be very helpful -- things have changed a lot since I last built an ubuntu kernel)? And would i likely lose the ability to use the onboard soundcard on my laptop, which is a thinkpad?

Thanks for your help,
Matt

Revision history for this message
Matt Price (matt-price) wrote :
Revision history for this message
ZachG (zgold550) wrote :

I am using an Asus 1201T running Ubuntu Lucid. I experienced the exact same symptons as Matt (corrupted kernel stack when running powertop as well as sporadic other crashes). Running a stock kernel also fixed my issue!

Now, if only I could get wifi working again with the stock kernel :)

Revision history for this message
Matt Price (matt-price) wrote : Re: [Bug 592745] Re: kernel panic - kernel stack corrupted

On Thu, 2010-06-17 at 16:15 +0000, ZachG wrote:
> I am using an Asus 1201T running Ubuntu Lucid. I experienced the exact
> same symptons as Matt (corrupted kernel stack when running powertop as
> well as sporadic other crashes). Running a stock kernel also fixed my
> issue!
>
> Now, if only I could get wifi working again with the stock kernel :)
>

that's my issue as well. i'm wondering whether in fact the ubuntu wifi
stack is the source of my problems -- that and the audio drivers are the
two main config changes that seem like they might affect me. can you
check the output of lspci and see if your wifi or sound card are the
same as mine (not on my laptop right now so can't check). thanks!!

Revision history for this message
ZachG (zgold550) wrote :

Sadly, the issue happened again today using the generic kernel :(. Will try another generic kernel...
(I got wifi to work again by re-install the rl8192se_dkms package)

lspci output:
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: ASUSTeK Computer Inc. RS880 PCI to PCI bridge (int gfx)
00:04.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 0)
00:05.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 1)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:05.0 VGA compatible controller: ATI Technologies Inc RS780M/RS780MN [Radeon HD 3200 Graphics]
02:00.0 Network controller: Realtek Semiconductor Co., Ltd. Device 8171 (rev 10)
03:00.0 Ethernet controller: Atheros Communications Atheros AR8132 / L1c Gigabit Ethernet Adapter (rev c0)

Revision history for this message
ZachG (zgold550) wrote :

OK, new datapoint. With a new generic kernel and no wifi module built it was stable with powertop. The second I built and modprobed the module the machine would crash on powertop.

Going to look for a new version of the wifi module now.

Revision history for this message
Matt Price (matt-price) wrote :

At Fri, 18 Jun 2010 23:20:36 -0000,
ZachG wrote:
>
> OK, new datapoint. With a new generic kernel and no wifi module built
> it was stable with powertop. The second I built and modprobed the
> module the machine would crash on powertop.
>
> Going to look for a new version of the wifi module now.

sounds like this is a duplicate of lp:585938. does that sound right
to you, jeremy?

Revision history for this message
Matt Price (matt-price) wrote :

At Sat, 19 Jun 2010 01:28:58 -0000,
matt price wrote:
>
> At Fri, 18 Jun 2010 23:20:36 -0000,
> ZachG wrote:
> >
> > OK, new datapoint. With a new generic kernel and no wifi module built
> > it was stable with powertop. The second I built and modprobed the
> > module the machine would crash on powertop.
> >
> > Going to look for a new version of the wifi module now.

confirmed on my machine that the issue is with r8192se by
modprobe -r r8192se_pci
after this powertop fails to trigger the kernel panic.
reloading the module gives the old, crashy behaviour.

tried with the new driver version and the problem is still there! at
least I think so -- it's hard form e to remove all traces of my old
dkms build system. goign to try that again now. I
think an upstream bug report is in order but I can't figure out where
to do that, if anyone has any clues i'd appreciate it. Also probably
if the powertop people could tell us what powertop is doing, it might
be possible to stop the system from spontaneously probing (I imagine
it has something to do with power management) and causing the crash
unexpectedly.
matt

Revision history for this message
ZachG (zgold550) wrote :

I've also tried the newer version (v17) which still seems to crash. I saw somebody using ndiswrapper somewhere, I havn't tried that yet. Do you think theres any chance of making progress on this guy or should I relegate this netbook to being wireless-less for quiet a while ;(

What's funny is that when I first got this machine (~2months ago) it worked fine on wifi. There must have been some update lately which introduced the kernel call that triggers the driver bug (the same thing powertop seems to trigger) or maybe there was an updated kernel introduced which causes bad behavior.

Any ideas on how to get older lucid kernels to try (like, the one from launch day)?

I may also try fedora 13, just to see if maybe theres something ubuntu specific that breaks things.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Matt,
    the kernel should make it in through the next update and release of the kernel.

~JFo

Changed in linux (Ubuntu):
status: Incomplete → Triaged
tags: added: kernel-net
Revision history for this message
ZachG (zgold550) wrote :

Jeremy,

Do I understand right that there is a kernel bug associated with this issue which is to be included in the next update? Is there a patch someplace that I can download and compile myself, or a precompiled one with the patch anywhere?

Revision history for this message
Matt Price (matt-price) wrote :

On Wed, 2010-06-23 at 17:03 +0000, Jeremy Foshee wrote:
> Matt,
> the kernel should make it in through the next update and release of the kernel.
Jeremy,

do you mean the _fix_ should make it in?? that would be awesome.
pointers to changelog entries//kernel-team discussions/whatever else
would be much appreciated, too. thanks again! I really hope that
works.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Matt,
    You asked me when the .32 mainline you tested would be available. There should be an inclusion of all patches contained in that kernel in the next released .32 kernel for Ubuntu.

As to fixes for this issue being included, I would need to have an upstream commit for a fix before that could be calculated. Do you know of an upstream bug on this issue that has a fix included in the current stable?

~JFo

Revision history for this message
Matt Price (matt-price) wrote :

hi jeremy,
my mistake, failed to read back in my earlier comments. The bug is
*not* in fact gone in the mainline kernels -- it appeared to be gone
only because the rtl8192se modules are missing from mainline. Once
loaded rtl8192se_pci will continue to cause the crash. I doubt there's
an upstream fix -- realtek has no public bug tracker that i can find &
their most recent driver does not fix the issue. It'd be great if
someone could identify what call is made to the driver to cause it to
crash, and then stop the call from being made, but I can imagine that's
unlikely to hapen (& i hve no idea how to go about that myself). thanks
again, m

Revision history for this message
Matt Price (matt-price) wrote :

i had another crash yesterday -- but it may be because i had failed to
remove the old driver. is anyone else still getting sporadic crashes?

Revision history for this message
ZachG (zgold550) wrote :

I am as well, even with a fixed driver. Mine happens most often when I close my lid, either it has to do with the screensaver or screen blanking im not sure. Sometimes its a lot more often than others. Certainly much better than before the driver fix. No way to consistently reproduce yet.

Revision history for this message
Matt Price (matt-price) wrote :

  On 10-07-21 02:08 PM, ZachG wrote:
> I am as well, even with a fixed driver. Mine happens most often when I
> close my lid, either it has to do with the screensaver or screen
> blanking im not sure. Sometimes its a lot more often than others.
> Certainly much better than before the driver fix. No way to
> consistently reproduce yet.
>
Anyone seeing a complete fix? Just lost half a day's work and unless
there's some solution I don't understand I am ready to open my laptop up
and replace my card with a new one. thanks much,
matt

Revision history for this message
ZachG (zgold550) wrote :

I've been using the iwpriv hack (convert it to a noop) and the updated module and have, knock on wood, not had any issues in a long time.

Revision history for this message
Matt Price (matt-price) wrote :

  On 10-08-17 10:10 PM, ZachG wrote:
> I've been using the iwpriv hack (convert it to a noop) and the updated
> module and have, knock on wood, not had any issues in a long time.
>
Hmm. Realized the dkms module hadn't compiled for latest kernel; and
converted iwpriv to a noop. I'll give it a couple of days before
dismantling my computer... Thanks, m

Revision history for this message
Bob Blanchard (blabj) wrote :
Download full text (4.4 KiB)

Received same kernel panic - so attaching to this bug report.

[270989.199500] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: f83ccee0

This on an IBM HS22 blade server with dual quad-core Intel Xeon 5530 with 12GB running Lucid 2.6.32-24-generic-pae. Its configured as an application server running Nomachine's NX server providing gnome sessions to up to 30 users.

In this instance, it has nothing to do with wireless (no wireless devices), nor powertop. Nothing "known" triggered this panic.. system was operational for a few days before this panic, with several users running gnome sessions.

Here is lspci:

00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 13)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:10.0 PIC: Intel Corporation 5520/5500/X58 Physical and Link Layer Registers Port 0 (rev 13)
00:10.1 PIC: Intel Corporation 5520/5500/X58 Routing and Protocol Layer Registers Port 0 (rev 13)
00:11.0 PIC: Intel Corporation 5520/5500 Physical and Link Layer Registers Port 1 (rev 13)
00:11.1 PIC: Intel Corporation 5520/5500 Routing & Protocol Layer Register Port 1 (rev 13)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 13)
00:15.0 PIC: Intel Corporation 5520/5500/X58 Trusted Execution Technology Registers (rev 13)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.4 PC...

Read more...

Revision history for this message
Bob Blanchard (blabj) wrote :

Well didn't have to wait long.. basically one day later - another kernel panic:

[ 1769.714273] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: f825cee0

tags: removed: regression-potential
Revision history for this message
gcc (chris+ubuntu-qwirx) wrote :

I can confirm that:

* powertop crashes my system every time, within about five seconds of starting, just when it starts displaying the first page of info;

* the crash only occurs if the r8192se_pci driver is loaded; the machine seems stable if I rmmod that driver;

* it doesn't seem to be fixed in 2.6.32.59+drm33.24, which is the latest kernel in git for lucid, and presumably the next kernel that will go into lucid-updates.

The panic backtrace also seems to implicate wireless (iwpriv specifically):

Aug 7 20:53:42 classmate kernel: [ 254.849157] r8191se_wx_get_firm_version(): Just Support 92SE tmp
Aug 7 20:53:42 classmate kernel: [ 254.849413] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: c04c6410
Aug 7 20:53:42 classmate kernel: [ 254.849420]
Aug 7 20:53:42 classmate kernel: [ 254.849810] Pid: 2289, comm: iwpriv Tainted: G A 2.6.32.59+drm33.24-cw-custom-dsdt-120731-1 #1
Aug 7 20:53:42 classmate kernel: [ 254.850120] Call Trace:
Aug 7 20:53:42 classmate kernel: [ 254.855314] [<c05873d6>] ? printk+0x1d/0x1f
Aug 7 20:53:42 classmate kernel: [ 254.861481] [<c0587311>] panic+0x47/0xef
Aug 7 20:53:42 classmate kernel: [ 254.866459] [<c014a3ce>] __stack_chk_fail+0x1e/0x20
Aug 7 20:53:42 classmate kernel: [ 254.871734] [<c04c6410>] ? dev_ioctl+0x510/0x520
Aug 7 20:53:42 classmate kernel: [ 254.876686] [<c04c6410>] dev_ioctl+0x510/0x520

renaming iwpriv out of the way also seems to avoid the crash.

Running "iwpriv wlan0 adhoc_peer_list" also seems to provoke the crash, without running powertop. So does "iwpriv -a", which makes this appear to be a duplicate of #585938, as suggested on that bug. I'm currently building a new kernel with the patch linked from that bug (http://people.canonical.com/~apw/lp585938-lucid/).

Revision history for this message
penalvch (penalvch) wrote :

Matt Price, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily kernel folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12-rc2

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.