systematic freezes on any kernel version post 2.6.35-22

Bug #719446 reported by Alan Campbell
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Maverick
Fix Released
Undecided
Tim Gardner

Bug Description

On any version of kernel after 2.6.35-22, get freezes

  -- almost always after some tens of seconds
      when copying megabytes of data to hard disk using nautilus,
      whether source is same HD or flash memory

  -- sometimes when allowing an update involving numerous megabytes of
      download (I now seldom allow an update when booted into post-2.6.35-22;
      cleaning up mess after aborted install not worth the hassle)

  -- when launching VirtualBox (again, I avoid trying to run a VM when in post-2.6.35-22 kernel for
      fear of messing up VM file)

  -- occasionally when launching Firefox

I've ruled out an X-freeze; once system has frozen because cannot reach it via SSH over LAN using puTTY

memtest++ run from boot runs for hours without problems

Tried booting with options ,

   acpi=off noacpi

fails to boot

Treid booting with options

   nomodeset xdriver=radeon

boots okay but same problem occurs

Tried running with cache off (/proc/sys/vm/drop_caches non-zero), makes no difference

I now run on 2.6.35-22 all the time (running on latest version of kernel installed: 2.6.35-25-generic
for purposes of this bug report).

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-25-generic 2.6.35-25.44
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.35-25.44-generic 2.6.35.10
Uname: Linux 2.6.35-25-generic i686
NonfreeKernelModules: hsfengine
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: alan 1943 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xc0000000 irq 16'
   Mixer name : 'Realtek ALC883'
   Components : 'HDA:10ec0883,1025140d,00100002 HDA:14f12bfa,1025009f,00090000'
   Controls : 23
   Simple ctrls : 13
Date: Tue Feb 15 16:52:42 2011
HibernationDevice: RESUME=UUID=a9fac5c7-6482-426b-a090-4e9344595cdd
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release i386 (20101007)
Lsusb:
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 002: ID 0402:5602 ALi Corp. M5602 Video Camera Controller
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Acer Aspire 5100
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-25-generic root=UUID=9374e917-5dad-4f46-8e95-38c38802755c ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_GB.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.38.3
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
dmi.bios.date: 11/27/2006
dmi.bios.vendor: Acer
dmi.bios.version: V2.60
dmi.board.name: Navarro
dmi.board.vendor: Acer
dmi.board.version: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAcer:bvrV2.60:bd11/27/2006:svnAcer:pnAspire5100:pvrV2.60:rvnAcer:rnNavarro:rvrN/A:cvnAcer:ct10:cvrN/A:
dmi.product.name: Aspire 5100
dmi.product.version: V2.60
dmi.sys.vendor: Acer

Revision history for this message
Alan Campbell (entropyreduction) wrote :
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Alan,
  Have you tried doing a copy from a text virtual console (switch to it from ctrl-alt-f1 - you can switch back using ctrl-alt-f7 or f8)
It's possible you might get a spew of error messages as it hangs on the console.

Dave

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Alan Campbell (entropyreduction) wrote :

Neat. I'm linux newebie, never occurred to me to try this.

using 2.6.35.24 (??.. latest kernel)

in console: copied about 700MB from /dev/sda3 to same partition: works ok

same copy via nautilus seized up at about 500 Mb

In console after reboot: froze , then every minute spit out:

 at (e.g.) 284.544011

 BUG: soft lockup CPU#0 stuck for 61s [kswap0:26]

 process: kswap0

 pid: 26

 ti: f71fa000

 task: f7133f70

 task.ti f71fa000

 stack: (nada)

        calltrace: (nada)

        code: diferent each time, but can transcribe if you wish

(linux newbie question: I take it within console there's no way to copy or capture what comes up on screen, except pen and paper?)

Can run same test on kernel 2.6.35-23 if any use.

For what it's worth: I asked about this problem a couple of times on

ubuntuforums.org: beginners

http://ubuntuforums.org/showthread.php?p=10251738#post10251738

There was some feeling it might have to do with bad memory; but that left problem of why
bad memory didn't cause a freeze in 2.6.35-22 but did in later versions of kernel.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Alan,
  Thanks for that. Those errors shouldn't happen - so it's pretty likely a kernel bug.
There are a few ways to get those messages;
  1) If you're lucky they'll have been recorded in /var/log/kern.log or /var/log/kern.log.1 or /var/log/debug
  2) A digital camera picture of the message is OK if possible.
  3) If you're logged into a virtual console, say the one you get with ctrl-alt-f1 if you do cat /dev/vcs1 > myfile it'll record the text contents of the screen into myfile. You can also do the same to the other console; e.g. if you're in ctrl-alt-f1 then you can cat /dev/vcs2 > myfile to get the one from ctrl-alt-f2

Dave

Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Alan Campbell (entropyreduction) wrote :

Ok, ta. Let me know if I can gather more data and/or test anything.....

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Well, if you could get the exact text of that error it might help; but probably the best thing is if I give you the standard response asking you to try the new kernels being developed - if you can give it a go it would be good to know if they fix it or if they give any different/better diagnostics

------------------------

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/daily/current/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 719446

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

------------------

Revision history for this message
Alan Campbell (entropyreduction) wrote :

> Well, if you could get the exact text of that error it might help;

I think the text I sent was it, barring the code; but I'll try to get that too using one of methods you suggested.

Log files not available, so I'll try the other methods you suggest next week.

......<snip>.....

> Please be sure to confirm this issue exists with the latest
> development release of Ubuntu. ISO CD images are available from
> http://cdimage.ubuntu.com/daily/current/ .

Can I pick your brain on this? (But tell me to go away and google if that's best for you).

As said in previous posts I'm pretty new to ubunu/linux. I'm trying to beat laptop into something that can do most of the things I do on my windows machines, which means running win versions in VirtualBox VMs for for forseeable future. In my experience VirtualBox is very sensitive to which kernel version (and presumably release) of ubuntu I'm using, and I've already got a slew of other software installed on my current version that may or may not work with latest release.

So is best approach to install development release and kernel in a new partition, and multibooting? If I install development release I assume it will notice existing grub stuff and just add itself to existing grub menu?

......<snip>......

> Also, if you could test the latest upstream kernel available that
> would be great. It will allow additional upstream developers to
> examine the issue. Refer to
> https://wiki.ubuntu.com/KernelMainlineBuilds .

I guess I want current version, pointed to as

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ ?

I assume I apply that to existing Maverick install, not on? or also on? development release install?

Thanks for any help.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Alan,
  Installing development versions in another partition is a good bet - however being a development version any type of hideous error might occur and destroy the other installation - that's what you get with development.

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ would probably be the right thing to try for the upstream; if you install that you should still be able to boot back to the old one.

Dave

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Not very comfortable with trying latest development release, if there's a reasonable change it will screw up my existing installation: I don;t much fancy redoing everything I've done up til now. I don't yet have partition backup/mirroring set up, so I've got no way to easily undo any serious damage.

Latest upstream kernel sounds a bit safer? Any use to you if I install it on my current Maverick installation (as opposed to latest dev release) and test?

Revision history for this message
Alan Campbell (entropyreduction) wrote :

I installed
 2.6.38-999-generic

from

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/

and got ambiguous results.

I got a freeze when copying sets of files of 600M+ in Nautilus,
but only after a half dozen tries

I wasn't able to get a freeze copying similar amounts using c -r in a console
reached using <ctrl><alt<f1>, despite many tries.

Can't use do cat /dev/vcs1 > myfile in terminal in a soft lockup in 2.6.35.24; I'll get hold of a digital camera next week and record error messages that way.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Only way I could get a half-decent picture of <ctrl><alt>f1 terminal screen was in two halves This is left half, 2.6.35.25 locked up immediately on cp-r running for ten seconds or so.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Here's right side of <ctrl><alt>f1 terminal screen, 2.6.35.25 locked up immediately on cp-r running for ten seconds or so.
Left side attached to previous comment

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Newbie question: assuming 2.6.38-999-generic may contain fixes that elimnate most of freezes; when does it become a distributed release. Will it become 2.6.35.26? O is it not that simple?

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

I'm also having this problem and willing to do a git bisect to find the offending patch. Will get back with any results later.

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

(N.B, it happens for me with 2.6.35.27 as well, so I'm bisecting 2.6.35-22 and 2.6.35-25.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Know nada about git bisect (except what I just read.

I have problems from 2.6.35-22 onward, so it would be less work to do a git bisect from 2.6.35-22 to 2.6.35-23?

Revision history for this message
Alan Campbell (entropyreduction) wrote :

I meant, of course, if you did too....

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Ok, so this commit is related to this bug (after bisection):

58e15f5029c75b45adbcb25cd76a36d26a6c4297 is the first bad commit
commit 58e15f5029c75b45adbcb25cd76a36d26a6c4297
Author: Christoph Lameter <email address hidden>
Date: Thu Sep 9 16:38:17 2010 -0700

    mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake

    commit aa45484031ddee09b06350ab8528bfe5b2c76d1c upstream.

    Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
    cheaper than scanning a number of lists. To avoid synchronization
    overhead, counter deltas are maintained on a per-cpu basis and drained
    both periodically and when the delta is above a threshold. On large CPU
    systems, the difference between the estimated and real value of
    NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than
    number of real free page in buddy, the VM can allocate pages below min
    watermark, at worst reducing the real number of pages to zero. Even if
    the OOM killer kills some victim for freeing memory, it may not free
    memory if the exit path requires a new page resulting in livelock.

    This patch introduces a zone_page_state_snapshot() function (courtesy of
    Christoph) that takes a slightly more accurate view of an arbitrary vmstat
    counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid
    the watermark being accidentally broken. The estimate is not perfect and
    may result in cache line bounces but is expected to be lighter than the
    IPI calls necessary to continually drain the per-cpu counters while kswapd
    is awake.

    Signed-off-by: Christoph Lameter <email address hidden>
    Signed-off-by: Mel Gorman <email address hidden>
    Signed-off-by: Andrew Morton <email address hidden>
    Signed-off-by: Linus Torvalds <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

    (cherry picked from commit 87f1cbdee91c60af6dd255226e792a6410d77fbb 2.6.35.6)

    Signed-off-by: Leann Ogasawara <email address hidden>

:040000 040000 56b962edbe8815533041d28a1b4c744bf810c302 b93faeb21bff790a5e90f1370509a4903bb4fd9a M include
:040000 040000 a3d6ce0e31b4552a22ef291e0a66441b78744ac2 4664c82b2f1aa2452751144455f9f133cbf71270 M mm

It *MIGHT* be also connected to the previous commit (7dd373d47c6e0dc124924348265be61990ba0fb6) as I observed some short (0.5 second) freezes while testing it, but no errors.

I'll see if I can revert it in HEAD and whether it helps - will post soon.

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Also, it seems that the main factor in replicability of this bug is a single-core processor running a SMP kernel (which is stock in Ubuntu) - any chance someone more knowledgeable in kernel development can have a look at this ?

N.B. Alan, yes, I could have started with 2.6.35-23 - this would have saved me ~1 compilation and reboot, probably.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

I'm impressed. Sounds a time-consuming business.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Pawel,
  Could you file a separate bug for your problem unless Alan can verify that the same works for him - it's always difficult to know if two peoples random crashes have the same underlying cause.

Dave

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Dave - if Alan verifies this with the HEAD of maverick kernel repository with this patch reverted, would that be a good-enough proof, or should he verify it with the HEAD of mainline kernel with this patch reverted?

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Pawel,
  Well I'm not on the kernel team so I'm not going to say for sure; I just know the normal bug policy for the kernel is to keep them separate. However, if you have a single patch that fixes it relative to the same starting point for both of you that does sound pretty conclusive to me.

Dave

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Alan - Are you able and willing to build your own kernel? Risks are more or less the same as for installing a development kernel, but it takes a bit of time to compile. If you don't want to build a kernel, but are willing to trust me enough to test one I built, I'll post a generic build here later.

If you can't reproduce the bug with the kernel made with these steps, it means we suffer from the same thing.

Maverick git repository:

If so, can you please follow the instructions here for installing the tools: https://help.ubuntu.com/community/Kernel/Compile#Tools you'll need
(steps are the same as for Lucid).

then, in the terminal, make a directory and download repository from git (warning: it's few hundred megabytes), e.g.
mkdir kernel-custom
cd kernel-custom
git clone git://kernel.ubuntu.com/ubuntu/ubuntu-maverick.git
cd ubuntu-maverick
git checkout -b bug-testing master
git revert 58e15f5029c75b45adbcb25cd76a36d26a6c4297
<<press ctrl+x>>
fakeroot debian/rules clean
AUTOBUILD=1 NOEXTRAS=1 fakeroot debian/rules binary-generic

This should build for you two .deb packages for linux-image (which you need to install) and linux-headers (which isn't necessary for this testing) with version number of 2.6.35-28.49 and place them in your kernel-custom directory. After installation, you need to reboot and select the new kernel. You can uninstall it in a standard way (synaptic, dpkg - choose your favourite) after you've done testing.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Yeah, I can do that. Not instantly: sometime next week probably, maybe about Wednesday or Thursday.
Maybe longer, in that to be completely sure I "can't reproduce the bug with the kernel" can take time. But happy to have a go.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Before I go any further:

sudo apt-get install fakeroot build-essential crash kexec-tools makedumpfile kernel-wedge

generated following warnings

update-rc.d: warning: kdump start runlevel arguments (2) do not match LSB Default-Start values (0 1 2 3 4 5)

update-rc.d: warning: kdump stop runlevel arguments (none) do not match LSB Default-Stop values (6)

sudo apt-get build-dep linux

generated following :

E: Could not open file /var/lib/apt/lists/download.virtualbox.org_virtualbox_debian_dists_maverick_non-free_source_Sources - open (2: No such file or directory

sudo apt-get install git-core libncurses5 libncurses5-dev libelf-dev asciidoc binutils-dev
completed with no error messages

Okay to proceed?

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Looks like the effects of apt being interrupted while it was doing something before. It shouldn't matter for kernel compilation, but you may want to uninstall and reinstall VirtualBox at some point (which should fix the error). You do need the build-dep linux though, so if the error occured before the installation finished, then there's a (very small) chance that you can get some really weird errors when you attempt to compile the kernel if you don't fix it.

The first two warnings from update-rc.d don't really mean anything bad, it apparently is a configuration bug in Ubuntu ( https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/569980 ) - kdump is used to create kernel crashdumps and all this is saying is that they will only be created during 'normal operation' (runlevel 2) rather than also during startup, shutdown, etc (which have different runlevels).

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Oh, and VirtualBox won't be enabled in kernel that you compile - you would need to create and install kernel headers for your git source for that; you can do this by running:

fakeroot debian/rules binary-indep

after building the kernel, which would give you linux-headers(...) package which you need to install so DKMS is able to auto-compile VirtualBox kernel module.

Revision history for this message
Alan Campbell (entropyreduction) wrote :
Download full text (4.5 KiB)

I'll just do without VirtualBox when testing custom kernel. I really don't want to mess around with VirtualBox; I've had to rebuild me VMs several times already, and it VirtualBox uninstall/install goes wrong I'd be doing it again.

Cant tell whether build-dep linux finished.

==================================

git revert 58e15f5029c75b45adbcb25cd76a36d26a6c4297

produced:

Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.

I assume incorrect email address of no significance?

There didn't seem to be any call for <ctrl X>; did I miss something?

during
fakeroot debian/rules clean
get

cd /home/alan/Data/Downloads/Ubuntu/ToInstall/kernel-custom/ubuntu-maverick/debian/build && kernel-wedge gen-control > /home/alan/Data/Downloads/Ubuntu/ToInstall/kernel-custom/ubuntu-maverick/debian/control
Use of uninitialized value $builddep in split at /usr/share/kernel-wedge/commands/gen-control line 32, <KVERS> line 9.
Use of uninitialized value $builddep in split at /usr/share/kernel-wedge/commands/gen-control line 32, <KVERS> line 10.

during
AUTOBUILD=1 NOEXTRAS=1 fakeroot debian/rules binary-generic
get

  LD arch/x86/built-in.o

WARNING: arch/x86/built-in.o(.data+0x2eb8): Section mismatch in reference from the variable powernow_driver to the function .init.text:powernow_cpu_init()
The variable powernow_driver references
the function __init powernow_cpu_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: arch/x86/built-in.o(.data+0x2f78): Section mismatch in reference from the variable longrun_driver to the function .init.text:longrun_cpu_init()
The variable longrun_driver references
the function __init longrun_cpu_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

and more like it.

=====================

  LD arch/x86/kernel/built-in.o

WARNING: arch/x86/kernel/built-in.o(.data+0x2eb8): Section mismatch in reference from the variable powernow_driver to the function .init.text:powernow_cpu_init()
The variable powernow_driver references
the function __init powernow_cpu_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: arch/x86/kernel/built-in.o(.data+0x2f18): Section mismatch in reference from the variable longhaul_driver to the function .init.text:longhaul_cpu_init()
The variable longhaul_driver references
the function __init longhaul_cpu_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: arch/x86/kernel/built-in.o(.data+0x2f78): Section mismatch in reference from the variable longrun_driver to the function .init.text:longrun_cpu_init()
The variable longr...

Read more...

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Kernel compilation produces a lot of warnings and it's ok, as long as it gives you a 'final product' in the form of .deb packages. Errors (as opposed to warnings) will stop the compilation and not produce a .deb package.

To remove packages installed via build dep, you can use this helpful tip:
http://www.webupd8.org/2010/10/undo-apt-get-build-dep-remove-build.html

Revision history for this message
Alan Campbell (entropyreduction) wrote :

I installed

linux-image-2.6.35-28-generic_2.6.35-28.49_i386.deb

by double clicking; it opened in Ubuntu Software Centre
,

Seemed to install successfully.

Then double clicked on

linux-headers-2.6.35-28-generic_2.6.35-28.49_i386.deb

and got "Dependency is not satisfiable: linux-headers-2.6.35-28"

Tried synaptic package manager, navigated to local folder where deb files lived;

could not open them; they were greyed out
.

Set permissions n deb files so I could execute them; same
.

Stumped.

How difficult would me dong my own git bisect be? Though if I don't have enough knowledge to get this update installed, maybe a git bisect is a bit ambitious.

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

you don't need to install linux-headers-2.6.35-28-generic_2.6.35-28.49_i386.deb (it's only if you want to have programs dependend on DKMS working with that particular kernel, like VirtualBox).

If you DO want to install the headers, start terminal and go to the folder where your sources sit (if you followed my tutorial exactly:

cd kernel-custom
cd ubuntu-maverick
fakeroot debian/rules binary-indep

which would generate, among other files, linux-headers-2.6.35-28 .

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

linux-image is your kernel, which should now appear when you start your computer.

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

(sorry for flooding your mailboxes)
Here's a tutorial how to bisect the kernel:
https://wiki.ubuntu.com/Kernel/KernelBisection
should you really want to do it. It is a bit time-consuming.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Ok, I'll try kernel next week. Thanks for all feedback.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Hi Pawel,

Been running your custom 2.6.35-28 for a few days, no freezes. So in my opinion he patch you bisected out is probably my culprit as well as yours.

Headers compiled okay as well. VirtualBox won't install in custom 2.6.35-28, probalby do do with that compile warning when I built kernel. I'll risk reinstalling VB, or even installing new 4.0 version.

Many thanks for (apparently) isolating bug.

What happens now...presumably developers try to isolate problem in patch and reapply to later official kernel version?

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Sorry, I'm still in a mess with kernel headers.

In custom kernel the usual invocation to bring virtualBox up to speed:

  /etc/init.d/vboxdrv setup

gets

Error! Your kernel headers for kernel 2.6.35-28-generic cannot be found at
/lib/modules/2.6.35-28-generic/build or /lib/modules/2.6.35-28-generic/source.

despite

fakeroot debian/rules binary-indep

producing

linux-headers-2.6.35-28_2.6.35-28.49_all.deb

and that apparently installing correctly.

And indeed there are neither build or source subfolders under /lib/modules/2.6.35-28-generic/, though there's a build folder under most other /lib/modules/2.6.35-xx-generic/ folders.

Any thing else I can do? I notice fakeroot call also produced a small linux-source-2.6.35_2.6.35-28.49_all.deb. Should I run that maybe?

Or just wait for an official release that corects erant patch you found?

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Alan: you need to install linux-headers-2.6.35-28-generic_2.6.35-28.49_i386.deb after installing linux-headers-2.6.35-28_2.6.35-28.49_all.deb to get Virtual Box driver to compile.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Alan Campbell (entropyreduction) wrote :

Thanks. Worked a treat. I had tried the other way around (linux-headers-2.6.35-28-generic_2.6.35-28.49_i386.deb first). All set, VirtualBox fine. Many thanks

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Dave,

Any chance the importance of this bug can be increased?
Looking at the code of the offending comment it may affect anyone running stock kernel on single core processor (the code is ifdeffed-out for non-SMP build configurations).

Revision history for this message
C de-Avillez (hggdh2) wrote :

marking Triaged/High -- it seems Pawel and Alan reduced the issue to a single commit. This may affect all single-core systems.

Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
YannUbuntu (yannubuntu) wrote :

Dear all,
I have a very similar bug, but I am not sure it is duplicate or not : see bug #706532
Please let me know what I can provide to help.

Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Pawel Jasnos (pjasnos) wrote :

May I just add - as mentioned in one of my posts above, the commit just preceding the one I bisected this bug to may be related, as on the system where this bug occurs there are some very short (quarter to half a second?) freezes in cases when this bug occurs, instead of a hard freeze, which disappear when that commit is removed. Both of those commits modify page allocator (I can find that commit's sha-1 if needed, but it directly precedes this one). I can also report another bug for it if needed, but it may be wiser to simply have a look at both of the commits.
I don't have any experience in kernel development so I may of course be mistaken as to the role of preceding commit.

Also, my way of reproducing this bug, which seemed to work every time:
1. On an affected system, run Open Office, GIMP, Inkscape or any other memory-hogging application.
2. Use scp to copy a large file over the local network (I used a 2 Gb DVD image).

Revision history for this message
Alan Campbell (entropyreduction) wrote :

YannUbuntu : I suppose you could go through same drill, compile you own custom kernel sans the patch, following Pawel's instructions in comment #24. Or you could wait til there's an official kernel build that addresses the problem and see if tht makes your bug go away.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

More fairly stupid newbie questions:

The custom kernel I made following Pawel's instructions works flawlessly, so problem clearly solved.

It installed itself as installed as 2.6.35-28.

Now there's an update waiting to come in with the same version number. Too soon to contain a fix for bug Pawel identified, and bug 719446 not referred to in changelog, so asume I don;t want it.

So I locked "linux" in package manager. But sooner or later a kernel version will show up with the bug fixed.

How do I know that's happened? Just follow this page?

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Alan - Yes, this page (and emails you should receive if there are any updates!) should be the best way of knowing when a fix is officially committed - packages containing it should appear fairly soon afterwards. If there's any critical patch added to the kernel, you would need to rebuild it yourself to get it (rembering to revert the commit) until a patch for this bug is officially incorporated into Ubuntu kernel.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Ok, many thanks.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Maverick):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
importance: High → Undecided
status: Triaged → Invalid
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Pawel - your bisect results appear to be consistent with the Ubuntu version of the kernel. There were 3 mm patches added in 2.6.35-23.36 which is when Alan begin to see this issue. I presume that the machine you are using is also a single CPU ?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

The first step is to figure out if this bug still exists upstream. In order to do that please install the current Natty kernel from https://launchpad.net/~kernel-ppa/+archive/pre-proposed?field.series_filter=natty . I suggest just copying the deb and installing directly ('cause its easier to remove afterwards):

wget https://launchpad.net/~kernel-ppa/+archive/pre-proposed/+files/linux-image-2.6.38-8-generic_2.6.38-8.40~pre201103300902_amd64.deb
sudo dpkg -i linux-image-2.6.38-8-generic_2.6.38-8.40~pre201103300902_amd64.deb

reboot, do your testing, then:

sudo dpkg -r linux-image-2.6.38-8-server

Revision history for this message
Alan Campbell (entropyreduction) wrote :

dpkg reports
dpkg: error processing linux-image-2.6.38-8-generic_2.6.38-8.40~pre201103300902_amd64.deb (--install):
package architecture (amd64) does not match system (i386)

Despite me having an AMD Turion 64 cpu

?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Alan - I should have looked at your boot message. You've got a 32 bit installation, so use https://launchpad.net/~kernel-ppa/+archive/pre-proposed/+files/linux-image-2.6.38-8-generic_2.6.38-8.40~pre201103300902_i386.deb instead.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Pawel - can you attach a proper stack trace when the problem occurs? Or at least a better photo?

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Ok. Won't be til next week. And I've just thrown more RAM at laptop, will have to drop back to earlier kernels and make sure problem still there.

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Tim - Ok, I'll do that soon.

Alan - code in the patch that I bisected this to is ran when RAM is getting full, so to make sure it occurs with more RAM added you'd need to fill it up (GIMP with a large canvas works for me). My machine is also single CPU but a Celeron - I'll post the exact infos with the stack trace.

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Some short but intensive testing seems to indicate that, for me, Natty kernel doesn't have this problem, which is good news.

It seems (but I didn't have a chance to test yet) that the fix might be this patch from upstream: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=commit;h=88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97

I'll test this kernel more and perhaps try to apply this patch to Maverick kernel and see if it solves the problem as I suspect.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Pawel - that is the same commit suiggested by upstream to fix this problem. At first glance, however, it looks like kind of an intrusive backport. I'll take a more extensive look at it today.

Revision history for this message
Alan Campbell (entropyreduction) wrote :

Pavel: Interesting. With my 2.6Gig of effective memory I dropped back to 2.6.35-23, loaded firefox, gimp with 9 images between 10000x5000 and 5000x5000, celestia, stellarium, sciTE. Best I could get was 67% memory utilisation, and I couldn't get a freeze on copying data after many attempts (though poor old CPU was panting away at 100%).

How big a GIMP canvas did the trick for you?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Pawel and Alan:

Here is a Lucid test kernel with a backport of upstream commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97. Add my PPA thusly:

echo "deb http://ppa.launchpad.net/timg-tpi/ppa/ubuntu lucid main"|sudo tee /etc/apt/sources.list.d/timg-tpi.list
sudo apt-get update
sudo apt-get -u dist-upgrade

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Pawel and Alan: ignore my previous transmission. Thats a backport to Lucid, but you guys are interested in Maverick.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

OK, try this one (when it gets done building):

echo "deb http://ppa.launchpad.net/timg-tpi/ppa/ubuntu maverick main"|sudo tee /etc/apt/sources.list.d/timg-tpi.list
sudo apt-get update
sudo apt-get -u dist-upgrade

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Eric Zurcher (eric-zurcher) wrote :

Tim: Your build of 2.6.35-29-generic fixes the problem, at least on my system. Will we be seeing this fix in Natty?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Eric - this fix is backported from 2.6.38 (which is Natty).

git describe --contains 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97
v2.6.38-rc1~216

Since you've had no prior input into this bug, I presume you were able to first reproduce symptoms similar to those described by Alan and Pawel?

Revision history for this message
Eric Zurcher (eric-zurcher) wrote :

Tim - You presume correctly.

I've had problems with this since the introduction of 2.6.35-23, and it's been a real pain in the posterior - I've become better acquainted with Alt-SysReq than I really wanted to be. I'm running on hardware very similar to that of the original bug report.

I'd run my own set of simple tests before finding this thread, and determined that the freezes I was seeing were happening when free memory dropped to near zero, so it's almost certainly the same problem.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Eric - thanks for your response. I'll see about getting this patch into 2.6.35.y stable.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Patch accepted by Andi Kleen, 2.6.35.y stable maintainer.

Revision history for this message
Pawel Jasnos (pjasnos) wrote :

Thanks, it seems to fix it :-).

Tim Gardner (timg-tpi)
tags: removed: needs-upstream-testing
Steve Conklin (sconklin)
Changed in linux (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Eric Zurcher (eric-zurcher) wrote :

The kernel from maverick-proposed seems to have fixed the problem for me.

Tim Gardner (timg-tpi)
tags: added: verification-done
tags: added: verification-done-maverick
removed: verification-done
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.0 KiB)

This bug was fixed in the package linux - 2.6.35-30.54

---------------
linux (2.6.35-30.54) maverick-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #794114

  [ Upstream Kernel Changes ]

  * Revert "xhci: Fix full speed bInterval encoding."
  * Revert "USB: xhci - also free streams when resetting devices"
  * Revert "USB: xhci - fix math in xhci_get_endpoint_interval()"
  * Revert "USB: xhci - fix unsafe macro definitions"

linux (2.6.35-30.53) maverick-proposed; urgency=low

  [ Upstream Kernel Changes ]

  * xhci: Fix full speed bInterval encoding.
    - LP: #792959

linux (2.6.35-30.52) maverick-proposed; urgency=low

  [ Herton R. Krzesinski ]

  * Release Tracking Bug
    - LP: #790653

  [ Stefan Bader ]

  * Include nls_iso8859-1 for virtual images
    - LP: #732046

  [ Thomas Schlichter ]

  * SAUCE: vesafb: mtrr module parameter is uint, not bool
    - LP: #778043

  [ Tim Gardner ]

  * [Config] Add cachefiles.ko to virtual flavour
    - LP: #770430

  [ Upstream Kernel Changes ]

  * Revert "intel_idle: PCI quirk to prevent Lenovo Ideapad s10-3 boot
    hang"
    - LP: #772560
  * Revert "TPM: Long default timeout fix"
    - LP: #772560
  * Revert "tpm_tis: Use timeouts returned from TPM"
    - LP: #772560
  * Revert "xen: set max_pfn_mapped to the last pfn mapped"
  * CAN: Use inode instead of kernel address for /proc file, CVE-2010-4565
    - LP: #765007
    - CVE-2010-4565
  * xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1,
    CVE-2011-0711
    - LP: #767740
    - CVE-2011-0711
  * Treat writes as new when holes span across page boundaries,
    CVE-2011-0463
    - LP: #770483
    - CVE-2011-0463
  * fs/partitions/ldm.c: fix oops caused by corrupted partition table,
    CVE-2011-1017
    - LP: #771382
    - CVE-2011-1017
  * qla2xxx: Make the FC port capability mutual exclusive.
    - LP: #772560
  * staging: usbip: bugfixes related to kthread conversion
    - LP: #772560
  * staging: usbip: bugfix add number of packets for isochronous frames
    - LP: #772560
  * staging: usbip: bugfix for isochronous packets and optimization
    - LP: #772560
  * staging: hv: Fix GARP not sent after Quick Migration
    - LP: #772560
  * staging: hv: use sync_bitops when interacting with the hypervisor
    - LP: #772560
  * irda: validate peer name and attribute lengths
    - LP: #772560
  * irda: prevent heap corruption on invalid nickname
    - LP: #772560
  * nilfs2: fix data loss in mmap page write for hole blocks
    - LP: #772560
  * ASoC: Explicitly say registerless widgets have no register
    - LP: #772560
  * ALSA: ens1371: fix Creative Ectiva support
    - LP: #772560
  * ROSE: prevent heap corruption with bad facilities
    - LP: #772560
  * Btrfs: Fix uninitialized root flags for subvolumes
    - LP: #772560
  * x86, mtrr, pat: Fix one cpu getting out of sync during resume
    - LP: #772560
  * UBIFS: do not read flash unnecessarily
    - LP: #772560
  * UBIFS: fix oops on error path in read_pnode
    - LP: #772560
  * UBIFS: fix debugging failure in dbg_check_space_info
    - LP: #772560
  * quota: Don't write quota info in dquot_commit()
    - LP: #772560
  * mm: avoid wrapping vm_...

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Fix Released
To post a comment you must log in.